“Stable Diffusion” is going to lead innovations in computer vision in 2023. It must be exciting!

Hi friends. Happy new year! Hope you are doing well. Last year, I found a new computer vision model, called “Stable Diffusion” in September. Since then, many AI researchers, artists and illustrators are crazy about that because it can create high quality of images easily. The image above is also created by “Stable Diffusion”. This is great!

1. I created many kinds of images by “Stable Diffusion”. They are amazing!

These images below were created in the experiments by “Stable Diffusion” last year. I found that it has a great ability to generate many kinds of images from oil painting to animation. With fine-tuning by “prompt engineering”, they are getting much better. It means that we should input appropriate words / texts into the model then the model can generate images that we want more effectively.

2. “Prompt engineering” works very well

In order to generate images that we want, we need to input the appropriate “prompt” into the model. We call it “prompt engineering” as I said before,

If you are a beginner to generate images, you can start it with a short prompt such as ” an apple on the table”. When you want the image which looks oil painting, you can just add it such as “oil painting of an apple on the table”.

Let us divide each prompt into three categories

  • Style
  • physical object
  • the way the physical object is displayed (ex. lighting)

So all we have to do is to consider what “each category of our prompt” is and input it into the model. For example “oil painting of an apple on the table, volumetric light’ . The results are images below. Why don’t you try it by yourself?

3. More research needed

Some researchers in computer vision think “Prompt engineering” can be optimized by computers. They developed the model to optimize it. In the research paper(1), they compare hand made prompt vs AI optimized prompt (see the images below). Which do you like better? I am not sure optimization always works perfectly. Therefore I think more research is needed with many use cases.

I will update my article to see how the technology is going in the future. Stay tuned!

1) Optimizing Prompts for Text-to-Image Generation Yaru Hao, Zewen Chi, Li Dong, Furu Wei, Microsoft Research, 19 Dec 2022, https://arxiv.org/abs/2212.09611

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

“Stable Diffusion” is a game changer of computer vision. It is amazing!

Hi friends. Hope you are doing well. Today, I would like to introduce a new computer vision model, called “Stable Diffusion”. This is an open source software. It means that you can use it for free, just download it without paying money for a license. It is good for anyone who is interested in computer vision. The image above is created by “Stable Diffusion”. It looks so good! I love that because it is very easy to create such beautiful images.

1.These images are amazing!

These images are created from the same text. When you see the background of each image, you may know where the girl stays. Yes it is “a cafe” because I make a text to order that she is in a cafe. As you know this is a “text to image generative model”. It means that we should input some words / texts into the model then the model can generate images based on this instruction. It is very interesting as I feel like I can communicate with computers when I create these images.

2. It is “open source software”

If I have to pay a lot of money to use it, it is not so impressive because very few people can do that. Fortunately, however, it is an open source software so everyone can use it for free! If you want to integrate “Stable Diffusion” into your products, no problem. If you want to create an updated version of this software, you can do that because it is open source software. So I want to make my own products with it in the near future. Why don’t you try it by yourself? If you are interested in “Stable Diffusion”, I recommend you to watch a youtube video about an interview of Emad Mostaque, founder of Stability AI (1). This company creates “Stable Diffusion” . The details of release information are provided here (2). Please check the terms of the licence of this software, too. 

3. It can change the direction of computer vision and beyond

The blog says “This release is the culmination of many hours of collective effort to create a single file that compresses the visual information of humanity into a few gigabytes.”. I cannot predict what can be achieved by this software exactly. But I can say many things , which used to be impossible, can be possible with this software. It means that “Stable Diffusion” enables all of us to create products, services and arts, which are unseen yet. This is definitely “democratization of AI”. I expect a tsunami of new kinds of products, services and arts will appear in the near future. It must be exciting!

I will update my article to see how this new software is going in the future. Stay tuned!

1] The Man behind Stable Diffusion https://www.youtube.com/watch?v=YQ2QtKcK2dA&t=942s

2) Stable Diffusion Public Release https://stability.ai/blog/stable-diffusion-public-release

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

“GRAPH ATTENTION NETWORKS”, it is awesome as it has attention mechanisms

Today, I would like to introduce “GRAPH ATTENTION NETWORKS” (GAT) (1). I like it very much as it has attention mechanisms. Let us see how it works.

  1. which nodes should we pay more attention to?

As I said before, GNN can be updated by taking information from neighbors. But you may be wondering which information is more important than others. In other words, which nodes should we pay more attention to? As the chart shows, the information from some nodes is more important than others. In the chart, rhe thicker red arrows from sender nodes to the receiver node is, the more attention GAT should pay to the node. But how can we know to which nodes should be paid more attention?  

2. Attention mechanism

Some of you may not know about “attention mechanisms”, so I will explain it in detail. It is getting popular when the Natural Language processing (NLP) model called “transformer” introduces this mechanism in 2017. In this NLP model can understand what words are more important than others when the model considers one specific word in sentences. GAT introduces the same mechanism to understand “which nodes GAT should pay more attention to than other nodes when the information is gathered from neighbors?”. The chart below explains how the attention mechanism works. They are taken from the original research paper of GAT (1).

In order to understand “what nodes GAT should pay more attention to”, attention weights (red arrow) are needed. The bigger these weights are, the more attention GAT should pay. To calculate attention weights, firstly features of sender node(green arrow) and receiver node(blue arrow) are linearly transformed and concatenated. eij is calculated by a single layer neural network (formula 1). This is called “self-attention” and eij is called “attention coefficient”. Once attention coefficients of all sender nodes are obtained, we put them into the softmax function to normalize them (formular 2). Then “attention weights” of aij can be obtained (right illustration). When you want to know more, please check formula 3. Finally the receiver node can be updated base on “attention weights” aij (formula 4).

3. multi-head attention

GAT introduces multi-head attention. It means that GAT has several attention mechanisms (right illustration). K attention mechanisms execute the transformation of formula 4, and then their features are concatenated (formula 5). When we perform multi-head attention on the final layer of the network, instead of concatenation, GAT uses average of results from each attention head and delays applying a softmax for classification task (formula 6).

Hope you enjoy the article. I like GAT as it is easy to use and more accurate than other GNNs I explained before. I will update my article soon. Stay tuned!

(1) GRAPH ATTENTION NETWORKS, Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, Yoshua Bengio

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Node Classification with Graph Neural Networks by PyG !

Last time, I introduced GCN (“GRAPH CONVOLUTIONAL NETWORKS”) in theory. Today, I would like to solve the problem with GCN. Before doing that, I choose the best framework for graph neural networks. This is “PyG”(1).

  1. PyG (PyTorch Geometric)

Let me look at the explanations of PyG in its official document.

“PyG is a library built upon PyTorch to easily write and train Graph Neural Networks for a wide range of applications related to structured data. PyG is both friendly to machine learning researchers and first-time users of machine learning toolkits.”

I thinks it is the best for beginners because

  • It is based on Pytorch, which is written in python and widely used in deep learning tasks.
  • There are well-written documents and notebook tutorials.
  • It has many “out of the box” functions so we can start experiments with GNN immediately.

2. Prepare graph data

Our task is called “node classification”. It means that each node has its class (Ex: age, rating, income, fail or success, default or not default, purchase or no-purchase, cured or not cured, whatever you like). We would like to predict “what the class of each node is” based on the graph data. 

Let me introduce “Cora" dataset(2), citation network. Each node represents a document. It has a 1433-dimensional feature and belongs to one of the seven classes. Each edge means a citation from the document to another. Our task is to predict the class of each document. Let us visualize each node before training our GCN as a dot below. We can see seven colours as there are seven colours in this graph.

3. GCN implementaion with PyG

Let us train GCN to analyse the graph data now. I explained how GCN works before so when you missed it, check it up.

This is a GCN implementaion with PyG. PyG has GCN in it. So all we have to do is 1. import GCNConv, 2. create a class by using GCNComv. That’s it! It looks easy if you are already familiar with Pytorch. When you want to run the whole notebook, it is available for you in the PyG official document. Here the link is.

Here is the result of our analysis. It looks good as nodes are classified into seven classes.

GCN can be applied to many tasks as it has a simple structure. Why don’t you try it by yourself with PyG today?

(1) PyG (PyTorch Geometric)  https://www.pyg.org/

(2) Revisiting Semi-Supervised Learning with Graph Embeddings Zhilin Yang, William W. Cohen, Ruslan Salakhutdinov, May 2016

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

“GRAPH CONVOLUTIONAL NETWORKS”. It is one of the most popular GNN

Last time I explained how GNN works. It gathers information from neighbors and aggregates them to predict classes we are interested in. Today, I would like to go deeper with one of the most popular GNN called “GRAPH CONVOLUTIONAL NETWORKS” or “GCN”(1). Let us start step by step.

1. Adjacency Matrix

As you know, a graph has edges or links among nodes. Therefore, a graph can be specified with an adjacency matrix A and node features H. A adjacency matrix is very useful to present the relationship among nodes. If a graph has a link from node “i” to node “j”, the element of A, whose row is “i” and column is “j” is “1”, otherwise “0”. It is shown in the chart below. If a node has a link to itself, diagonal elements are “1” in adjacency matrix.

2. Gather information from neighbors

Let us explain the chart below. 1. The node, which is red in the chart, gathers information from each neighbor. 2. The information is aggregated to update the node. The way for aggregation can be sum or average. 

As I said above, a graph can be specified with an adjacency matrix A and node features H or X. I introduce W as a matrix to show learnable weights and D is a matrix to show us the degree of A. It is noted that diagonal elements are “1” in adjacency matrix here. σ is a non linear function.

In GCN, the way to gather information is as follows (1).

It means that information from neighbors are weighted based on the degree of the sender (green one) and the receiver (red one) as well.  All information is aggregated to update the receiver (red one).

In the formula below, GCN is also considered more generally that the features of the neighborhood are directly aggregated with fixed weights (2).


“Cuv” is considered as the importance of node v to node u. It is a constant that directly depends on the entries in an adjacency matrix A.

That’s it! Hope you can understand how GCN works. In the next article, I would like to solve the problem with GCN. Stay tuned!

(1) SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS, Thomas N. Kipf & Max Welling, 22 Feb 2017

(2) Geometric Deep Learning Grids, Groups, Graphs, Geodesics, and Gauges, p79, Michael M. Bronstein, Joan Bruna, Taco Cohen, Petar Veličković, 4 May 2021

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Graph Neural Networks are very flexible to design models for data analysis

Last time, I introduced Graph Neural Networks (GNN) as a main model to analyze complex data. Let us see how GNN works in detail.

1. What Does Graph Data Look like?

Unlike tabular data, the graph has edges between nodes. It is very interesting because many things have inter relationship with each other, such as…

  • Investors behavior are affected each other in financial markets
  • Rumors are spread and impact people’s decisions in social network
  • Consumers may like the products which are already popular in the market
  • One marketing strategy affects the results of other marketing strategies in the company
  • In the board game called “Go”, some part of results affect other parts of results on Go board

These structures are shown just like the graph below. It is based on the karate club data(1). Each node means each member in the club. The graph(2) shows us four groups in the club. There are edges between nodes and these structures are very important in analyzing data.

2. How can GNN models be trained?

Each node is expressed as vectors (example : [0 1 0 0 5]). It is called “node features” or just “features” in machine learning. When models are trained, each node takes the information from neighbors and is updated based on this information. Yes, it looks simple! One of the ways to take the info from neighbors is the “sum” of information from neighbors. Another is to take the average. We iterate these updates until the loss function can be converged.

It is noted that we can sum up or take the average in the same manner even if the structures of the graph are changed. This is why GNN is very flexible to design the models. 

3. How can the predictions from GNN models be obtained?

After training of models, we can obtain predictions based on the graph. In GNN, there are three kinds of predictions.

  • node prediction : Each node should be classified according to labels. For example, in the Karate club above, each member should be classified as the member in one of the four teams shown in the chart above.
  • graph prediction : Based on the whole structure of the graph, it should be classified. For example, a new antibiotic may be classified whether it works well or not for treatments against certain diseases.
  • link prediction : When each node means each customer or each product, the edges between customers and products can mean the purchase in the past. If we can create better node features based on graph structures, recommendations can be provided to inform which products you may like more accurately.

Hope you can understand how GNN works well. It is very flexible to design. Next, I would like to explain what kind of GNN models are popular in the industries. Stay tuned!

(1) Wayne W. Zachary. An information flow model for conflict and fission in small groups. Journal of
anthropological research, pp. 452–473, 1977.

(2) SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS, 22 Feb 2017, Thomas N. Kipf & Max Welling

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Graph Neural Networks can be a game changer in Machine Learning!

Happy new year! As new year comes, I’m thinking about what to do about AI this year. After exploring various things, I decide to concentrate on “Graph Neural Networks” in 2022. I’ve heard the name “Graph Neural Networks” before, but since the success stories have been reported in several applications last year, I think it is the right time to work on it in 2022.

Graph is often represented by a diagram connecting dots, just like this.

Dots are called “nodes” and there are “edges” between nodes. They are very important in “Graph Neural Networks” or “GNN” in short.

These can be expressed intuitively with Graph. So they can be analysed by GNN.

  • Social network
  • Molecular structure of the drug
  • Structure of the brain
  • Transportation system
  • Communications system

If you have a structure that emphasizes relationships between nodes or dots, you can express it in Graph and use GNN for analysis. Due to its complexity, GNN hasn’t appeared much as an AI application, but in last year, I think we’ve seen a lot of success results. It seems that the number of papers on GNN published is increasing steadily.

In August of last year, DeepMind and Google researchers released that they predicted the arrival time at the destination using Google Map data and improved the accuracy. The roads were graphed by segment and analyzed using “Graph Neural Networks”. The structure of the model itself seems to be unexpectedly simple. For details, please see 3.2 Model Architecture in the research paper (1).

There are many other successful cases. Especially in the field of drug discovery, it seems to be expected.

Theoretically, “Graph Neural Networks” is a fairly broad concept and seems to have various models. The theoretical framework is also deepening with the participation of leading researchers, and research is likely to accelerate further in 2022.

So, “Graph Neural Networks” is a very interesting to me. When I find good examples, I would like to update it here. Stay tuned!

1)ETA Prediction with Graph Neural Networks in Google Maps, 25 Aug 2021

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software

“Keras” is a great library to create AI. It is the time to start learning it!

I started learning “Keras” in 2016 as it is easy to start.  It is open source software therefore no cost is needed when you use it.  There are many pre-defined layers and models in Keras. So no need for computer science PhD to start using the library.  That is why I would like to recommend “Keras” for everyone who is interested in AI today. Let me explain how awesome “Keras” is!

  1. “Keras” is a friend of beginners about AI

When the beginners starts programming of AI, they want to produce awesome results from AI models they create. This is important to keep motivated to continue learning AI. “Keras” enables them to do that. We can start AI programming from simple models without deep knowledge of AI. It is the best to start AI for everyone who is interested in AI.  I remember how glad I was when I created computer vision model to classify digits from 0 to 9 by “Keras”.  Now it is your turn!

2. We can write state of the art AI models with less code

While it is easy to create simple AI models with “Keras”, we can create more complex AI models with “Keras”, too. Currently I reaserach Variational autoencoder model by Keras. It is one of the most famous generated models all over the world. Keras provides great template to start. It you are interested in this, please check it out here.

3. We can implement AI models by “Keras” into Google Cloud

When you complete to create your own AI models,  you may want to create your own microservice with your own models. This is my example for that. Sentiment analysis model is created by “Keras on TensorFlow“. The model is implemented on Google Cloud Platform as microservice. So someone sent me emails (left),  correspondence emails(right) are sent to them automatically. The email tells us whether it is positive/negative sentiment as “porbability of positive”. It works very well. 

Actually,  99% of my AI models written in my articles are created by “Keras”. Although keras used to have several backends libraries before, now keras has only one backend “TensorFlow”, which is supported by Google.  “Keras” is now tightly integrated into “TensorFlow”.  It means that  “Keras” has more and more opportunities to expand its capabilities in “TensorFlow” eco-system. I think it is the time that you started learning AI with “Keras”

Why don’t you start learning “Keras” with us today! I would like to update my articles about “Keras”. Stay tuned!

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software

This is my favorite NLP model. It is small but works very well!

Since BERT was released from Google in Oct 2018, there are many models which improve original model BERT. Recently, I found ALBERT, which was released from Google research last year. So I perform small experiments about ALBERT. Let us make sentiment analysis model by using IMDB data. IMDB is movie review data, including review content and its sentiment from each user. I prepare 25000 training and 3000 test data. The reslut is very good. Let us see more details.

  1. It is easy to train the model as it has less parameters than BERT does.

BERT is very famous as it keeps good perfomance for NLP (natural language processing) tasks. But for me, it is a little too big. BERT has around 101 millions parameters. It means that it takes long to train the model and sometimes go over the capacity of memory of GPUs. On the other hand, ALBERT has around 11 millions so easy to train. It takes only about 1 hour to reach 90% accuracy when NVIDIA Tesla P100 GPU is used. It is awsome!

In this expriment, max_length is 256 for each sample

2. It is very accurate with a little data

ALBERT is not only fast to learn but also very accurate. I used pre-trained ALBERT base model from TensorFlow Hub. Because it is pretrained in advance, ALBERT is accurate with less data. For example, with only 500 training data, its accuracy is over 80%! It is very good when we apply it into real problems as we do not always have enough training data in practice.

max_length is 128 for each sample.

3. It is easily integrated to TensorFlow and keras

Finally, I would like to pointout ALBERT is easy to integrate TensorFlow keras, which is the framework of deep learning. All we have to do is to import ALBERT as “keras layer”. If you want to know more, check TensorFlow Hub. It says how to do it. I use TensorFlow keras everyday so ALBERT can be my favorate model automatically.

As I said, ALBERT is released in TensorFlow Hub and is free to use. So everyone can start using it easily. It is good to democratise “artificial intelligence”. I want to apply ALBERT into many applicatons in real world. Stay tuned!

Cheers Toshi

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software

This is my own AI developed when I stay home. It works well!

What are you doing when you should stay home under lock-down? Cooking? Yoga? reading books? For me, I would like to develop my own AI for myself. Here is a recipe to do that.

main goal : to understand what sentiment each of e-mail has, positive ot nagative?

interface : Gmail as it is my favoriate

AI library : TensroFlow as it is my favoriate

Infrastructure : Google Cloud as it is easy to deploy AI models

connector between AI and email : Microsoft Power Automate as it is easy to use

1.develop AI model

When I develop AI model, massive amount of data is needed. I use “IMDB review data” as it is publicly available and easy to use. Although it is written in English, the model can accept 16 languages, including English, Japanese, Chinese! It is great. If you want to know how it is possible, I recommed to read this paper.

2. connect email to AI model

Once the model is developed, let us connect it to emails so that messages in emails can be sent to AI models and the prediction can be sent back to emails. There are several tools to do that. I choose Microsoft Power Automate as everyone uses Microsoft all over the world. It takes only 2-3 hours even if you are beginners of this tool as it requires only a few lines of codes on it.

3. Let us see the prediction by AI

This is an example to see how my AI works. When I receive the message through e-mail, it is send to AI model automatically. Then AI model recieives the massage from email, it caluculates probability of “positive sentiment” and this probability can be sent back to email address which is set up in advance.

I use my AI several days and confirm these predictions are fairly reasonable for advertisements and announcements. It is very interesting! I would like to develop next version of my AI in near future and update it here. Stay tuned!

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software