This is my first machine intelligence model. It looks good so far!

DenceNet121_1-1

There are many images on the internet. A lot of people upload selfie-images to Instagram every day.   There are also many text data on the internet because Not only professionals writers but many people express their opinions on blogs and tweets. No one can see every image and text on the internet as it is a huge volume. In addition, images and texts sometimes have a relationship to each other. For example, people upload images and put explanations of them. Therefore I am always wondering how we can analyze both images and text at once.  There are several methods to do that. I choose image-captioning model out of these methods as it is easy to understand how it works.

 

1. What is an image-captioning model?

Before I start the project on image captioning, I performed computer vision projects and Natural language projects independently.  Computer vision means to classify cats and dogs or detect a specific type of cars and distinguish each of them from other types of cars. I also develop natural language models such as sentiment analysis of movie reviews. Image-captioning model is a kind of combined model of “computer vision and natural language model”.  Let us see the chart below.

image-captioning

A computer takes a picture as input. Then the encoder extracts features from the picture that is taken.  “feature” means the characteristics of an object”. Based on these features, the decoder generates sentences which describes what the picture tells us. This is how our “image-captioning” model works.

 

2. How can we find the template of “image captioning model” and modify it?

I found a good framework to develop our image-captioning models. It is “colab” provided by Google. Although it is free to use, there are many templates to start with the projects and GPU is available in it for research/interaction usages. It can provide us with a computational power to be required for developing image-captioning models. I found the original template of image-captioning in colab. The template is awesome as “the attention mechanism” is implemented. It uses inceptionV3 as an encoder and GRU as a decoder. But I would like to try other methods. I modify this template a little to change from inceptionV3 to densenet121 and from GRU to LSTM.  Let us see how it works on my experiment!

 

3. The results after 3-hour-training

Here is one of the outputs from my experiment of our image-captioning model. It says “a couple of two sugar covered in chocolate frosting are laid on top of a wooden table”. Although it is not perfect, it works very well.  When we input more data and computation time, it should be more accurate.

DenceNet121_1-2

 

This is the first step toward machine intelligence.  Of course, it is a long way to go.  But the combined images and texts, I believe we can develop many cool applications in the future. In addition, I found that “the attention mechanism” is very powerful to extract relevant information. I would like to focus on this mechanism to improve our algorithms going forward. Stay tuned!

 

(1) Olah&Carter, “Attention and Augmented Recurrent Neural Networks“, Distill, 2016.

 

When you need AI consulting,  could you see TOSHI STATS website?

Notice: Toshi Stats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. Toshi Stats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on Toshi Stats Co., Ltd. and me to correct any errors or defects in the codes and the software

 

 

 

 

Advertisements

We start AI Lab in the company and research “attention mechanism” in deep learning

As I said before. I completed the online course “deeplearning ai“. This is an awesome course I want to recommend to everyone. There are many topics we can learn in the course. One of the most interesting things for me is “attention mechanism” in neural translation.  So I would like to explain it in details. Do not worry as I do not use mathematics in this article.  Let us start.

 

The definition of attention mechanism is “The attention mechanism tells a Neural Machine Translation model where it should pay attention to at any step”. It may be natural when we consider how we translate language from one to another. Yes, human-being pays more attention to specific objects than others when they are more interesting to them. When we are hungry,  we tend to look for the sign of “restaurant” or ” food court”,  do not care the sing of “library”,  right?

We want to apply the same thing for translation by computers. Let me consider again. It is true that when we translate English to our mother tongue, such as Japanese, we look at the whole part of the sentences first, then make sure what words are important to us.  we do not perform translation one on one basis. In another word, we pay more attention to specific words than other words. So we want to introduce the same method in performing neural translation by computers.

 

Originally, attention mechanism was introduced (1) in Sep 2014. Since then there are many attention mechanisms introduced. One of the strongest attention models is “Transformer” by Google brain in  June 2017.  I think you use Google translation every day. It performs very well. But transformer is better than the model used in Google translation. This chart shows deference between  GNMT (Google translation) and Transformer(2).

Fortunately, Google prepares the framework to facilitate AI research.  It is called “Tensor2Tensor (T2T) “. It is open sourced and can be used without any fees. It means that you can do it by yourself! I decide to set up “AI Lab” in my company and introduce this framework to research attention mechanism. There are many pre-trained models including “Transformer”.  Why don’t you join us?

 

I used translations as our example to explain how attention mechanism works. But it can be applied to many other fields such as object detection which is used in face recognition and a self-driving car. It must be excited when we consider what can be achieved by attention mechanism.  I would like to update the progress.  So stay tuned!

 

 

When you need AI consulting,  do not hesitate to contact TOSHISTATS

 

(1) NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE.  By Dzmitry Bahdanau, KyungHyun Cho, Yoshua Bengio in Sep 2014

(2) Attention Is All You Need,  By Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez, Łukasz Kaiser,Illia Polosukhin,  in June 2017

 

 

 

Notice: TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software.