This is our cross-lingual intelligent system. It is smarter than I am as it recognizes 16 languages!

When I lived in Kuala Lumpur, Malaysia,  I always thought I am in a multilingual environment.  Most people speak Bahasa Malaysia, But when they talk to me, they speak English. Some of them understand Japanese.  Chinese Malaysian people speak Mandarin, Cantonese.  Indian people speak Hindi or other languages.  Yes, I am sure Asia is a “multi-lingual environment”.   Since then I am always wondering how we can develop the system that can accept many languages as inputs.  Now I found that.

This is the first cross-intelligent system by TOSHI STATS.  It can accept 16 languages and perform sentimental analysis.  Let me explain the details.

 

 

1. Inputs in 16 languages

As we use MUE(1) models in TensorFlow Hub,  it can accept 16 languages ( See the list below ).  Usually, Japanese systems cannot be input English and English systems cannot accept Japanese. But this system can be input both of them and work well. This is amazing! The first screenshot of our system is the input in Engish and the second is input in Japanese.

N_Osaka2a

N_Osaka1a

We do not need a system for each language, one by one. The secret is the system can map each sentence to the same space although each of them is written in different languages.

 

 

2. Transfer learning from English to other languages

I think it is the biggest breakthrough in the system. As a result of sharing the same space among languages,  we can train the model in English and transfer its knowledge to other languages.   For example, there are many text data for training models in English but there are a few in Japanese. In such a case, it is difficult to train models effectively in Japanese. But we can train models in English and use it in Japanese. It is great! Of course, we can train the model in another language and transfer it to others. It is extraordinary as it enables us to transfer knowledge and expertise from one languages to another.

 

 

3. Experiment and result

I choose one news title(2) from The Japan Times and perform sentiment analysis with the system. The title is ” Naomi Osaka cruises to victory in Pan Pacific Open final to capture first title on Japanese soil “.   I think it should be positive.

Japan-timesThis English sentence is translated into other 15 languages by Google translation. Then each sentence is input to the system and we measure “probability of positive sentiment”.  Here is the result. 90% of them are over 0.8. It means that in most languages, the system can recognize each sentence as definitely “positive”.  This is amazing!  It works pretty well in 16 languages although the model is trained only in English.

MUE_result

 

 

When I develop this cross-lingual intelligent system, I think it is already smarter than I am as I do not know what sentences in 14 languages mean except Japanese and English. Based on this method, we can develop many intelligent systems which are difficult to develop one year ago.  Let me update the progress of our intelligent system. Stay tuned!

 

 

1. Multilingual Universal Sentence Encoder for Semantic Retrieval ,  Google, YinfeiYang, Daniel Cer, Amin Ahmad, Mandy Guo, Jax Law, Noah Constant, Gustavo Hernandez Abrego, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil, July 9 2019

2.Naomi Osaka cruises to victory in Pan Pacific Open final to capture first title on Japanese soil, The Japan Time, Sep 22, 2019.

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software

BERT also works very well as a feature extractor in NLP!

Two years ago, I developed car classification models by ResNet. I use transfer learning to develop models as I can prepare only small amount of images. My model is already pre-trained by a huge amount of data such as ImageNet. I can extract features of each image of cars and train classification models on top of that. It works very well. If you are interested in it, could you see the article?

Then, I am wondering how BERT(1) works as a feature extractor. If it works well, it can be applied to many downstream tasks with ease. Let us try the experiment here. BERT is one of the best Natural Language Processing (NLP) models by Google. I wrote how BERT works in my article before. It is amazing!

Let me explain features a little. Feature means “How texts can be represented by vectors”. Each word can be converted to a number before inputting to BERT then whole sentence can be converted to 768-length-vectors by BERT. In this experiment, feature extraction can be done by TensorFlow Hub of BERT. Let us see its website. It says there are two kinds of outputs by BERT…

It means that when text data is input to BERT, the model returns two type of vectors. One is “one vector for each sentence”, the other is “sequence of vectors for each sentence”. In this task, we need “one vector for each sentence” because it is classification task and one vector is enough to input classification models. We can see the first 3 vectors out of 3503 samples below.

This is a training result of the classification model. Accuracy is 82.99% at 105 epoch. Although it is reasonable it is worsen than the result of the last article 88.58%. The deference is considered as advantage of fine tuning. In this experiment, weights of BERT are fixed and there is no fine tuning. So if you need more accuracy, let us try fine tuning just like the experiment in the last article.

BERT means “Bidirectional Encoder Representations from Transformers”. So it looks good as a tool for feature extractions. Especially this is multi-language model therefore we can use it for 104 languages. It is amazing!

I will perform other experiments about BERT in my article. Stay tuned!

  1. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
    11 Oct 2018, Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova, Google AI Language

Notice: Toshi Stats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. Toshi Stats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on Toshi Stats Co., Ltd. and me to correct any errors or defects in the codes and the software

BERT performs very well in the classification task in Japanese, too!

As I promised in the last article, I perform experiments about classification of news title in Japanese. The result is very good as I expected. Let me explain the details.

I use “livedoor news corpus” (2) for this experiment. These are five-class of news title in this experiment. These are about life, movie, sports, chats, and electronics. Here is the detail of the class. I would like to classify each title of news according to this class correctly.

Then I train BERT(1) model with a sample of news title written in Japanese. Here is the result. The BERT model, which I used, is the multi-language model. All I have to do is fine-tuning to apply my task. As you can see below, The accuracy ratio is about 88%. It is very good while I use very small sample data (3503 for training, 876 for test). It took less than one minute on colab with GPU.

With 3 epochs, I confirmed that the accuracy ratio is over 88%

Let me take 10 samples for validation and see each of them. These samples are not used for training so they are new to the computer. Nine out of ten are classified correctly. It is so good, isn’t it?

The beauty is that the pre-trained model is not specific for only Japanese. As it is a multi-language model, it should work in many kinds of languages with the same fine-tuning as I did in Japanese. Therefore It should work in your languages, too!

How about this experiment? I continue to do experiments of BERT in many tasks of natural language and update my article soon. Stay tuned!

  1. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
    11 Oct 2018, Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova, Google AI Language
  2. livedoor news corpus CC BY-ND 2.1 JP

Notice: Toshi Stats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. Toshi Stats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on Toshi Stats Co., Ltd. and me to correct any errors or defects in the codes and the software

BERT performs near state of the art in question and answering! I confirm it now

Today, I write the article of BERT, which a new natural language model, again because it works so well in question and answering task. In my last article, I explained how BERT works so if you are new about BERT, could you read it?

For this experiment, I use SQuADv1.1data as it is very famous in the field of question and answering.  Here is an explanation by them.

“Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowd workers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.” (This is from SQuAD2.0, a new version of Q&A data)

This is a very challenging task for computers to answer correctly. How does BERT work for this task? As you saw below, BERT recorded f1 90.70 after one-hour training on TPU on colab in our experiment. It is amazing because based on the Leaderboard of SQuAD1.1 below, it is the third or fourth among top universities and companies although the Leaderboard may be different from our experiment setting. It is also noted BERT is as good as a human is!

 

 

 

I tried both Base model and Large model with different batch size.  Large model is better than Base model with around 3 points. Large model takes around 60 minutes to complete training while Base model takes around 30 munites. I use TPU on Google colab for training. Here is the result. EM means “exact match”.

Question & answering can be applied to many tasks in businesses, such as information extraction from documents and automation for customer centers. It must be exciting when we can apply BERT to businesses in the near future.

 

Next, I would like to perform text-classification of news title in Japanese because BERT has a multi-language model which works in 104 languages globally. As I live in Tokyo now, it is easy to find good data for this experiment. I will update my article soon. So stay tuned!

 

 

 

 

 

@article{devlin2018bert,
  title={BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding},
  author={Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina},
  journal={arXiv preprint arXiv:1810.04805},
  year={2018}
}

Notice: Toshi Stats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. Toshi Stats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on Toshi Stats Co., Ltd. and me to correct any errors or defects in the codes and the software

 

“BERT” can be a game changer to accelerate digital transformation!

Since Q1 of 2019 is close to ending,  I would like to talk about one of the biggest innovation of deep learning in Natural Language Processing (NLP).  This is called “BERT” presented by Google AI in Oct 2018. As far as I know, it is the first model to perform very well in many language tasks such as sentimental analysis, question answering without any change of the model itself. It is amazing! Let us start now.

1. How BERT works?

The secrets of BERT are its structure and method of training.   BERT introduces transformer as the main blocks in it.  I mentioned transformer before as it is a new structure to extract information of sequential data. The key is the attention mechanism. It means to measure “how we should pay attention to each word in the sentence”. If you want to know more, it is a good reference. Then let us move on how BERT is trained. BERT means “Bidirectional Encoder Representations from Transformers”. For example, the word “bank” has different meanings in “bank account” and “bank of the river”. When the model can learn from data only forward direction, it is difficult to distinguish the difference of meaning of “bank”. But if it can learn not only forward but backward direction, the model can do so. It is the secret for BERT to perform the state of art in many NLP tasks without modifications. This is the chart from the research paper (1).

2. How can we apply BERT to our tasks for solutions?

BERT is so large that it needs a lot of data and computing resources such as GPU/TPU.  Therefore it takes time and cost if we train BERT from scratch. But no need to worry.  Google released a number of pre-trained models of BERT. It is great because we can use them as base models and all we have to do is just a small training to adjust to our own tasks such as text classification. It is called “fine-tuning”. These pre-trained models are open source and available for everyone. If you want to know more, please see the blog. The beauty is one of the pre-trained models is a multi-language model which works in 104 languages without any modifications. It is amazing! So it works in your language, too!

3. Can BERT accelerate digital transformation in our daily lives?

I think “Yes” because we are surrounding a massive amount of documentation such as contracts, customer reports, emails, financial reports, regulatory instructions, newspapers, and so on. It is impossible to understand everything and extract the information needed in a real-time manner.  With BERT, we can develop much better applications to handle many text data and extract information needed efficiently. It is very exciting when we consider how many applications can be created by using BERT in the near future.

Hope you enjoy my article. Now I research BERT intensively and update my article soon. Stay tuned!

  1. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
    11 Oct 2018, Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova, Google AI Language

 

 

 

Notice: Toshi Stats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. Toshi Stats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on Toshi Stats Co., Ltd. and me to correct any errors or defects in the codes and the software

More than 10X faster! You may have an access to the super-powered computers, too!

 

I started using deep learning four years ago.  I have countless experience of deep learning before. I thought I knew the most part of deep learning.  But I found I was wrong when I tried new computational engine called “TPU” today. I want to share my experience as it is beneficial for everyone who is interested in artificial intelligence. Let us start.

 

1. TPU is more than 10X faster

Deep learning is one of the most powerful algorithms in artificial intelligence. Google uses it in many products such as Google translation. Problem is that deep learning needs a massive amount of computational power. For example, let us develop a classifier to tell what it is from 0 to 9.  This is a dataset of MNIST, which is “hello world” in deep learning. I want to classify each of them automatically by computers.

MNIAT-TPU

Two years ago, I did it on my Mac Air11. It took around 80 minutes to complete training. MNIST dataset is one of the simplest training data in computer vision. So if I want to develop a more complex system, such as a self-driving car, my Mac Air is useless as it takes far longer for calculation.  Fortunately, I can try TPU, which is a specialized processor for deep leaning. Then I found it is incredibly fast as it completes the training less that one minute!  80 minutes vs 1 minute.  I tried many times but the result is always the same. So I check the speed of calculation.  It says more than 160 TFLOPS.  TPU is faster than super-computers in 2005. TPU is the fastest processor I have ever tried before. This is amazing.

TPU TFLOPS

 

2.  TPU is easy to use!

Although TPU is super fast, it should be easier to use. If you need to rewrite your code when you use TPU,  you may hesitate to use it. If you use “Tensorflow’, open source deep learning framework by Google, there is no problem.  Just small modifications are needed.  If you use other frameworks, you need to wait until TPU supports other frameworks. I am not sure when it happens. In my case, I mainly use tf.keras on Tensorflow so no need to worry about. you can see the codes of my experiment here.

 

 

3. TPU is available on colab for free!

Now we find TPU is fast and easy to use. Then I need inexpensive tools in my business. But you do not worry about. TPU is provided from Google colab, web-based- experiment- environment for free. Although there are some limitations (such as maximum-time is 12 hours), I think it is OK to develop minimum viable models. If you need TPU on formal projects,  paid service is also provided by Google. So we can use it as free-services or paid services, depends on our needs. I recommend you to use TPU on colab as free-services to learn how TPU works.

 

 

Technically this TPU is v2. Google already announced TPU v3, more powerful TPU. So these services might be more powerful in near future.  Today’s experience is just a beginning of the story.  Do you want to try TPU?

 

 

 

 

1). MNIST with tf.Keras and TPUs on colab

https://github.com/TOSHISTATS/MNIST-with-tf.Keras-and-TPUs-on-colab/blob/master/MNIST_TPU_20181009.ipynb

 

 

 

Notice: Toshi Stats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. Toshi Stats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on Toshi Stats Co., Ltd. and me to correct any errors or defects in the codes and the software

We start AI Lab in the company and research “attention mechanism” in deep learning

As I said before. I completed the online course “deeplearning ai“. This is an awesome course I want to recommend to everyone. There are many topics we can learn in the course. One of the most interesting things for me is “attention mechanism” in neural translation.  So I would like to explain it in details. Do not worry as I do not use mathematics in this article.  Let us start.

 

The definition of attention mechanism is “The attention mechanism tells a Neural Machine Translation model where it should pay attention to at any step”. It may be natural when we consider how we translate language from one to another. Yes, human-being pays more attention to specific objects than others when they are more interesting to them. When we are hungry,  we tend to look for the sign of “restaurant” or ” food court”,  do not care the sing of “library”,  right?

We want to apply the same thing for translation by computers. Let me consider again. It is true that when we translate English to our mother tongue, such as Japanese, we look at the whole part of the sentences first, then make sure what words are important to us.  we do not perform translation one on one basis. In another word, we pay more attention to specific words than other words. So we want to introduce the same method in performing neural translation by computers.

 

Originally, attention mechanism was introduced (1) in Sep 2014. Since then there are many attention mechanisms introduced. One of the strongest attention models is “Transformer” by Google brain in  June 2017.  I think you use Google translation every day. It performs very well. But transformer is better than the model used in Google translation. This chart shows deference between  GNMT (Google translation) and Transformer(2).

Fortunately, Google prepares the framework to facilitate AI research.  It is called “Tensor2Tensor (T2T) “. It is open sourced and can be used without any fees. It means that you can do it by yourself! I decide to set up “AI Lab” in my company and introduce this framework to research attention mechanism. There are many pre-trained models including “Transformer”.  Why don’t you join us?

 

I used translations as our example to explain how attention mechanism works. But it can be applied to many other fields such as object detection which is used in face recognition and a self-driving car. It must be excited when we consider what can be achieved by attention mechanism.  I would like to update the progress.  So stay tuned!

 

 

When you need AI consulting,  do not hesitate to contact TOSHISTATS

 

(1) NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE.  By Dzmitry Bahdanau, KyungHyun Cho, Yoshua Bengio in Sep 2014

(2) Attention Is All You Need,  By Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez, Łukasz Kaiser,Illia Polosukhin,  in June 2017

 

 

 

Notice: TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software.

Let us consider “Brain as a service” again now!

monitor-1307227_1280

Two years ago, I wrote my article about Computer Go player “AlphaGo” and talk about “Brain as a service” in future. Because AlphaGo is so strong and it can improve itself by reinforcement learning with self-play.  Now I am more confident that “Brain as a service” will be available in near future. Let us consider why I think so.

 

1. Self-play without human interventions

In Oct 2017, DeepMind released a new version of Computer Go player “AlphaGo Zero“. The previous version of AlphaGo learned from human’s play at the early stage of training. But AlphaGo Zero can improve themselves without human interventions and knowledge. Starting with nothing, it can be stronger than human Go-champion. This is incredible! Of course, the real world is not the game of Go so we should modify self-play to apply our real-life problems. But fundamentally, there are many chances to improve our society by using self-play as it provides super-human solutions if it is correctly implemented. AlphaGo Zero proves it is true.

 

2. Reinforcement learning can be researched anywhere on the earth

Now  I research OpemAI Gym, which is an environment/simulator for reinforcement learning (RL). This is provided by OpenAI which is a nonprofit organization established by Elon Musk, Sam Altman. OpenAI provides us not only research results of theory but also codes of them to implement in our system. It means that as long as we have an access to the internet, we can start our own research of reinforcement learning based on OpenAI Gym. No capital is required as codes of OpenAI Gym are provided for free. Just download and use them as they are open-source-software.  Applications of RL like AlphaGo can be developed anywhere in the world. If you want to try it, go to the OpenAI Gym website and set OpenAI Gym by yourself. You can enjoy cool results of reinforcement learning!

 

3. It will be easier to obtain data from the world

Google said “The real world as your playground: Build real-world games with Google Maps APIs”  last week. It means that any game developers can create real-world-game by using Google Maps. We can access to countless 3D buildings, roads, landmarks, and parks all over the world as digital assets. This is amazing!  But this should not be considered as just a matter of games. This is just one of the example to tell us how we can obtain data from the world because we can create real-world computer-vision simulators with this service.  In addition to that,  I would like to mention blockchain a little. blockchain can be used to connect the world in a transparent manner. I imagine that many data inside companies or organizations can be accessed more easily through blockchain in near future. Therefore we will be able to accelerate AI development with far more data than now at a rapid pace.  This must be excited!

 

 

” These things must be a trigger to change the landscape of our business, societies and lives. Because suddenly computers can be sophisticated enough to work just like our brain.  AlphaGo teaches us that it may happen when a few people think so. Yes, this is why I think that the age of “Brain as a Service” will come in near future.  How do you think of that?”

This is what I said two years ago. Of course, it is impossible to predict when “Brain as a Service” will be available. But I am sure we are going in this direction step by step.  Do you agree that?

 

 

 

Note: Toshifumi Kuga’s opinions and analyses are personal views and are intended to be for informational purposes and general interest only and should not be construed as individual investment advice or solicitation to buy, sell or hold any security or to adopt any investment strategy.  The information in this article is rendered as at publication date and may change without notice and it is not intended as a complete analysis of every material fact regarding any country, region market or investment.

Data from third-party sources may have been used in the preparation of this material and I, Author of the article has not independently verified, validated such data. I and TOSHI STATS.SDN.BHD. accept no liability whatsoever for any loss arising from the use of this information and relies upon the comments, opinions and analyses in the material is at the sole discretion of the user. 

“Monte Carlo tree search ” is the key in AlphaGo Zero!

On October last year, Google DeepMind released “AlphaGo Zero“.  It is stronger than all previous versions of Alpha Go although this new version uses no human knowledge of Go for training. It performs self-play and gets stronger by itself.  I was very surprised to hear the news. Because we need many data to train the model in general.

Today, I would like to consider why AlphaGo Zero works well from the viewpoint of Go-player as I played it for entertainment purpose so many years. I am not a Profesional Go player. But I have the expertise of both Go and Deep learning.  So it is a good opportunity for me to consider it now.

When I play Go,  I make decisions for next move based on the intuition in many cases because I am very confident that “it is right’. But when we are in a more complex situation in Go and are not so sure what the best move is, I should try many paths that I and my opponent can take each turn in my mind (not move on the real board) and want to choose the best move based on trials.  We call it “Yomi” in Japanese.  Unfortunately, I sometimes perform “Yomi” wrongly, then I make a wrong decision to move. Professional Go players perform “Yomi” much more accurately than I do.  This is the key to be strong players in Go.

 

Then I wonder how AlphaGo Zero can perform “Yomi” effectively.  I think this is the key to understand AlphaGo Zero. Let me consider these points

 

1.Monte Carlo tree search (MCTS) performs “Yomi” effectively.

Next move can be decided by the policy/value function. But there might be another better move. So we need to search for it. MCTS is used for this search in AlphaGo Zero. Based on the paper, MCTS can find the better move that original move was chosen by the policy/value function.  DeepMind says MCTS works as “powerful policy improvement operator” and “improved MCTS-based policy” can be obtained. This is great as it means that AlphaGo Zero can perform “Yomi” just like us.

 

2. A game can be continued by Self-play without human knowledge.

I wonder how we can play a whole game of Go without human knowledge. The paper explains it as follows    “Self-play with search—using the improved MCTS-based policy to select each move, then using the game-winner z as a sample of the value—may be viewed as a powerful policy evaluation operator.”  So just playing games with itself,  the winner of the game can be obtained as a sample. These results are used for next learning processes. Therefore ”Yomi” by AlphaGo Zero can be more accurate.

 

 

3. This training algorithm is very efficient to learn from scratch

Computers are very good at performing simulations so many times automatically.  So without human knowledge in advance, AlphaGo Zero can be stronger and stronger when it does “self-play” so many times. Based on the paper, starting with random play, AlphaGo Zero outperformed the previous version of AlphaGo that beat Lee Sedol in March 2016,  just after 72 hours training. This is incredible because it is required only 72 hours to develop the model to beat professional players from scratch without human knowledge.

 

 

Overall, AlphaGo Zero is incredible. If AlphaGo Zero training algorithm can be applied to our businesses,  AI professional-businessman might be created in 72 hours without human knowledge. This must be incredibly sophisticated!

Hope you enjoy the story of how AlphaGo Zero works. This time I overview the mechanism of AlpahGo Zero.  When you are interested in it more details, I recommend watching the video by DeepMind. In my next article, I would like to go a little deeper into MCTS and training of models.  It must be exciting!  See you again soon!

 

David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel & Demis Hassabis
Published in NATURE, VOL 550, 19 OCTOBER 2017

Notice: TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software.

This is incredible! Semantic segmentation by just 700 images from scratch with Mac Air!

dubai-1767540_1280

You may see this kind of pair of images below before.  Images are segmented by color based on the objects on them.  They are called “semantic segmentation”.  It is studied by many AI researchers now because it is critically important for self-driving car and robotics.

segmentaion1

Unfortunately, however, it is not easy for startups like us to perform this task.  Like other computer vision tasks, semantic segmentations needs massive images and computer resources. It is sometimes difficult in tight-budget projects. In case we cannot correct many images,  we are likely to give it up.

 

This situation can be changed by this new algorithm.  This is called “Fully convolutional DenseNets for semantic segmentation  (In short called “Tiramisu” 1)”.    Technically, this is the network which consists of many “Densenet(2)”,  which in July 2017 was awarded the CVPR Best Paper award.  This is a structure of this model written in the research paper (1).

Tiramisu1

I would like to confirm how this model works with a small volume of images. So I obtain urban-scene image set which is called”CamVid Database (3)”.  It has 701 scene images and colour-labeled images.  I choose 468 images for training and 233 images for testing. This is very little data for computer vision tasks as it usually needs more than 10,000-100,000 images to complete training for each task from scratch. In my experiment,  I do not use pre-trained models.  I do not use GPU for computation, either. My weapon is just MacBook Air 13 (Core i5) just like many business persons and students.  But new algorithm works extream well.  Here is the example of results.

T0.84 2017-08-13-1

T0.84 2017-08-13-4

“Prediction” looks similar to “ground-truth” which means the right answer in my experiment. Over all accuracy is around 83% for classification of 33 classes (at the 45th epoch in training).  This is incredible as only little data is available here. Although prediction misses some parts such as poles,  I am confident to gain more accuracy when more data and resources are available. Here is the training result. It took around 27 hours.  (Technically I use “FC-DenseNet56”.  Please read the research paper(1) for details)

Tiramisu0.84_2

Tiramisu0.84_1

Added on 18th August 2017: If you are interested in code with keras, please see this Github.

 

This experiment is inspired by awesome MOOCs called “fast.ai by Jeremy Howard. I strongly recommend watching this course if you are interested in deep learning.  No problem as it is free.  It has less math and is easy to understand for the people who are not interested in Ph.D. of computer science.

I will continue to research this model and others in computer vision. Hope I can provide updates soon.  Thanks for reading!

 

 

1.The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation (Simon Jegou, Michal Drozdzal, David Vazquez, Adriana Romero, Yoshua Bengio),  5 Dec 2016

 

2. Densely Connected Convolutional Networks(Gao Huang, Zhuang Liu, Kilian Q. Weinberger, Laurens van der Maaten),  3 Dec 2016

 

3. Segmentation and Recognition Using Structure from Motion Point Clouds, ECCV 2008
Brostow, Shotton, Fauqueur, Cipolla (bibtex)

 

 

Notice: TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software