More than 10X faster! You may have an access to the super-powered computers, too!

 

I started using deep learning four years ago.  I have countless experience of deep learning before. I thought I knew the most part of deep learning.  But I found I was wrong when I tried new computational engine called “TPU” today. I want to share my experience as it is beneficial for everyone who is interested in artificial intelligence. Let us start.

 

1. TPU is more than 10X faster

Deep learning is one of the most powerful algorithms in artificial intelligence. Google uses it in many products such as Google translation. Problem is that deep learning needs a massive amount of computational power. For example, let us develop a classifier to tell what it is from 0 to 9.  This is a dataset of MNIST, which is “hello world” in deep learning. I want to classify each of them automatically by computers.

MNIAT-TPU

Two years ago, I did it on my Mac Air11. It took around 80 minutes to complete training. MNIST dataset is one of the simplest training data in computer vision. So if I want to develop a more complex system, such as a self-driving car, my Mac Air is useless as it takes far longer for calculation.  Fortunately, I can try TPU, which is a specialized processor for deep leaning. Then I found it is incredibly fast as it completes the training less that one minute!  80 minutes vs 1 minute.  I tried many times but the result is always the same. So I check the speed of calculation.  It says more than 160 TFLOPS.  TPU is faster than super-computers in 2005. TPU is the fastest processor I have ever tried before. This is amazing.

TPU TFLOPS

 

2.  TPU is easy to use!

Although TPU is super fast, it should be easier to use. If you need to rewrite your code when you use TPU,  you may hesitate to use it. If you use “Tensorflow’, open source deep learning framework by Google, there is no problem.  Just small modifications are needed.  If you use other frameworks, you need to wait until TPU supports other frameworks. I am not sure when it happens. In my case, I mainly use tf.keras on Tensorflow so no need to worry about. you can see the codes of my experiment here.

 

 

3. TPU is available on colab for free!

Now we find TPU is fast and easy to use. Then I need inexpensive tools in my business. But you do not worry about. TPU is provided from Google colab, web-based- experiment- environment for free. Although there are some limitations (such as maximum-time is 12 hours), I think it is OK to develop minimum viable models. If you need TPU on formal projects,  paid service is also provided by Google. So we can use it as free-services or paid services, depends on our needs. I recommend you to use TPU on colab as free-services to learn how TPU works.

 

 

Technically this TPU is v2. Google already announced TPU v3, more powerful TPU. So these services might be more powerful in near future.  Today’s experience is just a beginning of the story.  Do you want to try TPU?

 

 

 

 

1). MNIST with tf.Keras and TPUs on colab

https://github.com/TOSHISTATS/MNIST-with-tf.Keras-and-TPUs-on-colab/blob/master/MNIST_TPU_20181009.ipynb

 

 

 

Notice: Toshi Stats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. Toshi Stats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on Toshi Stats Co., Ltd. and me to correct any errors or defects in the codes and the software

Advertisements

How can we develop machine intelligence with a little data in text analysis?

ginza-725794_1280

Whenever you want to create machine intelligence model, the first question to ask is “Where is my data?”.   It is usually difficult to find good data to create models because it is time-consuming and may require costs to do that. Unless you work in good companies such as Google or Facebook, it might be a headache for you. But fortunately, there are good ways to solve this problem. It is “Transfer learning”.  Let us find out!

1. Transfer learning

When we need to train machine intelligence models, we usually use “supervised learning”. It means that we need “teachers” who can tell which is a right answer. For example, when we need to classify “which is a cat or a dog?”, we need to tell “this is a cat and that is a dog” to computers.  It is the powerful method of learning to achieve higher accuracy.  So most of the current AI applications are developed by “Supervised learning”.  But a problem arises here. There are a little data for supervised learning.  While we have many images on our smartphones, each image has no information about “what it is”. So we need to add this information to each image manually.  It takes time to complete as a massive amount of images are needed in training. I explained it a little in computer vision in my blog before. We can say the same thing in text analysis or natural language processing. We have many tweets on the internet. But no one tells you which has positive and negative sentiment. Therefore we need to put “positive or negative’ to each tweet by ourselves. No one wants to do that. Then “Transfer learning” comes here.  You do not need training from scratch. Just transfer someone’s results to your models as someone did the similar training before you do!  The beauty of “Transfer Learning” is that we need just a little data in our training. No need for a massive amount of data anymore. It makes preparing data far easier for us!

Cat and dogs

2. “Transformer”

This model (1) is one of the most sophisticated models for machine translation in 2017. It is created by Google brain. As you know, it achieved the state of art of accuracy in Neural Machine translation at the time it was public.  The key architecture of Transformer is “Self-attention”.  It can tell us where the model should pay attention to among all words in a sentence, regardless of their respective position, by using “Query, Key, and Value” mechanism. The Research paper “Attention Is All You Need” is available here.  “Self-attention mechanism” takes times to explain in details. If you want to know more, this blog is strongly recommended. I just want to say “Self-attention mechanism” might be a game changer to develop machine intelligence in the future.

3.  Transfer learning based on “Transformer”

It has been more than one year since “Transformer” was public, There are several variations based on”Transformer”.  I found the good model for “transfer learning” I mentioned earlier in this article.  This is “Universal Sentence Encoder“(2).  In this website, we can find a good explanation of what it is.

“The Universal Sentence Encoder encodes text into high dimensional vectors that can be used for text classification, semantic similarity, clustering and other natural language tasks.”

The model takes sentences, phrases or short paragraphs and outputs vectors to be fed into the next process. “The universal-sentence-encoder-large” is trained with “Transformer” (-light is trained with a different model). The beauty is that Universal Sentence Encoder is already trained by Google and these results are available to perform “transfer learning” by ourselves.  This is great! This chart tells you how it works.

Sentense encoderThe team in Google claimed that “With transfer learning via sentence embeddings, we observe surprisingly good performance with minimal amounts of supervised training data for a transfer task.”.  So let me confirm how it works with a little data. I performed a small experiment based on this awesome article.  I modify the classification model and change the number of training samples. With only 100 training data,  I could achieve 79.2% accuracy.  With 300 data, 95.8% accuracy. This is great!  I believe the results come from the power of transfer learning with Universal Sentence Encoder.

result1red

In this article, I introduce transfer learning and perform a small experiment with the latest model “Universal Sentence Encoder”.  It looks very promising so far. I would like to continue transfer learning experiments and update the results here.  Stay tuned!

 

When you need AI consulting,  could you go to  TOSHI STATS website?

 

 

 

 

 

  1. Attention Is All You Need,  Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, Google, 12 June 2017.
  2. Universal Sentence Encoder,  Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil,  Google, 29 March 2018

 

Notice: Toshi Stats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. Toshi Stats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on Toshi Stats Co., Ltd. and me to correct any errors or defects in the codes and the software

 

This is my first machine intelligence model. It looks good so far!

DenceNet121_1-1

There are many images on the internet. A lot of people upload selfie-images to Instagram every day.   There are also many text data on the internet because Not only professionals writers but many people express their opinions on blogs and tweets. No one can see every image and text on the internet as it is a huge volume. In addition, images and texts sometimes have a relationship to each other. For example, people upload images and put explanations of them. Therefore I am always wondering how we can analyze both images and text at once.  There are several methods to do that. I choose image-captioning model out of these methods as it is easy to understand how it works.

 

1. What is an image-captioning model?

Before I start the project on image captioning, I performed computer vision projects and Natural language projects independently.  Computer vision means to classify cats and dogs or detect a specific type of cars and distinguish each of them from other types of cars. I also develop natural language models such as sentiment analysis of movie reviews. Image-captioning model is a kind of combined model of “computer vision and natural language model”.  Let us see the chart below.

image-captioning

A computer takes a picture as input. Then the encoder extracts features from the picture that is taken.  “feature” means the characteristics of an object”. Based on these features, the decoder generates sentences which describes what the picture tells us. This is how our “image-captioning” model works.

 

2. How can we find the template of “image captioning model” and modify it?

I found a good framework to develop our image-captioning models. It is “colab” provided by Google. Although it is free to use, there are many templates to start with the projects and GPU is available in it for research/interaction usages. It can provide us with a computational power to be required for developing image-captioning models. I found the original template of image-captioning in colab. The template is awesome as “the attention mechanism” is implemented. It uses inceptionV3 as an encoder and GRU as a decoder. But I would like to try other methods. I modify this template a little to change from inceptionV3 to densenet121 and from GRU to LSTM.  Let us see how it works on my experiment!

 

3. The results after 3-hour-training

Here is one of the outputs from my experiment of our image-captioning model. It says “a couple of two sugar covered in chocolate frosting are laid on top of a wooden table”. Although it is not perfect, it works very well.  When we input more data and computation time, it should be more accurate.

DenceNet121_1-2

 

This is the first step toward machine intelligence.  Of course, it is a long way to go.  But the combined images and texts, I believe we can develop many cool applications in the future. In addition, I found that “the attention mechanism” is very powerful to extract relevant information. I would like to focus on this mechanism to improve our algorithms going forward. Stay tuned!

 

(1) Olah&Carter, “Attention and Augmented Recurrent Neural Networks“, Distill, 2016.

 

When you need AI consulting,  could you see TOSHI STATS website?

Notice: Toshi Stats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. Toshi Stats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on Toshi Stats Co., Ltd. and me to correct any errors or defects in the codes and the software

 

 

 

 

We start AI Lab in the company and research “attention mechanism” in deep learning

As I said before. I completed the online course “deeplearning ai“. This is an awesome course I want to recommend to everyone. There are many topics we can learn in the course. One of the most interesting things for me is “attention mechanism” in neural translation.  So I would like to explain it in details. Do not worry as I do not use mathematics in this article.  Let us start.

 

The definition of attention mechanism is “The attention mechanism tells a Neural Machine Translation model where it should pay attention to at any step”. It may be natural when we consider how we translate language from one to another. Yes, human-being pays more attention to specific objects than others when they are more interesting to them. When we are hungry,  we tend to look for the sign of “restaurant” or ” food court”,  do not care the sing of “library”,  right?

We want to apply the same thing for translation by computers. Let me consider again. It is true that when we translate English to our mother tongue, such as Japanese, we look at the whole part of the sentences first, then make sure what words are important to us.  we do not perform translation one on one basis. In another word, we pay more attention to specific words than other words. So we want to introduce the same method in performing neural translation by computers.

 

Originally, attention mechanism was introduced (1) in Sep 2014. Since then there are many attention mechanisms introduced. One of the strongest attention models is “Transformer” by Google brain in  June 2017.  I think you use Google translation every day. It performs very well. But transformer is better than the model used in Google translation. This chart shows deference between  GNMT (Google translation) and Transformer(2).

Fortunately, Google prepares the framework to facilitate AI research.  It is called “Tensor2Tensor (T2T) “. It is open sourced and can be used without any fees. It means that you can do it by yourself! I decide to set up “AI Lab” in my company and introduce this framework to research attention mechanism. There are many pre-trained models including “Transformer”.  Why don’t you join us?

 

I used translations as our example to explain how attention mechanism works. But it can be applied to many other fields such as object detection which is used in face recognition and a self-driving car. It must be excited when we consider what can be achieved by attention mechanism.  I would like to update the progress.  So stay tuned!

 

 

When you need AI consulting,  do not hesitate to contact TOSHISTATS

 

(1) NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE.  By Dzmitry Bahdanau, KyungHyun Cho, Yoshua Bengio in Sep 2014

(2) Attention Is All You Need,  By Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez, Łukasz Kaiser,Illia Polosukhin,  in June 2017

 

 

 

Notice: TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software.

Let us consider “Brain as a service” again now!

monitor-1307227_1280

Two years ago, I wrote my article about Computer Go player “AlphaGo” and talk about “Brain as a service” in future. Because AlphaGo is so strong and it can improve itself by reinforcement learning with self-play.  Now I am more confident that “Brain as a service” will be available in near future. Let us consider why I think so.

 

1. Self-play without human interventions

In Oct 2017, DeepMind released a new version of Computer Go player “AlphaGo Zero“. The previous version of AlphaGo learned from human’s play at the early stage of training. But AlphaGo Zero can improve themselves without human interventions and knowledge. Starting with nothing, it can be stronger than human Go-champion. This is incredible! Of course, the real world is not the game of Go so we should modify self-play to apply our real-life problems. But fundamentally, there are many chances to improve our society by using self-play as it provides super-human solutions if it is correctly implemented. AlphaGo Zero proves it is true.

 

2. Reinforcement learning can be researched anywhere on the earth

Now  I research OpemAI Gym, which is an environment/simulator for reinforcement learning (RL). This is provided by OpenAI which is a nonprofit organization established by Elon Musk, Sam Altman. OpenAI provides us not only research results of theory but also codes of them to implement in our system. It means that as long as we have an access to the internet, we can start our own research of reinforcement learning based on OpenAI Gym. No capital is required as codes of OpenAI Gym are provided for free. Just download and use them as they are open-source-software.  Applications of RL like AlphaGo can be developed anywhere in the world. If you want to try it, go to the OpenAI Gym website and set OpenAI Gym by yourself. You can enjoy cool results of reinforcement learning!

 

3. It will be easier to obtain data from the world

Google said “The real world as your playground: Build real-world games with Google Maps APIs”  last week. It means that any game developers can create real-world-game by using Google Maps. We can access to countless 3D buildings, roads, landmarks, and parks all over the world as digital assets. This is amazing!  But this should not be considered as just a matter of games. This is just one of the example to tell us how we can obtain data from the world because we can create real-world computer-vision simulators with this service.  In addition to that,  I would like to mention blockchain a little. blockchain can be used to connect the world in a transparent manner. I imagine that many data inside companies or organizations can be accessed more easily through blockchain in near future. Therefore we will be able to accelerate AI development with far more data than now at a rapid pace.  This must be excited!

 

 

” These things must be a trigger to change the landscape of our business, societies and lives. Because suddenly computers can be sophisticated enough to work just like our brain.  AlphaGo teaches us that it may happen when a few people think so. Yes, this is why I think that the age of “Brain as a Service” will come in near future.  How do you think of that?”

This is what I said two years ago. Of course, it is impossible to predict when “Brain as a Service” will be available. But I am sure we are going in this direction step by step.  Do you agree that?

 

 

 

Note: Toshifumi Kuga’s opinions and analyses are personal views and are intended to be for informational purposes and general interest only and should not be construed as individual investment advice or solicitation to buy, sell or hold any security or to adopt any investment strategy.  The information in this article is rendered as at publication date and may change without notice and it is not intended as a complete analysis of every material fact regarding any country, region market or investment.

Data from third-party sources may have been used in the preparation of this material and I, Author of the article has not independently verified, validated such data. I and TOSHI STATS.SDN.BHD. accept no liability whatsoever for any loss arising from the use of this information and relies upon the comments, opinions and analyses in the material is at the sole discretion of the user. 

“Monte Carlo tree search ” is the key in AlphaGo Zero!

On October last year, Google DeepMind released “AlphaGo Zero“.  It is stronger than all previous versions of Alpha Go although this new version uses no human knowledge of Go for training. It performs self-play and gets stronger by itself.  I was very surprised to hear the news. Because we need many data to train the model in general.

Today, I would like to consider why AlphaGo Zero works well from the viewpoint of Go-player as I played it for entertainment purpose so many years. I am not a Profesional Go player. But I have the expertise of both Go and Deep learning.  So it is a good opportunity for me to consider it now.

When I play Go,  I make decisions for next move based on the intuition in many cases because I am very confident that “it is right’. But when we are in a more complex situation in Go and are not so sure what the best move is, I should try many paths that I and my opponent can take each turn in my mind (not move on the real board) and want to choose the best move based on trials.  We call it “Yomi” in Japanese.  Unfortunately, I sometimes perform “Yomi” wrongly, then I make a wrong decision to move. Professional Go players perform “Yomi” much more accurately than I do.  This is the key to be strong players in Go.

 

Then I wonder how AlphaGo Zero can perform “Yomi” effectively.  I think this is the key to understand AlphaGo Zero. Let me consider these points

 

1.Monte Carlo tree search (MCTS) performs “Yomi” effectively.

Next move can be decided by the policy/value function. But there might be another better move. So we need to search for it. MCTS is used for this search in AlphaGo Zero. Based on the paper, MCTS can find the better move that original move was chosen by the policy/value function.  DeepMind says MCTS works as “powerful policy improvement operator” and “improved MCTS-based policy” can be obtained. This is great as it means that AlphaGo Zero can perform “Yomi” just like us.

 

2. A game can be continued by Self-play without human knowledge.

I wonder how we can play a whole game of Go without human knowledge. The paper explains it as follows    “Self-play with search—using the improved MCTS-based policy to select each move, then using the game-winner z as a sample of the value—may be viewed as a powerful policy evaluation operator.”  So just playing games with itself,  the winner of the game can be obtained as a sample. These results are used for next learning processes. Therefore ”Yomi” by AlphaGo Zero can be more accurate.

 

 

3. This training algorithm is very efficient to learn from scratch

Computers are very good at performing simulations so many times automatically.  So without human knowledge in advance, AlphaGo Zero can be stronger and stronger when it does “self-play” so many times. Based on the paper, starting with random play, AlphaGo Zero outperformed the previous version of AlphaGo that beat Lee Sedol in March 2016,  just after 72 hours training. This is incredible because it is required only 72 hours to develop the model to beat professional players from scratch without human knowledge.

 

 

Overall, AlphaGo Zero is incredible. If AlphaGo Zero training algorithm can be applied to our businesses,  AI professional-businessman might be created in 72 hours without human knowledge. This must be incredibly sophisticated!

Hope you enjoy the story of how AlphaGo Zero works. This time I overview the mechanism of AlpahGo Zero.  When you are interested in it more details, I recommend watching the video by DeepMind. In my next article, I would like to go a little deeper into MCTS and training of models.  It must be exciting!  See you again soon!

 

David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel & Demis Hassabis
Published in NATURE, VOL 550, 19 OCTOBER 2017

Notice: TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software.

This could be a new way to train deep reinforcement learning in 2018!

flowers-19830_1280

As the end of this year is coming soon, I would like to consider what comes next in artificial intelligence next year.   So this week I reviewed several research papers and found something interesting to me. This is about “Genetic Algorithm (GA)”.  The paper (1) explains GA can be applied to deep reinforcement learning to obtain optimizations over parameters. This must be exciting for many researchers and programmers of deep learning.

 

According to Wikipedia, a genetic algorithm (GA) is a metaheuristic inspired by the process of natural selection. As I had the project using GA in Tokyo more than 10 years ago,  I would like to re-perform GA in the context of deep learning in 2018. To prepare for that, I would like to explain major components of GA below

  • Gene: They are instruments that we can modify to optimize.  It has similar functions to human’s gene.  In order to adjust environments around them, they can be modified through GA operations as I explain below.
  • Generation: This is the corrections of an individual gene at the certain period. As time passes, new generations will be created recurrently.
  • Selection: Based on fitness values, better genes are selected to make next generations. There are many ways to do that. The best gene may be retained.
  • Crossover: Parts of components of genes are exchanged between different genes.  It can accelerate diversifications among genes.
  • Mutation: Part of components are changed into another one, Mutation and crossover are derived from the processes of evolutions in nature.

 

This is an awesome image to explain GA with ease (2). Based on fitness values, genes with higher scores are selected.  When reproductions are performed,  some of the selected genes can remain as the same before (Elite Strategy). Crossover and mutation are performed against the rest of genes to create next generations.  You can see the fitness values in generation t+1 are bigger than generation t. This is a basic framework of GA.  There are many variations of GA in terms of the way to create next generations.

Image 2017-12-25 17-36-42

Genetic algorithm

I put the simple python code of GA for portfolio management in Github. If you are interested in GA more details. Please look at it here.

 

Although GA has a long history,  its application to deep learning is relatively new.  In TOSHI STATS, which is AI start-up,  I continue to research how GA can be applied to deep learning so that optimizations can be obtained effectively.  Hope I can update you soon in 2018.  Happy new year to everyone!

 

 

 

1.Deep Neuroevolution: Genetic Algorithms are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning, Felipe Petroski Such Vashisht Madhavan Edoardo Conti Joel Lehman Kenneth O. Stanley Jeff Clune, Uber AI Labs, 18 December 2017

2. IBA laboratory, a research laboratory of Genetic and Evolutionary Computations (GEC) of the Graduate School of Engineering, The University of Tokyo, Japan.

 

 

Notice: TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software.

 

This is incredible! Semantic segmentation by just 700 images from scratch with Mac Air!

dubai-1767540_1280

You may see this kind of pair of images below before.  Images are segmented by color based on the objects on them.  They are called “semantic segmentation”.  It is studied by many AI researchers now because it is critically important for self-driving car and robotics.

segmentaion1

Unfortunately, however, it is not easy for startups like us to perform this task.  Like other computer vision tasks, semantic segmentations needs massive images and computer resources. It is sometimes difficult in tight-budget projects. In case we cannot correct many images,  we are likely to give it up.

 

This situation can be changed by this new algorithm.  This is called “Fully convolutional DenseNets for semantic segmentation  (In short called “Tiramisu” 1)”.    Technically, this is the network which consists of many “Densenet(2)”,  which in July 2017 was awarded the CVPR Best Paper award.  This is a structure of this model written in the research paper (1).

Tiramisu1

I would like to confirm how this model works with a small volume of images. So I obtain urban-scene image set which is called”CamVid Database (3)”.  It has 701 scene images and colour-labeled images.  I choose 468 images for training and 233 images for testing. This is very little data for computer vision tasks as it usually needs more than 10,000-100,000 images to complete training for each task from scratch. In my experiment,  I do not use pre-trained models.  I do not use GPU for computation, either. My weapon is just MacBook Air 13 (Core i5) just like many business persons and students.  But new algorithm works extream well.  Here is the example of results.

T0.84 2017-08-13-1

T0.84 2017-08-13-4

“Prediction” looks similar to “ground-truth” which means the right answer in my experiment. Over all accuracy is around 83% for classification of 33 classes (at the 45th epoch in training).  This is incredible as only little data is available here. Although prediction misses some parts such as poles,  I am confident to gain more accuracy when more data and resources are available. Here is the training result. It took around 27 hours.  (Technically I use “FC-DenseNet56”.  Please read the research paper(1) for details)

Tiramisu0.84_2

Tiramisu0.84_1

Added on 18th August 2017: If you are interested in code with keras, please see this Github.

 

This experiment is inspired by awesome MOOCs called “fast.ai by Jeremy Howard. I strongly recommend watching this course if you are interested in deep learning.  No problem as it is free.  It has less math and is easy to understand for the people who are not interested in Ph.D. of computer science.

I will continue to research this model and others in computer vision. Hope I can provide updates soon.  Thanks for reading!

 

 

1.The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation (Simon Jegou, Michal Drozdzal, David Vazquez, Adriana Romero, Yoshua Bengio),  5 Dec 2016

 

2. Densely Connected Convolutional Networks(Gao Huang, Zhuang Liu, Kilian Q. Weinberger, Laurens van der Maaten),  3 Dec 2016

 

3. Segmentation and Recognition Using Structure from Motion Point Clouds, ECCV 2008
Brostow, Shotton, Fauqueur, Cipolla (bibtex)

 

 

Notice: TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software

 

Let us develop car classification model by deep learning with TensorFlow&Keras

taxi-1209542_640
For nearly one year, I have been using TensorFlow and considering what I can do with it. Today I am glad to announce that I developed my computer vision model trained by real-world images. This is classification model for automobiles in which 4 kinds of cars can be classified. It is trained by little images on a normal laptop like Mac air. So you can re-perform it without preparing extra hardware.   This technology is called “deep learning”. Let us start this project and go into deeper now.

 

1. What should we classify by using images?

This is the first thing we should consider when we develop the computer vision model. It depends on the purpose of your businesses. When you are in health care industry,  it may be signs of diseases in human body.  When you are in a manufacture, it may be images of malfunctions parts in plants. When you are in the agriculture industry, Conditions of farm land should be classified if it is not good. In this project, I would like to use my computer vision model for urban-transportations in near future.  I live in Kuala Lumpur, Malaysia.  It suffers from huge traffic jams every day.  The other cities in Asean have the same problem. So we need to identify, predict and optimize car-traffics in an urban area. As the fist step, I would like to classify four classes of cars in images by computers automatically.

 

 

2. How can we obtain images for training?

It is always the biggest problem to develop computer vision model by deep learning.  To make our models accurate, a massive amount of images should be prepared. It is usually difficult or impossible unless you are in the big companies or laboratories.  But do not worry about that.  We have a good solution for the problem.  It is called “pre-trained model”. This is the model which is already trained by a huge amount of images so all we have to do is just adjusting our specific purpose or usage in the business. “Pre-trained model” is available as open source software. We use ResNet50 which is one of the best pre-trained models in computer vision. With this model, we do not need to prepare a huge volume of images. I prepared 400 images for training and 80 images for validation ( 100 and 20 images per class respectively).  Then we can start developing our computer vision model!

 

3.  How can we keep models accurate to classify the images

If the model provides wrong classification results frequently, it must be useless. I would like to keep accuracy ratio over 90% so that we can rely on the results from our model.  In order to achieve accuracy over 90%,  more training is usually needed.  In this training, there are 20 epochs, which takes around 120 minutes to complete on my Mac air13. You can see the progress of the training here.  This is done TensorFlow and Keras as they are our main libraries for deep learning.  At 19th epoch, highest accuracy (91.25%) are achieved ( in the red box). So The model must be reasonably accurate!

Res 0.91

 

Based on this project,  our model, which is trained with little images,  can keep accuracy over 90%.  Although whether higher accuracy can be achieved depends on images for training,  90% accuracy is good to start with more images to achieve 99% accuracy in future. When you are interested in the classification of something, you can start developing your own model as only 100 images per class are needed for training. You can correct them by yourselves and run your model on your computer.  If you need the code I use,  you can see it here. Do you like it? Let us start now!

 

Notice: TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software

Can your computers see many objects better than you in 2017 ?

notebook-1757220_640

Happy new year for everyone.  I am very excited that new year comes now. Because this year, artificial intelligence (AI) will be much closer and closer to us in our daily lives. Smartphones can answer your questions with accuracy. Self-driving car can run without human drivers. Many AI game players can compete human players, and so on. It is incredible, isn’t it!

However, in most cases,  these programs of many products are developed by giant IT companies, such as Google and Microsoft. They have almost unlimited data and computer resources so it is possible to make better programs. How about us?  we have small data and limited computer resources unless we have enough budget to use cloud services. Is it  difficult to make good programs in our laptop computers by ourselves?  I do not think so. I would like to try it by myself first.

I would like to make program to classify cats and dogs in images. To do that, I found a good tutorial (1). I use the code of this tutorial and perform my experiment. Let us start now. How can we do that?  It is amazing.

cats-and-dogs

For building the AI model to classify cats and dogs, we need many images of cats and dogs. Once we have many data, we should train the model so that the model can classify cats and dogs correctly.  But we have two problems to do that.

1.  We need massive amount of images data of  cats and dogs

2. We need high-performance computer resources like GPU

To train the models of artificial intelligence,  it is sometimes said ” With massive amount of data sets,  it takes several days or one week to complete training the models”. In many cases, we can not do that.  So what should we do?

Do not worry about that. We do not need to create the model from scratch.  Many big IT companies or famous universities have already trained the AI models and make them public for everyone to use. It is sometimes called “pre-trained models”. So all we have to do is just input the results from pre-trained model and make adjustments for our own purposes. In this experiment,  our purpose is to identify cats and dogs by computers.

I follow the code by François Chollet, creator of keras. I run it on my MacAir11. It is normal Mac and no additional resources are put in it. I prepared only 1000 images for cats and dogs respectively. It takes 70 minutes to train the model.  The result is around 87% accuracy rate. It is great as it is done on normal laptop PC, rather than servers with GPU.

 

 

Based on the experiment, I found that Artificial intelligence models can be developed on my Mac with little data to solve our own problem. I would like to perform more tuning to obtain more accuracy rate . There are several methods to make it better.

Of course, this is the beginning of story. Not only “cats and dogs classifications’ but also many other problems can be solved in the way I experiment here. When pre-trained models are available, they can provide us great potential abilities to solve our own problems. Could you agree with that?  Let us try many things with “pre-trained model” this year!

 

 

1.Building powerful image classification models using very little data

https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html

Notice: TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software