This is incredible! Semantic segmentation by just 700 images from scratch with Mac Air!

dubai-1767540_1280

You may see this kind of pair of images below before.  Images are segmented by color based on the objects on them.  They are called “semantic segmentation”.  It is studied by many AI researchers now because it is critically important for self-driving car and robotics.

segmentaion1

Unfortunately, however, it is not easy for startups like us to perform this task.  Like other computer vision tasks, semantic segmentations needs massive images and computer resources. It is sometimes difficult in tight-budget projects. In case we cannot correct many images,  we are likely to give it up.

 

This situation can be changed by this new algorithm.  This is called “Fully convolutional DenseNets for semantic segmentation  (In short called “Tiramisu” 1)”.    Technically, this is the network which consists of many “Densenet(2)”,  which in July 2017 was awarded the CVPR Best Paper award.  This is a structure of this model written in the research paper (1).

Tiramisu1

I would like to confirm how this model works with a small volume of images. So I obtain urban-scene image set which is called”CamVid Database (3)”.  It has 701 scene images and colour-labeled images.  I choose 468 images for training and 233 images for testing. This is very little data for computer vision tasks as it usually needs more than 10,000-100,000 images to complete training for each task from scratch. In my experiment,  I do not use pre-trained models.  I do not use GPU for computation, either. My weapon is just MacBook Air 13 (Core i5) just like many business persons and students.  But new algorithm works extream well.  Here is the example of results.

T0.84 2017-08-13-1

T0.84 2017-08-13-4

“Prediction” looks similar to “ground-truth” which means the right answer in my experiment. Over all accuracy is around 83% for classification of 33 classes (at the 45th epoch in training).  This is incredible as only little data is available here. Although prediction misses some parts such as poles,  I am confident to gain more accuracy when more data and resources are available. Here is the training result. It took around 27 hours.  (Technically I use “FC-DenseNet56”.  Please read the research paper(1) for details)

Tiramisu0.84_2

Tiramisu0.84_1

Added on 18th August 2017: If you are interested in code with keras, please see this Github.

 

This experiment is inspired by awesome MOOCs called “fast.ai by Jeremy Howard. I strongly recommend watching this course if you are interested in deep learning.  No problem as it is free.  It has less math and is easy to understand for the people who are not interested in Ph.D. of computer science.

I will continue to research this model and others in computer vision. Hope I can provide updates soon.  Thanks for reading!

 

 

1.The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation (Simon Jegou, Michal Drozdzal, David Vazquez, Adriana Romero, Yoshua Bengio),  5 Dec 2016

 

2. Densely Connected Convolutional Networks(Gao Huang, Zhuang Liu, Kilian Q. Weinberger, Laurens van der Maaten),  3 Dec 2016

 

3. Segmentation and Recognition Using Structure from Motion Point Clouds, ECCV 2008
Brostow, Shotton, Fauqueur, Cipolla (bibtex)

 

 

Notice: TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software

 

Advertisements

Let us develop car classification model by deep learning with TensorFlow&Keras

taxi-1209542_640
For nearly one year, I have been using TensorFlow and considering what I can do with it. Today I am glad to announce that I developed my computer vision model trained by real-world images. This is classification model for automobiles in which 4 kinds of cars can be classified. It is trained by little images on a normal laptop like Mac air. So you can re-perform it without preparing extra hardware.   This technology is called “deep learning”. Let us start this project and go into deeper now.

 

1. What should we classify by using images?

This is the first thing we should consider when we develop the computer vision model. It depends on the purpose of your businesses. When you are in health care industry,  it may be signs of diseases in human body.  When you are in a manufacture, it may be images of malfunctions parts in plants. When you are in the agriculture industry, Conditions of farm land should be classified if it is not good. In this project, I would like to use my computer vision model for urban-transportations in near future.  I live in Kuala Lumpur, Malaysia.  It suffers from huge traffic jams every day.  The other cities in Asean have the same problem. So we need to identify, predict and optimize car-traffics in an urban area. As the fist step, I would like to classify four classes of cars in images by computers automatically.

 

 

2. How can we obtain images for training?

It is always the biggest problem to develop computer vision model by deep learning.  To make our models accurate, a massive amount of images should be prepared. It is usually difficult or impossible unless you are in the big companies or laboratories.  But do not worry about that.  We have a good solution for the problem.  It is called “pre-trained model”. This is the model which is already trained by a huge amount of images so all we have to do is just adjusting our specific purpose or usage in the business. “Pre-trained model” is available as open source software. We use ResNet50 which is one of the best pre-trained models in computer vision. With this model, we do not need to prepare a huge volume of images. I prepared 400 images for training and 80 images for validation ( 100 and 20 images per class respectively).  Then we can start developing our computer vision model!

 

3.  How can we keep models accurate to classify the images

If the model provides wrong classification results frequently, it must be useless. I would like to keep accuracy ratio over 90% so that we can rely on the results from our model.  In order to achieve accuracy over 90%,  more training is usually needed.  In this training, there are 20 epochs, which takes around 120 minutes to complete on my Mac air13. You can see the progress of the training here.  This is done TensorFlow and Keras as they are our main libraries for deep learning.  At 19th epoch, highest accuracy (91.25%) are achieved ( in the red box). So The model must be reasonably accurate!

Res 0.91

 

Based on this project,  our model, which is trained with little images,  can keep accuracy over 90%.  Although whether higher accuracy can be achieved depends on images for training,  90% accuracy is good to start with more images to achieve 99% accuracy in future. When you are interested in the classification of something, you can start developing your own model as only 100 images per class are needed for training. You can correct them by yourselves and run your model on your computer.  If you need the code I use,  you can see it here. Do you like it? Let us start now!

 

Notice: TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software

Can your computers see many objects better than you in 2017 ?

notebook-1757220_640

Happy new year for everyone.  I am very excited that new year comes now. Because this year, artificial intelligence (AI) will be much closer and closer to us in our daily lives. Smartphones can answer your questions with accuracy. Self-driving car can run without human drivers. Many AI game players can compete human players, and so on. It is incredible, isn’t it!

However, in most cases,  these programs of many products are developed by giant IT companies, such as Google and Microsoft. They have almost unlimited data and computer resources so it is possible to make better programs. How about us?  we have small data and limited computer resources unless we have enough budget to use cloud services. Is it  difficult to make good programs in our laptop computers by ourselves?  I do not think so. I would like to try it by myself first.

I would like to make program to classify cats and dogs in images. To do that, I found a good tutorial (1). I use the code of this tutorial and perform my experiment. Let us start now. How can we do that?  It is amazing.

cats-and-dogs

For building the AI model to classify cats and dogs, we need many images of cats and dogs. Once we have many data, we should train the model so that the model can classify cats and dogs correctly.  But we have two problems to do that.

1.  We need massive amount of images data of  cats and dogs

2. We need high-performance computer resources like GPU

To train the models of artificial intelligence,  it is sometimes said ” With massive amount of data sets,  it takes several days or one week to complete training the models”. In many cases, we can not do that.  So what should we do?

Do not worry about that. We do not need to create the model from scratch.  Many big IT companies or famous universities have already trained the AI models and make them public for everyone to use. It is sometimes called “pre-trained models”. So all we have to do is just input the results from pre-trained model and make adjustments for our own purposes. In this experiment,  our purpose is to identify cats and dogs by computers.

I follow the code by François Chollet, creator of keras. I run it on my MacAir11. It is normal Mac and no additional resources are put in it. I prepared only 1000 images for cats and dogs respectively. It takes 70 minutes to train the model.  The result is around 87% accuracy rate. It is great as it is done on normal laptop PC, rather than servers with GPU.

 

 

Based on the experiment, I found that Artificial intelligence models can be developed on my Mac with little data to solve our own problem. I would like to perform more tuning to obtain more accuracy rate . There are several methods to make it better.

Of course, this is the beginning of story. Not only “cats and dogs classifications’ but also many other problems can be solved in the way I experiment here. When pre-trained models are available, they can provide us great potential abilities to solve our own problems. Could you agree with that?  Let us try many things with “pre-trained model” this year!

 

 

1.Building powerful image classification models using very little data

https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html

Notice: TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software

How can computers see the objects? It is done by probability

sheltie-1023012_640

Do you know how computers can see the world?  It is very important as self-driving cars will be available in near future.  If you do not know it,  you can not be brave enough to ride on them. So let me explain it for a while.

 

1.Image can be expressed as a sequence of number

I believe that you have heard the word “RGB“. R stands for red,  G stands for green, B stands for blue. Every color is created by mix of three colors of R,G and B.  Each R, G and B has a value of number which is somewhere from 0 to 255.  Therefore each point in the images, which is called “pixel” has a vector such as [255, 35, 57].  So each image can be expressed as a sequence of numbers. The sequence of numbers are fed into computers to understand what it is.

 

2. Convnet and classifier learn and classify images

Once images are fed into computers,  convnet is used to analyze these data. Convent is one of the famous algorithms of deep learning and frequently used for computer vision. Basic process of image classification is explained as follows.

conputer-vision-001

  • The images is fed into computers as a sequence of numbers
  • Convolutional neural network identifies features to represent the object in the image
  • Features are obtained as a vector
  • Classifier provides the probability of each candidate of the objective
  • The object in the image is classified as an object with the highest probability

In this case, probability of Dog is the highest. So computers can classify “it is a dog”.  Of course, each image has a different set of probabilities so that computers can understand what it is.

 

3.  This is a basic process of computer vision. In order to achieve higher accuracy, many researchers have been developing better algorithms and processing methods intensively. I believe that the most advanced computer vision algorithm is about to surpass the sight of human being. Could you look at the famous experiment by a researcher with his sight? (1)  . His error rate is 5.1%.

Now I am very interested in computer vision and focus on this field in my research. Hope I can update my new finding in near future.

 

1.What I learned from competing against a ConvNet on ImageNet, Andrej Karpathy, a Research Scientist at OpenAI, Sep 2 2014

http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/

 

 

Notice: TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software