This is incredible! Semantic segmentation by just 700 images from scratch with Mac Air!

dubai-1767540_1280

You may see this kind of pair of images below before.  Images are segmented by color based on the objects on them.  They are called “semantic segmentation”.  It is studied by many AI researchers now because it is critically important for self-driving car and robotics.

segmentaion1

Unfortunately, however, it is not easy for startups like us to perform this task.  Like other computer vision tasks, semantic segmentations needs massive images and computer resources. It is sometimes difficult in tight-budget projects. In case we cannot correct many images,  we are likely to give it up.

 

This situation can be changed by this new algorithm.  This is called “Fully convolutional DenseNets for semantic segmentation  (In short called “Tiramisu” 1)”.    Technically, this is the network which consists of many “Densenet(2)”,  which in July 2017 was awarded the CVPR Best Paper award.  This is a structure of this model written in the research paper (1).

Tiramisu1

I would like to confirm how this model works with a small volume of images. So I obtain urban-scene image set which is called”CamVid Database (3)”.  It has 701 scene images and colour-labeled images.  I choose 468 images for training and 233 images for testing. This is very little data for computer vision tasks as it usually needs more than 10,000-100,000 images to complete training for each task from scratch. In my experiment,  I do not use pre-trained models.  I do not use GPU for computation, either. My weapon is just MacBook Air 13 (Core i5) just like many business persons and students.  But new algorithm works extream well.  Here is the example of results.

T0.84 2017-08-13-1

T0.84 2017-08-13-4

“Prediction” looks similar to “ground-truth” which means the right answer in my experiment. Over all accuracy is around 83% for classification of 33 classes (at the 45th epoch in training).  This is incredible as only little data is available here. Although prediction misses some parts such as poles,  I am confident to gain more accuracy when more data and resources are available. Here is the training result. It took around 27 hours.  (Technically I use “FC-DenseNet56”.  Please read the research paper(1) for details)

Tiramisu0.84_2

Tiramisu0.84_1

Added on 18th August 2017: If you are interested in code with keras, please see this Github.

 

This experiment is inspired by awesome MOOCs called “fast.ai by Jeremy Howard. I strongly recommend watching this course if you are interested in deep learning.  No problem as it is free.  It has less math and is easy to understand for the people who are not interested in Ph.D. of computer science.

I will continue to research this model and others in computer vision. Hope I can provide updates soon.  Thanks for reading!

 

 

1.The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation (Simon Jegou, Michal Drozdzal, David Vazquez, Adriana Romero, Yoshua Bengio),  5 Dec 2016

 

2. Densely Connected Convolutional Networks(Gao Huang, Zhuang Liu, Kilian Q. Weinberger, Laurens van der Maaten),  3 Dec 2016

 

3. Segmentation and Recognition Using Structure from Motion Point Clouds, ECCV 2008
Brostow, Shotton, Fauqueur, Cipolla (bibtex)

 

 

Notice: TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software

 

Advertisements

How can computers see the objects? It is done by probability

sheltie-1023012_640

Do you know how computers can see the world?  It is very important as self-driving cars will be available in near future.  If you do not know it,  you can not be brave enough to ride on them. So let me explain it for a while.

 

1.Image can be expressed as a sequence of number

I believe that you have heard the word “RGB“. R stands for red,  G stands for green, B stands for blue. Every color is created by mix of three colors of R,G and B.  Each R, G and B has a value of number which is somewhere from 0 to 255.  Therefore each point in the images, which is called “pixel” has a vector such as [255, 35, 57].  So each image can be expressed as a sequence of numbers. The sequence of numbers are fed into computers to understand what it is.

 

2. Convnet and classifier learn and classify images

Once images are fed into computers,  convnet is used to analyze these data. Convent is one of the famous algorithms of deep learning and frequently used for computer vision. Basic process of image classification is explained as follows.

conputer-vision-001

  • The images is fed into computers as a sequence of numbers
  • Convolutional neural network identifies features to represent the object in the image
  • Features are obtained as a vector
  • Classifier provides the probability of each candidate of the objective
  • The object in the image is classified as an object with the highest probability

In this case, probability of Dog is the highest. So computers can classify “it is a dog”.  Of course, each image has a different set of probabilities so that computers can understand what it is.

 

3.  This is a basic process of computer vision. In order to achieve higher accuracy, many researchers have been developing better algorithms and processing methods intensively. I believe that the most advanced computer vision algorithm is about to surpass the sight of human being. Could you look at the famous experiment by a researcher with his sight? (1)  . His error rate is 5.1%.

Now I am very interested in computer vision and focus on this field in my research. Hope I can update my new finding in near future.

 

1.What I learned from competing against a ConvNet on ImageNet, Andrej Karpathy, a Research Scientist at OpenAI, Sep 2 2014

http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/

 

 

Notice: TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software

 

Let us overview the variations of deep learning now !

office-581131_640

This weekend, I research recurrent neural network (RNN) as I want to develop my small chatbot. I also run program of convnet as I want to confirm how they are accurate.  So I think it is good timing to overview the variations of deep learning because this makes it easier to learn each of network in details.

 

1. Fully connected network

This is the basic of deep learning. When you heard the word “deep learning”, it means “Fully connected network” in most cases. Let us see the program in my article of last week again. You can see “fully_connected” in it.  This network is similar to the network in our brain.

Deep Learning

 

2. Convolutional neural network (Convnet)

This is mainly used for image recognition and computer vision. there are many variations in convnet to achieve higher accuracy. Could you remember my recommendation of TED presentations before?  Let us see it again when you want to know convnet more.

 

3. Recurrent neural network (RNN)

The biggest advantage of RNN is that no need to use fixed size input (Covnet needs it). Therefore it is frequently used in natural language processes as our sentences are sometimes very short and sometimes very long. It means that RNN can handle sequence of input data effectively. In order to solve difficulties when parameters are obtained, many kind of RNN are developed and used now.

RNN

 

4. Reinforcement learning (RL)

the output is an action or sequence of actions and the only supervisory signal is an occasional scalar reward.

  • The goal in selecting each action is to maximize the expected sum of the future rewards. We usually use a discount factor for delayed rewards so that we don’t have to look too far into the future.

This is a good explanation according to the lecture_slides-lec1 p46 of  “Neural Networks for Machine Learning” by Geoffrey Hinton, in Coursera.

 

 

Many researchers all over the world have been developing new models. Therefore new kind of network may be added in near future. Until that, these models are considered as building blocks to implement the deep learning algorithms to solve our problems. Let us use them effectively!

 

Notice: TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software

This is our new platform provided by Google. It is amazing as it is so accurate!

cheesecake-608963_640

In Deep learning project for digital marketing,  we need superior tools to perform data analysis and deep learning.  I have watched “TensorFlow“, which is an open source software provided by Google since it was published on Nov 2015.   According to one of the latest surveys by  KDnuggets, “TensorFlow” is the top ranked tool for deep learning (H2O, which our company uses as main AI engine, is also getting popular)(1).

I try to perform an image recognition task with TensorFlow and ensure how it works. These are results of my experiment. MNIST, which is hand written digits from 0 to 9, is used for the experiment. I choose convolutional network to perform it.  How can TensorFlow can classify them correctly?

MNIST

I set the program of TensorFlow in jupyter like this. This comes from tutorials of TensorFlow.

MNIST 0.81

 

This is the result . It is obtained after 80-minute training. My machine is MAC air 11 (1.4 GHz Intel Core i5, 4GB memory)

MNIST 0.81 3

Could you see the accuracy rate?  Accuracy rate is 0.9929. So error rate is just 0.71%!  It is amazing!

MNIST 0.81 2r

Based on my experiment, TensorFlow is an awesome tool for deep learning.  I found that many other algorithms, such as LSTM and Reinforcement learning, are available in TensorFlow. The more algorithms we have,  the more flexible our strategy for solutions of digital marketing can be.

 

We obtain this awesome tool to perform deep learning. From now we can analyze many data with TensorFlow.  I will provide good insights from data in the project to promote digital marketing. As I said before “TensorFlow” is open source software. It is free to use in our businesses.  No fees is required to pay. This is a big advantage for us!

I can not say TensorFlow is a tool for beginners as it is a computer language for deep leaning. (H2O can be operated without programming by GUI). If you are familiar with Python or similar languages, It is for you!  You can download and use it without paying any fees. So you can try it by yourself. This is my strong recommendation!

 

TensorFlow: Large-scale machine learning on heterogeneous systems

1 : R, Python Duel As Top Analytics, Data Science software – KDnuggets 2016 Software Poll Results

http://www.kdnuggets.com/2016/06/r-python-top-analytics-data-mining-data-science-software.html

 

 

Notice: TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software.