We start AI Lab in the company and research “attention mechanism” in deep learning

As I said before. I completed the online course “deeplearning ai“. This is an awesome course I want to recommend to everyone. There are many topics we can learn in the course. One of the most interesting things for me is “attention mechanism” in neural translation.  So I would like to explain it in details. Do not worry as I do not use mathematics in this article.  Let us start.

 

The definition of attention mechanism is “The attention mechanism tells a Neural Machine Translation model where it should pay attention to at any step”. It may be natural when we consider how we translate language from one to another. Yes, human-being pays more attention to specific objects than others when they are more interesting to them. When we are hungry,  we tend to look for the sign of “restaurant” or ” food court”,  do not care the sing of “library”,  right?

We want to apply the same thing for translation by computers. Let me consider again. It is true that when we translate English to our mother tongue, such as Japanese, we look at the whole part of the sentences first, then make sure what words are important to us.  we do not perform translation one on one basis. In another word, we pay more attention to specific words than other words. So we want to introduce the same method in performing neural translation by computers.

 

Originally, attention mechanism was introduced (1) in Sep 2014. Since then there are many attention mechanisms introduced. One of the strongest attention models is “Transformer” by Google brain in  June 2017.  I think you use Google translation every day. It performs very well. But transformer is better than the model used in Google translation. This chart shows deference between  GNMT (Google translation) and Transformer(2).

Fortunately, Google prepares the framework to facilitate AI research.  It is called “Tensor2Tensor (T2T) “. It is open sourced and can be used without any fees. It means that you can do it by yourself! I decide to set up “AI Lab” in my company and introduce this framework to research attention mechanism. There are many pre-trained models including “Transformer”.  Why don’t you join us?

 

I used translations as our example to explain how attention mechanism works. But it can be applied to many other fields such as object detection which is used in face recognition and a self-driving car. It must be excited when we consider what can be achieved by attention mechanism.  I would like to update the progress.  So stay tuned!

 

 

When you need AI consulting,  do not hesitate to contact TOSHISTATS

 

(1) NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE.  By Dzmitry Bahdanau, KyungHyun Cho, Yoshua Bengio in Sep 2014

(2) Attention Is All You Need,  By Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez, Łukasz Kaiser,Illia Polosukhin,  in June 2017

 

 

 

Notice: TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software.

Advertisements

Let us consider “Brain as a service” again now!

monitor-1307227_1280

Two years ago, I wrote my article about Computer Go player “AlphaGo” and talk about “Brain as a service” in future. Because AlphaGo is so strong and it can improve itself by reinforcement learning with self-play.  Now I am more confident that “Brain as a service” will be available in near future. Let us consider why I think so.

 

1. Self-play without human interventions

In Oct 2017, DeepMind released a new version of Computer Go player “AlphaGo Zero“. The previous version of AlphaGo learned from human’s play at the early stage of training. But AlphaGo Zero can improve themselves without human interventions and knowledge. Starting with nothing, it can be stronger than human Go-champion. This is incredible! Of course, the real world is not the game of Go so we should modify self-play to apply our real-life problems. But fundamentally, there are many chances to improve our society by using self-play as it provides super-human solutions if it is correctly implemented. AlphaGo Zero proves it is true.

 

2. Reinforcement learning can be researched anywhere on the earth

Now  I research OpemAI Gym, which is an environment/simulator for reinforcement learning (RL). This is provided by OpenAI which is a nonprofit organization established by Elon Musk, Sam Altman. OpenAI provides us not only research results of theory but also codes of them to implement in our system. It means that as long as we have an access to the internet, we can start our own research of reinforcement learning based on OpenAI Gym. No capital is required as codes of OpenAI Gym are provided for free. Just download and use them as they are open-source-software.  Applications of RL like AlphaGo can be developed anywhere in the world. If you want to try it, go to the OpenAI Gym website and set OpenAI Gym by yourself. You can enjoy cool results of reinforcement learning!

 

3. It will be easier to obtain data from the world

Google said “The real world as your playground: Build real-world games with Google Maps APIs”  last week. It means that any game developers can create real-world-game by using Google Maps. We can access to countless 3D buildings, roads, landmarks, and parks all over the world as digital assets. This is amazing!  But this should not be considered as just a matter of games. This is just one of the example to tell us how we can obtain data from the world because we can create real-world computer-vision simulators with this service.  In addition to that,  I would like to mention blockchain a little. blockchain can be used to connect the world in a transparent manner. I imagine that many data inside companies or organizations can be accessed more easily through blockchain in near future. Therefore we will be able to accelerate AI development with far more data than now at a rapid pace.  This must be excited!

 

 

” These things must be a trigger to change the landscape of our business, societies and lives. Because suddenly computers can be sophisticated enough to work just like our brain.  AlphaGo teaches us that it may happen when a few people think so. Yes, this is why I think that the age of “Brain as a Service” will come in near future.  How do you think of that?”

This is what I said two years ago. Of course, it is impossible to predict when “Brain as a Service” will be available. But I am sure we are going in this direction step by step.  Do you agree that?

 

 

 

Note: Toshifumi Kuga’s opinions and analyses are personal views and are intended to be for informational purposes and general interest only and should not be construed as individual investment advice or solicitation to buy, sell or hold any security or to adopt any investment strategy.  The information in this article is rendered as at publication date and may change without notice and it is not intended as a complete analysis of every material fact regarding any country, region market or investment.

Data from third-party sources may have been used in the preparation of this material and I, Author of the article has not independently verified, validated such data. I and TOSHI STATS.SDN.BHD. accept no liability whatsoever for any loss arising from the use of this information and relies upon the comments, opinions and analyses in the material is at the sole discretion of the user. 

“Monte Carlo tree search ” is the key in AlphaGo Zero!

On October last year, Google DeepMind released “AlphaGo Zero“.  It is stronger than all previous versions of Alpha Go although this new version uses no human knowledge of Go for training. It performs self-play and gets stronger by itself.  I was very surprised to hear the news. Because we need many data to train the model in general.

Today, I would like to consider why AlphaGo Zero works well from the viewpoint of Go-player as I played it for entertainment purpose so many years. I am not a Profesional Go player. But I have the expertise of both Go and Deep learning.  So it is a good opportunity for me to consider it now.

When I play Go,  I make decisions for next move based on the intuition in many cases because I am very confident that “it is right’. But when we are in a more complex situation in Go and are not so sure what the best move is, I should try many paths that I and my opponent can take each turn in my mind (not move on the real board) and want to choose the best move based on trials.  We call it “Yomi” in Japanese.  Unfortunately, I sometimes perform “Yomi” wrongly, then I make a wrong decision to move. Professional Go players perform “Yomi” much more accurately than I do.  This is the key to be strong players in Go.

 

Then I wonder how AlphaGo Zero can perform “Yomi” effectively.  I think this is the key to understand AlphaGo Zero. Let me consider these points

 

1.Monte Carlo tree search (MCTS) performs “Yomi” effectively.

Next move can be decided by the policy/value function. But there might be another better move. So we need to search for it. MCTS is used for this search in AlphaGo Zero. Based on the paper, MCTS can find the better move that original move was chosen by the policy/value function.  DeepMind says MCTS works as “powerful policy improvement operator” and “improved MCTS-based policy” can be obtained. This is great as it means that AlphaGo Zero can perform “Yomi” just like us.

 

2. A game can be continued by Self-play without human knowledge.

I wonder how we can play a whole game of Go without human knowledge. The paper explains it as follows    “Self-play with search—using the improved MCTS-based policy to select each move, then using the game-winner z as a sample of the value—may be viewed as a powerful policy evaluation operator.”  So just playing games with itself,  the winner of the game can be obtained as a sample. These results are used for next learning processes. Therefore ”Yomi” by AlphaGo Zero can be more accurate.

 

 

3. This training algorithm is very efficient to learn from scratch

Computers are very good at performing simulations so many times automatically.  So without human knowledge in advance, AlphaGo Zero can be stronger and stronger when it does “self-play” so many times. Based on the paper, starting with random play, AlphaGo Zero outperformed the previous version of AlphaGo that beat Lee Sedol in March 2016,  just after 72 hours training. This is incredible because it is required only 72 hours to develop the model to beat professional players from scratch without human knowledge.

 

 

Overall, AlphaGo Zero is incredible. If AlphaGo Zero training algorithm can be applied to our businesses,  AI professional-businessman might be created in 72 hours without human knowledge. This must be incredibly sophisticated!

Hope you enjoy the story of how AlphaGo Zero works. This time I overview the mechanism of AlpahGo Zero.  When you are interested in it more details, I recommend watching the video by DeepMind. In my next article, I would like to go a little deeper into MCTS and training of models.  It must be exciting!  See you again soon!

 

David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel & Demis Hassabis
Published in NATURE, VOL 550, 19 OCTOBER 2017

Notice: TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software.

This could be a new way to train deep reinforcement learning in 2018!

flowers-19830_1280

As the end of this year is coming soon, I would like to consider what comes next in artificial intelligence next year.   So this week I reviewed several research papers and found something interesting to me. This is about “Genetic Algorithm (GA)”.  The paper (1) explains GA can be applied to deep reinforcement learning to obtain optimizations over parameters. This must be exciting for many researchers and programmers of deep learning.

 

According to Wikipedia, a genetic algorithm (GA) is a metaheuristic inspired by the process of natural selection. As I had the project using GA in Tokyo more than 10 years ago,  I would like to re-perform GA in the context of deep learning in 2018. To prepare for that, I would like to explain major components of GA below

  • Gene: They are instruments that we can modify to optimize.  It has similar functions to human’s gene.  In order to adjust environments around them, they can be modified through GA operations as I explain below.
  • Generation: This is the corrections of an individual gene at the certain period. As time passes, new generations will be created recurrently.
  • Selection: Based on fitness values, better genes are selected to make next generations. There are many ways to do that. The best gene may be retained.
  • Crossover: Parts of components of genes are exchanged between different genes.  It can accelerate diversifications among genes.
  • Mutation: Part of components are changed into another one, Mutation and crossover are derived from the processes of evolutions in nature.

 

This is an awesome image to explain GA with ease (2). Based on fitness values, genes with higher scores are selected.  When reproductions are performed,  some of the selected genes can remain as the same before (Elite Strategy). Crossover and mutation are performed against the rest of genes to create next generations.  You can see the fitness values in generation t+1 are bigger than generation t. This is a basic framework of GA.  There are many variations of GA in terms of the way to create next generations.

Image 2017-12-25 17-36-42

Genetic algorithm

I put the simple python code of GA for portfolio management in Github. If you are interested in GA more details. Please look at it here.

 

Although GA has a long history,  its application to deep learning is relatively new.  In TOSHI STATS, which is AI start-up,  I continue to research how GA can be applied to deep learning so that optimizations can be obtained effectively.  Hope I can update you soon in 2018.  Happy new year to everyone!

 

 

 

1.Deep Neuroevolution: Genetic Algorithms are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning, Felipe Petroski Such Vashisht Madhavan Edoardo Conti Joel Lehman Kenneth O. Stanley Jeff Clune, Uber AI Labs, 18 December 2017

2. IBA laboratory, a research laboratory of Genetic and Evolutionary Computations (GEC) of the Graduate School of Engineering, The University of Tokyo, Japan.

 

 

Notice: TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software.

 

This is incredible! Semantic segmentation by just 700 images from scratch with Mac Air!

dubai-1767540_1280

You may see this kind of pair of images below before.  Images are segmented by color based on the objects on them.  They are called “semantic segmentation”.  It is studied by many AI researchers now because it is critically important for self-driving car and robotics.

segmentaion1

Unfortunately, however, it is not easy for startups like us to perform this task.  Like other computer vision tasks, semantic segmentations needs massive images and computer resources. It is sometimes difficult in tight-budget projects. In case we cannot correct many images,  we are likely to give it up.

 

This situation can be changed by this new algorithm.  This is called “Fully convolutional DenseNets for semantic segmentation  (In short called “Tiramisu” 1)”.    Technically, this is the network which consists of many “Densenet(2)”,  which in July 2017 was awarded the CVPR Best Paper award.  This is a structure of this model written in the research paper (1).

Tiramisu1

I would like to confirm how this model works with a small volume of images. So I obtain urban-scene image set which is called”CamVid Database (3)”.  It has 701 scene images and colour-labeled images.  I choose 468 images for training and 233 images for testing. This is very little data for computer vision tasks as it usually needs more than 10,000-100,000 images to complete training for each task from scratch. In my experiment,  I do not use pre-trained models.  I do not use GPU for computation, either. My weapon is just MacBook Air 13 (Core i5) just like many business persons and students.  But new algorithm works extream well.  Here is the example of results.

T0.84 2017-08-13-1

T0.84 2017-08-13-4

“Prediction” looks similar to “ground-truth” which means the right answer in my experiment. Over all accuracy is around 83% for classification of 33 classes (at the 45th epoch in training).  This is incredible as only little data is available here. Although prediction misses some parts such as poles,  I am confident to gain more accuracy when more data and resources are available. Here is the training result. It took around 27 hours.  (Technically I use “FC-DenseNet56”.  Please read the research paper(1) for details)

Tiramisu0.84_2

Tiramisu0.84_1

Added on 18th August 2017: If you are interested in code with keras, please see this Github.

 

This experiment is inspired by awesome MOOCs called “fast.ai by Jeremy Howard. I strongly recommend watching this course if you are interested in deep learning.  No problem as it is free.  It has less math and is easy to understand for the people who are not interested in Ph.D. of computer science.

I will continue to research this model and others in computer vision. Hope I can provide updates soon.  Thanks for reading!

 

 

1.The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation (Simon Jegou, Michal Drozdzal, David Vazquez, Adriana Romero, Yoshua Bengio),  5 Dec 2016

 

2. Densely Connected Convolutional Networks(Gao Huang, Zhuang Liu, Kilian Q. Weinberger, Laurens van der Maaten),  3 Dec 2016

 

3. Segmentation and Recognition Using Structure from Motion Point Clouds, ECCV 2008
Brostow, Shotton, Fauqueur, Cipolla (bibtex)

 

 

Notice: TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software

 

Let us develop car classification model by deep learning with TensorFlow&Keras

taxi-1209542_640
For nearly one year, I have been using TensorFlow and considering what I can do with it. Today I am glad to announce that I developed my computer vision model trained by real-world images. This is classification model for automobiles in which 4 kinds of cars can be classified. It is trained by little images on a normal laptop like Mac air. So you can re-perform it without preparing extra hardware.   This technology is called “deep learning”. Let us start this project and go into deeper now.

 

1. What should we classify by using images?

This is the first thing we should consider when we develop the computer vision model. It depends on the purpose of your businesses. When you are in health care industry,  it may be signs of diseases in human body.  When you are in a manufacture, it may be images of malfunctions parts in plants. When you are in the agriculture industry, Conditions of farm land should be classified if it is not good. In this project, I would like to use my computer vision model for urban-transportations in near future.  I live in Kuala Lumpur, Malaysia.  It suffers from huge traffic jams every day.  The other cities in Asean have the same problem. So we need to identify, predict and optimize car-traffics in an urban area. As the fist step, I would like to classify four classes of cars in images by computers automatically.

 

 

2. How can we obtain images for training?

It is always the biggest problem to develop computer vision model by deep learning.  To make our models accurate, a massive amount of images should be prepared. It is usually difficult or impossible unless you are in the big companies or laboratories.  But do not worry about that.  We have a good solution for the problem.  It is called “pre-trained model”. This is the model which is already trained by a huge amount of images so all we have to do is just adjusting our specific purpose or usage in the business. “Pre-trained model” is available as open source software. We use ResNet50 which is one of the best pre-trained models in computer vision. With this model, we do not need to prepare a huge volume of images. I prepared 400 images for training and 80 images for validation ( 100 and 20 images per class respectively).  Then we can start developing our computer vision model!

 

3.  How can we keep models accurate to classify the images

If the model provides wrong classification results frequently, it must be useless. I would like to keep accuracy ratio over 90% so that we can rely on the results from our model.  In order to achieve accuracy over 90%,  more training is usually needed.  In this training, there are 20 epochs, which takes around 120 minutes to complete on my Mac air13. You can see the progress of the training here.  This is done TensorFlow and Keras as they are our main libraries for deep learning.  At 19th epoch, highest accuracy (91.25%) are achieved ( in the red box). So The model must be reasonably accurate!

Res 0.91

 

Based on this project,  our model, which is trained with little images,  can keep accuracy over 90%.  Although whether higher accuracy can be achieved depends on images for training,  90% accuracy is good to start with more images to achieve 99% accuracy in future. When you are interested in the classification of something, you can start developing your own model as only 100 images per class are needed for training. You can correct them by yourselves and run your model on your computer.  If you need the code I use,  you can see it here. Do you like it? Let us start now!

 

Notice: TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software

Can your computers see many objects better than you in 2017 ?

notebook-1757220_640

Happy new year for everyone.  I am very excited that new year comes now. Because this year, artificial intelligence (AI) will be much closer and closer to us in our daily lives. Smartphones can answer your questions with accuracy. Self-driving car can run without human drivers. Many AI game players can compete human players, and so on. It is incredible, isn’t it!

However, in most cases,  these programs of many products are developed by giant IT companies, such as Google and Microsoft. They have almost unlimited data and computer resources so it is possible to make better programs. How about us?  we have small data and limited computer resources unless we have enough budget to use cloud services. Is it  difficult to make good programs in our laptop computers by ourselves?  I do not think so. I would like to try it by myself first.

I would like to make program to classify cats and dogs in images. To do that, I found a good tutorial (1). I use the code of this tutorial and perform my experiment. Let us start now. How can we do that?  It is amazing.

cats-and-dogs

For building the AI model to classify cats and dogs, we need many images of cats and dogs. Once we have many data, we should train the model so that the model can classify cats and dogs correctly.  But we have two problems to do that.

1.  We need massive amount of images data of  cats and dogs

2. We need high-performance computer resources like GPU

To train the models of artificial intelligence,  it is sometimes said ” With massive amount of data sets,  it takes several days or one week to complete training the models”. In many cases, we can not do that.  So what should we do?

Do not worry about that. We do not need to create the model from scratch.  Many big IT companies or famous universities have already trained the AI models and make them public for everyone to use. It is sometimes called “pre-trained models”. So all we have to do is just input the results from pre-trained model and make adjustments for our own purposes. In this experiment,  our purpose is to identify cats and dogs by computers.

I follow the code by François Chollet, creator of keras. I run it on my MacAir11. It is normal Mac and no additional resources are put in it. I prepared only 1000 images for cats and dogs respectively. It takes 70 minutes to train the model.  The result is around 87% accuracy rate. It is great as it is done on normal laptop PC, rather than servers with GPU.

 

 

Based on the experiment, I found that Artificial intelligence models can be developed on my Mac with little data to solve our own problem. I would like to perform more tuning to obtain more accuracy rate . There are several methods to make it better.

Of course, this is the beginning of story. Not only “cats and dogs classifications’ but also many other problems can be solved in the way I experiment here. When pre-trained models are available, they can provide us great potential abilities to solve our own problems. Could you agree with that?  Let us try many things with “pre-trained model” this year!

 

 

1.Building powerful image classification models using very little data

https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html

Notice: TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software

How can computers see the objects? It is done by probability

sheltie-1023012_640

Do you know how computers can see the world?  It is very important as self-driving cars will be available in near future.  If you do not know it,  you can not be brave enough to ride on them. So let me explain it for a while.

 

1.Image can be expressed as a sequence of number

I believe that you have heard the word “RGB“. R stands for red,  G stands for green, B stands for blue. Every color is created by mix of three colors of R,G and B.  Each R, G and B has a value of number which is somewhere from 0 to 255.  Therefore each point in the images, which is called “pixel” has a vector such as [255, 35, 57].  So each image can be expressed as a sequence of numbers. The sequence of numbers are fed into computers to understand what it is.

 

2. Convnet and classifier learn and classify images

Once images are fed into computers,  convnet is used to analyze these data. Convent is one of the famous algorithms of deep learning and frequently used for computer vision. Basic process of image classification is explained as follows.

conputer-vision-001

  • The images is fed into computers as a sequence of numbers
  • Convolutional neural network identifies features to represent the object in the image
  • Features are obtained as a vector
  • Classifier provides the probability of each candidate of the objective
  • The object in the image is classified as an object with the highest probability

In this case, probability of Dog is the highest. So computers can classify “it is a dog”.  Of course, each image has a different set of probabilities so that computers can understand what it is.

 

3.  This is a basic process of computer vision. In order to achieve higher accuracy, many researchers have been developing better algorithms and processing methods intensively. I believe that the most advanced computer vision algorithm is about to surpass the sight of human being. Could you look at the famous experiment by a researcher with his sight? (1)  . His error rate is 5.1%.

Now I am very interested in computer vision and focus on this field in my research. Hope I can update my new finding in near future.

 

1.What I learned from competing against a ConvNet on ImageNet, Andrej Karpathy, a Research Scientist at OpenAI, Sep 2 2014

http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/

 

 

Notice: TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software

 

Is this a real voice by human being? It is amazing as generated by computers

girl-926225_640

As I shared the article this week,  I found the exciting system to generate voices by computers. When I heard the voice I was very surprised as it sounds so real. I recommend you to listen to them in the website here.  There are versions of English and Mandarine. This is created by DeepMind, which is one of the best research arms of artificial intelligence in the world. What makes it happen?   Let us see it now.

 

1. Computers learns our voices deeper and deeper

According to the explanation of DeepMind, they use “WaveNet, a deep neural network for generating raw audio waveforms”.  They also explain”pixel RNN and pixel CNN”, which are invented by them earlier this year. (They have got one of best paper award at ICML 2016, which are one of the biggest international conference about machine learning, based on the research). By applying pixel RNN and CNN to voice generation, computers can learn wave of voices far more details than previous methods. It enables computers generate more natural voices. It is how WaveNet is born this time.

As the result of learning raw audio waveforms, computer can generate voices that sound so real. Could you see the metrics below?  The score of WaveNet is not so different from the score of Human Speech (1). It is amazing!

%e3%82%b9%e3%82%af%e3%83%aa%e3%83%bc%e3%83%b3%e3%82%b7%e3%83%a7%e3%83%83%e3%83%88-2016-09-14-9-29-29

2. Computers can generate man’s voice as well as woman’s voice at the same time

As computer can learn wave of our voices more details,  they can create both man’s voice and woman’s voice. You can also listen to each of them in the web. DeepMind says “Similarly, we could provide additional inputs to the model, such as emotions or accents”(2) . I would like to listen them, too!

 

3. Computers can generate not only voice but also music!

In addition to that,  WaveNet can create music, too.  I listen to the piano music by WaveNet and I like it very much as it sounds so real. You can try it in the web, too.  When we consider music and voice as just data of audio waveforms, it is natural that WaveNets can generate not only voices but also music.

 

If we can use WaveNet in digital marketing, it must be awesome! Every promotions, instructions and guidance to customers can be done by voice of  WaveNet!  Customers may not recognize “it is the voice by computers”.  Background music could be optimized to each customer by WaveNet, too!  In my view, this algorithm could be applied to many other problems such as detections of cyber security attack, anomaly detections of vibrations of engines, analysis of earthquake as long as data can form  of “wave”.  I want to try many things by myself!

Could you listen the voice by WaveNet? I believe that in near future, computers could learn how I speech and generate my voice just as I say.  It must be exciting!

 

 

1,2.  WaveNet:A generative model for Raw Audio

https://deepmind.com/blog/wavenet-generative-model-raw-audio/

 

 

Notice: TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software

Let us overview the variations of deep learning now !

office-581131_640

This weekend, I research recurrent neural network (RNN) as I want to develop my small chatbot. I also run program of convnet as I want to confirm how they are accurate.  So I think it is good timing to overview the variations of deep learning because this makes it easier to learn each of network in details.

 

1. Fully connected network

This is the basic of deep learning. When you heard the word “deep learning”, it means “Fully connected network” in most cases. Let us see the program in my article of last week again. You can see “fully_connected” in it.  This network is similar to the network in our brain.

Deep Learning

 

2. Convolutional neural network (Convnet)

This is mainly used for image recognition and computer vision. there are many variations in convnet to achieve higher accuracy. Could you remember my recommendation of TED presentations before?  Let us see it again when you want to know convnet more.

 

3. Recurrent neural network (RNN)

The biggest advantage of RNN is that no need to use fixed size input (Covnet needs it). Therefore it is frequently used in natural language processes as our sentences are sometimes very short and sometimes very long. It means that RNN can handle sequence of input data effectively. In order to solve difficulties when parameters are obtained, many kind of RNN are developed and used now.

RNN

 

4. Reinforcement learning (RL)

the output is an action or sequence of actions and the only supervisory signal is an occasional scalar reward.

  • The goal in selecting each action is to maximize the expected sum of the future rewards. We usually use a discount factor for delayed rewards so that we don’t have to look too far into the future.

This is a good explanation according to the lecture_slides-lec1 p46 of  “Neural Networks for Machine Learning” by Geoffrey Hinton, in Coursera.

 

 

Many researchers all over the world have been developing new models. Therefore new kind of network may be added in near future. Until that, these models are considered as building blocks to implement the deep learning algorithms to solve our problems. Let us use them effectively!

 

Notice: TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software