This is my favorite NLP model. It is small but works very well!

Since BERT was released from Google in Oct 2018, there are many models which improve original model BERT. Recently, I found ALBERT, which was released from Google research last year. So I perform small experiments about ALBERT. Let us make sentiment analysis model by using IMDB data. IMDB is movie review data, including review content and its sentiment from each user. I prepare 25000 training and 3000 test data. The reslut is very good. Let us see more details.

  1. It is easy to train the model as it has less parameters than BERT does.

BERT is very famous as it keeps good perfomance for NLP (natural language processing) tasks. But for me, it is a little too big. BERT has around 101 millions parameters. It means that it takes long to train the model and sometimes go over the capacity of memory of GPUs. On the other hand, ALBERT has around 11 millions so easy to train. It takes only about 1 hour to reach 90% accuracy when NVIDIA Tesla P100 GPU is used. It is awsome!

In this expriment, max_length is 256 for each sample

2. It is very accurate with a little data

ALBERT is not only fast to learn but also very accurate. I used pre-trained ALBERT base model from TensorFlow Hub. Because it is pretrained in advance, ALBERT is accurate with less data. For example, with only 500 training data, its accuracy is over 80%! It is very good when we apply it into real problems as we do not always have enough training data in practice.

max_length is 128 for each sample.

3. It is easily integrated to TensorFlow and keras

Finally, I would like to pointout ALBERT is easy to integrate TensorFlow keras, which is the framework of deep learning. All we have to do is to import ALBERT as “keras layer”. If you want to know more, check TensorFlow Hub. It says how to do it. I use TensorFlow keras everyday so ALBERT can be my favorate model automatically.

As I said, ALBERT is released in TensorFlow Hub and is free to use. So everyone can start using it easily. It is good to democratise “artificial intelligence”. I want to apply ALBERT into many applicatons in real world. Stay tuned!

Cheers Toshi

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software

This is our cross-lingual intelligent system. It is smarter than I am as it recognizes 16 languages!

When I lived in Kuala Lumpur, Malaysia,  I always thought I am in a multilingual environment.  Most people speak Bahasa Malaysia, But when they talk to me, they speak English. Some of them understand Japanese.  Chinese Malaysian people speak Mandarin, Cantonese.  Indian people speak Hindi or other languages.  Yes, I am sure Asia is a “multi-lingual environment”.   Since then I am always wondering how we can develop the system that can accept many languages as inputs.  Now I found that.

This is the first cross-intelligent system by TOSHI STATS.  It can accept 16 languages and perform sentimental analysis.  Let me explain the details.

 

 

1. Inputs in 16 languages

As we use MUE(1) models in TensorFlow Hub,  it can accept 16 languages ( See the list below ).  Usually, Japanese systems cannot be input English and English systems cannot accept Japanese. But this system can be input both of them and work well. This is amazing! The first screenshot of our system is the input in Engish and the second is input in Japanese.

N_Osaka2a

N_Osaka1a

We do not need a system for each language, one by one. The secret is the system can map each sentence to the same space although each of them is written in different languages.

 

 

2. Transfer learning from English to other languages

I think it is the biggest breakthrough in the system. As a result of sharing the same space among languages,  we can train the model in English and transfer its knowledge to other languages.   For example, there are many text data for training models in English but there are a few in Japanese. In such a case, it is difficult to train models effectively in Japanese. But we can train models in English and use it in Japanese. It is great! Of course, we can train the model in another language and transfer it to others. It is extraordinary as it enables us to transfer knowledge and expertise from one languages to another.

 

 

3. Experiment and result

I choose one news title(2) from The Japan Times and perform sentiment analysis with the system. The title is ” Naomi Osaka cruises to victory in Pan Pacific Open final to capture first title on Japanese soil “.   I think it should be positive.

Japan-timesThis English sentence is translated into other 15 languages by Google translation. Then each sentence is input to the system and we measure “probability of positive sentiment”.  Here is the result. 90% of them are over 0.8. It means that in most languages, the system can recognize each sentence as definitely “positive”.  This is amazing!  It works pretty well in 16 languages although the model is trained only in English.

MUE_result

 

 

When I develop this cross-lingual intelligent system, I think it is already smarter than I am as I do not know what sentences in 14 languages mean except Japanese and English. Based on this method, we can develop many intelligent systems which are difficult to develop one year ago.  Let me update the progress of our intelligent system. Stay tuned!

 

 

1. Multilingual Universal Sentence Encoder for Semantic Retrieval ,  Google, YinfeiYang, Daniel Cer, Amin Ahmad, Mandy Guo, Jax Law, Noah Constant, Gustavo Hernandez Abrego, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil, July 9 2019

2.Naomi Osaka cruises to victory in Pan Pacific Open final to capture first title on Japanese soil, The Japan Time, Sep 22, 2019.

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software

BERT also works very well as a feature extractor in NLP!

Two years ago, I developed car classification models by ResNet. I use transfer learning to develop models as I can prepare only small amount of images. My model is already pre-trained by a huge amount of data such as ImageNet. I can extract features of each image of cars and train classification models on top of that. It works very well. If you are interested in it, could you see the article?

Then, I am wondering how BERT(1) works as a feature extractor. If it works well, it can be applied to many downstream tasks with ease. Let us try the experiment here. BERT is one of the best Natural Language Processing (NLP) models by Google. I wrote how BERT works in my article before. It is amazing!

Let me explain features a little. Feature means “How texts can be represented by vectors”. Each word can be converted to a number before inputting to BERT then whole sentence can be converted to 768-length-vectors by BERT. In this experiment, feature extraction can be done by TensorFlow Hub of BERT. Let us see its website. It says there are two kinds of outputs by BERT…

It means that when text data is input to BERT, the model returns two type of vectors. One is “one vector for each sentence”, the other is “sequence of vectors for each sentence”. In this task, we need “one vector for each sentence” because it is classification task and one vector is enough to input classification models. We can see the first 3 vectors out of 3503 samples below.

This is a training result of the classification model. Accuracy is 82.99% at 105 epoch. Although it is reasonable it is worsen than the result of the last article 88.58%. The deference is considered as advantage of fine tuning. In this experiment, weights of BERT are fixed and there is no fine tuning. So if you need more accuracy, let us try fine tuning just like the experiment in the last article.

BERT means “Bidirectional Encoder Representations from Transformers”. So it looks good as a tool for feature extractions. Especially this is multi-language model therefore we can use it for 104 languages. It is amazing!

I will perform other experiments about BERT in my article. Stay tuned!

  1. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
    11 Oct 2018, Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova, Google AI Language

Notice: Toshi Stats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. Toshi Stats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on Toshi Stats Co., Ltd. and me to correct any errors or defects in the codes and the software

BERT performs very well in the classification task in Japanese, too!

As I promised in the last article, I perform experiments about classification of news title in Japanese. The result is very good as I expected. Let me explain the details.

I use “livedoor news corpus” (2) for this experiment. These are five-class of news title in this experiment. These are about life, movie, sports, chats, and electronics. Here is the detail of the class. I would like to classify each title of news according to this class correctly.

Then I train BERT(1) model with a sample of news title written in Japanese. Here is the result. The BERT model, which I used, is the multi-language model. All I have to do is fine-tuning to apply my task. As you can see below, The accuracy ratio is about 88%. It is very good while I use very small sample data (3503 for training, 876 for test). It took less than one minute on colab with GPU.

With 3 epochs, I confirmed that the accuracy ratio is over 88%

Let me take 10 samples for validation and see each of them. These samples are not used for training so they are new to the computer. Nine out of ten are classified correctly. It is so good, isn’t it?

The beauty is that the pre-trained model is not specific for only Japanese. As it is a multi-language model, it should work in many kinds of languages with the same fine-tuning as I did in Japanese. Therefore It should work in your languages, too!

How about this experiment? I continue to do experiments of BERT in many tasks of natural language and update my article soon. Stay tuned!

  1. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
    11 Oct 2018, Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova, Google AI Language
  2. livedoor news corpus CC BY-ND 2.1 JP

Notice: Toshi Stats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. Toshi Stats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on Toshi Stats Co., Ltd. and me to correct any errors or defects in the codes and the software

BERT performs near state of the art in question and answering! I confirm it now

Today, I write the article of BERT, which a new natural language model, again because it works so well in question and answering task. In my last article, I explained how BERT works so if you are new about BERT, could you read it?

For this experiment, I use SQuADv1.1data as it is very famous in the field of question and answering.  Here is an explanation by them.

“Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowd workers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.” (This is from SQuAD2.0, a new version of Q&A data)

This is a very challenging task for computers to answer correctly. How does BERT work for this task? As you saw below, BERT recorded f1 90.70 after one-hour training on TPU on colab in our experiment. It is amazing because based on the Leaderboard of SQuAD1.1 below, it is the third or fourth among top universities and companies although the Leaderboard may be different from our experiment setting. It is also noted BERT is as good as a human is!

 

 

 

I tried both Base model and Large model with different batch size.  Large model is better than Base model with around 3 points. Large model takes around 60 minutes to complete training while Base model takes around 30 munites. I use TPU on Google colab for training. Here is the result. EM means “exact match”.

Question & answering can be applied to many tasks in businesses, such as information extraction from documents and automation for customer centers. It must be exciting when we can apply BERT to businesses in the near future.

 

Next, I would like to perform text-classification of news title in Japanese because BERT has a multi-language model which works in 104 languages globally. As I live in Tokyo now, it is easy to find good data for this experiment. I will update my article soon. So stay tuned!

 

 

 

 

@article{devlin2018bert,
  title={BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding},
  author={Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina},
  journal={arXiv preprint arXiv:1810.04805},
  year={2018}
}

Notice: Toshi Stats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. Toshi Stats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on Toshi Stats Co., Ltd. and me to correct any errors or defects in the codes and the software

 

“BERT” can be a game changer to accelerate digital transformation!

Since Q1 of 2019 is close to ending,  I would like to talk about one of the biggest innovation of deep learning in Natural Language Processing (NLP).  This is called “BERT” presented by Google AI in Oct 2018. As far as I know, it is the first model to perform very well in many language tasks such as sentimental analysis, question answering without any change of the model itself. It is amazing! Let us start now.

1. How BERT works?

The secrets of BERT are its structure and method of training.   BERT introduces transformer as the main blocks in it.  I mentioned transformer before as it is a new structure to extract information of sequential data. The key is the attention mechanism. It means to measure “how we should pay attention to each word in the sentence”. If you want to know more, it is a good reference. Then let us move on how BERT is trained. BERT means “Bidirectional Encoder Representations from Transformers”. For example, the word “bank” has different meanings in “bank account” and “bank of the river”. When the model can learn from data only forward direction, it is difficult to distinguish the difference of meaning of “bank”. But if it can learn not only forward but backward direction, the model can do so. It is the secret for BERT to perform the state of art in many NLP tasks without modifications. This is the chart from the research paper (1).

2. How can we apply BERT to our tasks for solutions?

BERT is so large that it needs a lot of data and computing resources such as GPU/TPU.  Therefore it takes time and cost if we train BERT from scratch. But no need to worry.  Google released a number of pre-trained models of BERT. It is great because we can use them as base models and all we have to do is just a small training to adjust to our own tasks such as text classification. It is called “fine-tuning”. These pre-trained models are open source and available for everyone. If you want to know more, please see the blog. The beauty is one of the pre-trained models is a multi-language model which works in 104 languages without any modifications. It is amazing! So it works in your language, too!

3. Can BERT accelerate digital transformation in our daily lives?

I think “Yes” because we are surrounding a massive amount of documentation such as contracts, customer reports, emails, financial reports, regulatory instructions, newspapers, and so on. It is impossible to understand everything and extract the information needed in a real-time manner.  With BERT, we can develop much better applications to handle many text data and extract information needed efficiently. It is very exciting when we consider how many applications can be created by using BERT in the near future.

Hope you enjoy my article. Now I research BERT intensively and update my article soon. Stay tuned!

  1. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
    11 Oct 2018, Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova, Google AI Language

 

 

 

Notice: Toshi Stats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. Toshi Stats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on Toshi Stats Co., Ltd. and me to correct any errors or defects in the codes and the software

How can we develop machine intelligence with a little data in text analysis?

ginza-725794_1280

Whenever you want to create machine intelligence model, the first question to ask is “Where is my data?”.   It is usually difficult to find good data to create models because it is time-consuming and may require costs to do that. Unless you work in good companies such as Google or Facebook, it might be a headache for you. But fortunately, there are good ways to solve this problem. It is “Transfer learning”.  Let us find out!

1. Transfer learning

When we need to train machine intelligence models, we usually use “supervised learning”. It means that we need “teachers” who can tell which is a right answer. For example, when we need to classify “which is a cat or a dog?”, we need to tell “this is a cat and that is a dog” to computers.  It is the powerful method of learning to achieve higher accuracy.  So most of the current AI applications are developed by “Supervised learning”.  But a problem arises here. There are a little data for supervised learning.  While we have many images on our smartphones, each image has no information about “what it is”. So we need to add this information to each image manually.  It takes time to complete as a massive amount of images are needed in training. I explained it a little in computer vision in my blog before. We can say the same thing in text analysis or natural language processing. We have many tweets on the internet. But no one tells you which has positive and negative sentiment. Therefore we need to put “positive or negative’ to each tweet by ourselves. No one wants to do that. Then “Transfer learning” comes here.  You do not need training from scratch. Just transfer someone’s results to your models as someone did the similar training before you do!  The beauty of “Transfer Learning” is that we need just a little data in our training. No need for a massive amount of data anymore. It makes preparing data far easier for us!

Cat and dogs

2. “Transformer”

This model (1) is one of the most sophisticated models for machine translation in 2017. It is created by Google brain. As you know, it achieved the state of art of accuracy in Neural Machine translation at the time it was public.  The key architecture of Transformer is “Self-attention”.  It can tell us where the model should pay attention to among all words in a sentence, regardless of their respective position, by using “Query, Key, and Value” mechanism. The Research paper “Attention Is All You Need” is available here.  “Self-attention mechanism” takes times to explain in details. If you want to know more, this blog is strongly recommended. I just want to say “Self-attention mechanism” might be a game changer to develop machine intelligence in the future.

3.  Transfer learning based on “Transformer”

It has been more than one year since “Transformer” was public, There are several variations based on”Transformer”.  I found the good model for “transfer learning” I mentioned earlier in this article.  This is “Universal Sentence Encoder“(2).  In this website, we can find a good explanation of what it is.

“The Universal Sentence Encoder encodes text into high dimensional vectors that can be used for text classification, semantic similarity, clustering and other natural language tasks.”

The model takes sentences, phrases or short paragraphs and outputs vectors to be fed into the next process. “The universal-sentence-encoder-large” is trained with “Transformer” (-light is trained with a different model). The beauty is that Universal Sentence Encoder is already trained by Google and these results are available to perform “transfer learning” by ourselves.  This is great! This chart tells you how it works.

Sentense encoderThe team in Google claimed that “With transfer learning via sentence embeddings, we observe surprisingly good performance with minimal amounts of supervised training data for a transfer task.”.  So let me confirm how it works with a little data. I performed a small experiment based on this awesome article.  I modify the classification model and change the number of training samples. With only 100 training data,  I could achieve 79.2% accuracy.  With 300 data, 95.8% accuracy. This is great!  I believe the results come from the power of transfer learning with Universal Sentence Encoder.

result1red

In this article, I introduce transfer learning and perform a small experiment with the latest model “Universal Sentence Encoder”.  It looks very promising so far. I would like to continue transfer learning experiments and update the results here.  Stay tuned!

 

When you need AI consulting,  could you go to  TOSHI STATS website?

 

 

 

 

 

  1. Attention Is All You Need,  Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, Google, 12 June 2017.
  2. Universal Sentence Encoder,  Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil,  Google, 29 March 2018

 

Notice: Toshi Stats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. Toshi Stats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on Toshi Stats Co., Ltd. and me to correct any errors or defects in the codes and the software

 

This is my first machine intelligence model. It looks good so far!

DenceNet121_1-1

There are many images on the internet. A lot of people upload selfie-images to Instagram every day.   There are also many text data on the internet because Not only professionals writers but many people express their opinions on blogs and tweets. No one can see every image and text on the internet as it is a huge volume. In addition, images and texts sometimes have a relationship to each other. For example, people upload images and put explanations of them. Therefore I am always wondering how we can analyze both images and text at once.  There are several methods to do that. I choose image-captioning model out of these methods as it is easy to understand how it works.

 

1. What is an image-captioning model?

Before I start the project on image captioning, I performed computer vision projects and Natural language projects independently.  Computer vision means to classify cats and dogs or detect a specific type of cars and distinguish each of them from other types of cars. I also develop natural language models such as sentiment analysis of movie reviews. Image-captioning model is a kind of combined model of “computer vision and natural language model”.  Let us see the chart below.

image-captioning

A computer takes a picture as input. Then the encoder extracts features from the picture that is taken.  “feature” means the characteristics of an object”. Based on these features, the decoder generates sentences which describes what the picture tells us. This is how our “image-captioning” model works.

 

2. How can we find the template of “image captioning model” and modify it?

I found a good framework to develop our image-captioning models. It is “colab” provided by Google. Although it is free to use, there are many templates to start with the projects and GPU is available in it for research/interaction usages. It can provide us with a computational power to be required for developing image-captioning models. I found the original template of image-captioning in colab. The template is awesome as “the attention mechanism” is implemented. It uses inceptionV3 as an encoder and GRU as a decoder. But I would like to try other methods. I modify this template a little to change from inceptionV3 to densenet121 and from GRU to LSTM.  Let us see how it works on my experiment!

 

3. The results after 3-hour-training

Here is one of the outputs from my experiment of our image-captioning model. It says “a couple of two sugar covered in chocolate frosting are laid on top of a wooden table”. Although it is not perfect, it works very well.  When we input more data and computation time, it should be more accurate.

DenceNet121_1-2

 

This is the first step toward machine intelligence.  Of course, it is a long way to go.  But the combined images and texts, I believe we can develop many cool applications in the future. In addition, I found that “the attention mechanism” is very powerful to extract relevant information. I would like to focus on this mechanism to improve our algorithms going forward. Stay tuned!

 

(1) Olah&Carter, “Attention and Augmented Recurrent Neural Networks“, Distill, 2016.

 

When you need AI consulting,  could you see TOSHI STATS website?

Notice: Toshi Stats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. Toshi Stats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on Toshi Stats Co., Ltd. and me to correct any errors or defects in the codes and the software

 

 

 

 

We start AI Lab in the company and research “attention mechanism” in deep learning

As I said before. I completed the online course “deeplearning ai“. This is an awesome course I want to recommend to everyone. There are many topics we can learn in the course. One of the most interesting things for me is “attention mechanism” in neural translation.  So I would like to explain it in details. Do not worry as I do not use mathematics in this article.  Let us start.

 

The definition of attention mechanism is “The attention mechanism tells a Neural Machine Translation model where it should pay attention to at any step”. It may be natural when we consider how we translate language from one to another. Yes, human-being pays more attention to specific objects than others when they are more interesting to them. When we are hungry,  we tend to look for the sign of “restaurant” or ” food court”,  do not care the sing of “library”,  right?

We want to apply the same thing for translation by computers. Let me consider again. It is true that when we translate English to our mother tongue, such as Japanese, we look at the whole part of the sentences first, then make sure what words are important to us.  we do not perform translation one on one basis. In another word, we pay more attention to specific words than other words. So we want to introduce the same method in performing neural translation by computers.

 

Originally, attention mechanism was introduced (1) in Sep 2014. Since then there are many attention mechanisms introduced. One of the strongest attention models is “Transformer” by Google brain in  June 2017.  I think you use Google translation every day. It performs very well. But transformer is better than the model used in Google translation. This chart shows deference between  GNMT (Google translation) and Transformer(2).

Fortunately, Google prepares the framework to facilitate AI research.  It is called “Tensor2Tensor (T2T) “. It is open sourced and can be used without any fees. It means that you can do it by yourself! I decide to set up “AI Lab” in my company and introduce this framework to research attention mechanism. There are many pre-trained models including “Transformer”.  Why don’t you join us?

 

I used translations as our example to explain how attention mechanism works. But it can be applied to many other fields such as object detection which is used in face recognition and a self-driving car. It must be excited when we consider what can be achieved by attention mechanism.  I would like to update the progress.  So stay tuned!

 

 

When you need AI consulting,  do not hesitate to contact TOSHISTATS

 

(1) NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE.  By Dzmitry Bahdanau, KyungHyun Cho, Yoshua Bengio in Sep 2014

(2) Attention Is All You Need,  By Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez, Łukasz Kaiser,Illia Polosukhin,  in June 2017

 

 

 

Notice: TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software.

Is this a real voice by human being? It is amazing as generated by computers

girl-926225_640

As I shared the article this week,  I found the exciting system to generate voices by computers. When I heard the voice I was very surprised as it sounds so real. I recommend you to listen to them in the website here.  There are versions of English and Mandarine. This is created by DeepMind, which is one of the best research arms of artificial intelligence in the world. What makes it happen?   Let us see it now.

 

1. Computers learns our voices deeper and deeper

According to the explanation of DeepMind, they use “WaveNet, a deep neural network for generating raw audio waveforms”.  They also explain”pixel RNN and pixel CNN”, which are invented by them earlier this year. (They have got one of best paper award at ICML 2016, which are one of the biggest international conference about machine learning, based on the research). By applying pixel RNN and CNN to voice generation, computers can learn wave of voices far more details than previous methods. It enables computers generate more natural voices. It is how WaveNet is born this time.

As the result of learning raw audio waveforms, computer can generate voices that sound so real. Could you see the metrics below?  The score of WaveNet is not so different from the score of Human Speech (1). It is amazing!

%e3%82%b9%e3%82%af%e3%83%aa%e3%83%bc%e3%83%b3%e3%82%b7%e3%83%a7%e3%83%83%e3%83%88-2016-09-14-9-29-29

2. Computers can generate man’s voice as well as woman’s voice at the same time

As computer can learn wave of our voices more details,  they can create both man’s voice and woman’s voice. You can also listen to each of them in the web. DeepMind says “Similarly, we could provide additional inputs to the model, such as emotions or accents”(2) . I would like to listen them, too!

 

3. Computers can generate not only voice but also music!

In addition to that,  WaveNet can create music, too.  I listen to the piano music by WaveNet and I like it very much as it sounds so real. You can try it in the web, too.  When we consider music and voice as just data of audio waveforms, it is natural that WaveNets can generate not only voices but also music.

 

If we can use WaveNet in digital marketing, it must be awesome! Every promotions, instructions and guidance to customers can be done by voice of  WaveNet!  Customers may not recognize “it is the voice by computers”.  Background music could be optimized to each customer by WaveNet, too!  In my view, this algorithm could be applied to many other problems such as detections of cyber security attack, anomaly detections of vibrations of engines, analysis of earthquake as long as data can form  of “wave”.  I want to try many things by myself!

Could you listen the voice by WaveNet? I believe that in near future, computers could learn how I speech and generate my voice just as I say.  It must be exciting!

 

 

1,2.  WaveNet:A generative model for Raw Audio

https://deepmind.com/blog/wavenet-generative-model-raw-audio/

 

 

Notice: TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software