BERT performs very well in the classification task in Japanese, too!

As I promised in the last article, I perform experiments about classification of news title in Japanese. The result is very good as I expected. Let me explain the details.

I use “livedoor news corpus” (2) for this experiment. These are five-class of news title in this experiment. These are about life, movie, sports, chats, and electronics. Here is the detail of the class. I would like to classify each title of news according to this class correctly.

Then I train BERT(1) model with a sample of news title written in Japanese. Here is the result. The BERT model, which I used, is the multi-language model. All I have to do is fine-tuning to apply my task. As you can see below, The accuracy ratio is about 88%. It is very good while I use very small sample data (3503 for training, 876 for test). It took less than one minute on colab with GPU.

With 3 epochs, I confirmed that the accuracy ratio is over 88%

Let me take 10 samples for validation and see each of them. These samples are not used for training so they are new to the computer. Nine out of ten are classified correctly. It is so good, isn’t it?

The beauty is that the pre-trained model is not specific for only Japanese. As it is a multi-language model, it should work in many kinds of languages with the same fine-tuning as I did in Japanese. Therefore It should work in your languages, too!

How about this experiment? I continue to do experiments of BERT in many tasks of natural language and update my article soon. Stay tuned!

  1. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
    11 Oct 2018, Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova, Google AI Language
  2. livedoor news corpus CC BY-ND 2.1 JP

Notice: Toshi Stats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. Toshi Stats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on Toshi Stats Co., Ltd. and me to correct any errors or defects in the codes and the software

Advertisements

“BERT” can be a game changer to accelerate digital transformation!

Since Q1 of 2019 is close to ending,  I would like to talk about one of the biggest innovation of deep learning in Natural Language Processing (NLP).  This is called “BERT” presented by Google AI in Oct 2018. As far as I know, it is the first model to perform very well in many language tasks such as sentimental analysis, question answering without any change of the model itself. It is amazing! Let us start now.

1. How BERT works?

The secrets of BERT are its structure and method of training.   BERT introduces transformer as the main blocks in it.  I mentioned transformer before as it is a new structure to extract information of sequential data. The key is the attention mechanism. It means to measure “how we should pay attention to each word in the sentence”. If you want to know more, it is a good reference. Then let us move on how BERT is trained. BERT means “Bidirectional Encoder Representations from Transformers”. For example, the word “bank” has different meanings in “bank account” and “bank of the river”. When the model can learn from data only forward direction, it is difficult to distinguish the difference of meaning of “bank”. But if it can learn not only forward but backward direction, the model can do so. It is the secret for BERT to perform the state of art in many NLP tasks without modifications. This is the chart from the research paper (1).

2. How can we apply BERT to our tasks for solutions?

BERT is so large that it needs a lot of data and computing resources such as GPU/TPU.  Therefore it takes time and cost if we train BERT from scratch. But no need to worry.  Google released a number of pre-trained models of BERT. It is great because we can use them as base models and all we have to do is just a small training to adjust to our own tasks such as text classification. It is called “fine-tuning”. These pre-trained models are open source and available for everyone. If you want to know more, please see the blog. The beauty is one of the pre-trained models is a multi-language model which works in 104 languages without any modifications. It is amazing! So it works in your language, too!

3. Can BERT accelerate digital transformation in our daily lives?

I think “Yes” because we are surrounding a massive amount of documentation such as contracts, customer reports, emails, financial reports, regulatory instructions, newspapers, and so on. It is impossible to understand everything and extract the information needed in a real-time manner.  With BERT, we can develop much better applications to handle many text data and extract information needed efficiently. It is very exciting when we consider how many applications can be created by using BERT in the near future.

Hope you enjoy my article. Now I research BERT intensively and update my article soon. Stay tuned!

  1. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
    11 Oct 2018, Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova, Google AI Language

 

 

 

Notice: Toshi Stats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. Toshi Stats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on Toshi Stats Co., Ltd. and me to correct any errors or defects in the codes and the software

We start AI Lab in the company and research “attention mechanism” in deep learning

As I said before. I completed the online course “deeplearning ai“. This is an awesome course I want to recommend to everyone. There are many topics we can learn in the course. One of the most interesting things for me is “attention mechanism” in neural translation.  So I would like to explain it in details. Do not worry as I do not use mathematics in this article.  Let us start.

 

The definition of attention mechanism is “The attention mechanism tells a Neural Machine Translation model where it should pay attention to at any step”. It may be natural when we consider how we translate language from one to another. Yes, human-being pays more attention to specific objects than others when they are more interesting to them. When we are hungry,  we tend to look for the sign of “restaurant” or ” food court”,  do not care the sing of “library”,  right?

We want to apply the same thing for translation by computers. Let me consider again. It is true that when we translate English to our mother tongue, such as Japanese, we look at the whole part of the sentences first, then make sure what words are important to us.  we do not perform translation one on one basis. In another word, we pay more attention to specific words than other words. So we want to introduce the same method in performing neural translation by computers.

 

Originally, attention mechanism was introduced (1) in Sep 2014. Since then there are many attention mechanisms introduced. One of the strongest attention models is “Transformer” by Google brain in  June 2017.  I think you use Google translation every day. It performs very well. But transformer is better than the model used in Google translation. This chart shows deference between  GNMT (Google translation) and Transformer(2).

Fortunately, Google prepares the framework to facilitate AI research.  It is called “Tensor2Tensor (T2T) “. It is open sourced and can be used without any fees. It means that you can do it by yourself! I decide to set up “AI Lab” in my company and introduce this framework to research attention mechanism. There are many pre-trained models including “Transformer”.  Why don’t you join us?

 

I used translations as our example to explain how attention mechanism works. But it can be applied to many other fields such as object detection which is used in face recognition and a self-driving car. It must be excited when we consider what can be achieved by attention mechanism.  I would like to update the progress.  So stay tuned!

 

 

When you need AI consulting,  do not hesitate to contact TOSHISTATS

 

(1) NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE.  By Dzmitry Bahdanau, KyungHyun Cho, Yoshua Bengio in Sep 2014

(2) Attention Is All You Need,  By Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez, Łukasz Kaiser,Illia Polosukhin,  in June 2017

 

 

 

Notice: TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software.