BERT performs very well in the classification task in Japanese, too!

As I promised in the last article, I perform experiments about classification of news title in Japanese. The result is very good as I expected. Let me explain the details.

I use “livedoor news corpus” (2) for this experiment. These are five-class of news title in this experiment. These are about life, movie, sports, chats, and electronics. Here is the detail of the class. I would like to classify each title of news according to this class correctly.

Then I train BERT(1) model with a sample of news title written in Japanese. Here is the result. The BERT model, which I used, is the multi-language model. All I have to do is fine-tuning to apply my task. As you can see below, The accuracy ratio is about 88%. It is very good while I use very small sample data (3503 for training, 876 for test). It took less than one minute on colab with GPU.

With 3 epochs, I confirmed that the accuracy ratio is over 88%

Let me take 10 samples for validation and see each of them. These samples are not used for training so they are new to the computer. Nine out of ten are classified correctly. It is so good, isn’t it?

The beauty is that the pre-trained model is not specific for only Japanese. As it is a multi-language model, it should work in many kinds of languages with the same fine-tuning as I did in Japanese. Therefore It should work in your languages, too!

How about this experiment? I continue to do experiments of BERT in many tasks of natural language and update my article soon. Stay tuned!

  1. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
    11 Oct 2018, Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova, Google AI Language
  2. livedoor news corpus CC BY-ND 2.1 JP

Notice: Toshi Stats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. Toshi Stats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on Toshi Stats Co., Ltd. and me to correct any errors or defects in the codes and the software

Advertisements

BERT performs near state of the art in question and answering! I confirm it now

Today, I write the article of BERT, which a new natural language model, again because it works so well in question and answering task. In my last article, I explained how BERT works so if you are new about BERT, could you read it?

For this experiment, I use SQuADv1.1data as it is very famous in the field of question and answering.  Here is an explanation by them.

“Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowd workers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.” (This is from SQuAD2.0, a new version of Q&A data)

This is a very challenging task for computers to answer correctly. How does BERT work for this task? As you saw below, BERT recorded f1 90.70 after one-hour training on TPU on colab in our experiment. It is amazing because based on the Leaderboard of SQuAD1.1 below, it is the third or fourth among top universities and companies although the Leaderboard may be different from our experiment setting. It is also noted BERT is as good as a human is!

 

 

 

I tried both Base model and Large model with different batch size.  Large model is better than Base model with around 3 points. Large model takes around 60 minutes to complete training while Base model takes around 30 munites. I use TPU on Google colab for training. Here is the result. EM means “exact match”.

Question & answering can be applied to many tasks in businesses, such as information extraction from documents and automation for customer centers. It must be exciting when we can apply BERT to businesses in the near future.

 

Next, I would like to perform text-classification of news title in Japanese because BERT has a multi-language model which works in 104 languages globally. As I live in Tokyo now, it is easy to find good data for this experiment. I will update my article soon. So stay tuned!

 

 

 

 

 

@article{devlin2018bert,
  title={BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding},
  author={Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina},
  journal={arXiv preprint arXiv:1810.04805},
  year={2018}
}

Notice: Toshi Stats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. Toshi Stats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on Toshi Stats Co., Ltd. and me to correct any errors or defects in the codes and the software