Since Q1 of 2019 is close to ending, I would like to talk about one of the biggest innovation of deep learning in Natural Language Processing (NLP). This is called “BERT” presented by Google AI in Oct 2018. As far as I know, it is the first model to perform very well in many language tasks such as sentimental analysis, question answering without any change of the model itself. It is amazing! Let us start now.
1. How BERT works?
The secrets of BERT are its structure and method of training. BERT introduces transformer as the main blocks in it. I mentioned transformer before as it is a new structure to extract information of sequential data. The key is the attention mechanism. It means to measure “how we should pay attention to each word in the sentence”. If you want to know more, it is a good reference. Then let us move on how BERT is trained. BERT means “Bidirectional Encoder Representations from Transformers”. For example, the word “bank” has different meanings in “bank account” and “bank of the river”. When the model can learn from data only forward direction, it is difficult to distinguish the difference of meaning of “bank”. But if it can learn not only forward but backward direction, the model can do so. It is the secret for BERT to perform the state of art in many NLP tasks without modifications. This is the chart from the research paper (1).
2. How can we apply BERT to our tasks for solutions?
BERT is so large that it needs a lot of data and computing resources such as GPU/TPU. Therefore it takes time and cost if we train BERT from scratch. But no need to worry. Google released a number of pre-trained models of BERT. It is great because we can use them as base models and all we have to do is just a small training to adjust to our own tasks such as text classification. It is called “fine-tuning”. These pre-trained models are open source and available for everyone. If you want to know more, please see the blog. The beauty is one of the pre-trained models is a multi-language model which works in 104 languages without any modifications. It is amazing! So it works in your language, too!
3. Can BERT accelerate digital transformation in our daily lives?
I think “Yes” because we are surrounding a massive amount of documentation such as contracts, customer reports, emails, financial reports, regulatory instructions, newspapers, and so on. It is impossible to understand everything and extract the information needed in a real-time manner. With BERT, we can develop much better applications to handle many text data and extract information needed efficiently. It is very exciting when we consider how many applications can be created by using BERT in the near future.
Hope you enjoy my article. Now I research BERT intensively and update my article soon. Stay tuned!
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
11 Oct 2018, Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova, Google AI Language
Notice: Toshi Stats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. Toshi Stats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on Toshi Stats Co., Ltd. and me to correct any errors or defects in the codes and the software