When I lived in Kuala Lumpur, Malaysia, I always thought I am in a multilingual environment. Most people speak Bahasa Malaysia, But when they talk to me, they speak English. Some of them understand Japanese. Chinese Malaysian people speak Mandarin, Cantonese. Indian people speak Hindi or other languages. Yes, I am sure Asia is a “multi-lingual environment”. Since then I am always wondering how we can develop the system that can accept many languages as inputs. Now I found that.
This is the first cross-intelligent system by TOSHI STATS. It can accept 16 languages and perform sentimental analysis. Let me explain the details.
1. Inputs in 16 languages
As we use MUE(1) models in TensorFlow Hub, it can accept 16 languages ( See the list below ). Usually, Japanese systems cannot be input English and English systems cannot accept Japanese. But this system can be input both of them and work well. This is amazing! The first screenshot of our system is the input in Engish and the second is input in Japanese.
We do not need a system for each language, one by one. The secret is the system can map each sentence to the same space although each of them is written in different languages.
2. Transfer learning from English to other languages
I think it is the biggest breakthrough in the system. As a result of sharing the same space among languages, we can train the model in English and transfer its knowledge to other languages. For example, there are many text data for training models in English but there are a few in Japanese. In such a case, it is difficult to train models effectively in Japanese. But we can train models in English and use it in Japanese. It is great! Of course, we can train the model in another language and transfer it to others. It is extraordinary as it enables us to transfer knowledge and expertise from one languages to another.
3. Experiment and result
I choose one news title(2) from The Japan Times and perform sentiment analysis with the system. The title is ” Naomi Osaka cruises to victory in Pan Pacific Open final to capture first title on Japanese soil “. I think it should be positive.
This English sentence is translated into other 15 languages by Google translation. Then each sentence is input to the system and we measure “probability of positive sentiment”. Here is the result. 90% of them are over 0.8. It means that in most languages, the system can recognize each sentence as definitely “positive”. This is amazing! It works pretty well in 16 languages although the model is trained only in English.
When I develop this cross-lingual intelligent system, I think it is already smarter than I am as I do not know what sentences in 14 languages mean except Japanese and English. Based on this method, we can develop many intelligent systems which are difficult to develop one year ago. Let me update the progress of our intelligent system. Stay tuned!
1. Multilingual Universal Sentence Encoder for Semantic Retrieval , Google, YinfeiYang, Daniel Cer, Amin Ahmad, Mandy Guo, Jax Law, Noah Constant, Gustavo Hernandez Abrego, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil, July 9 2019
2.Naomi Osaka cruises to victory in Pan Pacific Open final to capture first title on Japanese soil, The Japan Time, Sep 22, 2019.
Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software