Hello, friends. I am Toshi. Today I update my weekly letter. This week’s topic is “e-mail”. Now everyone uses email to communicate with customers, colleagues and families. It is useful and efficient. However, if you try to read massive amounts of e-mails at once manually, it takes a lot of time. Recently, computers can read e-mail and classify potentially relevant e-mail from others instead of us. So I am wondering how computers can do that. Let us consider it a little.
1. Our words can become “data”
When we hear the word “data”, we imagine numbers in spreadsheets. This is a kind of “traditional” data. Formally, it is called “structured data”. On the other hand, text such as words in e-mail, Twitter, Facebook can be “data”, too. This kind of data is called “unstructured data“. Most of our data exist as “unstructured data” around us. However, computers can transform these data into data that can be analyzed. This is generally an automated process. So we do not need to check each of them one by one. Once we can create these new data, computers can analyze them at astonishing speed. It is one of the biggest advantages to use computers in analyzing e-mails.
2. Classification comes again
Actually, there are many ways for computers to understand e-mails. These methods are sometimes called “Natural language processing (NLP)“. One of the most sophisticated one is a method using machine learning and understanding the meaning of sentences by looking at the structures of sentences. Here I would like to introduce one of the simplest methods so that everyone can understand how it works. It is easy to imagine that the “number of each word” can be data. For example, ” I want to meet you next week.”. In this case, (I,1), (want,1),(to,1), (meet,1),(you,1), (next,1),(week,1) are data to be analyzed. The longer sentences are, the more words appear as data. For example, we try to analyze e-mails from customers to assess who are satisfied with our products. If the number of positive words, such as like, favorite, satisfy, are high, it might mean customers are satisfied with the products, vice versa. This is a problem of “classification“. So we can apply the same method as I explained before. The “target” is “customers satisfied” or “not satisfied” and “features” are the number of each word.
3. What’s the impact to businesses?
If computers understand what we said in text such as e-mails, we can make the most out of it in many fields. For the marketing, we can analyze the voices of customers from the massive amount of e-mails. For the legal services, computers identify what e-mails are potentially relevant as evidences for litigations. It is called “e-discovery“. In addition to that, I found that Bank of England started monitoring social networks such as Twitter and Facebook in order to research economies. This is a kind of “new-wave” of economic analysis. These are just examples. I think you can create many examples of applications for businesses by yourself because we are surrounded by a lot of e-mails now.
In my view, natural language processing (NLP) will play a major role in the digital economy. Would you like to exchange e-mail with computers?