“DEEP LEARNING PROJECT for Digital marketing” starts today. I present probability of visiting the store here

cake-623579_640

At the beginning of this year,  I set up a new project of my company.  The project is called “Deep Learning project” because “Deep Learning” is used as a core calculation engine in the project. Now that I have set up the predictive system to predict customer response to a direct mailing campaign, I would like to start a sub-project called  “DEEP LEARNING PROJECT for Digital marketing”.  I think the results from the project can be applied across industries, such as healthcare, financial, retails, travels and hotels, food and beverage, entertainments and so on. First, I would like to explain how to obtain probability for each customer to visit the store in our project.

 

1. What is the progress of the project so far?

There are several progresses in the project.

  • Developing the model to obtain the probability of visiting the store
  • Developing the scoring process to assign the probability to each customer
  • Implement the predictive system by using Excel as an interface

Let me explain our predictive system. We constructed the predictive system on the platform of  Microsoft Azure Machine Learning Studio. The beauty of the platform is Excel, which is used by everyone, can be used as an interface to input and output data. This is our interface of the predictive system with on-line Excel. Logistic regression in MS Azure Machine Learning is used as our predictive model.

The second row (highlighted) is the window to input customer data.

Azure ML 1

Once customer data are input, the probability for the customer to visit the store can be output. (See the red characters and number below). In this case (Sample data No.1) the customer is less likely to visit the store as Scored  Probabilities is very low (0.06)

Azure ML 3

 

On the other hand,  In the case (Sample data No.5) the customer is likely to visit the store as Scored Probabilities is relatively high (0.28). If you want to know how it works, could you see the video?

Azure ML 2

Azure ML 4

 

2. What is the next in our project?

Once we create the model and implement the predictive system, we are going to the next stage to reach more advanced topics

  • More marketing cases with variety of data
  • More accuracy by using many models including Deep Learning
  • How to implement data-driven management

 

Our predictive system should be more flexible and accurate. In order to achieve that, we will perform many experiments going forward.

 

3. What data is used in the project?

There are several data to be used for digital marketing. I would like to use this data for our project.

When we are satisfied with the results of our predictions by this data,  next data can be used for our project.

 

 

Digital marketing is getting more important to many industries from retail to financial.   I will update the article about our project on a monthly basis. Why don’t you join us and enjoy it!  When you have your comments or opinions, please do not hesitate to send us!

If you want to receive update of the project or want to know the predictive system more, could you sing up here?

 

 

 

Microsoft, Excel and AZURE are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.

Notice: TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software.

Can computers write sentences of docs to support you in the future?

49cad23354bef871147702f5880a45c6_s

This is amazing!  It is one of the most incredible applications for me this year!  I am very excited about that.  Let me share with you as you can use it,  too.

This is “Smart Reply of Inbox”, an e-mail application from Google.  It was announced on 3rd November. I try it today.

For example, I got e-mail from Hiro. He asked me to have a lunch tomorrow. In the screen, three candidates of my answer appear automatically.  1. Yes, what time?  2. Yes, what’s up  3. No, sorry.  These candidates  are created after computers understand what Hiro said in the e-mail. So each of them is very natural for me.

mail1

So all I have to do is just to choose the first candidate and send it to Hiro.  It is easy!

mail2

According to Google, state of the art technology “Long short term memory” is used in this application.

I always wonder how computers understand the meaning of words and sentences.  In this application, sentences are represented in fixed sized vectors. It means that each sentence is converted to sequences of numbers.  If two sentences have the same meaning,  the vector of each sentence should be similar to each other even though the original sentences look different.

 

This technology is one of the machine learning. Therefore,  the more people use it, the more sophisticated it can be because it can learn by itself.  Now it applies to relatively short sentences like e-mail. But I am sure it will be applied to longer sentences, such as official documents in business.  I wonder when it happens in the future.  Pro. Geoffrey Hinton is expected to research this area with intense.  If it happens, computers will be able to understand what documents mean and create some sentences based on their understanding.  I do not know how Industires are changed when it happens.

This kind of technology is sometimes referred as “Natural language processing” or “NLP”.   I want to focus on this area as a main research topic of my company in 2016.  Some progresses will be shared through my weekly letter here.

 

I would like to recommend you to try Smart Reply of Inbox and enjoy it!  Let me know your impressions. Cheers!

 

 

 

Note: Toshifumi Kuga’s opinions and analyses are personal views and are intended to be for informational purposes and general interest only and should not be construed as individual investment advice or solicitation to buy, sell or hold any security or to adopt any investment strategy.  The information in this article is rendered as at publication date and may change without notice and it is not intended as a complete analysis of every material fact regarding any country, region market or investment.

Data from third-party sources may have been used in the preparation of this material and I, Author of the article has not independently verified, validated such data. I and TOSHI STATS.SDN.BHD. accept no liability whatsoever for any loss arising from the use of this information and relies upon the comments, opinions and analyses in the material is at the sole discretion of the user. 

“Community” accelerates the progress of machine learning all over the world!

cake-1005760_1280

When you start learning programming,  it is recommended to visit the sites of community of languages.  “R” and “python” have big communities, and they have been contributing to the progress of each language. This is good for all users. H2O. ai also held an annual community conference “H2O WORLD 2015”  this month.  Now video and presentation slides are available through the internet. I could not attend the conference as it was held in Silicon Valley in the US. But I can follow and enjoy it just by going through websites. I recommend you to have a quick look to understand how knowledge and experiences can be shared at the conference. It is good for anyone who are interested in data analysis.

 

1.  The user communities can accelerate the progress of open source languages

When I started learning “MATLAB®” in 2001,  there were few user communities in Japan as far as I knew.  So I should attend the paid seminars to learn this language, which were not cheap.  But now most of uses communities are available without any fee. In addition to that,  this kind of communities have been bigger and bigger recently.   One of the main reasons is that number of “open source languages” are increasing recently.    “R” and “python” are also open source languages. It means that when someone want to try certain language,  all they have to do is just “download  and use it”.  Therefore, users can be increased at an astonishing pace.  On the other hand,  if someone want to try “proprietary software” such as MATLAB, they must buy each license before using it. I loved MATLAB for many years and recommended my friends to use it. But unfortunately no one uses it privately because it is difficult to pay license fee privately.  I imagine that most users of proprietary software are in organizations such as companies and universities.  In such case, organizations pay license fees.  So each individual can enjoy no freedom to choose languages they want to use. Generally it is difficult to switch from one language to another when proprietary softwares are used. It is called “Vendor lock-in“.  Open source languages can avoid that. This is one of the reasons why I love open source languages now. The more people can use, the more progress can be achieved.  New technologies such as “machine learning” can be developed thought user communities because more users will join going forward.

 

2.  The real industry experiences can be shared in communities

It is the most exciting part of the community.  As a lot of data scientists and engineers from industry join communities,  their knowledge and experience are shared frequently.  It is difficult to find this kind of information in other places.  For example, the theory of algorithms and methods of programming can be found in the courses provided by universities in MOOCs. But there are few about industry experiences in MOOCs in a real time basis.  For example, in H2O WORLD 2015,  there are sessions with many professionals and CEOs from industries. They share their knowledge and experiences there.  It is a treasure not only for experts of data analysis, but for business personnel who are interested in data analysis. I would like to share my own experience in user communities in future.

 

3.  Big companies are supporting uses communities

Recently major IT big companies have noticed the importance of the user community and try to support them.  For example, Microsoft supports “R Consortium” as a platinum member. Google and Facebook support communities of their open source languages, such as “TensorFlow” and “Torch“.  Because new things are likely to happen and be developed among users outside the companies.  Therefore It is also beneficial to big IT companies when they support user communities. Many other IT companies are supporting communities, too. You can find many names as sponsors under the big conference of user communities.

 

The next big conference of user communities is “useR! – International R User Conference 2016“.  It will be held on June 2016.  Why don’t you join us?  You may find a lof of things there. It must be exciting!

 

Note: Toshifumi Kuga’s opinions and analyses are personal views and are intended to be for informational purposes and general interest only and should not be construed as individual investment advice or solicitation to buy, sell or hold any security or to adopt any investment strategy.  The information in this article is rendered as at publication date and may change without notice and it is not intended as a complete analysis of every material fact regarding any country, region market or investment.

Data from third-party sources may have been used in the preparation of this material and I, Author of the article has not independently verified, validated such data. I and TOSHI STATS.SDN.BHD. accept no liability whatsoever for any loss arising from the use of this information and relies upon the comments, opinions and analyses in the material is at the sole discretion of the user. 

“Speed” is the first priority of data analysis in the age of big data

cake-219595_1280

When I learned data analysis a long time ago,  the number of samples of data was from 100 to 1,000. Because teachers should explain what the data are in the details.  There were  a little parameters that was calculated, too.  Therefore, most of statistical tools could handle these data within a reasonable time.  Even spread sheets worked well.  There are huge volume data,  however,  and there are more than 1,000 or10,000 parameters that should be calculated now.  We have problems to analyze data because It takes too long to complete the analysis and obtain the results.  This is the problem in the age of big data.

This is one of the biggest reasons why new generation tools and languages of machine learning appear in the market.  Torch became open sourced from Facebook at January 2015.  H2O 3.0 was released as open source in May 2015 and TensorFlow was also released from Google as open source in this month.  Each language explains itself as “very fast” language.

 

Let us consider each of the latest languages.  I think each language puts importance into the speed of calculations.  Torch uses LuaJIT+C, H2O uses Jave behind it.  TensorFlow uses C++. LuaJIT , Java and C++ are usually much faster compared to script languages such as python or R. Therefore new generation languages must be faster when big data should be analyzed.

Last week, I mentioned deep learning by R+H2O.  Then let me check how fast H2O runs models to complete the analysis.  This time, I use H2O FLOW,  an awesome GUI,  shown below.  The deep learning model runs on my MAC Air11  (1.4 GHz Intel Core i5, 4GB memory, 121GB HD) as usual.  Summary of the data used  as follows

  • Data: MNIST  hand-written digits
  • Training set : 19000 samples with 785 columns
  • Test set : 10000 samples with 785 columns

Then I create the deep learning model with three hidden layers and corresponding units (1024,1024,2048).  You can see it in red box here. It is a kind of complex model as it has three layers.

DL MNIST1 model

It took just 20 minutes to complete. It is amazing!  It is very fast, despite the fact that  deep learning requires many calculations to develop the model.  If deep learning models can be developed within 30 minutes,  we can try many models at different setting of parameters to understand what the data means and obtain insight from them.

DL MNIST1 time

I did not stop running the model before it fitted the data.  These confusion matrices tell us error rate is 2.04 % for training data (red box) and 3.19 % of test data (blue box). It looks good in term of  data fitting.  It means that 20 minutes is enough to create good models in this case.

DL MNIST1 cm

 

Now it is almost impossible to understand data by just looking at them carefully because it is too big to look at with our eye. However,  through analytic models, we can understand what data means. The faster analyses can be completed,  the more  insight can be obtained from data. It is wonderful for all of us.  Yes, we can have an enough time to enjoy coffee and cakes with relaxing after our analyses are completed!

 

 

Note: Toshifumi Kuga’s opinions and analyses are personal views and are intended to be for informational purposes and general interest only and should not be construed as individual investment advice or solicitation to buy, sell or hold any security or to adopt any investment strategy.  The information in this article is rendered as at publication date and may change without notice and it is not intended as a complete analysis of every material fact regarding any country, region market or investment.

Data from third-party sources may have been used in the preparation of this material and I, Author of the article has not independently verified, validated such data. I and TOSHI STATS.SDN.BHD. accept no liability whatsoever for any loss arising from the use of this information and relies upon the comments, opinions and analyses in the material is at the sole discretion of the user. 

 

“H2O”, this is an awesome tool of “Digital marketing” for everyone!

fruit-1004887_640

Last week I found the awesome tool for digital marketing as well as data analysis.  It is called “H2O“.  Although it is open source software, its performance is incredible and easy to use.  I would like to introduce it to Sales/Marketing personnel who are interested in Digital marketing.

“H2O is open-source software for big-data analysis. It is produced by the start-up H2O.ai(formerly 0xdata), which launched in 2011 in Silicon Valley. The speed and flexibility of H2O allow users to fit hundreds or thousands of potential models as part of discovering patterns in data. With H2O, users can throw models at data to find usable information, allowing H2O to discover patterns. Using H2O, Cisco estimates each month 20 thousand models of its customers’ propensities to buy while Google fits different models for each client according to the time of day.” according to Wikipedia(1).

Although its performance looks very good, it is open source software. It means that everyone can use the awesome tool without any fee.  It is incredible!  “H2O” is awarded one of ” Bossie Awards 2015: The best open source big data tools” (2).  This image shows H2O user interface “H2O FLOW”.

H2O Flow

By using this interface, you can use the state of art algorithm such as “Deep learning” without programming.  It is very important for beginners of data analysis. Because they can start data analysis without programming anyway.  Dr. Arno Candel,   Physicist & Hacker at H2O.ai. , said  “And the best thing is that the user doesn’t need to know anything about Neural Networks”(3).  Once models are developed by this user interface, program of the model with “Java” is automatically generated.  It can be used in production systems with ease.

 

 

One of the advantages of open source is that many user’s cases are publicly available. Open source can be public, therefore it is easy to be distributed as users’ experiences of “What is good?” and “What is bad?”.   This image is a collection of tutorials “H2O University“.  It is also available for free. There are many other presentations, videos about H2O in the internet, too! You may find your industry”s cases among them. Therefore, there is a lot of materials to learn H2O by ourselves.

H2O Univ

 

In addition to that,  “H2O” can be used as an extension of “R“.  R is one of the most widely-used analytical language.  “H2O” can be controlled from R console easily. Therefore  “H2O” can be integrated with R.  “H2O” also can be used with Python.

There are so many other functionalities in H2O. I cannot write everything here.  I am sure it is an awesome tool for both business personnel and data scientists.  I  would like to start using “H2O” and publish my experiences of “H2O”going forward. Why don’t you join “H2O community”?

 

 

Source

1.Wikipedia:H2O (software)

https://en.wikipedia.org/wiki/H2O_(software)

2.Bossie Awards 2015: The best open source big data tools

http://www.infoworld.com/article/2982429/open-source-tools/bossie-awards-2015-the-best-open-source-big-data-tools.html#slide4

3.Interview: Arno Candel, H2O.ai on the Basics of Deep Learning to Get You Started

http://www.kdnuggets.com/2015/01/interview-arno-candel-0xdata-deep-learning.html

 

Note: Toshifumi Kuga’s opinions and analyses are personal views and are intended to be for informational purposes and general interest only and should not be construed as individual investment advice or solicitation to buy, sell or hold any security or to adopt any investment strategy.  The information in this article is rendered as at publication date and may change without notice and it is not intended as a complete analysis of every material fact regarding any country, region market or investment.

Data from third-party sources may have been used in the preparation of this material and I, Author of the article has not independently verified, validated such data. I and TOSHI STATS.SDN.BHD. accept no liability whatsoever for any loss arising from the use of this information and relies upon the comments, opinions and analyses in the material is at the sole discretion of the user. 

Linkedin bought a predictive marketing company. What does it mean?

conference-room-768441_640

I think many people like Linkedin as a platform of professionals and they are interested in what is going on in the company.  Last week, I found that “Today we are pleased to announce that we’ve acquired Fliptop, a leading provider of predictive sales and marketing software.1″  (David Thacker said on the blog on 27 August, 2015 ).  Linekdin bought a leading predictive marketing company.  What does it mean? Let me consider a little.

 

1. What does “Fliptop” do?

It is a marketing software company. On the website, it says  “DRIVE REVENUE FASTER WITH PREDICTIVE MARKETING”, “Increase lead conversion rates and velocity.” and “Identify the companies most likely to buy”.  It was established in 2009 so it is a relatively young company. The company uses technologies called “Machine learning” to identify potential customers with high probability of purchase of products and services.   According to the website of the company, it has an expertise of standard machine learning algorithms, such as logistic regression and decision trees. These methods are used for classifications or predictions.  For example,  the company can identify who is likely to buy the products based on data, including the purchase history of each customer in the past. It hires computer science experts to develop the models for predictions.

 

2. What will Linkedin do with Fliptop?

As you know, Linkedin has a huge customer base so it has a massive amount of data generated by users of Lnkedin everyday.  This data have been accumulated every second. Therefore Lnikedin should have an ability and enhance it to make the most out of the data.  Linkedin should analyze the data and make better business decisions to compete other IT companies in the markets. In order to do that, there are two options, 1. Technologies developed In-house,  2. Purchase of resources outside the company.  Lnkedin took an option of “2” this time. Doug Camplejohn, CEO of Fliptop, said “We will continue to support our customers with existing contracts for some period of time, but have decided not to take on any new ones. We will also be reaching out to our customers shortly to discuss winding down their existing relationship with Fliptop.”.  Therefore Fliptop will not be independent as a service provider and will be integrated into the functions of Linkedin. It seems that knowledge and expertise of Fliptop are seamlessly integrated into Linkedin in future.  I am not so sure what current users of Fliptop should do as long as I know now.

 

3.  Data is “King”

This kind of purchases has been seen in IT industry recently. Google bought “DNN research” in 2013 and “DeepMind” in 2014. Microsoft also bought “Revolution Analytics” in 2015.  These small or medium size companies have expertise in machine learning and data analysis.  When they try to expand their businesses, they need massive data to be analyzed. However, they are not owners of a massive amount of data. Owners of a massive amount of data are usually big IT companies, such as Google and Facebook.  It is sometimes difficult for relatively small companies to obtain a massive amount of data, including  customer data.  On the other hand, big IT companies, including Linkedin, are usually owners of huge customer data. In addition to that, big IT companies now  enhance resources and expertise to analyze data as well. Once they have both of them, new services can be created and offered in a shorter period. The more people use these services, the more accurate and effective they can be.  Therefore, it sounds logical when big IT companies acquire small companies with expertise in data analysis and machine learning. Big IT companies definitely need their expertise in data analysis and machine learning.

 

 

From the standpoint of consumers, it is good because they can enjoy many services offered by big IT companies with lower costs. But from the standpoint of companies, competitions are getting tougher as this occurs not only in IT industries but many other industries. Now Linkedin seems to be ready for this competition, which comes in the future.

Machine learning is sometimes considered as engines and data are considered as fuel.  When they are combined in one place, new knowledge and insights may be found and new products and services may be created.  It accelerates changes of the landscape of the industries. Mobile, cloud, big data, IOT and artificial intelligence will contribute to this change a lot. It must be exciting to see what happens next in the future.

 

 

 

Source

1. Accelerating Our Sales Solutions Efforts Through Fliptop Acquisition, David Thacker, August 27, 2015 

http://sales.linkedin.com/blog/accelerating-our-sales-solutions-efforts-through-fliptop-acquisition/

2.A New Chapter,  Doug Camplejohn, August 27, 2015

http://blog.fliptop.com/blog/2015/08/27/a-new-chapter/

 

 

Note: Toshifumi Kuga’s opinions and analyses are personal views and are intended to be for informational purposes and general interest only and should not be construed as individual investment advice or solicitation to buy, sell or hold any security or to adopt any investment strategy.  The information in this article is rendered as at publication date and may change without notice and it is not intended as a complete analysis of every material fact regarding any country, region market or investment.

Data from third-party sources may have been used in the preparation of this material and I, Author of the article has not independently verified, validated such data. I accept no liability whatsoever for any loss arising from the use of this information and relies upon the comments, opinions and analyses in the material is at the sole discretion of the user. 

“Prediction” is very important in analyzing big data of the business

money-548948_640

It is a good timing to reconsider “Big data and digital economy” because this name of group on Linledin has four-month-history and more than 100 participants now. I would like to appreciate the cooperation of all of you.

In the beginning of 2000s, I worked in the risk management dept in the Japanese consumer finance company.   There is a credit risk model which can predict who is likely to be in a default in the company. I learned it more details and understood how it worked so accurately. I found that if I collect a lot of data about customers, I could obtain accurate predictions for events of defaults in terms of each customer.

Now in 2015,  I researched many algorithms and statistical models including the state of art “deep learning”.   While there are many usages and objectives in using such models,  in my view,  the most important thing for business persons is “prediction” just like my experience in consumer finance company because they should make good business decisions to compete in markets.

If you are in health care industry,  you may be interested in predictions about who is likely to be cured. If you are in sales, you may be interested in predictions about who is likely to come to the shop and buy the products. If you are in marketing,  you may be interested in who is likely to click the advertisement on the web.  Whatever you do,  predictions are very important for your businesses because it enables us to take the right actions.  Let me explain key points about predictions.

 

Target

What are your interests to predict?    Revenue of your business?  Number of customers?    Satisfaction rate based on client feedback?  Price of wine near futures? You can mention anything you want.  We call it “Target”.  So firstly, “Target” should be defined in predictions so that you can make right business decisions.

 

Features

Secondly,  let us find something related to your target.  For example,   If you are a sales person and interested in who is likely to buy the products,  features are “attributes of each customer such as age, sex, occupation” , “behavior of each customer such as how many times he/she come to the shop per month and when he/she bought the products last time”,  “What did he/she click in the web shop”  and so on.  Based on the prediction, you can send coupons or tickets to “highly likely to buy”customers in order to increase your sales.  If you are interested in the price of wine,  features may be temperature,  amount of rain and locations of farms,  and so on.  If you can predict the price of wine,  you might make  good investments of wine.  These are just simple examples. In reality,  a number of features may be 100,  1000  or more.  It depends on whole data you have.  Usually the more data you have, the more accurate your predictions are.  This is why data is very important to obtain predictions.

 

Evaluation of predictions

Finally by inputting features into statistical models,  predictions of the target can be obtained. Therefore, you can predict who is likely to buy the products when you think of marketing strategies.  This is good for your business as marketing strategies can be more effective.  Unfortunately customer preferences may be changed in the long run.  When situations and environments such as customer preferences are changed,  predictions may not be accurate anymore.  So it is important to evaluate predictions and update statistical models periodically.  No model can work accurately forever.

 

Once you can obtain the prediction,  you can implement processes of the predictions as a daily activity, rather than one-off analysis. It means that data driven decisions are made on a daily basis.  It is one of the biggest aspects of “digital economy”.  From retail shops to health care and financial industry,  predictions are already used in many fields.  The methods of predictions are sometimes considered as “black-box”.  But I do not think It is good to use predictions without understanding the methods behind predictions. I would like to explain them in my weekly letter in future.  Hope you enjoy it!

 

 

Note: Toshifumi Kuga’s opinions and analyses are personal views and are intended to be for informational purposes and general interest only and should not be construed as individual investment advice or solicitation to buy, sell or hold any security or to adopt any investment strategy.  The information in this article is rendered as at publication date and may change without notice and it is not intended as a complete analysis of every material fact regarding any country, region market or investment.

Data from third-party sources may have been used in the preparation of this material and I, Author of the article has not independently verified, validated such data. I accept no liability whatsoever for any loss arising from the use of this information and relies upon the comments, opinions and analyses in the material is at the sole discretion of the user. 

Do you know how computers can read e-mails instead of us?

email-329819_1280

Hello, friends. I am Toshi. Today I update my weekly letter. This week’s topic is “e-mail”.   Now everyone uses email to communicate with customers, colleagues and families. It is useful and efficient. However, if you try to read massive amounts of e-mails at once manually, it takes a lot of time.  Recently, computers can read e-mail and classify potentially relevant e-mail from others instead of us. So I am wondering how computers can do that. Let us consider it a little.

1.  Our words can become “data”

When we hear the word “data”,  we imagine numbers in spreadsheets.  This is a kind of “traditional” data.  Formally, it is called “structured data”. On the other hand, text such as words in e-mail, Twitter, Facebook can be “data”, too.  This kind of data is called “unstructured data“. Most of our data exist as “unstructured data” around us.  However, computers can transform these data into data that can be analyzed. This is generally an automated process. So we do not need to check each of them one by one. Once we can create these new data, computers can analyze them at astonishing speed.  It is one of the biggest advantages to use computers in analyzing e-mails.

2. Classification comes again

Actually, there are many ways for computers to understand e-mails. These methods are sometimes called Natural language processing (NLP)“.  One of the most sophisticated one is a method using machine learning and understanding the meaning of sentences by looking at the structures of sentences. Here I would like to introduce one of the simplest methods so that everyone can understand how it works.  It is easy to imagine that the “number of each word” can be data.  For example, ” I want to meet you next week.”.  In this case, (I,1), (want,1),(to,1), (meet,1),(you,1), (next,1),(week,1) are data to be analyzed. The longer sentences are, the more words appear as data. For example, we try to analyze e-mails from customers to assess who are satisfied with our products. If the number of positive words, such as like, favorite, satisfy, are high,  it might mean customers are satisfied with the products, vice versa.  This is a problem of “classification“.  So we can apply the same method as I explained before. The “target” is “customers satisfied” or “not satisfied” and “features” are the number of each word. 

3. What’s the impact to businesses?

If computers understand what we said in text such as e-mails,  we can make the most out of it in many fields. For the marketing, we can analyze the voices of customers from the massive amount of e-mails. For the legal services, computers identify what e-mails are potentially relevant as evidences for litigations.  It is called “e-discovery“.  In addition to that, I found that Bank of England started monitoring social networks such as Twitter and Facebook in order to research economies.  This is a kind of “new-wave” of economic analysis.  These are just examples. I think  you can create many examples of applications for businesses by yourself because we are surrounded by a lot of e-mails now.  

In my view, natural language processing (NLP) will play a major role in the digital economy.   Would you like to exchange e-mail with computers?

When are self-driving cars available in Asia? We should re-consider regulations about it.

road-237384_640

Last year I learned “machine learning” on coursera and found that it is very useful to develop self-driving car.  This course was created in 2011.  Since then,  there has been much progress in self-driving cars. Last week I found two articles on self-driving cars. One is self-driving cars by google and the other is an autonomous truck. Let us see what they are and consider the impacts of these cars when they are available to us.

 

1. Self-driving cars

This is one the most aggressive project of self-driving cars because the goal of the project is cars without driver intervention. According to Google website, it says”a few of the prototype vehicles we’ve created will leave the test track and hit the familiar roads of Mountain View, Calif., with our safety divers aboard.”.  It looks so small and cute. However, with computers and sensors, it can run without intervention by humans. I imagine machine learning is used to control self-driving cars as I learned it on coursera before. Because the machine can “learn” new things from data, the more self-driving cars run, the safer and more sophisticated they become. Therefore collecting many data on self-driving cars is critically important.  I wonder when they can drive without drivers in future.

 

2. Autonomous truck

The other is autonomous trucks.  According to Bloomberg, “Regulatory and technological obstacles may hold back the driverless car for decades. But one of the first driverless semi-trucks is already driving, legally, on the highways of Nevada.” This is a truck which can be controlled on highways. But in difficult tasks such as driving in parking lots, human should take over and drive them. It looks like “a truck, which is supported by computers”.  Unlike self-driving cars by google, this truck needs human drivers. But it must be helpful for truck drivers when they drive on highways for long time.

 

3. What is needed to promote self-driving cars?

Firstly, we need to consider regulations about how self-driving cars are allowed to run in public. Because the more data is available, the more sophisticated self-driving cars become. In order to accelerate development of self-driving cars,  data is like “fuel” to develop computers in order to control cars. Therefore regulations are very important to allow self-driving cars to run in the real world  in order to collect data.

 

4. What are the impacts to our society?

In aging societies such as Japan,  older people sometimes feel difficulties to drive a car to go to hospitals or shopping malls. In such a case, the self-driving car is one of the solutions for the problem.  With self-driving cars, senior personnel can go anywhere they want without driving.  In the emerging countries like Asean,  a lot of trucks are needed to prepare the infrastructures and lifelines all over the countries. So it is very useful when self-driving trucks are permitted to run across country borders.  Therefore, regulations should be considered as a region rather than country by country.

In the long run, we should prepare the shift from current situations to a digital economy. It means that some of jobs might be replaced by computers with machine learning.  The more self-driving cars are available, the less truck drivers and taxi drivers are needed. Andrew Ng, the famous researcher of machine learning,  talked about this shift on the article.  “A midrange challenge might be truck-driving. Truck drivers do very similar things day after day, so computers are trying to do that too.”

 

 

No one knows exactly when self-driving cars are available in public. It does not look long-term future as I look at the development of technologies.  We may have a lesson of self-driving cars.   Andrew Ng says in the article, “Computers enhanced by machine learning are eliminating jobs long done by humans. The trend is only accelerating.”

What do you think?

Now I challenge the competition of data analysis. Could you join with us?

public-domain-images-free-stock-photos-high-quality-resolution-downloads-nashville-tennessee-21

Hi friends.  I am Toshi.  Today I update the weekly letter.  This week’s topic is about my challenge.  Last Saturday and Sunday I challenged the competition of data analysis in the platform called “Kaggle“. Have you heard of that?   Let us find out what the platform is and how good it is for us.

 

This is the welcome page of Kaggle. We can participate in many challenges without any fee.  In some competitions,  the prize is awarded to a winner. First, data are provided to be analyzed after registration of competitions.  Based on the data, we should create our models to predict unknown results. Once you submit the result of your predictions,  Kaggle returns your score and ranking in all participants.

K1

In the competition I participated in, I should predict what kind of news articles will be popular in the future.  So “target” is “popular” or “not popular”. You may already know it is “classification” problem because “target” is “do” or “not do”  type. So I decided to use “logistic curve” to predict, which I explained before.  I always use “R” as a tool for data analysis.

This is the first try of my challenge,  I created a very simple model with only one “feature”. The performance is just average.  I should improve my model to predict the results more correctly.

K3

Then I modified some data from characters to factors and added more features to be input.  Then I could improve performance significantly. The score is getting better from 0.69608  to 0.89563.

In the final assessment, the data for predictions are different from the data used in interim assessments. My final score was 0.85157. Unfortunately, I could not reach 0.9.  I should have tried other methods of classification, such as random forest in order to improve the score. But anyway this is like a game as every time I submit the result,  I can obtain the score. It is very exciting when the score is getting improved!

K4

 

This list of competitions below is for the beginners. Everyone can challenge the problems below after you sign off.  I like “Titanic”. In this challenge we should predict who could survive in the disaster.  Can we know who is likely to survive based on data, such as where customers stayed in the ship?  This is also “classification”problem. Because the “target” is “survive”or “not survive”.

K2

 

You may not be interested in data-scientists itself. But it is worth challenging these competitions for everyone because most of business managers have opportunities to discuss data analysis with data-scientists in the digital economy. If you know how data is analyzed in advance, you can communicate with data-scientists smoothly and effectively. It enables us to obtain what we want from data in order to make better business decisions.  With this challenge I could learn a lot. Now it’s your turn!