“H2O”, this is an awesome tool of “Digital marketing” for everyone!


Last week I found the awesome tool for digital marketing as well as data analysis.  It is called “H2O“.  Although it is open source software, its performance is incredible and easy to use.  I would like to introduce it to Sales/Marketing personnel who are interested in Digital marketing.

“H2O is open-source software for big-data analysis. It is produced by the start-up H2O.ai(formerly 0xdata), which launched in 2011 in Silicon Valley. The speed and flexibility of H2O allow users to fit hundreds or thousands of potential models as part of discovering patterns in data. With H2O, users can throw models at data to find usable information, allowing H2O to discover patterns. Using H2O, Cisco estimates each month 20 thousand models of its customers’ propensities to buy while Google fits different models for each client according to the time of day.” according to Wikipedia(1).

Although its performance looks very good, it is open source software. It means that everyone can use the awesome tool without any fee.  It is incredible!  “H2O” is awarded one of ” Bossie Awards 2015: The best open source big data tools” (2).  This image shows H2O user interface “H2O FLOW”.

H2O Flow

By using this interface, you can use the state of art algorithm such as “Deep learning” without programming.  It is very important for beginners of data analysis. Because they can start data analysis without programming anyway.  Dr. Arno Candel,   Physicist & Hacker at H2O.ai. , said  “And the best thing is that the user doesn’t need to know anything about Neural Networks”(3).  Once models are developed by this user interface, program of the model with “Java” is automatically generated.  It can be used in production systems with ease.



One of the advantages of open source is that many user’s cases are publicly available. Open source can be public, therefore it is easy to be distributed as users’ experiences of “What is good?” and “What is bad?”.   This image is a collection of tutorials “H2O University“.  It is also available for free. There are many other presentations, videos about H2O in the internet, too! You may find your industry”s cases among them. Therefore, there is a lot of materials to learn H2O by ourselves.

H2O Univ


In addition to that,  “H2O” can be used as an extension of “R“.  R is one of the most widely-used analytical language.  “H2O” can be controlled from R console easily. Therefore  “H2O” can be integrated with R.  “H2O” also can be used with Python.

There are so many other functionalities in H2O. I cannot write everything here.  I am sure it is an awesome tool for both business personnel and data scientists.  I  would like to start using “H2O” and publish my experiences of “H2O”going forward. Why don’t you join “H2O community”?




1.Wikipedia:H2O (software)


2.Bossie Awards 2015: The best open source big data tools


3.Interview: Arno Candel, H2O.ai on the Basics of Deep Learning to Get You Started



Note: Toshifumi Kuga’s opinions and analyses are personal views and are intended to be for informational purposes and general interest only and should not be construed as individual investment advice or solicitation to buy, sell or hold any security or to adopt any investment strategy.  The information in this article is rendered as at publication date and may change without notice and it is not intended as a complete analysis of every material fact regarding any country, region market or investment.

Data from third-party sources may have been used in the preparation of this material and I, Author of the article has not independently verified, validated such data. I and TOSHI STATS.SDN.BHD. accept no liability whatsoever for any loss arising from the use of this information and relies upon the comments, opinions and analyses in the material is at the sole discretion of the user. 

Linkedin bought a predictive marketing company. What does it mean?


I think many people like Linkedin as a platform of professionals and they are interested in what is going on in the company.  Last week, I found that “Today we are pleased to announce that we’ve acquired Fliptop, a leading provider of predictive sales and marketing software.1″  (David Thacker said on the blog on 27 August, 2015 ).  Linekdin bought a leading predictive marketing company.  What does it mean? Let me consider a little.


1. What does “Fliptop” do?

It is a marketing software company. On the website, it says  “DRIVE REVENUE FASTER WITH PREDICTIVE MARKETING”, “Increase lead conversion rates and velocity.” and “Identify the companies most likely to buy”.  It was established in 2009 so it is a relatively young company. The company uses technologies called “Machine learning” to identify potential customers with high probability of purchase of products and services.   According to the website of the company, it has an expertise of standard machine learning algorithms, such as logistic regression and decision trees. These methods are used for classifications or predictions.  For example,  the company can identify who is likely to buy the products based on data, including the purchase history of each customer in the past. It hires computer science experts to develop the models for predictions.


2. What will Linkedin do with Fliptop?

As you know, Linkedin has a huge customer base so it has a massive amount of data generated by users of Lnkedin everyday.  This data have been accumulated every second. Therefore Lnikedin should have an ability and enhance it to make the most out of the data.  Linkedin should analyze the data and make better business decisions to compete other IT companies in the markets. In order to do that, there are two options, 1. Technologies developed In-house,  2. Purchase of resources outside the company.  Lnkedin took an option of “2” this time. Doug Camplejohn, CEO of Fliptop, said “We will continue to support our customers with existing contracts for some period of time, but have decided not to take on any new ones. We will also be reaching out to our customers shortly to discuss winding down their existing relationship with Fliptop.”.  Therefore Fliptop will not be independent as a service provider and will be integrated into the functions of Linkedin. It seems that knowledge and expertise of Fliptop are seamlessly integrated into Linkedin in future.  I am not so sure what current users of Fliptop should do as long as I know now.


3.  Data is “King”

This kind of purchases has been seen in IT industry recently. Google bought “DNN research” in 2013 and “DeepMind” in 2014. Microsoft also bought “Revolution Analytics” in 2015.  These small or medium size companies have expertise in machine learning and data analysis.  When they try to expand their businesses, they need massive data to be analyzed. However, they are not owners of a massive amount of data. Owners of a massive amount of data are usually big IT companies, such as Google and Facebook.  It is sometimes difficult for relatively small companies to obtain a massive amount of data, including  customer data.  On the other hand, big IT companies, including Linkedin, are usually owners of huge customer data. In addition to that, big IT companies now  enhance resources and expertise to analyze data as well. Once they have both of them, new services can be created and offered in a shorter period. The more people use these services, the more accurate and effective they can be.  Therefore, it sounds logical when big IT companies acquire small companies with expertise in data analysis and machine learning. Big IT companies definitely need their expertise in data analysis and machine learning.



From the standpoint of consumers, it is good because they can enjoy many services offered by big IT companies with lower costs. But from the standpoint of companies, competitions are getting tougher as this occurs not only in IT industries but many other industries. Now Linkedin seems to be ready for this competition, which comes in the future.

Machine learning is sometimes considered as engines and data are considered as fuel.  When they are combined in one place, new knowledge and insights may be found and new products and services may be created.  It accelerates changes of the landscape of the industries. Mobile, cloud, big data, IOT and artificial intelligence will contribute to this change a lot. It must be exciting to see what happens next in the future.





1. Accelerating Our Sales Solutions Efforts Through Fliptop Acquisition, David Thacker, August 27, 2015 


2.A New Chapter,  Doug Camplejohn, August 27, 2015




Note: Toshifumi Kuga’s opinions and analyses are personal views and are intended to be for informational purposes and general interest only and should not be construed as individual investment advice or solicitation to buy, sell or hold any security or to adopt any investment strategy.  The information in this article is rendered as at publication date and may change without notice and it is not intended as a complete analysis of every material fact regarding any country, region market or investment.

Data from third-party sources may have been used in the preparation of this material and I, Author of the article has not independently verified, validated such data. I accept no liability whatsoever for any loss arising from the use of this information and relies upon the comments, opinions and analyses in the material is at the sole discretion of the user. 

How can we create good movies based on big data?


Last Sunday, my son came to Kuala Lumpur as he has summer vacation now.  So  I brought him to the movie theater.  I chose “Mission: Impossible – Rogue Nation” to entertain him.  In the movie, Tom cruise is very active.  I cannot believe he is older than I am!  My son and I could enjoy the movie very much, as his action is amazing.

Then I am wondering how we can create good movies.  Every year, many movies are created, but few of them can stay in people’s mind in the long term.  Let me consider it here for a while.

1. How can we define what good movies are?

There are many measures to evaluate movies.  Critics can assess the quality of movies. But I would like to make it simple.  Number of customers who watch the movie or the amount of sales revenue such as “Box office“of the movie can be used as a measure of “good movie” as it is easy to collect and measure. So the more people watch the movie, the better it is according to our definition of “good movies”.


2. Let us consider something related to the number of customers or sales revenue.

A lot of things relate to them.  Like Mission:Impossible,  actors and actresses are very important.  The director is also important.  In addition to that,  where is it created?  Is it an action movie or a love story or a thriller?,  and so on. They may be closely related to the number of customers or sales revenue of the movie.  So the data about something related to the number of customers or sales revenue in the past should be collected.


3. How can we obtain predictions of the number of customers or sales revenue of the  unseen movie in advance?

Could you remember the last week’s letter about “Target” and “Features”?  “Target” is something that we want to predict and “Features” are something that are related to “Target”.  Predictions of “Target” can be obtained by inputting “features” into “Statistical model”.  I would like to call this unit “module”.   I summarize it as follows.


According to our definition of “good movies”,  Target is the number of customers or sales revenue of the movie. Features are actors and actress, category of the story, locations where the movie was taken, and so on. So these features are input to statistical models to obtain predictions of target for unseen movies.  Based on this analysis,  we could predict the sales of movies before they are seen in theaters. It means that good movies could be created based on this prediction. When this prediction is accurate,  film production companies might increase sales revenue  because they can create good movies based on predictions of Targets. But in reality, we should prepare a lot of data to predict them accurately.  In additions to that,  customer preference might be changed suddenly, however, it is very difficult to update the statistical models in advance to follow such changes. Therefore, there is a risk where statistical models can not follow circumstance changes in a timely manner.



It should be noted that more features will be available as computers will understand videos or movies. Now the technology 1is in progress.  It will enable computers to turn videos into texts. For example, when there is a scene where the swan is on the lake, computers understand the video and make sentences that explain the scene automatically.  It means that whole part of the movie can be transformed into texts without human intervention. So movies will be analyzed based on their stories. More features can be identified in the results of this analysis. When new kinds of data are available to us, it may enable us to obtain more features and improve accuracy of predictions.  Would you likely to make your own movie in future?



1. A picture is worth a thousand (coherent) words:building a natural description of images, 17 Nov 2014,  Google Research



Note: Toshifumi Kuga’s opinions and analyses are personal views and are intended to be for informational purposes and general interest only and should not be construed as individual investment advice or solicitation to buy, sell or hold any security or to adopt any investment strategy.  The information in this article is rendered as at publication date and may change without notice and it is not intended as a complete analysis of every material fact regarding any country, region market or investment.

Data from third-party sources may have been used in the preparation of this material and I, Author of the article has not independently verified, validated such data. I accept no liability whatsoever for any loss arising from the use of this information and relies upon the comments, opinions and analyses in the material is at the sole discretion of the user. 

“Prediction” is very important in analyzing big data of the business


It is a good timing to reconsider “Big data and digital economy” because this name of group on Linledin has four-month-history and more than 100 participants now. I would like to appreciate the cooperation of all of you.

In the beginning of 2000s, I worked in the risk management dept in the Japanese consumer finance company.   There is a credit risk model which can predict who is likely to be in a default in the company. I learned it more details and understood how it worked so accurately. I found that if I collect a lot of data about customers, I could obtain accurate predictions for events of defaults in terms of each customer.

Now in 2015,  I researched many algorithms and statistical models including the state of art “deep learning”.   While there are many usages and objectives in using such models,  in my view,  the most important thing for business persons is “prediction” just like my experience in consumer finance company because they should make good business decisions to compete in markets.

If you are in health care industry,  you may be interested in predictions about who is likely to be cured. If you are in sales, you may be interested in predictions about who is likely to come to the shop and buy the products. If you are in marketing,  you may be interested in who is likely to click the advertisement on the web.  Whatever you do,  predictions are very important for your businesses because it enables us to take the right actions.  Let me explain key points about predictions.



What are your interests to predict?    Revenue of your business?  Number of customers?    Satisfaction rate based on client feedback?  Price of wine near futures? You can mention anything you want.  We call it “Target”.  So firstly, “Target” should be defined in predictions so that you can make right business decisions.



Secondly,  let us find something related to your target.  For example,   If you are a sales person and interested in who is likely to buy the products,  features are “attributes of each customer such as age, sex, occupation” , “behavior of each customer such as how many times he/she come to the shop per month and when he/she bought the products last time”,  “What did he/she click in the web shop”  and so on.  Based on the prediction, you can send coupons or tickets to “highly likely to buy”customers in order to increase your sales.  If you are interested in the price of wine,  features may be temperature,  amount of rain and locations of farms,  and so on.  If you can predict the price of wine,  you might make  good investments of wine.  These are just simple examples. In reality,  a number of features may be 100,  1000  or more.  It depends on whole data you have.  Usually the more data you have, the more accurate your predictions are.  This is why data is very important to obtain predictions.


Evaluation of predictions

Finally by inputting features into statistical models,  predictions of the target can be obtained. Therefore, you can predict who is likely to buy the products when you think of marketing strategies.  This is good for your business as marketing strategies can be more effective.  Unfortunately customer preferences may be changed in the long run.  When situations and environments such as customer preferences are changed,  predictions may not be accurate anymore.  So it is important to evaluate predictions and update statistical models periodically.  No model can work accurately forever.


Once you can obtain the prediction,  you can implement processes of the predictions as a daily activity, rather than one-off analysis. It means that data driven decisions are made on a daily basis.  It is one of the biggest aspects of “digital economy”.  From retail shops to health care and financial industry,  predictions are already used in many fields.  The methods of predictions are sometimes considered as “black-box”.  But I do not think It is good to use predictions without understanding the methods behind predictions. I would like to explain them in my weekly letter in future.  Hope you enjoy it!



Note: Toshifumi Kuga’s opinions and analyses are personal views and are intended to be for informational purposes and general interest only and should not be construed as individual investment advice or solicitation to buy, sell or hold any security or to adopt any investment strategy.  The information in this article is rendered as at publication date and may change without notice and it is not intended as a complete analysis of every material fact regarding any country, region market or investment.

Data from third-party sources may have been used in the preparation of this material and I, Author of the article has not independently verified, validated such data. I accept no liability whatsoever for any loss arising from the use of this information and relies upon the comments, opinions and analyses in the material is at the sole discretion of the user. 

Facebook, Twitter, Google and “new wave” of economic analysis


On Saturday, I found that the report from Bank of England.  This report is about economic analysis in central banks with Big data such as social network services. It is good not only for economic researchers, but also business personnels to consider how Big data should be used. So I would like to consider it based on this report for a while.

Before considering usage of Big data, I would like to define “Big data”. Big data is data sets that are granular, in real time basis and  non-numeric data as well as  numeric one.   These data are completely opposite in nature compared with data which are currently analyzed in Central banks.  Because such data are usually “aggregated,  periodical and numeric”.  One of the examples is financial statements of companies.  Big data are different from such data.  For example Twitter are generated by individuals in real time. These are usually text, images and video. Then the questions come.


1. Can we build up macro economic models based on big data?

Central banks are responsible for the stability of the financial system in the country.  Is it possible for central banks to collect data of each loan from private banks and assess credit risk of each, then confirm financial stability as a whole country?  It can be applied to private companies, too. Is it possible that the company collect data of each customer, forecast the amount of purchase by each customer and predict the revenue of the company next fiscal year?  Big data may enable us to do so even though it takes time.


2. Is the method used “theory based” or “data driven”?

Even though they cannot be clearly distinguished in practice,  these are two approaches to analyze Big data in economic analysis. Someone puts importance to economic theories. Let us call it “theory based”.  Others take another approach of “Let the data speak for themselves”.   We may call it “data driven”.  Their opinions are sometimes against each other even though they analyze the same data. So we should have well-balanced approach between them.


3. Should we change the processes to make business decisions?

Big data comes to us in a real time basis.  But our decision making process in organizations is usually periodical. For example, board of directors meetings and executive committees in companies are generally held on a monthly basis.  Should they be held more flexibly in a timely manner based on outputs from analysis of Big data, rather than periodical one?  The bigger companies become, the more difficult it is to change the process in practice.


FRB in the US is currently wondering when they should raise the interest rate of the US.  Chairwoman of FRB has been always saying  “It is based on economic data“.  But I am not sure she cares about data (conversations) on social networking services in the US. What do you think?

Now I challenge the competition of data analysis. Could you join with us?


Hi friends.  I am Toshi.  Today I update the weekly letter.  This week’s topic is about my challenge.  Last Saturday and Sunday I challenged the competition of data analysis in the platform called “Kaggle“. Have you heard of that?   Let us find out what the platform is and how good it is for us.


This is the welcome page of Kaggle. We can participate in many challenges without any fee.  In some competitions,  the prize is awarded to a winner. First, data are provided to be analyzed after registration of competitions.  Based on the data, we should create our models to predict unknown results. Once you submit the result of your predictions,  Kaggle returns your score and ranking in all participants.


In the competition I participated in, I should predict what kind of news articles will be popular in the future.  So “target” is “popular” or “not popular”. You may already know it is “classification” problem because “target” is “do” or “not do”  type. So I decided to use “logistic curve” to predict, which I explained before.  I always use “R” as a tool for data analysis.

This is the first try of my challenge,  I created a very simple model with only one “feature”. The performance is just average.  I should improve my model to predict the results more correctly.


Then I modified some data from characters to factors and added more features to be input.  Then I could improve performance significantly. The score is getting better from 0.69608  to 0.89563.

In the final assessment, the data for predictions are different from the data used in interim assessments. My final score was 0.85157. Unfortunately, I could not reach 0.9.  I should have tried other methods of classification, such as random forest in order to improve the score. But anyway this is like a game as every time I submit the result,  I can obtain the score. It is very exciting when the score is getting improved!



This list of competitions below is for the beginners. Everyone can challenge the problems below after you sign off.  I like “Titanic”. In this challenge we should predict who could survive in the disaster.  Can we know who is likely to survive based on data, such as where customers stayed in the ship?  This is also “classification”problem. Because the “target” is “survive”or “not survive”.



You may not be interested in data-scientists itself. But it is worth challenging these competitions for everyone because most of business managers have opportunities to discuss data analysis with data-scientists in the digital economy. If you know how data is analyzed in advance, you can communicate with data-scientists smoothly and effectively. It enables us to obtain what we want from data in order to make better business decisions.  With this challenge I could learn a lot. Now it’s your turn!

Do you want to know “how banks rate you when you borrow money from banks”?


Hi friends,  I am Toshi, This is my weekly letter. This week’s topic is “how banks rate you when you borrow money from banks”. When we want bank loans, it is good that we can borrow the amount of money we need,  with a lower interest.  Then I am wondering how banks decide who can borrow the amount of money requested with lower interests. In other words, how banks assess customer’s credit worthiness.  The answer is “Classification”.  Let me explain more details. To make the story simple,  I take an example of  unsecured loans, loans without collateral.


1.  “Credit risk model” makes judgements to lend

Now many banks prepare their own risk models to assess credit worthiness of customers.  Especially global banks are required to prepare the models by regulators, such as BIS, FSA and central banks. Major regional banks are also promoted to have risk models to assess credit worthiness.  Regulations may differ from countries to countries,  by size of banks.  But it is generally said that banks should have their risk models to enhance credit risk management.  When I used to be a credit risk manager of the Japanese consumer finance company, which is one of  the group companies in the biggest financial group in Japan,  each customer is rated by credit risk models. Good rating means you can borrow money with lower interest. On the other hand, bad rating means you can borrow only limited amount of money with higher interest rate or may be rejected to borrow. From the standpoint of management of banks, it is good because banks can keep consistency of the lending judgements to customers among the all branches.  The less human judgement exists, the more consistency banks keep.  Even though business models may be different according to strategies of banks, the basic idea of the assessment of credit worthiness is the same.


2. “Loan application form” is a starting point of the rating process

So you understand credit risk models play an important role. Next, you may wonder how rating of each customer is provided.  Here “classification” works. Let me explain about this.  When we try to borrow money,  It is required to fill “application forms”. Even though the details of forms are different according to banks,  we are usually asked to fill “age” “job title” “industry” “company name” “annual income” “owned assets and liabilities” and so on.   These data are input into risk models as “features”.   So each customer has a different value of “features”.  For example, someone’s income is high while others income is low.   Then I can say  “Features”of each customer can explain credit worthiness of each customer.   In other words,  credit risk model can “classify”  customers with high credit worthiness and customers with low credit worthiness by using  “features”.


3.  Rating of each customer are provided based on “probability of default

Then let us see how models can classify customers in more details. Each customer has values of “features”  in the application form. Based on the values of “features”, each customer obtains his/her own “one value”.  For example, Tom obtains “-4.9” and Susum obtains “0.9” by adding “features” multiplied with “its weight”.  Then we can obtain “probability of default” for each customer.  “Probability of default” means the likelihood where the customer will be in default in certain period, such as one year. Let us see Tom’s case. According to the graph below,  Tom’s probability of default, which is shown in y-axis, is close to 0.  Tom has a low “probability of default”. It means that he is less likely to be in default in the near term. In such a case,  banks provide a good rating to Tom. This curve below is called “logistic curve” which I explained last week. Please look at my week letter on 23 April.


Let us see Susumu’s case. According to the graph below,  Susumu’s probability of default, which is shown in y-axis, is around 0.7, 70%.  Susumu has a high probability of default. It means that he is likely to be in default in the near term. In such a case,  banks provide a bad rating to Susumu. In summary,  the lower probability of default is,  the better rating is provided to customers.



Although there are other methods  of “classification”,  logistic curve is widely used in the financial industry as far as I know. In theory, the probability of default can be obtained for many customers from individuals to big company and sovereigns, such as “Greeks”.  In practice, however, more data are available in loans to individuals and small and medium size enterprises (SME) than loans to big companies.  The more data are available, the more accurately banks can assess credit worthiness. If there are few data about defaults of customers in the past,  it is difficult to develop credit risk models effectively. Therefore, risk models of individuals and SMEs might be easier than risk models of big companies as more data are usually available in loans to individuals and SMEs.

I hope you can understand the process to rate customers in banks. Data can explain our credit worthiness, maybe better than we do. Data about us is very important when we try to borrow money from banks.

The reason why computers may replace experts in many fields. View from “feature” generation.


Hi friends, I am Toshi. I updated my weekly letter.  Today I explain 1. How classification, do or do not, can be obtained with probabilities and 2. Why computers may replace experts in many fields from legal service to retail marketing.   These two things are closely related to each other. Let us start now.


1.  How can classification be obtained with probabilities?

Last week, I explained that “target” is very important and “target” is expressed by “features”.  For example Customer “buy” or “not buy” may be expressed by customers age and  the number of  overseas trips a year.  So I can write this way : “target” ← “features”.   This week, I try to show you the value of “target” can be a probability, which is  a number between 0 and 1.  If the “target” is closer to “1”,  the customer is highly likely to buy.   If the target is closer to “0”,  the customer is less likely to buy.   Here is our example of “target” and “features” in the table below.

customer data

I want  Susumu’s value of the “target” to be close to “1” in calculations by using “features”.  How can we do that?   Last week we added “features” with“weight” of each feature.   For example  (-0.2)*30+0.3 *3+6,  the answer is 0.9.  “-0.2″ and “0.3” are the weight for each feature respectively. “6” is a kind of adjustment.  Next let us introduce this curve below. In the case of Susumu, his value from his features is 0.9. So let us put 0.9 on the x-axis, then what is the value of y? According to this  curve, the value of y is around 0.7. It means that  Susumu’s probability of buying products is around 0.7.  If probability is over 0.5, it is generally considered that customer is likely to buy.


In the case of Tom, I want his value of the “target” to be close to “0” in calculations by using “features”.  Let us add his value of features as follows  (-0.2) *56+0. 3 *1+6,  the answer is -4.9.  His value from his features is -4.9. So let us put  -4.9 on the x-axis, then what is the value of y?  According to this curve, Tom’s probability of buying products is almost 0. Unlike Susumu’s case, Tom is less likely to buy.


This curve is called “logistic curve“.   It is interesting that whatever value “x” takes, “y” is always between 0 and 1.  By using this curve, everyone can have the value between 0 and 1, which is considered as the probability of the event. This curve is so simple and useful that it is used in many fields.  In short, everyone has a probability of buying products, which is expressed as the value of “y”.  It means that we can predict who is likely to buy in advance as long as “features”are obtained! The higher value customers have, the more likely they will buy the products.



2.  Why may computers replace experts in many fields?

Now you understand what are”features”.  “Features” generally are set up based on expert opinion. For example, if you want to know who is in default in the future, “features”needed are considered “annual income”, “age”, “job”, “the past delinquency” and so on. I know them because I used to be a credit risk manager in consumer finance company in Japan.  Each expert can introduce the features in the business and industries.  That is why the expert’s opinion is valuable, so far. However, computers are also creating their features based on data. They are sometimes so complex that no one can understand them. For example, ” -age*3-number of jobs in the past” has no meaning for us. No one knows what it means. But computers do. Sometimes computers can predict “target”, which means “do” or “not do” with their own features more precisely than we do.


In the future,  I am sure much more data will be available to us.  It means computers have more chance to create better “features” than experts do. So experts should use the results of predictions by computers and introduce them into their insight and decisions in each field.  Otherwise, we cannot compete with computers because computers can work 24 hours/day and 365 days/year. It is very important that the results of predictions should be used effectively to enhance our own expertise in future.



Notice: TOSHI STATS SDN. BHD. and I, author of the blog,  do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software.

Easy way to understand how classification works without formula! no.1


Hello, I am Toshi. Hope you are doing well. Last week  I introduced “classification” to you and explained it can be applied to every industry. Today I would like to explain how it works step by step  this week and next week. Do not worry, no complex formula is used today.  It is easier than making pancakes with fry pan!

I understand each business manager have different of problems and questions. For example, if you are a sales manager in retail, you would like to know who is likely to buy your products.  If you are working in banks, you want to know who will be in default. If you are in the healthcare industries, who is likely to have diseases in future.  It is awesome for your business if we can predict what happens with certainty in advance.

These problems look like different from each other. However, they are categorized as same task called “classification” because we need to classify “do” or “do not”.  For sales managers, it means that “buy” or “not buy”. For managers in banks,  “in default” or “not in default”. In personnel in legal service, “win the case” or “not win the case”.  If predictions about “do” or “do not” can be obtained in advance.  It can contribute to the performance  of your businesses. Let us see how it is possible.


1.  “target” is significantly important

We can apply “do” or ” do not” method to all industries. Therefore, you can apply it to your own problems in businesses.  I  am sure you are already interested in  your own “do” or ” do not”.   Then let us move on to data analysis.  “Do” or “do not” is called “target” and has a value of  “1” or “0”.  For example, I bought premium products in a retail shop,  In such a case,  I have “1” as  a target.  On the other hand, my friend did not buy anything there.  So she has “0”  as a target.   Therefore  everyone should have “1” or “0” as a target.   It is very important as a starting point.  I recommend to consider what is a good  “target” in your businesses.


2.  What are closely related to “target”?

This is your role because you have expertise in your business.  It is assumed that you are sales manager of retail fashion. Let us imagine what are closely related to the customer’s “buy” or “not buy”.  One of them may be customers’ age because younger generation may buy more clothes than senior.  Secondly, the number of  overseas trips a year because the more they travel overseas, the more clothes they buy.  Susumu, one of my friends, is 30 years old and travels overseas three times a year.  So his data is just like this : Susumu  (30, 3).  These are called “features”.   Yes, everyone has different values of the features. Could you make your own values of features by yourself?  Your value of the features must be different from (30,3).  Then, with this feature (30, 3),  I would like to express “target” next.  (NOTE: In general,  the number of features is far more than two. I want to make it simple to understand the story with ease.)  Here is our customer data.

customer data

3.  How “targets” can be expressed with “features”?

Susumu has his value of features (30, 3).  Then let us make the sum of  30 and 3. The answer is 33.  However, I do not think it works because each feature has same impact to “target”.  Some features must have more impact than others. So let us introduce “weight” of each feature.   For example  (-0.2)*30+0.3 *3+6,  the answer is 0.9.  “-0.2” and “0.3” are the weight for each feature respectively. “6” is a kind of adjustment. This time it looks better as “age” has a different impact from “the number of travels”against “target”.  So “target”, which means in this case Susume will buy or not,  is expressed with features, “age” and  “the number of travels”.  Once it is done, we do not need to calculate by ourselves anymore as computers can do that instead of us. All we have to know is “target” can be expressed with “features”.  Maybe I can write this way : “target” ← “features”.   That is all!



Even if the number of features is more than 1000, we can do the same thing as above.  First, put the weight to each feature, second, sum up all features with each weight.  Therefore, you understand how a lot of data can be converted to  just “one value”.  With one value, we can easily judge whether Susumu is likely to buy or not.  The higher value he has,  the more likely he will buy clothes. It is very useful because it enables us to intuitively know whether customers will buy or not.

Next week I would like to introduce “Logistic regression model” and explain how it can be classified quantitatively.   See you next week!

“Classification” is significantly useful for our business, isn’t it?


Hello, I am Toshi. Hope you are  doing well. Now I consider how we can apply data analysis to our daily businesses.  So I would like to introduce “classification” to you.

If you are working in marketing/sales departments, you want to know who are likely to buy your products and services. If you are in legal services, you would like to know who wins the case in a court. If you are in financial industries, you would like to know who will be in default among your loan customers.

These cases are considered as same problems as “classfication”.  It means that you can classify a thing or an event you are interested in from all populations you have on hand.  If you have data about who bought your products and services in the past, we can apply “classification” to predict who are likely to buy and make better business decisions. Based on the results of classification,  you can know who is likely to win cases and who will be in default with a numerical measure of certainty,  which is called “probability”.  Of course, “classification” can not be a fortune teller.  But “classification” can provide us who is likely to do something or what is likely to occur with some probabilities.  If your customer has 90% of probabilities based on “classification”, it means that they are highly likely to buy your products and services.


I would like to tell several examples of “classification” for each business. You may want to know the clues about the questions below.

  • For the sales/marketing personnel

What is the movie/music in the Top 10 ranking in the future?

  • For personnel in the legal services

Who wins the cases ?

  • For personnel in the financial industries or accounting firms

Who will be in default in future?

  • For personnel in healthcare industries

Who is likely to have a disease or cure diseases?

  • For personnel in asset management marketing

Who is rich enough to promote investments?

  • For personnel in sports industries

Which team wins the world series in baseball?

  • For engineers

Why was the spaceship engine exploded in the air?


We can consider a lot of  examples more as long as data is available.  When we try to solve these problems above,  we need data in the past, including the target variable, such as who bought products, who won the cases and who was default in the past.  Without data in the past, we can predict nothing. So data is critically important for “classification” to make better business decisions.   I think data is “King”.


Technically, several methods are used in classification.  Logistic regression,  Decision trees,  Support Vector Machine and Neural network and so on. I recommend to learn Logistic regression first as it is simple, easy to apply real problems and can be basic knowledge to learn more complex methods such as neural network.


I  would like to explain how classification works in the coming weeks.  Do not miss it!  See you next week!