“Prediction” is very important in analyzing big data of the business

money-548948_640

It is a good timing to reconsider “Big data and digital economy” because this name of group on Linledin has four-month-history and more than 100 participants now. I would like to appreciate the cooperation of all of you.

In the beginning of 2000s, I worked in the risk management dept in the Japanese consumer finance company.   There is a credit risk model which can predict who is likely to be in a default in the company. I learned it more details and understood how it worked so accurately. I found that if I collect a lot of data about customers, I could obtain accurate predictions for events of defaults in terms of each customer.

Now in 2015,  I researched many algorithms and statistical models including the state of art “deep learning”.   While there are many usages and objectives in using such models,  in my view,  the most important thing for business persons is “prediction” just like my experience in consumer finance company because they should make good business decisions to compete in markets.

If you are in health care industry,  you may be interested in predictions about who is likely to be cured. If you are in sales, you may be interested in predictions about who is likely to come to the shop and buy the products. If you are in marketing,  you may be interested in who is likely to click the advertisement on the web.  Whatever you do,  predictions are very important for your businesses because it enables us to take the right actions.  Let me explain key points about predictions.

 

Target

What are your interests to predict?    Revenue of your business?  Number of customers?    Satisfaction rate based on client feedback?  Price of wine near futures? You can mention anything you want.  We call it “Target”.  So firstly, “Target” should be defined in predictions so that you can make right business decisions.

 

Features

Secondly,  let us find something related to your target.  For example,   If you are a sales person and interested in who is likely to buy the products,  features are “attributes of each customer such as age, sex, occupation” , “behavior of each customer such as how many times he/she come to the shop per month and when he/she bought the products last time”,  “What did he/she click in the web shop”  and so on.  Based on the prediction, you can send coupons or tickets to “highly likely to buy”customers in order to increase your sales.  If you are interested in the price of wine,  features may be temperature,  amount of rain and locations of farms,  and so on.  If you can predict the price of wine,  you might make  good investments of wine.  These are just simple examples. In reality,  a number of features may be 100,  1000  or more.  It depends on whole data you have.  Usually the more data you have, the more accurate your predictions are.  This is why data is very important to obtain predictions.

 

Evaluation of predictions

Finally by inputting features into statistical models,  predictions of the target can be obtained. Therefore, you can predict who is likely to buy the products when you think of marketing strategies.  This is good for your business as marketing strategies can be more effective.  Unfortunately customer preferences may be changed in the long run.  When situations and environments such as customer preferences are changed,  predictions may not be accurate anymore.  So it is important to evaluate predictions and update statistical models periodically.  No model can work accurately forever.

 

Once you can obtain the prediction,  you can implement processes of the predictions as a daily activity, rather than one-off analysis. It means that data driven decisions are made on a daily basis.  It is one of the biggest aspects of “digital economy”.  From retail shops to health care and financial industry,  predictions are already used in many fields.  The methods of predictions are sometimes considered as “black-box”.  But I do not think It is good to use predictions without understanding the methods behind predictions. I would like to explain them in my weekly letter in future.  Hope you enjoy it!

 

 

Note: Toshifumi Kuga’s opinions and analyses are personal views and are intended to be for informational purposes and general interest only and should not be construed as individual investment advice or solicitation to buy, sell or hold any security or to adopt any investment strategy.  The information in this article is rendered as at publication date and may change without notice and it is not intended as a complete analysis of every material fact regarding any country, region market or investment.

Data from third-party sources may have been used in the preparation of this material and I, Author of the article has not independently verified, validated such data. I accept no liability whatsoever for any loss arising from the use of this information and relies upon the comments, opinions and analyses in the material is at the sole discretion of the user. 

“Classification” is significantly useful for our business, isn’t it?

public-domain-images-free-stock-photos-high-quality-resolution-downloads-public-domain-archive-14

Hello, I am Toshi. Hope you are  doing well. Now I consider how we can apply data analysis to our daily businesses.  So I would like to introduce “classification” to you.

If you are working in marketing/sales departments, you want to know who are likely to buy your products and services. If you are in legal services, you would like to know who wins the case in a court. If you are in financial industries, you would like to know who will be in default among your loan customers.

These cases are considered as same problems as “classfication”.  It means that you can classify a thing or an event you are interested in from all populations you have on hand.  If you have data about who bought your products and services in the past, we can apply “classification” to predict who are likely to buy and make better business decisions. Based on the results of classification,  you can know who is likely to win cases and who will be in default with a numerical measure of certainty,  which is called “probability”.  Of course, “classification” can not be a fortune teller.  But “classification” can provide us who is likely to do something or what is likely to occur with some probabilities.  If your customer has 90% of probabilities based on “classification”, it means that they are highly likely to buy your products and services.

 

I would like to tell several examples of “classification” for each business. You may want to know the clues about the questions below.

  • For the sales/marketing personnel

What is the movie/music in the Top 10 ranking in the future?

  • For personnel in the legal services

Who wins the cases ?

  • For personnel in the financial industries or accounting firms

Who will be in default in future?

  • For personnel in healthcare industries

Who is likely to have a disease or cure diseases?

  • For personnel in asset management marketing

Who is rich enough to promote investments?

  • For personnel in sports industries

Which team wins the world series in baseball?

  • For engineers

Why was the spaceship engine exploded in the air?

 

We can consider a lot of  examples more as long as data is available.  When we try to solve these problems above,  we need data in the past, including the target variable, such as who bought products, who won the cases and who was default in the past.  Without data in the past, we can predict nothing. So data is critically important for “classification” to make better business decisions.   I think data is “King”.

 

Technically, several methods are used in classification.  Logistic regression,  Decision trees,  Support Vector Machine and Neural network and so on. I recommend to learn Logistic regression first as it is simple, easy to apply real problems and can be basic knowledge to learn more complex methods such as neural network.

 

I  would like to explain how classification works in the coming weeks.  Do not miss it!  See you next week!

This course is the best for beginners of data analysis. It is free, too!

shield-229112_1280

Last week, I started learning on-line course about data analysis. It is “The Analytics Edges” in edx, one of the biggest platforms of MOOCs all over the world (www.edx.org).  This course says “Through inspiring examples and stories, discover the power of data and use analytics to provide an edge to your career and your life.”   Now I completed Unit one and two out of  total nine in the course and found that it is the best course for beginners of data analysis in MOOCs. Let me tell you why it is.

 

1. There are a variety of data sets to analyze

When you start learning data analysis, data is very important to motivate yourself to continue to learn.  When you are sales personnel, sales data is the best to learn data analysis because you are interested in sales as professional.  When you are in financial industries, financial data is the best for you.   This course uses a variety of data from crime rate to automobile sales.  Therefore, you can see the data you are interested in. It is critically important for beginners of data analysis.

 

2. This course focuses on how to use analytics tools, quite than the theory behind the analysis

Many of data analysis courses take a long time to explain the theory behind the analysis.  It is required when you want to be a data scientist because theory is needed to construct an analytic method by yourself. However, most of business managers do not want to be data scientists.  All business managers need is the way to analyze data to make better business decisions. For this purpose, this course is good and well-balanced between theory and practice.  Firstly, a short summary of theory is provided, then move on to practice. Most of  the lectures focus on “how to use R for data analysis”. R is one of the famous programming languages for data analysis, which is free for everyone.  It enables beginners to use R in analyzing data step by step.

 

3. It covers major analytic methods of data analysis.

When you see the schedule of the course,  you find many analytic methods from linear regression to optimizations.  This course covers major methods that beginners must know.  I recommend to focus on linear regression and logistic regression when you do not have enough time to compete all units because both of method is applicable to many cases in the real world.

 

 

I think it is worth seeing only the video in Unit 1 and 2.  Interesting topics are used especially for people who like baseball. If you do not have enough time to learn R programming, it is OK to skip it. The story behind the analysis is very good and informative for beginners. So you may enjoy the videos about the story and skip videos of programming for the first time. If you try to obtain a certificate from edx, you should obtain 55% at least over the homework, competition and final exam.  For beginners, it may be difficult to complete the a whole course within limited time (three-month).  Do not worry.  I think this course can be learned again in time to come.  So first time,  please focus on Unit1 and Unit2, then a second time, try a whole course if  you can. In addition, most of edx courses including this are free for anyone.   You can enjoy anytime, anywhere as long as you have an internet access.  Could you try this course with me (www.toshistats.net) ?

Do you want to be “Analytic savvy manager”?

Data is and will be around us and it is increasing at an astonishing rate.  In the such business environment,  what should business managers do?    I do no think every manager should have an analytics skill at the same level as a data scientist because it is almost impossible.  However, I do think every manager should communicate with data scientists and make better decisions by using output from their data analysis.  Sometimes this kind of manager is called “analytic savvy manager”.  Let us consider what “analytic savvy manager”should know.

 

1.   What kind of data is available to us?

Business managers should know what kind of data is available for their business analysis.  Some of them are free and others are not.  Some of them are in  companies or private and others are public.  Some of them are structured and others are not.  It is noted that data which are available is increasing in terms of volume and variety.  Data is a starting point of analysis, however, data scientists may not know specific fields of business in detail. It is business managers that know what data is available to businesses.  Recently data gathering services have provided us a lot of data for free.  I recommend you to look at “Quandl” to find public data. It is easy to use and provides a lot of public data for free.  Strong recommendation!

 

2.  What kind of analysis method can be applied?

Business managers do not need to memorize formulas of each analysis method.  I recommend business managers to understand simple linear regression and logistic regression and get the big picture about how the statistical models work. Once you are familiar with two methods,  you can understand other complex statistical models with ease because fundamental structures are not so different among methods.  Statistical models enable us to understand what big data means without loss of information.  In addition to that,  I also recommend business managers to keep in touch with the progress of machine learning,  especially deep learning.  This method has great performances and is expected to be used in a lot of business field such as natural language processing.  It may change the landscape of businesses going forward.

 

3.  How can output from analysis be used to make better decisions?

This is critically important to make a better decision.  Output of data analysis should be in aligned with business needs to make decisions.  Data scientist can guarantee whether numbers of the output are accurate in terms of calculations. However, they can not guarantee whether it is relevant and useful to make better decisions.  Therefore business managers should communicate with data scientist during the process of data analysis and make the output of analysis relevant to business decisions. It is the goal of data analysis.

 

I do not think these points above are difficult to understand for business managers even though they do not have a quantitative analytic background.   If you are getting familiar with these points above, it would make you different from others at the age of big data.

Do you want to be “Analytic savvy manager”?

13943376615435