Easy way to understand how classification works without formula! no.1

public-domain-images-free-stock-photos-high-quality-resolution-downloads-around-the-house-18

Hello, I am Toshi. Hope you are doing well. Last week  I introduced “classification” to you and explained it can be applied to every industry. Today I would like to explain how it works step by step  this week and next week. Do not worry, no complex formula is used today.  It is easier than making pancakes with fry pan!

I understand each business manager have different of problems and questions. For example, if you are a sales manager in retail, you would like to know who is likely to buy your products.  If you are working in banks, you want to know who will be in default. If you are in the healthcare industries, who is likely to have diseases in future.  It is awesome for your business if we can predict what happens with certainty in advance.

These problems look like different from each other. However, they are categorized as same task called “classification” because we need to classify “do” or “do not”.  For sales managers, it means that “buy” or “not buy”. For managers in banks,  “in default” or “not in default”. In personnel in legal service, “win the case” or “not win the case”.  If predictions about “do” or “do not” can be obtained in advance.  It can contribute to the performance  of your businesses. Let us see how it is possible.

 

1.  “target” is significantly important

We can apply “do” or ” do not” method to all industries. Therefore, you can apply it to your own problems in businesses.  I  am sure you are already interested in  your own “do” or ” do not”.   Then let us move on to data analysis.  “Do” or “do not” is called “target” and has a value of  “1” or “0”.  For example, I bought premium products in a retail shop,  In such a case,  I have “1” as  a target.  On the other hand, my friend did not buy anything there.  So she has “0”  as a target.   Therefore  everyone should have “1” or “0” as a target.   It is very important as a starting point.  I recommend to consider what is a good  “target” in your businesses.

 

2.  What are closely related to “target”?

This is your role because you have expertise in your business.  It is assumed that you are sales manager of retail fashion. Let us imagine what are closely related to the customer’s “buy” or “not buy”.  One of them may be customers’ age because younger generation may buy more clothes than senior.  Secondly, the number of  overseas trips a year because the more they travel overseas, the more clothes they buy.  Susumu, one of my friends, is 30 years old and travels overseas three times a year.  So his data is just like this : Susumu  (30, 3).  These are called “features”.   Yes, everyone has different values of the features. Could you make your own values of features by yourself?  Your value of the features must be different from (30,3).  Then, with this feature (30, 3),  I would like to express “target” next.  (NOTE: In general,  the number of features is far more than two. I want to make it simple to understand the story with ease.)  Here is our customer data.

customer data

3.  How “targets” can be expressed with “features”?

Susumu has his value of features (30, 3).  Then let us make the sum of  30 and 3. The answer is 33.  However, I do not think it works because each feature has same impact to “target”.  Some features must have more impact than others. So let us introduce “weight” of each feature.   For example  (-0.2)*30+0.3 *3+6,  the answer is 0.9.  “-0.2” and “0.3” are the weight for each feature respectively. “6” is a kind of adjustment. This time it looks better as “age” has a different impact from “the number of travels”against “target”.  So “target”, which means in this case Susume will buy or not,  is expressed with features, “age” and  “the number of travels”.  Once it is done, we do not need to calculate by ourselves anymore as computers can do that instead of us. All we have to know is “target” can be expressed with “features”.  Maybe I can write this way : “target” ← “features”.   That is all!

 

 

Even if the number of features is more than 1000, we can do the same thing as above.  First, put the weight to each feature, second, sum up all features with each weight.  Therefore, you understand how a lot of data can be converted to  just “one value”.  With one value, we can easily judge whether Susumu is likely to buy or not.  The higher value he has,  the more likely he will buy clothes. It is very useful because it enables us to intuitively know whether customers will buy or not.

Next week I would like to introduce “Logistic regression model” and explain how it can be classified quantitatively.   See you next week!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s