Logistic regression model or Matrix factorization?


When I used to be a risk manager in financial industry,  I would like to use logistic regression model. This model is widely used to measure probability of defaults of counter parties.  So this model is very famous in the financial industry.  In the field of machine learing, this models is regarded as one of classifiers as it enable us to classify data based on the results of calculations.   Both numerical data and categorical data can be used in this model.  It is simple and flexible  so I want to use this model as our recommender engine.

In addition to that, I found that matrix factorization model is widely used in the industries currently.  It has been popular since this models had a good performance in the Netflix Prize competition in 2009.  Once we obtain the matrix which provides ratings according to users and items,  matrix factorization is applied to this matrix and divides it into two matrices, One is the matrix for users’ preferences and the other is items features. By using these two matrices, we can provide recommendations to users  even though users do not provide any ratings to the specific items. It is simple but very powerful to solve problems. This performance was proved in the Netflix Prize competition  in 2009.


When we have two models, there are two advantages as follows.

1  We can compare the results from each model each other.

By using same data, we can compare how each model provides recommendations effectively.  I think it is good because it is very difficult to evaluate how the model works well without comparison to other models.


2  We can combine two models into one model.

In practice, several models are sometimes combined into one model so that the results are more accurate compared with the results by just one model. For example, matrix factorization provides us features automatically,  These features may be used as inputs in logistic regression models. Liner product of each model is one of the methods of combining models as well.


Yes, we have two major models as our recommendation engines. So let us make them more accurate and effective going forward. The more we have experiences of developing models, the more recommendations by our models are accurate and effective. These models are expected to be implemented with R language, our primary tool for data analysis. It must be exciting!  Why don’t you join us?  you will be going to be an expert of recommender systems with this blog!

What is a utility function in recommender systems?


Let us go back to recommender systems as I did not mention last week.   Last month I found that customers’ preference and items features are key to provide recommendations. Then I started developing the model used in recommender systems.  Now I think I should explain the initial problem setting in recommender systems.  This week I looked at “Mining Massive datasets” in Coursera and I found that problem setting of recommender systems in this course is simple and easy to understand.  So I decided to follow this. If you are interested in this more detail,  I recommend to look at this course, excellent MOOCs in Coursera.


Let us introduce a utility function, which tells us how customers are satisfied with the items. The term of “utility function ” is coming from micro economics. So some of you may learn it before.  I think it is good to use a utility function here because we can use the method of economics when we analyze the impacts of recommender systems to our society going forward.  I hope  more people, who are not data-scientists, are getting interested in recommender systems.

The utility function is expressed as follows


U:utility of customers,  θ:customers’preferences,  x:Item features,  R:ratings of the items for the customers

This is simple and easy to understand what utility function is.  I would like to use this definition going forward. I think ratings may be one, two, three…, or it may be a continuous number according to recommender systems.

When we look at the simple models, such as linear regression model and logistic regression model,  Key metrics are explanatory variables or features and its weight or parameters. It is represented as x and θ respectively.  And product of θx shows us how much it has an impact on variables, which we want to predict. Therefore I would like to introduce θx as a critical part of my recommender engine.   ”θx” means that each x is multiplied to it’s correspondent weight θ and summing up all products .This is critically important for recommender systems. Mathematically θx is calculations of products of vectors/matrices. It is simple but has a strong power to provide recommendations effectively. I would like to develop my recommender engine by using θx next week.


Yes, we should consider what color of shirts maximize our utility functions, for example.  In futures, utility functions of every person might be stored in computers and recommendations might be provided automatically in order to maximize our utility functions. So everyone may be satisfied with everyday life. What a wonderful world it is!

I have started Nanodegrees in Udacity this week. Yes, I will develop my website by myself!



I have started Front-End web developer course of Nanodegrees in Udacity this week.  I would like to obtain the skills of front-end web development, such as a website and mobile service because I would like to develop websites and mobile services, which are backed by machine learning.  So I am going to  set up the prototype website on Microsoft Azure and use visual studio online for writing codes of HTML, CSS and Java script.  When I learn methods to write the codes in Nanodegrees, I try to use these methods to develop the prototype website on Microsoft Azure.  I think it is good because I can learn the methods of wiring codes through Nanodegrees and develop my websites on Microsoft Azure at the same time.

As I said before, Nonodegrees focus on industries practices and applications for jobs.  It looks like an open training on the job.  It introduces a project based method, where participants should make several web sites by themselves according to instructions. I hope I can develop websites by writing HTML, CSS and Java script by the end of this course.

Actually, it is my first online course, which is required to pay for.  It costs 200 USD per month. I took more than 10 MOOCs (massive open online courses) in Coursera and Edex before.  Unlike Nanodegrees, these courses are free so I do not pay any fee at all. Most courses in Coursera and an Edex are provided by professors of the universities.  So Nanodegrees are contrasted to Coursera and Edex, which are major providers of MOOCs. I would like to explain what the difference is between Nanodegrees and other free courses going forward.

I want to make it a kind of parallel processing to develop websites and mobile services. When new methods of developing of websites and mobile services are provided through Nanodegrees,  I will deploy prototype websites on Microsoft Azure at the same time.  In addition to that,  the project to develop recommender engines is going on in my company and the prototype engine will be expected to be developed within this year.  This engine will be combined with the websites to enhance their services. I think it might be possible as Microsoft Azure has machine learning as a service.

This is a scheme to set up the platform to develop websites and mobile service backed by machine learning. Front-end developer course of Nanodegrees in Udacity might make it possible even for beginners like me. I hope this program keeps a high standard to provide skills and methods to participants so that everyone thinks it is worth paying fees to participate in this course.  I am sure Sebastian Thrun, CEO and cofounder of Udacity makes it happen.


How about a recommender system for yourself? Computers know you better than you do!?


Since the beginning of September, I have been considering recommender systems intensively.  Now I realize that recommender systems may know you better than you do because recommender systems can memorize your behavior, such as shopping, touring and learning as many as possible.  It is more than your memory in your brain as human being forget their memory as time passes.  The more data computers have, the more accurate the recommendations are. Now that more and more people have their own devices, such as smart phones/tablets, and use them everyday in their lives. It means that data on our personal behaviors is accumulated in computers every second, even though we do not realize that.  In the future, personal devices may provide us recommendations for every choice in our lives. What are the advantages and disadvantages of  recommendations in the future? Let me think about it for a while.



You can easily obtain what you want based on the recommendations. It may be something you cannot imagine even though computers know that for you.  When I go shopping to huge department stores, I sometimes get tired in finding what I want because there are too many goods in the department stores. In such cases, the recommendation is definitely a powerful tool if it is accurate. Information about products and services can be gathered from all over the world so recommendations may be about products from foreign countries. When a new kind of bread is introduced and appears in a bakery, computers analyze the factors of this bread, such as taste, price, appearance and calculate the metrics to provide recommendations for you. I would like to have this recommendation as I love breads as breakfast.



You may loss the opportunities to realize new taste and preference of yourself because computers can calculate your preference very accurately. When recommendations by computer never miss your expectations, you may feel no need to go outside recommendations.  It means that there is no challenge to go outside of your past behavior. If you like Japanese food, computers may provide you recommendations of Japanese food only. So you eat only Japanese food, never try other kind of foods.  But I think human being needs to go to outside it to create innovations and adventures in their lives. If that is the case, I may input random numbers into my personal device so that recommendations have some noises from my past behavior. I need a little challenge against my past behaviors as it makes my life more interesting, even if I do not know whether it works or not.


I imagine that, in 2040 each personal device such as a smart phone, a smart watch can hold massive data and carry calculation power in it.  So it may calculate your preference far better than you think.  In the morning,  you may find your most favorite bread, which you want to eat at breakfast,  on the dining table before you think about it.  This will be based on recommendations by your smart phone. It may know everything about you. It may be perfect. Is it wonderful, isn’t it?

How can we predict the price of wine by data analysis ?


Before discussing the models in details,  It is good to explain how models work in general, so that beginners for data analysis can understand models. I select one of the famous research of wine quality and price by  Orley Ashenfelter , a professor of economics at Princeton university. You can look at the details of the analysis on this site. This is “BORDEAUX WINE VINTAGE QUALITY AND THE WEATHER”. I calculated them by myself in order to explain how models work in data analysis.


1. Gathering data

Quality and price of wine are closely related to the quality of the grapes.  So it is worth considering what factors impact the quality of the grapes.  For example, Temperatures, quantities of rain, the skill of the farmers, the quality of vineyards may be candidates of the factors. Historical data of each factor for more than 40 years are needed in this analysis. It is sometimes very important in practice, whether data is available for longer periods. Here is the data used in this analysis.  So you can do it by yourself  if you want to.


2.  Put data into models

Once data is prepared, I input the data into my model in R. This time, I use linear regression model, which is  one of the simplest models.  This model can be expressed by the products of explanatory variables and parameters. According to the web sites,  explanatory variable as follows

       WRAIN      1  Winter (Oct.-March) Rain  ML                
       DEGREES    1  Average Temperature (Deg Cent.) April-Sept.   
       HRAIN      1  Harvest (August and Sept.) ML               
       TIME_SV    1  Time since Vintage (Years)

This is RStudio, famous Integrated Development Environment for R.   Upper left side of RStudio, I developed the function with linear regression “lm” with R and data from the web site is input into the model.

スクリーンショット 2014-09-30 14.05.37


3. Examine the outputs from models

In order to predict wine price,  parameters should be obtained firstly.  There is no need to worry about.  R can calculate this automatically. The result is as follows. Coefficients mean parameters here. You can see this result in lower left side of RStudio.


(Intercept)     WRAIN    DEGREES   HRAIN    TIME_SV
-12.145007   0.001167   0.616365   -0.003861   0.023850

Finally, we can predict wine price. You can see the predictions of wine price in lower right side of RStuido.


This graph shows the comparison between the predictions of wine price and real price of wine.  Red square tells us predictions of price and Blue circle tells real price. These are relative prices against an average price in 1961.  So the real price in 1961 is 1.0.  It seems that the model works well.  Of course it may not work now as it was made more than 20 years ago. But it is good to learn how models work and predictions are made by this research. Once you can understand the linear regression model, it enables you to understand other complex models with ease. I hope you can enjoy prediction of wine price. OK, let us move on recommender engines again next week !


Notice: TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software.