For three weeks I have researched recommender systems by using many websites. I am very surprised to see that a lot of documents, sites and videos are available to know how recommender systems work and what current topics are in order to improve them. I especially focus on video lectures by Xavier Amatriain, working for Netflix as a Research/Engineering Director. He covered from basic methodologies of recommender system to the latest methods to achieve business objectives. It is strongly recommended to see them if you are interested in recommender systems. When technical terms in the video lectures are difficult to understand, I suggest you to look at MOOCs by Dr.Andrew Ng at Stanford university in advance. It enables you to understand the technicalities of machine learning with ease because it provides broad knowledge of machine learning as I said before.
In my thought, there are two major methods to calculate ratings for recommendations. One is matrix factorization and the other is neural network, even though a lot of other methods can be used in recommender engines. The object of this project is to develop the model of recommender engine so that beginners for data analysis can develop the recommender engine by themselves. Therefore neural network is out of my scope as it is too complex for beginners. Matrix factorization must be good for learning to develop the models in recommender systems. So I decided to focus on the method of matrix factorization in this project.
In my view, key points of modeling based on matrix factorization as follows
1. How are customers’ preferences represented in the models?
Someone like sweet things and someone do not. Someone like love stories and someone do not. Someone like rock ‘n’ roll and someone do not. So customers’ preferences look like vectors of the level of each preference. It is reasonable and easy to understand even for beginners. Why don’t you make a vector of your preferences for your favorite items?
2. How are items’ features represented in the models?
It may take time to prepare the features for each item. The features of each item should be in line with customers’ preference as we discussed above. So whether it is sweet or not? Whether it is a love story or not? Whether it is rock ‘n’ roll or not? Items’ features also look like vectors as customers’ preferences do. It is easy to understand, too.
3. How can we match between customers’ preferences and items’ features?
It is reasonable to make metrics by using products between customers’ preferences and items’ features. If the customers’ preference has higher score and the items’ feature which is corresponding to the preference, also has a higher score, products between the preference and the feature is also higher as each of them is higher. It means that this higher score of the product enables us to recommend this item to this customer because this item has features which the customer likes. It makes sense!
If you need more details of matrix factorization, this paper is recommended to read. It explains how the model work in recommender systems. Going forward, I would like to develop a prototype model of recommender engine with R language so that beginners understand how the models work on recommender systems.
For your information, Coursera will provide the course of “Mining Massive Datasets” by Jure Leskovec, Anand Rajaraman and Jeff Ullman at Stanford University. It will start at 29th Sep 2014 and cover recommender systems. Do not miss it!