Last week I introduced inner product as a simple model in recommender systems. This week I would like to introduce more advanced model for recommender systems. It is called singular value decomposition.
According to Mining Massive datasets in Coursera, one of the best on-line courses about machine learning and big data, singular value decomposition or SVD is defined as follows.
U : left singular matrix
Σ : singular matrix
V : right singular matrix
Row vectors and column vectors of matrix A can be transformed into lower dimensional space. This space is called “concept”. In other words row vectors and column vectors can be mapped to concept space, which has smaller dimensions than row and column vectors of matrix A. Strength of each concept is defined in singular matrix where diagonal values are positive. When SVD is applied to recommeder systems, row vectors of matrix A can be customers’ preference and column vectors can be items features. For example, movies can be classified as a SF movie or a romance movie, which are “concept”. Each customer may like SF movies or romance movies. We can predict unknown rating for customers and items by using SVD.
SVD is also used for dimensionality reduction and advantages of SVP are as follows.
1. find hidden correlations
2. make visualization of data easier
3. reduce the amount of data
Therefore SVD can be applied to not only recommender system but other kinds of business applications.
Let us see R to analyze data by singular value decomposition. R has a function of singular value decomposition, SVD. Therefore we can execute singular value decomposition by just inputting data into function of svd() in R. IDE below is RStudio.
In this case, matrix ss is decomposed into $d,$u and $v.
$u : left singular matrix
$d: singular matrix
$v : right singular matrix
When we look at $d, value of the first and second column are large, therefore we focus on the first concept and second concept. In $u, the row vectors of ss are mapped to concept space. In $v, the column vectors of ss are also mapped to same concept space. Red rectangular and blue rectangular show similarity based on “concept”. I recommend you to try svd() to analyze data in R as it is very easy and effective.
SVD is a little complicated than inner product but it is very useful when there are a lot of data which has large dimensions. Let us be familiar with SVD because we would like to use this model going forward.