Sunday, October 30, 2016

Matrix factorization techniques for recommender systems

Recommender systems recognize patterns of user preference in products and provide personalized recommendations for users and are used in a variety of online shopping websites. Content filtering approach and collaborative filtering are two main methods used in recommendation systems. In former method, the characteristics of users and products are stored but the latter, needs the user behavior in the past. Collaborative filtering is more accurate than content based methods but doesn’t work very accurate for new products and users. Collaborative filtering approach is based on neighborhood methods and latent factor models. Neighborhood methods finds the relation between both users and items. In other words, the interest of a user to and item, could be based on the rating of the similar items. But in latent factor models, both items and users are characterized. Matrix factorization methods that are concerned in this paper are the base of latent factor models which characterize users and items by vectors. More interaction between user and item factors recommends items for the user. In these systems usually input data are derived from explicit feedback and marked on the dimensions of a matrix which are users and items. The matrix is likely sparse but there are methods that can overcome this issue.
Matrix factorization models are modeled as inner product of users and items on a joint latent factor space. Due to sparsity of the matrix, this method has difficulties. There are methods that can dominate this problem and add data to the matrix but are expensive and create massive amount of data. Therefore, methods using observed data are recommended.
The factor vectors of users and items are learned by Minimizing the regularized square error on the set of known ratings with stochastic gradient descent and alternating least squares which are learning algorithms. The extent of the regularization is determined by cross-validation.
Various data aspects and other application-specific requirements can be considered in matrix factorization approach. As we know, the variation in ratings is because of effects associated with either users or items. This variation is called biases or intercepts and are independent of interactions. To consider the biases, observed rating is broken down into global average, item bias, user bias and user-item interaction. The square error function should be minimized in order to make the system learn.
When there isn’t enough rating from the users, we can incorporate additional sources of information like implicit feedback or user attributes. The matrix factorization model integrates all signal sources so items can get a similar treatment when necessary.
In reality product perceptions, popularity and customer’s inclination evolve over time. This temporal effects should be considered in the system. The matrix factorization approach assumes these terms vary over time: item biases, user biases and user preferences. Temporal dynamics also affect the interaction between users and items.
In another situation, the ratings don’t have same weight or confidence. The matrix factorization model can easily accept varying confidence levels by giving less weight to less meaningful observations.
The more complex factor models with different amount of details, are more accurate. Since there are temporal effects in the data, the temporal components are very important.

The experiments in this paper are done on Netflix dataset. The results are superior to classical nearest-neighbor techniques and their model is memory-efficient. Also many crucial aspects of the data in forms of feedback, temporal dynamics and confidence levels are integrated to make the model accurate.

No comments:

Post a Comment