Tuesday, November 29, 2016

Can Movies and Books Collaborate? Cross-Domain Collaborative Filtering for Sparsity Reduction


Collaborative filtering (CF) in recommender systems boils down to analyzing the tabular data. These methods are based on the observed ratings in a rating matrix. the rating matrix is always extremely sparse. They consider how to alleviate the sparsity problem in collaborative filtering by transferring user-item rating knowledge from one task to other related tasks. The target task is represented as a spars rating matrix, containing few observed ratings. Then also get an auxiliary task from another domain, which is related to the target one and has a dense rating matrix. They show how to learn informative and yet compact cluster-level user-item rating patterns from the auxiliary rating matrix and transfer them to the target rating matrix and refer to this collection of patterns to be transferred as a “codebook”. By assuming the user-item rating patterns in target matrix is similar to auxiliary matrix, they can reconstruct the target rating matrix by expanding the codebook.

Monday, November 21, 2016

Evolutionary Undersampling for Extremely Imbalanced Big Data Classification under Apache Spark


In this work, we propose a big data scheme for extremely imbalance problems implemented under Apache Spark, which aims at solving the lack of density problem. First, the whole training dataset is split into chunks, and the positive examples are extracted from it. Then, we broadcast the positive set, so that, all the nodes have a single in-memory copy of the positive samples. For each chunk of the negative data, we aim to obtain a balanced subset of data using a sample of the positive set. Later, EUS is applied to reduce the size of both classes and maximize the classification performance, obtaining a reduced set that is used to learn a model. Finally, the different models are combined to predict the classes of the test set.

I. Triguero, M. Galar, D. Merino, J. Maillo, H. Bustince, and F. Herrera. Evolutionary undersampling for extremely imbalanced big data classification under apache spark. 2016.

Bayesian Personalized Ranking with Multi-Channel User Feedback

In many domains, users provide unary feedback via a number of different ‘channels’. In this paper, we propose an approach called Multi-Feedback Bayesian Personalized Ranking (MF-BPR). The innovation of MFBRP is a sampling method designed to simultaneously exploit unary feedback from multiple channels during training. The key to our approach is to map different feedback channels to different ‘levels’ that reflect the contribution that each type of feedback can have in the training phase.
Our MF-BPR can be considered hybrid, since it simultaneously uses different sources of feedback uses multiple feedback channels simultaneously.
BPR-MF is based on the insight that user feedback collected via various channels reflects different strengths of user preference, and the sampling method of BPR can exploit these differences. In MF-BPR we introduce a non-uniform sampler that takes into account the level (importance) of the feedback channel. We propose a non-uniform distribution for p(L) where the cardinality of feedback as well as the importance of a level is taken into account with a weight factor. In our experiments, we found that the inverse rank of positive levels are good candidates for weights.
The MF-BPR is evaluated on three datasets and its performance is compared with different methods using 4-fold cross validation. The ground truth and relevant recommendations however, are different in the three datasets depending on the problem.

Babak Loni, Roberto Pagano, Martha Larson, and Alan Hanjalic. 2016. Bayesian Personalized Ranking with Multi-Channel User Feedback. In Proceedings of the 10th ACM Conference on Recommender Systems (RecSys '16). ACM, New York, NY, USA, 361-364.

Monday, November 14, 2016

A Review of Dynamic Vehicle Routing Problems


The Vehicle Routing Problem (VRP) formulation was first introduced in 1959 as a generalization of the Traveling Salesman Problem (TSP). In contrast to the classical definition of the vehicle routing problem, real-world applications often include two important dimensions: evolution and quality of information. Evolution of information relates to the fact that in some problems the information available to the planner may change during the execution of the routes. Quality of information reflects possible uncertainty on the available data. In addition, depending on the problem and the available technology, vehicle routes can either be designed statically or dynamically.
Static and stochastic problems are characterized by input partially known as random variables, which realizations are only revealed during the execution of the routes. In dynamic and deterministic problems, part or all of the input is unknown and revealed dynamically during the design or execution of the routes. Similarly, dynamic and stochastic problems have part or all of their input unknown and revealed dynamically during the execution of the routes.
In contrast to their static counterparts, dynamic routing problems involve new elements that increase the complexity of their decisions (more degrees of freedom) and introduce new challenges while judging the merit of a given route plan. In dynamic routing, the ability to redirect a moving vehicle to a new request nearby allows for additional savings.
Different problems (or instances of a same problem) can have different levels of dynamism, which
can be characterized according to two dimensions, the frequency of changes and the urgency of requests. Thus, three metrics have been proposed to measure the dynamism of a problem. The degree of dynamism is defined as the ratio between the number of dynamic requests and the total number of requests. The effective degree of dynamism can be interpreted as the normalized average of the disclosure times. The reaction time is defined as the difference between the disclosure time and the end of the corresponding time window li, highlighting that longer reaction times mean more flexibility to insert the request into the current routes.
In a category of applications, a service request is defined by a customer location and a possible time window; while vehicle routes just fulfill service requests without considering side constraints such as capacity. Approaches for dynamic and deterministic vehicle routing problems can be divided into two categories: those based on periodic reoptimization, and those based on continuous reoptimization.  Periodic reoptimization approaches start at the beginning of the day with a first optimization that produces an initial set of routes. Then, an optimization procedure periodically solves a static problem corresponding to the current state. Continuous reoptimization approaches perform the optimization throughout the day and maintain information on good solutions in an adaptive memory. Whenever the available data changes, a decision procedure aggregates the information from the memory to update the current routing.
Researchers have mainly focused on the routing aspect of the dynamic fleet management. However, in some applications there is more that can be done to improve performance and service level. So a system in which aside from giving a yes/no answer to a customer request, suggests convenient times for the company would be highly desirable in such contexts.