Collaborative
filtering (CF) in recommender systems boils down to analyzing the tabular data.
These methods are based on the observed ratings in a rating matrix. the rating
matrix is always extremely sparse. They consider how to alleviate the sparsity problem
in collaborative filtering by transferring user-item rating knowledge from one
task to other related tasks. The target task is represented as a spars rating
matrix, containing few observed ratings. Then also get an auxiliary task from
another domain, which is related to the target one and has a dense rating
matrix. They show how to learn informative and yet compact cluster-level
user-item rating patterns from the auxiliary rating matrix and transfer them to
the target rating matrix and refer to this collection of patterns to be
transferred as a “codebook”. By assuming the user-item rating patterns in
target matrix is similar to auxiliary matrix,
they can reconstruct the target rating matrix by expanding the codebook.
Tuesday, November 29, 2016
Monday, November 21, 2016
Evolutionary Undersampling for Extremely Imbalanced Big Data Classification under Apache Spark
In this work, we propose a
big data scheme for extremely imbalance problems implemented under Apache Spark,
which aims at solving the lack of density problem. First, the whole training
dataset is split into chunks, and the positive examples are extracted from it.
Then, we broadcast the positive set, so that, all the nodes have a single in-memory
copy of the positive samples. For each chunk of the negative data, we aim to
obtain a balanced subset of data using a sample of the positive set. Later, EUS
is applied to reduce the size of both classes and maximize the classification
performance, obtaining a reduced set that is used to learn a model. Finally,
the different models are combined to predict the classes of the test set.
I. Triguero, M. Galar, D. Merino, J. Maillo, H. Bustince, and F. Herrera. Evolutionary undersampling for extremely imbalanced big data classification under apache spark. 2016.
I. Triguero, M. Galar, D. Merino, J. Maillo, H. Bustince, and F. Herrera. Evolutionary undersampling for extremely imbalanced big data classification under apache spark. 2016.
Bayesian Personalized Ranking with Multi-Channel User Feedback
In many domains, users provide unary feedback
via a number of different ‘channels’. In this paper, we propose an approach
called Multi-Feedback Bayesian Personalized Ranking (MF-BPR). The innovation of
MFBRP is a sampling method designed to simultaneously exploit unary feedback
from multiple channels during training. The key to our approach is to map
different feedback channels to different ‘levels’ that reflect the contribution
that each type of feedback can have in the training phase.
Our MF-BPR can be considered hybrid, since it
simultaneously uses different sources of feedback uses multiple feedback
channels simultaneously.
BPR-MF is based on the insight that user
feedback collected via various channels reflects different strengths of user
preference, and the sampling method of BPR can exploit these differences. In
MF-BPR we introduce a non-uniform sampler that takes into account the level (importance) of the
feedback channel. We propose a non-uniform distribution for p(L) where the
cardinality of feedback as well as the importance of a level is taken into
account with a weight factor. In our experiments, we found that the inverse
rank of positive levels are good candidates for weights.
The MF-BPR is evaluated on three datasets and its performance is compared with different methods using 4-fold cross validation. The ground truth and relevant recommendations however, are different in the three datasets depending on the problem.
Babak Loni, Roberto Pagano, Martha Larson, and Alan Hanjalic. 2016. Bayesian Personalized Ranking with Multi-Channel User Feedback. In Proceedings of the 10th ACM Conference on Recommender Systems (RecSys '16). ACM, New York, NY, USA, 361-364.
The MF-BPR is evaluated on three datasets and its performance is compared with different methods using 4-fold cross validation. The ground truth and relevant recommendations however, are different in the three datasets depending on the problem.
Babak Loni, Roberto Pagano, Martha Larson, and Alan Hanjalic. 2016. Bayesian Personalized Ranking with Multi-Channel User Feedback. In Proceedings of the 10th ACM Conference on Recommender Systems (RecSys '16). ACM, New York, NY, USA, 361-364.
Monday, November 14, 2016
A Review of Dynamic Vehicle Routing Problems
The
Vehicle Routing Problem (VRP) formulation was first introduced in 1959 as a
generalization of the Traveling Salesman Problem (TSP). In contrast to the
classical definition of the vehicle routing problem, real-world applications
often include two important dimensions: evolution and quality of information. Evolution
of information relates to the fact that in some problems the information
available to the planner may change during the execution of the routes. Quality
of information reflects possible uncertainty on the available data. In
addition, depending on the problem and the available technology, vehicle routes
can either be designed statically or dynamically.
Static
and stochastic problems are characterized by input partially known as random
variables, which realizations are only revealed during the execution of the
routes. In dynamic and deterministic problems, part or all of the input is
unknown and revealed dynamically during the design or execution of the routes. Similarly,
dynamic and stochastic problems have part or all of their input unknown and
revealed dynamically during the execution of the routes.
In
contrast to their static counterparts, dynamic routing problems involve new
elements that increase the complexity of their decisions (more degrees of
freedom) and introduce new challenges while judging the merit of a given route
plan. In dynamic routing, the ability to redirect a moving vehicle to a new
request nearby allows for additional savings.
Different
problems (or instances of a same problem) can have different levels of
dynamism, which
can
be characterized according to two dimensions, the frequency of changes and the
urgency of requests. Thus, three metrics have been proposed to measure the
dynamism of a problem. The degree of dynamism is defined as the ratio between
the number of dynamic requests and
the total number of requests. The effective degree of dynamism can be
interpreted as the normalized average of the disclosure times. The reaction
time is defined as the difference between the disclosure time and the end of the
corresponding time window li,
highlighting that longer reaction times mean more flexibility to insert the
request into the current routes.
In
a category of applications, a service request is defined by a customer location
and a possible time window; while vehicle routes just fulfill service requests
without considering side constraints such as capacity. Approaches for dynamic and
deterministic vehicle routing problems can be divided into two categories:
those based on periodic reoptimization, and those based on continuous
reoptimization. Periodic reoptimization
approaches start at the beginning of the day with a first optimization that
produces an initial set of routes. Then, an optimization procedure periodically
solves a static problem corresponding to the current state. Continuous
reoptimization approaches perform the optimization throughout the day and maintain
information on good solutions in an adaptive memory. Whenever the available data
changes, a decision procedure aggregates the information from the memory to
update the current routing.
Researchers
have mainly focused on the routing aspect of the dynamic fleet management. However,
in some applications there is more that can be done to improve performance and
service level. So a system in which aside from giving a yes/no answer to a
customer request, suggests convenient times for the company would be highly desirable
in such contexts.
Subscribe to:
Posts (Atom)