Tuesday, March 26, 2019

The paper is accepted!

The paper acceptance/rejection notification date of the paper in the previous post was March 25th. I didn't get any email or notification yesterday, but when I opened my email today after the lunch break and noticed that my paper is accepted, I was so excited! So the first thing I did, was to take a screenshot of the email and sent it to my family to share the happiness with them. My advisor was not in her office but I believe she has received the acceptance email.

The conference is AIED 2019 and will take place in Chicago on June 25th. I should read the reviewers' notes and revise the paper by April 8th and register for the main conference.

Thursday, February 14, 2019

Paper submitted

Early morning on Monday (minutes before 3AM) we submitted our first paper to a conference. The decision will be made in March. Hopefully, the paper gets accepted and I publish more papers as a part of my dissertation. The detail of the paper will be revealed later.

Wednesday, January 16, 2019

Educational Data Mining

After working on Recommendation Systems, especially cross domain recommendation systems, I decided to change my direction in research. So I chose to do research on Educational Data Mining. My focus will be on finding frequent pattern of students while interacting with online education platforms and predict their performance based on the patterns. The patterns are called behaviors and can be any interaction with the system. You will find posts related to this area and my improvements from now on this weblog.

Tuesday, December 6, 2016

Machine Learning that Matters

The contributions of this work are, the clear identification and description of a fundamental problem, suggested first steps towards addressing this gap, the issuance of relevant Impact Challenges to the machine learning community, and the identification of several key obstacles to machine learning  impact, as an aid for focusing future research efforts. Increasingly, ML papers that describe a new algorithm follow a standard evaluation template. After presenting results on synthetic data sets to illustrate certain aspects of the algorithm’s behavior, the paper reports results on a collection of standard data sets, However, in practice direct comparisons fail because we have no standard for reproducibility.

There are also problems with how we measure performance. Most often, an abstract evaluation metric (classification accuracy, root of the mean squared error or RMSE, F-measure is used. 

Authors must demonstrate a “machine learning contribution” that is often narrowly interpreted by reviewers as “the  development of a new algorithm or the explication of a novel theoretical analysis. Reconnecting active research to relevant real-world problems is part of the process of maturing as a research field. The first step in evaluation is to define or select methods that enable direct measurement, the goal is to develop general methods that apply across domains.

Finally, you should consider potential impact when selecting which research problems to tackle, not merely how interesting or challenging they are from the ML perspective. They proposed the following six Impact Challenges as examples of machine learning that matters: they do not focus on any single problem domain, nor a particular technical capability. The goal is to inspire the field of machine learning to take the steps needed to mature into a valuable contributor to the larger world.

Tuesday, November 29, 2016

Can Movies and Books Collaborate? Cross-Domain Collaborative Filtering for Sparsity Reduction


Collaborative filtering (CF) in recommender systems boils down to analyzing the tabular data. These methods are based on the observed ratings in a rating matrix. the rating matrix is always extremely sparse. They consider how to alleviate the sparsity problem in collaborative filtering by transferring user-item rating knowledge from one task to other related tasks. The target task is represented as a spars rating matrix, containing few observed ratings. Then also get an auxiliary task from another domain, which is related to the target one and has a dense rating matrix. They show how to learn informative and yet compact cluster-level user-item rating patterns from the auxiliary rating matrix and transfer them to the target rating matrix and refer to this collection of patterns to be transferred as a “codebook”. By assuming the user-item rating patterns in target matrix is similar to auxiliary matrix, they can reconstruct the target rating matrix by expanding the codebook.

Monday, November 21, 2016

Evolutionary Undersampling for Extremely Imbalanced Big Data Classification under Apache Spark


In this work, we propose a big data scheme for extremely imbalance problems implemented under Apache Spark, which aims at solving the lack of density problem. First, the whole training dataset is split into chunks, and the positive examples are extracted from it. Then, we broadcast the positive set, so that, all the nodes have a single in-memory copy of the positive samples. For each chunk of the negative data, we aim to obtain a balanced subset of data using a sample of the positive set. Later, EUS is applied to reduce the size of both classes and maximize the classification performance, obtaining a reduced set that is used to learn a model. Finally, the different models are combined to predict the classes of the test set.

I. Triguero, M. Galar, D. Merino, J. Maillo, H. Bustince, and F. Herrera. Evolutionary undersampling for extremely imbalanced big data classification under apache spark. 2016.

Bayesian Personalized Ranking with Multi-Channel User Feedback

In many domains, users provide unary feedback via a number of different ‘channels’. In this paper, we propose an approach called Multi-Feedback Bayesian Personalized Ranking (MF-BPR). The innovation of MFBRP is a sampling method designed to simultaneously exploit unary feedback from multiple channels during training. The key to our approach is to map different feedback channels to different ‘levels’ that reflect the contribution that each type of feedback can have in the training phase.
Our MF-BPR can be considered hybrid, since it simultaneously uses different sources of feedback uses multiple feedback channels simultaneously.
BPR-MF is based on the insight that user feedback collected via various channels reflects different strengths of user preference, and the sampling method of BPR can exploit these differences. In MF-BPR we introduce a non-uniform sampler that takes into account the level (importance) of the feedback channel. We propose a non-uniform distribution for p(L) where the cardinality of feedback as well as the importance of a level is taken into account with a weight factor. In our experiments, we found that the inverse rank of positive levels are good candidates for weights.
The MF-BPR is evaluated on three datasets and its performance is compared with different methods using 4-fold cross validation. The ground truth and relevant recommendations however, are different in the three datasets depending on the problem.

Babak Loni, Roberto Pagano, Martha Larson, and Alan Hanjalic. 2016. Bayesian Personalized Ranking with Multi-Channel User Feedback. In Proceedings of the 10th ACM Conference on Recommender Systems (RecSys '16). ACM, New York, NY, USA, 361-364.