Morteza Haghir Chehreghani
European Conference on Information Retrieval 2017 (ECIR 2017), Aberdeen, UK, 8-13 April, 2017.
We study the user profile completion and enrichment problem, where the goal is to estimate the unknown values of user profiles. We investigate how the type of the features (categorical or continuous) suggests the use of a specific approach for this task. In particular, in this context, we validate the hypothesis that a classification method like K-nearest neighbor search fits better for categorical features and matrix factorization methods such as Non-negative Matrix Factorization perform superior on continuous features. We study different variants of K-nearest neighbor search (with different metrics) and demonstrate how they perform in different settings. Moreover, we investigate the impact of shifting the variables on the quality of (non-negative) factorization and the prediction error. We validate our methods via extensive experiments on real-world datasets and, finally, based on the results and observations, we discuss a hybrid approach to accomplish this task.
Report number: