A Document Frequency Constraint for Pseudo-Relevance Feedback Models
Stéphane Clinchant, Eric Gaussier
We study in this paper the behavior of several PRF models, and display their main characteristics, This will lead us to introduce a new heuristic constraint for PRF models, referred to as the Document Frequency (DF) constraint. We then analyze, from a theoretical point
of view, state-of-the-art PRF models according to their relation with this constraint. This analysis reveals that the standard mixture model for PRF in the language modeling family does not satisfy the DF constraint. We then conduct a series of experiments in orderto see whether the
DF constraint is valid or not. To do so, we performed tests with an oracle and a simnple family of tf-idf functions based on a prameter k controlling the convexity/concavity ofthe function. Both the oracle and the results obtained with this family offunctions validate the DF constraint.
CORIA, Avignon, France, March 16-18, 2011.