The beta negative binomial distribution for text modeling

St├ęphane Clinchant, Eric Gaussier
We first review in this paper the burstiness and aftereffect of future sampling phenomena, and propose a formal, operational criterion to characterize distributions according to these phenomena. We then introduce the Beta negative binomial distribution for text modeling, and show its relations to several models (in particular to the Laplace law of succession and to the tf-itf model used in the Divergence from Randomness framework of (2). We finally illustrate the behavior of this distribution on text categorization and information retrieval experiments.
ECIR, 30th European Conference on Information Retrieval, Glasgow, Scotland, 30th March - 3rd April 2008.