XRCE Personal Language Analytics Engine for Multilingual Author Profiling Notebook for PAN at CLEF 2015

Caroline Brun, Shachar Mirkin, Scott Nowson, Julien Perez, Claude Roux
This technical notebook describes the methodology used – and results achieved – for the PAN 2015 Author Profiling Challenge by the team from Xerox Research Centre Europe (XRCE). This year, personality traits are introduced alongside age and gender in a corpus of tweets in four languages – English, Spanish, Italian and Dutch. We describe a largely language agnostic methodology for classification which uses language specific linguistic processing to generate features. We also report on experiments in which we use machine translation to accommodate for languages in which there is less training data. Native language results are successful, but socio-demographic signals in language seem to be lost under MT conditions.
CLEF 2015 Conference and Labs of the Evaluation Forum, 8-11 September, Toulouse, France


2015-046.pdf (151.24 kB)