XRCE Personal Language Analytics Engine for Multilingual Author Profiling Notebook for PAN at CLEF 2015
Caroline Brun, Shachar Mirkin, Scott Nowson, Julien Perez, Claude Roux
This technical notebook describes the methodology used – and results
achieved – for the PAN 2015 Author Profiling Challenge by the team from Xerox
Research Centre Europe (XRCE). This year, personality traits are introduced
alongside age and gender in a corpus of tweets in four languages – English, Spanish,
Italian and Dutch. We describe a largely language agnostic methodology for
classification which uses language specific linguistic processing to generate features.
We also report on experiments in which we use machine translation to
accommodate for languages in which there is less training data. Native language
results are successful, but socio-demographic signals in language seem to be lost under MT conditions.
CLEF 2015 Conference and Labs of the Evaluation Forum, 8-11 September, Toulouse, France
2015-046.pdf (151.24 kB)