Publications
Authors:
  • Caroline Brun , Shachar Mirkin , Scott Nowson , Julien Perez , Claude Roux
Citation:
CLEF 2015 Conference and Labs of the Evaluation Forum, 8-11 September, Toulouse, France
Abstract:
This technical notebook describes the methodology used – and results
achieved – for the PAN 2015 Author Profiling Challenge by the team from Xerox
Research Centre Europe (XRCE). This year, personality traits are introduced
alongside age and gender in a corpus of tweets in four languages – English, Spanish,
Italian and Dutch. We describe a largely language agnostic methodology for
classification which uses language specific linguistic processing to generate features.
We also report on experiments in which we use machine translation to
accommodate for languages in which there is less training data. Native language
results are successful, but socio-demographic signals in language seem to be lost under MT conditions.
Year:
2015
Report number:
2015/046
Attachments: