Publications
Authors:
  • Joern Wuebker , Hermann Ney , Adrià Martinez-Villaronga , Adrià Giménez , Alfons Juan , Christophe Servan , Marc Dymetman , Shachar Mirkin
Citation:
AMTA, Vancouver, Canada, October 22-26, 2014.
Abstract:
For the task of translating scientific video lectures from English into French, we perform a qualitative and quantitative comparison of several data selection techniques, based on cross-entropy and infrequent n-gram criteria. In terms of BLEU, a combination of translation and language model cross-entropy achieves the most stable results. As another important criterion for measuring translation quality in our application, we identify the number of out-of-vocabulary words. Here, infrequent n-gram recovery shows superior performance. Finally, we combine the two selection techniques in order to benefit from both their strengths.
Year:
2014
Report number:
2014/014
Attachments: