Hybrid adaptation of Named Entity Recognition systems for Statistical Machine Translation purposes
Vassilina Nikoulina, Agnes Sandor, Marc Dymetman
Appropriate Named Entity handling is important for Statistical Machine Translation. In this
work we address the challenging issues of generalization and sparsity of NEs in the context of
SMT. Our approach uses the source NE Recognition (NER) system to generalize the training data
by replacing the recognized Named Entities with place-holders, thus allowing a Phrase-Based
Statistical Machine Translation (PBMT) system to learn more general patterns. At translation
time, the recognized Named Entities are handled through a specifically adapted translation
model, which improves the quality of their translation. We add a post-processing step to a
standard NER system in order to make it more suitable for integration with SMT and we also
learn a prediction model for deciding between options for translating the Named Entities,
based on their context and on their impact on the translation of the entire sentence. We show
important improvements in terms of BLEU and TER scores already after integration of NER into
SMT, but especially after applying the SMT-adapted post-processing step to the NER component.
Second Workshop on Applying Machine Learning Techniques to Optimise the Division of Labour in Hybrid MT (ML4HMT-12), Mumbai, India, December 9th, 2012.
The article is available on this internet website :