Translating with non contiguous phrase

Michel Simard, Nicola Cancedda, Bruno Cavestro, Marc Dymetman, Eric Gaussier, Cyril Goutte, Philippe Langlais, Kenji Yamada, Arne Mauser
This paper presents a phrase-based statistical machine translation method, based on non-contiguous phrase, i.e. phrases with gaps. A method for producing such phrases from a word aligned corpora is proposed. A statistical translation model is also presented that deals such phrases, as well as a training method based on the maximization of translation accuracy, as measured with the NIST evaluation metric. Translations are produced by means of a beam-search decoder. Experimental results are presented, that demonstrate how the proposed method allows to better generalize from the training data.
HLT/EMNLP: Human Language Technology Conference/Conference on Empirical methods in natural language processing, Vancouver, Canada, October 6-8, 2005.


simard05translating.pdf (132.41 kB)