Definite Noun Phrases in Statistical Machine Translation into Danish

Sara Stymne
There are two ways to express definiteness in Danish, which makes it problematic for statistical machine translation (SMT) from English, since the wrong realisation can be chosen. We present a method for identifying and transforming English definite NPs which would likely be expressed in a different way in Danish. The transformed English is used for training a phrase-based SMT system. We show significant improvements of translation quality, of up to 21.1% relative on Bleu, by performing identification and transformation of definiteness, compared to a baseline trained on original English, in two different domains.
Workshop on Extracting and Using Constructions in NLP, Odense, Denmark, 14 May 2009.
