Complexity Based Phrase-Table Filtering for Statistical Machine Translation
Nadi Tomeh, Nicola Cancedda, Marc Dymetman
We describe an approach for filtering phrase tables in a Statistical Machine Translation system, which relies on a statistical independence measure called Noise, first introduced in (Moore, 2004). While previous work by (Johnson et al., 2007) also addressed the question of phrase table filtering, it relied on a simpler independence measure, the p-value, which is theoretically less satisfying than the Noise in this context. In this paper, we use Noise as the filtering criterion, and show that when we partition the bi-phrase tables in several sub-classes according to their complexity, using Noise leads to improvements in BLEU score that are unreachable using pvalue, while allowing a similar amount of pruning of the phrase tables.
MT Summit 2009 (Machine Translation Summit XII), Ottawa, Ontario, Canada, August 26-30, 2009.
Full paper available on MT Summit Website
2009-038.pdf (223.42 kB)