Publications
Authors:
  • Nadi Tomeh , Nicola Cancedda , Marc Dymetman
Citation:
MT Summit 2009 (Machine Translation Summit XII), Ottawa, Ontario, Canada, August 26-30, 2009.
<BR> Full paper available on <a href=http://www.mt-archive.info/MTS-2009-Tomeh.pdf> MT Summit Website </a>
Abstract:
We describe an approach for filtering phrase tables in a Statistical Machine Translation system, which relies on a statistical independence measure called Noise, first introduced in (Moore, 2004). While previous work by (Johnson et al., 2007) also addressed the question of phrase table filtering, it relied on a simpler independence measure, the p-value, which is theoretically less satisfying than the Noise in this context. In this paper, we use Noise as the filtering criterion, and show that when we partition the bi-phrase tables in several sub-classes according to their complexity, using Noise leads to improvements in BLEU score that are unreachable using pvalue, while allowing a similar amount of pruning of the phrase tables.
Year:
2009
Report number:
2009/038
Attachments: