Publications
Authors:
  • Pierre Mahé , Nicola Cancedda
Citation:
To appear in Learning Machine Translation - to be published by MIT Press in their NIPS Workshop Series
Abstract:
This paper introduces a method for taking advantage of background linguistic
resources in Statistical Machine Translation. Morphological, syntactic and possibly semantic properties of words are combined by means of an enriched word-sequence kernel. In contrast to alternative formulations, linguistic resources are integrated in such a way as to generate rich composite features defined across the various word representations. Word-sequence kernels find natural applications in the context of discriminative language modeling, where they can help correct specific problems of the translation process. As a first step in this direction, experiments on an artificial problem consisting in the detection of word misordering demonstrate the interest of the proposed kernel construction.
Year:
2008
Report number:
2007/058