Linguistically enriched word-sequence kernels for discriminative language modeling
Pierre Mahé, Nicola Cancedda
This paper introduces a method for taking advantage of background linguistic
resources in Statistical Machine Translation. Morphological, syntactic and possibly semantic properties of words are combined by means of an enriched word-sequence kernel. In contrast to alternative formulations, linguistic resources are integrated in such a way as to generate rich composite features defined across the various word representations. Word-sequence kernels find natural applications in the context of discriminative language modeling, where they can help correct specific problems of the translation process. As a first step in this direction, experiments on an artificial problem consisting in the detection of word misordering demonstrate the interest of the proposed kernel construction.
To appear in Learning Machine Translation - to be published by MIT Press in their NIPS Workshop Series