Internship

Discriminative Language Models for Statistical Machine Translation

Unit: Grenoble/CLT

Nicola Cancedda - nicola.cancedda@xrce.xerox.com

Duration: 3-6 months
Start Date: January 2009 or later

The main research lines within the Cross Language Technologies (CLT) area at XRCE are Statistical Machine Translation, Cross-Lingual Information Retrieval and Machine Learning Techniques for Cross-Lingual Applications. CLT is currently coordinating the European Project SMART (Statistical Multilingual Analysis for Retrieval and Translation) [http://www.smart-project.eu].

One of the core components of every Statistical Machine Translation (SMT) system is the so-called Language Model (LM). In most cases, this is a model of a probability distribution over all possible sentences in the language into which one is translating (the 'target' language), based on some conditional independence assumptions. In alternative or in combination with these LMs, trained only on large amount of fluent text, one can envisage 'discriminative' LMs, expressly trained to distinguish between positive examples of fluent text and negative examples of non-fluent text.

We are looking for an intern to work on the topic of discriminative language models for SMT under the supervision of experienced XRCE researchers. The task will consist in extending existing research based on Factored Sequence Kernels and integrating a discriminative LM with XRCE's SMT system Matrax.

The ideal candidate will be a strong Master or Ph.D. student with background in statistical machine translation and/or machine learning, and will be fluent in C/C++ and/or Python. Some specific knowledge and practice of kernel methods will be a plus.

XRCE provides an informal and relaxed working environment situated in the Parc de Maupertuis in Meylan, just outside of Grenoble, France. The nearest ski-slopes are a mere 30 minutes drive and are visible from the lab. For non-skiers the area offers ample opportunities for hiking, climbing, rafting, biking, and paragliding. Grenoble is a pleasant student town with a medieval centre and some very good restaurants and bars. The official language at the centre is English.

The Xerox Research Centre Europe (XRCE) is a young, dynamic research organization, which aims at creating innovative document technologies to support growth in Xerox content and document management services across the different Xerox businesses

XRCE: Château

XRCE is both a multicultural and multidisciplinary organization set in Grenoble, France. Our domains of research stretch from the social sciences to computing. We have renowned expertise in natural language applications, work practice studies, image-based document processing, distributed applications and knowledge management agents. The diversity of culture and disciplines at XRCE makes it an interesting and stimulating environment to work in, leading to often unexpected discoveries!

XRCE is part of the Xerox Innovation group made up of 800 researchers and engineers in four world-renowned research and technology centres. Xerox is an equal opportunity employer.

The Grenoble site is set in a park in the heart of the French Alps in a stunning location only a few kilometers from the city centre. The city of Grenoble has a large scientific community made up of national research institutes (CNRS, Universities, INRIA) and private industries. Stimulated also by the presence of a large student community, Grenoble has become a resolutely modern city, with a rich heritage and a vibrant cultural scene. It is a lively and cosmopolitan place, offering a host of leisure opportunities. Winter sports resorts just half an hour from campus and three natural parks at the city limits make running, skiing, trekking, climbing and paragliding easily available.
Grenoble is close to both the Swiss and Italian borders.