Machine Learning for Document Access and Translation

The Machine Learning for Document Access and Translation (MLDAT) group creates effective tools for organizing, accessing and translating multilingual and multimodal document collections.

The document collections and streams that form the backbone of so many operations in modern global enterprises are increasingly large and heterogeneous, they are shared by larger and larger user communities exhibiting complex patterns of social interaction, and they evolve at a fast pace. In such a dynamic context, providing effective document categorization, clustering, and translation solutions requires challenging the state of the art on a daily basis.We choose to look for general, well-founded solutions to the underlying fundamental problems using advanced machine learning methods: letting the software learn a task from examples, instead of trying to program directly a solution too complex for a human to describe.

Our activities are organized around the following two themes:

  • Document Organization and Access. Models and algorithms for multilingual and multimodal document categorization and clustering, for social network analysis, and for novelty detection. Our most successful applications range from enterprise mailroom automation to eDiscovery in corporate litigation processes.
  • Statistical Machine Translation. Models and algorithms for automated translation and for removing language barriers through industry-specific and function-specific solutions beyond what general-purpose solutions can provide.

For both we conduct fundamental research and develop prototypes in partnership with Xerox businesses.

We also participate in European and other government-sponsored collaborative projects, including currently SYNC3, Fragrances and Organic.Lingua. In the past, we coordinated the SMART project on machine learning methods for translation and cross-language information access. Our active participation in the PASCAL 2 European Network of Excellence strengthens our links with some of the best academic research institutions in Europe.