Natural language understanding (NLU) encompasses our research to automatically extract and understand the information in text, which is a core part of our offering in Text Analytics. From articles to blogs to tweets, the meaning in text in multiple languages can be converted to structured data, using information extraction and natural language understanding technologies, in order to facilitate automatic decision making
We conduct active research in fundamental topics including dependency parsing for syntax and semantics, machine learning for structured prediction, distributional compositional models of meaning, and weak supervision for domain adaptation.
Information Extraction finds specifically targeted types of information, such as named entities (people or company names, dates, locations), numerical figures, or semantic relations (e.g. London is-the-capital-of UK). Our Information Extraction technology exploits statistical models and sophisticated lexical, syntactic and semantic analyses that normalise the large variety of ways of saying things into a consistent form with reduced ambiguity.
Automated Information Access: Our research in interactive and multi-modal document machine learning has been deployed across Xerox in various domains including litigation and transaction processing, saving millions of dollars per year. We work on automatic document clustering and topic extraction, document categorization and document segmentation, with a focus on scalability and multi-modality.
Multi-linguality: We have considerable expertise in machine translation, and its deployment in business processes, including domain adaptation and quality estimation of translation. More broadly, our information access tools can be extended to work multi and cross-lingual, by mapping several languages into a common space. We also develop computer-aided authoring tools to compose documents in a language you don't speak.