The Document Content Models research explored formalisms and techniques for specifying, manipulating and exploiting the semantic structures of documents, seen as global, cohesive, objects. Document representations focus on high-level communicative goals; they are specified through constraint mechanisms which may involve interaction with external knowledge bases. Applications include controlled authoring, interactive generation, natural language interfaces, global document content analysis, document normalization.
The MDA (Multilingual Document Authoring) project provides interactive tools, such as context-aware menus, for assisting monolingual writers in the production of multilingual documents. These tools extend conventional syntax-driven SGML or XML editors so that choices down to the word-level are possible when authoring the document content. In addition, dependencies between two distant parts of the document can be specified in such a way that a change in one part of the document is immediately reflected in a change in some other part of the document.
The author's choices have language-independent meanings (example: choosing between a solution and an emulsion in a drug description document), which are automatically rendered in any of the languages known to the system, along with their grammatical consequences on the surrounding text. Although the author is not explicitly following standards, the text produced by the system is implicitly controlled both:
Document Normalization is the interactive process of legacy document analysis into some well-defined and controlled document content model and the generation of a corresponding normalized document.