With the goal of transforming documents into “meaningful spaces”, the main focus has to be semantics. Semantics is everywhere, hidden in completely different types of documents (e.g. text, images, videos, programs and audio) and at different levels (e.g. document content, document structure). Because most of the “semantics” that is nowadays accessible in documents lies in text, we concentrate on the semantic content analysis of the textual parts of documents. This textual part also includes document structure (for instance information already encoded into tags and user profiles). Our goal is not to investigate the fundamental nature of meaning, so we concentrate on the linguistic meaning.
Description
A unifying theme in the ongoing research in the ParSem area is an emphasis on the role of context in determining meaning. We are particularly interested in theoretical models of communication, language, dialogue, computation and inference which take into account the context in which these activities occur.
We are also interested in applying research results to practical applications and real-world problems. Our general application focus is information discovery.
Our current research themes include:
- Ontology Acquisition: In philosophy the word “ontology” traditionally refers to the description of the universe. In computational linguistics, the word ontology applies to the description of knowledge. In this sense, an ontology is defined as a set of concepts and a set of relations. Each concept is described against the other concepts through one or more relations.
- Semantic Disambiguation : aims at associating a given word in text or discourse with a definition or meaning or semantic class (sense) that is distinguishable from other meanings potentially attributable to that word. This task involves two steps:
- Linguistic Normalization: Taking as a basis a syntactic description of an input text, normalization provides a more abstract representation of this input text to take a step towards semantic representation. Current work on normalization done by the ParSem area can be seen from two points of view:
- Co-reference: The coreference resolution task is to establish equivalence between entities that are mentioned in a text. The first phase of the project deals with pronominal coreference. It mainly focuses on personal pronouns (I, he/she, they...), which are the most frequent in texts.
- Discourse Analysis: In this project we explore representations that facilitate the recognition of non-lexicalized, non-conventional expressions for a given concept.
-
Temporal Processing: The aim of temporal processing is to automatically associate a time stamp to the processes appearing in texts and to characterize aspects of those processes. The temporal dimension is a very important parameter in tasks like information extraction and question answering as events and states can be true at a certain moment and false at another moment. Taking into account this temporal dimension is the ultimate goal of temporal processing.
In order to acheive this task, we first developed modules to recognise temporal expressions (TE) in text. We then normalize these temporal expressions compare them to set a partial temporal order of events along the time-line.
Very different information sources are necessary: lexical semantic information, information about the syntactic structures and also information about tense and aspect.
The work on temporal processing has been undertaken for English, French and Portuguese (in collaboration with L2f-InescID in Lisbon, for the latter) and modules for temporal processing for these languages are under development. In our approach we try to take into account current development in temporal processing (TimeML, TIDES).