Home page Site map Contact
   

 

SEMANTICS

With the goal of transforming documents into “meaningful spaces”, the main focus has to be semantics. Semantics is everywhere, hidden in completely different types of documents (e.g. text, images, videos, programs and audio) and at different levels (e.g. document content, document structure). Because most of the “semantics” that is nowadays accessible in documents lies in texts, we concentrate on the semantic content analysis of the textual parts of documents. This textual part also includes document structure (for instance information already encoded into tags and user profiles). Our goal is not to investigate the fundamental nature of meaning, so we concentrate on the linguistic meaning. 

DESCRIPTION :

 

A unifying theme in the ongoing research in the ParSem area is an emphasis on the role of context  in determining meaning. We are particularly interested in theoretical models of communication, language, dialogue, computation, and inference which take into account the context in which these activities are occurring. 
We are also interested in applying research results to practical applications and real-world problems.  Our general application focus is information discovery.

Our current research themes include:
 

  •  Ontology Acquisition: The word “ontology” traditionally refers in philosophy to the description of the universe. In computational linguistic, the word ontology applies to the description of knowledge. Ontology in that sense is defined as a set of concepts and a set of relations. Each concept is described against the other concepts through one or more relations. 

  • We build tools that can discover that two concepts are related somehow, by noticing that expressions denoting those concepts are frequently linked together syntactically in a corpus.  We explore the idea that the range of syntactic constructions that can be used to link two concepts may provide information about the nature of the relationship(s) that can exist between those concepts.  This information could subsequently be used to enrich the representation of a document's content with entities and relations that are implied, but not explicitly stated.
     
     
  •  Semantic Disambiguation : WSD aims at associating a given word in text or discourse with a definition or meaning or semantic class (sense) that is distinguishable from other meanings potentially attributable to that word. This task involves two steps:

  • The determination of all different senses for every word relevant at least to the text or discourse under consideration.  Precise de definition of what a sense is a matter of debate but much of recent approaches rely on predefined senses such as a list of senses given in a dictionary, associated words, entries in a transfer dictionary, etc. 
    The assignment of word to senses is done using 2 sources of information:
    The linguistic context of the word to be disambiguated (and maybe some extra-linguistic knowledge about situation, etc.)
    External knowledge sources including lexical, encyclopedic, etc.

    All disambiguation processes  involve matching the context of an instance of the word to be disambiguated with information from an external knowledge source (knowledge-driven WSD) or information about the contexts of previously disambiguated instances of the word derived from corpora (data-driven WSD or corpus-based WSD). 
    Two major types of techniques are emerging in WSD: Statistical supervised systems and unsupervised knowledge-based systems. But in the last Senseval/Romanseval competition, it has been noted that several unsupervised systems made use of the training data to fine-tune their results and that several supervised systems had a lexical resource as a fall back where the data were insufficient. A combination of methodologies seems therefore to be the trend for the future of WSD.. 
     

  •  Linguistic Normalization: The aim of normalization is to provide, taking as a basis a syntactic description of an input text, a more abstract representation of this input text having in order to make a step towards semantic representation. Current work on normalization done by ParSem area can be seen under two points of view:

  •  
      • General normalization
      • Domain specific normalization
       
  •  Co-reference: The coreference resolution task aims at establishing equivalence between entities that are mentioned in a text. The first phase of the project deals with pronominal coreference. It mainly focuses on personal pronouns (I, he/she, they, ...), which were shown to be the most frequent in texts.

  •  
  •  Discourse Analysis: In this project we explore representations that facilitate the recognition of non-lexicalized, non-conventional expressions for a given concept.

ROBUST PARSING

ONGOING ACTIVITIES

  • Pre-processing components
  • XIP engine
  • Grammar development
  • Entity recognition
  • Ontology Acquisition
  • Semantic Disambiguation
  • Linguistic Normalization
  • Co-reference
  • Discourse Analysis
  • VIKEF
Past Projects

Contact us

People

Search the XRCE Publications database