European Conference on Machine Learning & Practice of Knowledge Discovery in Databases ECML-PKDD September 15-19

Tomi Silander presenting paper  co-authored by Arvind Agarwal Saurabh Kataria (XRCW): “Multitask Learning for Sequence Labeling Tasks”.

Anna Stavrianou presenting paper co-authored by Caroline Brun, Tomi Silander and Claude Roux:  “NLP based Feature Extraction for Automated Tweet Classification”, 

Parsing & Semantics

Parsing & Semantics or 'ParSem' concentrates on automatically making sense of electronic documents using semantic analysis. The group concentrates on two main research lines of natural language processing: robust parsing and semantics.

Robust Parsing

Robust parsing provides mechanisms to identify major syntactic structures and major functional relations between words on large collections of unrestricted documents (ex: web pages, newspapers, scientific literature, encyclopedias)....more


With the goal of transforming documents into “meaningful spaces”, the main focus has to be semantics. Semantics is everywhere, hidden in completely different types of documents (e.g. text, images, videos, programs and audio) and at different levels (e.g. document content, document structure)...more

Ongoing Activities

  • Pre-processing components, Xerox Incremental Parsing engine, Grammar development, Entity recognition, Linguistic Normalization
  • Ontology Acquisition
  • Semantic Disambiguation
  • Co-reference and discourse Analysis
  • Current EU projects: GALATEAS, SCOOP, Europeana Connect, SYNC3 and EERQI