Publications
Authors:
  • Salah Ait-Mokhtar , Veronika Lux , Eva Banik
Citation:
EACL Workshop on NLP and XML, Budapest, Hungrary, April 12-17, 2003.
Abstract:
This paper shows how taking document structure into account helps to improve the performance of linguistic
parsing. We restrict our study to one specific structure in a single markup language : lists in HTML
documents. First we establish a typology of lists based on a corpus study. Then, after describing a
transformation process that creates documents with uniform list markup, we show how the list tags can be
incorporated into a parsing system, and how they enhance performance on every level of parsing.
Year:
2003
Report number:
2003/012
Attachments: