Linguistic parsing of lists in structured documents
Salah Ait-Mokhtar, Veronika Lux, Eva Banik
This paper shows how taking document structure into account helps to improve the performance of linguistic
parsing. We restrict our study to one specific structure in a single markup language : lists in HTML
documents. First we establish a typology of lists based on a corpus study. Then, after describing a
transformation process that creates documents with uniform list markup, we show how the list tags can be
incorporated into a parsing system, and how they enhance performance on every level of parsing.
EACL Workshop on NLP and XML, Budapest, Hungrary, April 12-17, 2003.
eacl2003.ps (326.96 kB)