Linguistic parsing of lists in structured documents

Salah Ait-Mokhtar, Veronika Lux, Eva Banik
This paper shows how taking document structure into account helps to improve the performance of linguistic parsing. We restrict our study to one specific structure in a single markup language : lists in HTML documents. First we establish a typology of lists based on a corpus study. Then, after describing a transformation process that creates documents with uniform list markup, we show how the list tags can be incorporated into a parsing system, and how they enhance performance on every level of parsing.
EACL Workshop on NLP and XML, Budapest, Hungrary, April 12-17, 2003.

Attachments (326.96 kB)