Publications
Authors:
  • Salah Ait-Mokhtar , Eva Banik , Veronika Lux
Citation:
Xerox Technical Report
Abstract:
The aim of this report is to show how taking document structure into account helps to improve the performance
of parsing. We restrict the linguistic analysis to technical documents and we consider one specific structure
in a single markup language: lists in html documents. First we establish a typology of lists based on a corpus
study. Then, after describing a transformation process that creates documents with uniform list markup, we
show how the list tags can be incorporated into a XIP grammar, and how they enhance performance on every
level of parsing.
Year:
2002
Report number:
2002/054
Attachments: