Parsing (with) Punctuation etc.

Ted Briscoe
In this paper, I describe an approach to robust domain-independent syntactic parsing of unrestricted naturally-occurring (English) input. The technique involves parsillg sequences of part-of-speech and punctuation labels using a unification-based grammar coupled with a probabilistic LR parser. I describe the coverage of several corpora using this grammar and report an experiment to derive a probabilistic LR parser for the grammar from bracketed training data. I describe a! systematic and declarative text grammar for English and its (modular) integration with the syntactic grammar. evaluate the contribution of punctuation to deriving an accurate syntactic analysis through experiments with the tramed parser on identical texts either with or without naturally-occurring punctuation marks. I briefly outline how the resulting system might be used to acquire an accurate valency / argument structure dictionary.
Feschrift for Jan Aarts, Rodopi, Amterdam (eds.) Oostdijk and de Haas