Parsing (with) Punctuation etc.
In this paper, I describe an approach to robust domain-independent syntactic parsing of unrestricted
naturally-occurring (English) input. The technique involves parsillg sequences of part-of-speech and
punctuation labels using a unification-based grammar coupled with a probabilistic LR parser.
I describe the coverage of several corpora using this grammar and report an experiment to derive
a probabilistic LR parser for the grammar from bracketed training data. I describe a! systematic and
declarative text grammar for English and its (modular) integration with the syntactic grammar.
evaluate the contribution of punctuation to deriving an accurate syntactic analysis through experiments
with the tramed parser on identical texts either with or without naturally-occurring punctuation marks.
I briefly outline how the resulting system might be used to acquire an accurate valency / argument structure
Feschrift for Jan Aarts, Rodopi, Amterdam (eds.) Oostdijk and de Haas