Light Parsing as Finite-State Filtering
or a number of language processing tasks, such as information retrieval and information extraction tasks,
pertinent information can be extracted from text without doing a full parse of the individual sentences. The most
common restriction of the parser is to adopt a non-recursive model of the language treated, which allows an
implementation of the parser using efficient finite state tools at the cost of missing some coverage. These light
parsers allow the successive introduction of symbols into the input string wherever specified regular
expressions of words and/or part-of-speech tags match. Recent advances in finite state expression compilation
make writing mark up transducers simpler, leading to quicker implementations of layered finite state parsers.
The resulting parsers are easier to create and maintain. In this article, we describe a light parsing method
using recently created finite state operators. Two applications of this parser are described: grouping adjacent
syntactically related units, and extracting non-adjacent n-ary grammatical relations. A system for evaluating
the parser over a large corpus is described.
Extended Finite State Models of Language.Editor: Andras Kornai. Cambridge University Press, 1999.