Recognizing Lexical Patterns in Text
Greg Grefenstette, Anne Schiller, Salah Ait-Mokhtar
For most natural language processing tasks, the complexity and richness of the lexicon determines the
ultimate performance of the system. In this chapter we present a number of low-level natural language
processing techniques for recognizing lexical structures in a domain-specific corpus, concentrating on
techniques that precede a manual construction of the lexicon, or that can serve as a basis for an automatic
creation of a lexicon. Recognizing things in text is easier for a computer than recognizing things in images.
But in both domains recognizing means abstracting away surface difference in order to identify two variants of
the same object. A number of techniques have been developed by the computational linguistic community for
abstracting away surface difference in text: tokenization, lemmatization, part-of-speech tagging, and finite-state
pattern recognition. An overview of these techniques will be presented here.
F. Van Eynde, D. Gibbon (eds.): Lexicon Development for Speech and Language Processing. Kluwer Academic Publishers. 2000.