A Lexical Interface for Finite-State Syntax

Jean-Pierre Chanod, Pasi Tapanainen
this document describes the lexical interface for finite state syntax as it is currently implemented and used for the development of the French constraint grammar. The system includes finte state transducers for multiword expressions, for capitalised, misspelt or unknown words and for accent recovery. It also encodes general multiword expressions such as dates or idioms. The tokeniser includes a finite state automaton for simple tokens and a more complex process for possibly ambuguous, multiword tokens.
MLTT Technical reports (Feb 96)