A Lexical Interface for Finite-State Syntax
Jean-Pierre Chanod, Pasi Tapanainen
this document describes the lexical interface for finite state syntax as it is currently implemented and
used for the development of the French constraint grammar. The system includes finte state transducers for
multiword expressions, for capitalised, misspelt or unknown words and for accent recovery. It also encodes
general multiword expressions such as dates or idioms. The tokeniser includes a finite state automaton for
simple tokens and a more complex process for possibly ambuguous, multiword tokens.
MLTT Technical reports (Feb 96)