Publications
Authors:
  • Jean-Pierre Chanod , Pasi Tapanainen
Citation:
MLTT Technical reports (Feb 96)
Abstract:
this document describes the lexical interface for finite state syntax as it is currently implemented and
used for the development of the French constraint grammar. The system includes finte state transducers for
multiword expressions, for capitalised, misspelt or unknown words and for accent recovery. It also encodes
general multiword expressions such as dates or idioms. The tokeniser includes a finite state automaton for
simple tokens and a more complex process for possibly ambuguous, multiword tokens.
Year:
1996
Report number:
1996-025