Publications
Authors:
  • Jean-Pierre Chanod , Pasi Tapanainen
Citation:
Andréas Kornai (ed): Extended Finite State Models of Language. Cambridge University Press.
Abstract:
This paper describes a robust finite-state based parser applied to French. The non-deterministic tokeniser
includes a finite-state automaton for simple tokens and a lexical transducer for encoding a wide variety of
multiword expressions. The lexicon attaches morpho-syntactic tags to each token and alternative clause
boundaries inbetween. The parser can parse technical manuals with high accuracy: in a test sample 95% of
both functional and part-of-speech tags were correct. The average number of parses per sentence is low, more
than 92% of sentences produce four or less than four parses, including the correct one.
Year:
1997
Report number:
1997/308
Attachments: