Finite-State Based Reductionist Parsing for French.
Jean-Pierre Chanod, Pasi Tapanainen
This paper describes a robust finite-state based parser applied to French. The non-deterministic tokeniser
includes a finite-state automaton for simple tokens and a lexical transducer for encoding a wide variety of
multiword expressions. The lexicon attaches morpho-syntactic tags to each token and alternative clause
boundaries inbetween. The parser can parse technical manuals with high accuracy: in a test sample 95% of
both functional and part-of-speech tags were correct. The average number of parses per sentence is low, more
than 92% of sentences produce four or less than four parses, including the correct one.
Andréas Kornai (ed): Extended Finite State Models of Language. Cambridge University Press.
finite-state.pdf (97.38 kB)