Home page Site map Contact
  

 

CA: MORPHOLOGY

Morphological analysis is the basic enabling technology for many kinds of text processing. Recognition of word forms is the first step towards part-of-speech tagging, parsing, translation, and other high-level applications.

The two central problems in morphology are

    word formation

    Words are typically composed of smaller units of meaning, called morphemes. The morphemes that make up a word must be combined in a certain order: piti-less-ness is a word of English but *piti-ness-less is not.

    morphological and orthographical alternation

    The shape of a morpheme often depends on the environment: pity is realized as piti in the context of less, die as dy in dying.

The CA work on morphology is based on the fundamental insight that both problems can be solved with the help of finite automata:

  1. the allowed combinations of morphemes can be encoded as a finite-state network;

  2. the rules that determine the form of each morpheme can be implemented as finite-state transducers;

  3. the lexicon network and the rule transducers can be composed into a single automaton, a lexical transducer, that contains all the morphological information about the language including derivation, inflection, and compounding.

Lexical transducers have many advantages. They are bidirectional (the same network for both analysis and generation), fast (thousands of words per second), and compact. This technology is protected by by several patents (e.g. US Patent 5,594,641 and 5,625,554).

We have created comprehensive morphological analyzers for many languages including English, French, Dutch, German, Hungarian, Italian, Portuguese, and Spanish.

More recent developments include Czech, Danish, Finnish, Norwegian, Polish, Romanian, Russian, Swedish and Turkish. The lexical transducer for Arabic demonstrates the applicability of the finite-state technology to the analysis of non-concatenative languages.

See our demos.

 

Back to Tools for Natural Language Processing