![]() |
|
|
|
|
![]() |
|
|
|
|
|
|
|
|
|
|
CA: MORPHOLOGY Morphological analysis is the basic enabling technology for many kinds of text processing. Recognition of word forms is the first step towards part-of-speech tagging, parsing, translation, and other high-level applications. The two central problems in morphology are word formation Words are typically composed of smaller units of meaning, called morphemes. The morphemes that make up a word must be combined in a certain order: piti-less-ness is a word of English but *piti-ness-less is not. morphological and orthographical alternation The shape of a morpheme often depends on the environment: pity is realized as piti in the context of less, die as dy in dying. The CA work on morphology is based on the fundamental insight that both problems can be solved with the help of finite automata:
Lexical transducers have many advantages. They are bidirectional (the same network for both analysis and generation), fast (thousands of words per second), and compact. This technology is protected by by several patents (e.g. US Patent 5,594,641 and 5,625,554). We have created comprehensive morphological analyzers for many languages including English, French, Dutch, German, Hungarian, Italian, Portuguese, and Spanish. More recent developments include Czech, Danish, Finnish, Norwegian, Polish, Romanian, Russian, Swedish and Turkish. The lexical transducer for Arabic demonstrates the applicability of the finite-state technology to the analysis of non-concatenative languages. See our demos.
|
|