Publications
Authors:
  • Andre Kempe
Citation:
Proc. CoNLL'99, Bergen, Norway, pp. 7-13
Abstract:
The paper presents an entropy­based approach to segment a corpus into words, when no additional information
about the corpus or the language, and no other resources such as a lexicon or grammar are available. To
segment the corpus, the algorithm searches for separators, without knowing a priori by which symbols they
are constituted. Good results can be obtained with corpora containing 'clearly perceptible' separators such as
blank or new­line.
Year:
1999
Report number:
1999/052
Attachments: