Publication Search Form

Keywords

Authors

Year

We found publication with these paramters.

Experiments in Unsupervised Entropy­Based Corpus Segmentation

Andre Kempe
The paper presents an entropy­based approach to segment a corpus into words, when no additional information about the corpus or the language, and no other resources such as a lexicon or grammar are available. To segment the corpus, the algorithm searches for separators, without knowing a priori by which symbols they are constituted. Good results can be obtained with corpora containing 'clearly perceptible' separators such as blank or new­line.
Proc. CoNLL'99, Bergen, Norway, pp. 7-13
1999
1999/052

Attachments

kempe99.pdf (228.60 kB)

kempe99.ps.gz (69.03 kB)