Publications
Authors:
  • Julien Quint
Citation:
Coling 2000
Abstract:
Sumo is a formalism for universal segmentation of text. Its purpose is to provide a framework
for the creation of segmentation applications. It is called #universal# as the formalism itself is
independent of the language of the documents to process and independent of the levels of segmentation
#e.g. words, sentences, paragraphs, morphemes...# considered by the target application.
This framework relies on a layered structure representing the possible segmentations of
the document. This structure and the tools to manipulate it are described, followed by detailed
examples highlighting some features of Sumo.
Year:
2000
Report number:
2000/038