Publications
Authors:
  • Julien Quint
Citation:
Proceedings of NLP 2000, pages 16-26, Patras, Greece, June, 2000.
Abstract:
Sumo is a formalism for universal segmentation
of text. Its purpose is to provide a framework
for the creation of segmentation applications. It
is called #universal# as the formalism itself is
independent of the language of the documents
to process and independent of the levels of segmentation
#e.g. words, sentences, paragraphs,
morphemes...# considered by the target application.
This framework relies on a layered structure
representing the possible segmentations of
the document. This structure and the tools to
manipulate it are described, followed by detailed
examples highlighting some features of Sumo.
Year:
2000
Report number:
2000/008