Structuring documents according to their table of contents
Hervé Dejean, Jean-Luc Meunier
In this paper we present a method for structuring a document according to the information present in its table of contents. The detection of the ToC as well as the determination of the parts it refers to in the document body rely on a series of generic properties characterizing any ToC, while its hierarchization is achieved using clustering techniques. We also report on the robustness and performance of the method before discussing it, in light of related work
DocEng 05, Bristol, UK, November 2-4, 2005.
fp10640-dejean.pdf (298.09 kB)