Numbered Sequence Detection in Documents
We present in this work a method to detect numbered sequences in a document. The method relies on the following steps: first, all potential "numbered
patterns" are automatically extracted from the document. Secondly, possible coherent sequences are built using pattern incrementality (called incremental
relation). Finally possible wrong links between items are corrected using the notion of optimization context. An evaluation of the method is presented and
weaknesses and possible improvements are discussed.
DRR 2009 (Document Recognition and Retrieval), San Jose, CA, USA, 20-22 January 2010
2009-052.pdf (273.68 kB)