Publications
Authors:
  • HervĂ© Dejean
Citation:
DRR (Document Recognition and Retrieval)- San Francisco, CA, USA, 23-27 January 2011
Abstract:
We propose a method for automatically inferring the different page templates used to layout the document elements. After the identification of labeled elements through Logical Analysis, geometrical relations are computed between these labeled elements, and page templates candidates are generated using frequent related elements. A fuzzy matching operation allows for selecting the most frequent and relevant page templates for a given document. Such page templates can be used to correct errors produced during a different previous steps of the document analysis: zoning, OCR, and logical analysis.
Year:
2011
Report number:
2010/039
Attachments: