2010/039 - Unsupervised method to generate page template
- Hervé Dejean
DRR (Document Recognition and Retrieval)- San Francisco, CA, USA, 23-27 January 2011
We propose a method for automatically inferring the different page templates used to layout the document elements. After the identification of labeled elements through Logical Analysis, geometrical relations are computed between these labeled elements, and page templates candidates are generated using frequent related elements. A fuzzy matching operation allows for selecting the most frequent and relevant page templates for a given document. Such page templates can be used to correct errors produced during a different previous steps of the document analysis: zoning, OCR, and logical analysis.