Publications
Authors:
  • Jérôme Fuselier , Boris Chidlovskii
Citation:
To appear in Document Numérique
Abstract:
More and more companies are migrating their legacy document management systems toward XML format, the industrial standard for data exchange. In order to reduce the migration cost we propose an approach aimed at automating the conversion of layout-oriented documents to semantic-oriented annotations. The conversion module uses supervised machine learning technique to learn a conversion model for a collection. The conversion is achieved through a semantic annotation of the document content and structuring the annotation, accordingly to a XML schema that specify the class of target documents.
Year:
2005
Report number:
2005/050
Attachments: