Keywords

Authors

Year

Traitement automatique pour la Migration de Documents Num?riques vers XML

Authors: J Fuselier, Boris Chidlovskii
To appear in Document Num?rique
More and more companies are migrating their legacy document management systems toward XML format, the industrial standard for data exchange. In order to reduce the migration cost we propose an approach aimed at automating the conversion of layout-oriented documents to semantic-oriented annotations. The conversion module uses supervised machine learning technique to learn a conversion model for a collection. The conversion is achieved through a semantic annotation of the document content and structuring the annotation, accordingly to a XML schema that specify the class of target documents.
Year: 2005
Report number: 2005/050

Attachments

fuselier_chidlovskii.pdf (355.86 kB)