Publications
Authors:
  • Jean-Pierre Chanod , Boris Chidlovskii , Hervé Dejean , Olivier Fambon , Jérôme Fuselier , Thierry Jacquin , Jean-Luc Meunier
Citation:
9th European Conference on Research and Advanced Technology for Digital Libraries, Vienna, Austria, September 18-23, 2005.
Abstract:
We present an integrated framework for the document conversion from legacy formats to XML format. We describe the LegDoC project, aimed at automating the conversion of layout annotations layout-oriented formats like PDF, PS and HTML to semantic-oriented annotations. A toolkit of different components covers complementary techniques the logical document analysis and semantic annotations with the methods of machine learning. We use a real case conversion project as a driving example to exemplify different techniques implemented in the project.
Year:
2005
Report number:
2005/016