Document annotation by active learning techniques

Boris Chidlovskii, Loïc Lecerf
We present an integrated framework for the document conversion from legacy formats to XML format. We describe the Leg Doc project, aimed at automating the conversion of layout annotations layout-oriented formats like PDE, PS and HTML to semantic-oriented annotations. A toolkit of different components covers complementary techniques the logical document analysis and semantic annotations with the methods of machine learning. We report on the preliminary results of deplying active laring techniques for.
ACM Document Engineering Symposium, Amsterdam, The Netherland, 10-13 oCT. 2006.