|
|
|
![]() |
|
|
|
|
|
|
XRCE Overview ![]() Travel information ![]() Career ![]() Internships ![]() People ![]() |
|
|
|
Xerox Research Centre Europe
Principal Scientist, Document Structure group. PhD in Computer Science from Kiev State University, Ukraine (1990). Research interests cover XML data and schema management, machine learning, information extraction from the Web, data mediation and integration, the Hidden Web and distributed information retrieval. I moved to XRCE in 1996 and joined Constraint Based Knowledge Brokers project on data integration from heterogeneous corporate and Web resources. I developed techniques for the query mapping and translation, the mechanism of semantic cache for the efficient query processing, and the wrapper generation for Web resources. In 1999, the induction of Web wrappers became a separate project, called Iwrap. The Iwrap combines methods of grammatical inference with the active learning, it offers a GUI-based interface for inferring wrapper instances from a small number of annotations. The wrapper designer is a core component of the AskOnce content integration platform, commercialized by Xerox in 2000 and acquired by Documentum in 2004. The analysis of commercial deployment of AskOnce wrappers during the 2003-2005 period has been recently published in ACM SIGMOD'2006 (the full list of wrappers is here). In 2002-2003, we developed the Web Information Discovery prototype for the automatic discovery of the Hidden Web resources. Other projects I contributed to are the Knowledge Pump personalized recommendation system, the schema induction for XML collections and the Intelligent Adviser for authoring XML documents. In the current Legacy Document Conversion project, we are developping novel methods for the semantic annotation of layout-oriented documents and their conversion to XML. We address various issues of the document analysis and conversion, with a strong emphasis on machine learning and hybrid methods for real mass document conversion tasks.
Publications
Patents
|
|