Publication Search Form

Keywords

Authors

Year

We found publication with these paramters.

Wrapping Web Information Providers by Transducer Induction

Boris Chidlovskii
Modern agent and mediator systems communicate to a multitude of Web information providers to better satisfy user requests. They use wrappers to extract relevant information from HTML responses and to annotate it with user-defined labels. A number of approaches exploit the methods of machine learning to induce instances of certain wrapper classes, by assuming the tabular structure of HTML responses and by observing the regularity of extracted fragments in the HTML structure. In this work, we propose a general approach and consider the information extraction conducted by wrappers as a special form of transduction. We make no assumption about the HTML response structure and profit from the advanced methods of transducer induction, in order to develop two powerful wrapper classes, for samples with and without ambiguous translations. We test the proposed induction methods on a set of general-purpose and bibliographic data providers and report the results of experiments.
European Conference on Machine Learning, Freiburg, Germany, September 3-7, 2001
2001
2001/012

Attachments

transducerSubmission.pdf (400.99 kB)