Keywords

Authors

Year

Wrapper Generation by k-Reversible Grammar Induction

Authors: Boris Chidlovskii
ECAI'00 Machine Learning for Information Extraction Workshop, Berlin, August 2000
Modern agent and mediator systems communicate to a multitude of Web information providers to better satisfy the user requests. They use wrappers to extract relevant information from HTML pages and annotate it with user-defined labels. A number of approaches exploit the regularity in page structures to induce instances of wrapper classes. The power of a class is crucial; a more powerful class permits to successfully wrap more sites. In this work, we use the grammatical inference theory to develop a powerful wrapper class based on the k-reversible grammars. We also address the sample labeling problem and show how the label conflicts can make the wrapper inference impossible. We propose the label normalization method in order to discard the label conflicts and induce partial wrappers.
Year: 2000
Report number: 2000/205

Attachments

ecai00IE.ps (443.73 kB)