Publication Search Form




We found publication with these paramters.

Wrapper Generation by k-Reversible Grammar Induction

Boris Chidlovskii
Modern agent and mediator systems communicate to a multitude of Web information providers to better satisfy the user requests. They use wrappers to extract relevant information from HTML pages and annotate it with user-defined labels. A number of approaches exploit the regularity in page structures to induce instances of wrapper classes. The power of a class is crucial; a more powerful class permits to successfully wrap more sites. In this work, we use the grammatical inference theory to develop a powerful wrapper class based on the k-reversible grammars. We also address the sample labeling problem and show how the label conflicts can make the wrapper inference impossible. We propose the label normalization method in order to discard the label conflicts and induce partial wrappers.
ECAI'00 Machine Learning for Information Extraction Workshop, Berlin, August 2000

Attachments (443.73 kB)