Publications
Authors:
  • Boris Chidlovskii
Citation:
Proc. ACM Web Information and Data Management'01, Atlanta, US, November 2001
Abstract:
We study the problem of automatic repairing of wrappers for Web information providers. Majority of Web
wrappers use "hooks" or "landmarks" to find and extract relevant information from Web pages and such
wrappers often become inoperable when the page structure is changed. The solution we propose in this paper
extends conventional forward wrappers with alternative classifiers built using content features of extracted
information and wrappers processing pages backward. We report some preliminary results of the information
extraction recovery and wrapper repairing for a set of real Web provider changes.
Year:
2001
Report number:
2001/025
Attachments: