Learning query languages of web interfaces
Andre Bergholz, Boris Chidlovskii
This paper studies the problem of automatic acquisition of the query languages supported by a Web
information resource. We describe a system that automatically probes the search interface of a resource with
a set of test queries and analyses the returned pages to recognize supported query operators. The automatic
acquisition assumes the availability of the number of matches the resource returns for a submitted query. The
match numbers are used to train a learning system and to generate classification rules that recognize the
query operators supported by a provider and their syntactic encodings. These classification rules are employed
during the automatic probing of new providers to determine query operators they support. We report on results
of experiments with a set of real Web resources.
ACM 2004 Symposium on Applied Computing, Nicosia, Cyprus, Mach 14-17, 2004.