Searching for relevant information on the Web is an important and time consuming activity. The obvious way to serach for information is by using some of the popular general-purpose search engines. These search engines continuously crawl and index billions of Web pages. However, there exists a part of the Web that is unavailable for central indexing. This part, which is often referred to as the Hidden Web or the Invisble Web, includes the content of databases and document collections accessible through (and hidden by) search interfaces offered by various Web sites. We refer to such search interfaces, which allows users to find and access the internal information of a site, as "gateways to the Hidden Web". The Hidden Web spans company sites, libraries, patent databases, university sites, media sites, etc. The goal of the WID project is to make the Hidden Web more visible, and to allow users to find and explore information on the Hidden Web through the same or similar search interfaces they use for the visible Web.
The size of the Hidden Web is estimated to be about 500 times bigger than that of Visible Web. It is further believed that the quality of the information on the Hidden Web is higher, because this information is usually organised in structured databases and with professional usage in mind. Thus, collecting, accessing, and organizing Hidden Web resources has emerged as an interesting challenge for both research and industry.
The main objective of this project is to automate the Hidden Web discovery process. We divide the project into three subprojects: the discovery, the analysis, and the classification of resources. With this separation we try to imitate the human approach to information gathering.
For further contact about the project, please contact Boris Chidlovskii