Thursday, June 02nd, 2016

Today, a PhD student of mine, Mohammad S. Khelghati, defended his thesis.
Deep Web Content Monitoring [Download]
In this thesis, we investigate the path towards a focused web harvesting approach which can automatically and efficiently query websites, navigate through results, download data, store it and track data changes over time. Such an approach can also facilitate users to access a complete collection of relevant data to their topics of interest and monitor it over time. To realize such a harvester, we focus on the following obstacles: finding methods that can achieve the best coverage in harvesting data for a topic; reducing the cost of harvesting a website regarding the number of submitted requests by estimating its actual size; monitoring data changes over time in web data repositories; and we combine our experiences in harvesting with the studies in the literature to suggest a general designing and developing framework for a web harvester. It is important to know how to configure harvesters so that they can be applied to different websites, domains and settings. These steps bring further improvements to data coverage and monitoring functionalities of web harvesters and can help users such as journalists, business analysts, organizations and governments to reach the data they need without requiring extreme software and hardware facilities. With this thesis, we hope to have contributed to the goal of focused web harvesting and monitoring topics over time.

