2016 – Page 3 – Djoerd Hiemstra

13th SSR on Deep Web Entity Monitoring

On 2nd of June 2016, we organize the 13th Seminar on Search and Ranking on Deep Web Entity Monitoring with 3 invited spears: Gianluca Demartini (University of Sheffield, UK), Andrea Calì (Birkbeck, University of London, UK), and Pierre Senellart (Télécom ParisTech, France).

More information at: http://www.cs.utwente.nl/~hiemstra/ssr14/.

A new search engine for the university

As of this today, the university is using our Distributed Search approach as their main search engine on: http://utwente.nl/search (and also stand-alone on https://search.utwente.nl). The UT search engine offers its user not only the results from a large web crawl, but also live results from many sources that were previously invisible, such as courses, timetables, staff contact information, publications, the local photo database “Beeldbank”, vacancies, etc. The search engine combines about 30 of such sources, and learns over time which sources should be included for a query, even if it has never seen that query, nor the results for the query.

Read more in the official announcement (in Dutch).

Efficient Web Harvesting Strategies for Monitoring Deep Web Content

by Mohammadreza Khelghati, Djoerd Hiemstra, and Maurice van Keulen

Focused Web Harvesting aims at achieving a complete harvest of a set of related web data for a given topic. Whether you are a fan following your favourite artist, athlete or politician, or a journalist investigating a topic, you need to access all the information relevant to your topics of interest and keep it up-to-date over time. General search engines like Google apply different techniques to enhance the freshness of their crawled data. However, in Focused Web Harvesting, we lack an efficient approach that detects changes of the content for a given topic over time. In this paper, we focus on techniques that allow us to keep the content relevant to a given entity up-to-date. To do so, we introduce approaches to efficiently harvest all the new and changed documents matching a given entity by querying a web search engine. One of our proposed approaches outperform the baseline and other approaches in finding the changed content on the web for a given entity with at least an average of 20 percent better performance.

[download pdf]

The software for this work is available as: HaverstED.

3TU NIRICT theme Data Science

The main objective of the NIRICT research in Data Science is to study the science and technology to unlock the intelligence that is hidden inside Big Data.
The amounts of data that information systems are working with are rapidly increasing. The explosion of data happens in a pace that is unprecedented and in our networked world of today the trend is even accelerating. Companies have transactional data with trillions of bytes of information about their customers, suppliers and operations. Sensors in smart devices generate unparalleled amounts of sensor data. Social media sites and mobile phones have allowed billions of individuals globally to create their own enormous trails of data.
The driving force behind this data explosion is the networked world we live in, where information systems, organizations that employ them, people that use them, and processes that they support are connected and integrated, together with the data contained in those systems.

What happens in an internet minute in 2016?

Unlocking the Hidden Intelligence

Data alone is just a commodity, it is Data Science that converts big data into knowledge and insights. Intelligence is hidden in all sorts of data and data systems.
Data in information systems is usually created and generated for specific purposes: it is mostly designed to support operational processes within organizations. However, as a by-product, such event data provide an enormous source of hidden intelligence about what is happening, but organizations can only capitalize on that intelligence if they are able to extract it and transform the intelligence into novel services.
Analyzing the data provides opportunities for organizations to gather intelligence to capitalize historic and current performance of their processes and exploit future chances for performance improvement.
Another rich source of information and insights is data from the Social Web. Analyzing Social Web Data provides governments, society and companies with better understanding of their community and knowledge about human behavior and preferences.
Each 3TU institute has its own Data Science program, where local data science expertise is bundled and connected to real-world challenges.

Delft Data Science (DDS) – TU Delft
Scientific director: Prof. Geert-Jan Houben

Data Science Center Eindhoven (DSC/e) – TU/e
Scientific director: Prof. Wil van der Aalst

Data Science Center UTwente (DSC UT) – UT
Scientific director: Dr. Djoerd Hiemstra

More information at: https://www.3tu.nl/nirict/en/Research/data-science/.