Greetings from CuriousU Search Engine Technology

Thursday, May 11th, 2017, posted by Djoerd Hiemstra

CuriousU Search Engine Technology will explore the world of search engines. You will learn how search engines work, what challenges they deal with, and how their performance can be measured. And even beter: you will be guided in building, evaluating, and improving your own search engine on a real-world dataset.

To be presented at CuriousU Summer School 2017 13 - 22 August, 2017 at the University of Twente.

Data Science guest lectures

Monday, September 26th, 2016, posted by Djoerd Hiemstra

On 12 October we organize another Data Science Day in the Design Lab with guest lectures by Thijs Westerveld (Chief Science Officer at WizeNoze, Amsterdam), and Iadh Ounis (Professor of Information Retrieval in the School of Computing Science at the University of Glasgow). For more information and registration, see:

Welcome to Information Retrieval

Tuesday, September 1st, 2015, posted by Djoerd Hiemstra

Welcome to the course Information Retrieval. We will introduce some exciting new things in the course: This year’s practical assignments are motivated by use cases of MyDataFactory, a company specialized in product data. The course uses the book “Introduction to Information Retrieval” by Christopher Manning, Prabhakar Raghavan and Hinrich Schütze. Have a look at the schedule on Blackboard under “Course Information” for an overview of the course first quarter of the course. In the second quarter, students will research a specific topic in depth. We hope to see you at the first lecture on Wednesday 2 September at 13.45h. in RA4334.

Theo Huibers, Dolf Trieschnigg and Djoerd Hiemstra.

How to build Google in an Afternoon

Friday, May 29th, 2015, posted by Djoerd Hiemstra

How many machines do we need to search and manage an index of billions of documents? In this lecture, I will discuss basic techniques for indexing very large document collections. I will discuss inverted files, index compression, and top-k query optimization techniques, showing that a single desktop PC suffices for searching billions of documents. An important part of the lecture will be spend on estimating index sizes and processing times. At the end of the afternoon, students will have a better understanding of the scale of the web and its consequences for building large-scale web search engines, and students will be able to implement a cheap but powerful new ‘Google’.

To be presented at the SIKS Course Advances in Information Retrieval on 18, 19 June in Vught, The Netherlands.

ImagePile: an Alternative for Vertical Results Lists

Tuesday, May 17th, 2011, posted by Djoerd Hiemstra

by Saskia Akkersdijk, Merel Brandon, Hanna Jochmann-Mannak, Djoerd Hiemstra, and Theo Huibers

ImagePileRecent work shows that children are very well capable of searching with Google, due to their familiarity with the interface. However, children do have difficulties with the vertical list representation of the results. In this paper, we present an alternative result representation for a touch interface, the ImagePile. The ImagePile displays the results as a pile of images where the user navigates through via horizontal swiping. This representation was tested on a search engine for the Emma child hospital’s library. Using a within subject experiment, both representations were tested with children to compare the usability of both systems. The vertical representation was perceived as easier to use, but the ImagePile system was considered more fun to use. Also, with the ImagePile system more relevant results were chosen by the children, and they were more aware of the number of results.

Guest lecture by Arjen de Vries

Monday, October 18th, 2010, posted by Djoerd Hiemstra

How search logs can help improve future searches

In the European project Vitalas, we had the opportunity to analyze the search log data from a commercial picture portal of a European news agency, which offers access to photographic images to professional users. I will discuss how these logs can be used in various ways to improve image search: to expand the image representation, to make suggestions of alternative queries, to adapt the search results to user context, and to build automatically concept detectors for content-based image retrieval. I also present recent work on using the semantic information that has become publicly available in the form of linked data to improve the search log analysis. The results show that bringing in linked data gives insights beyond the more common term-based analysis, since queries related in the most frequent ways do not usually share terms. I conclude with a discussion of the implications of our findings for improving log analysis, image collection management, and search engine design.

The guest lecture takes place on 20 October 2010 at 13.45 h. in ZI-2126.

Guest lecture by Thijs Westerveld

Tuesday, October 5th, 2010, posted by Djoerd Hiemstra

Automatically Analyzing Word of Mouth

Thijs Westerveld from Teezir B.V., Utrecht, will give a guest lecture on 6 October 2010 in ZI-2126. Teezir uses advanced search technology to aggregate views and opinions found on review sites, in discussion groups or blogs. This way, we create statistics and interpretations about what people are saying. Querying this data allows decision makers to slice and dice the content, and learn what people say, either at the very aggregated level: “what is the share of positive versus negative views about our new product?”, or at the very detailed level: “which sources reflect this negative sentiment, and what exactly are people saying?”

Who Rules ruler In this talk I will demonstrate Teezir’s Opinion Analysis dashboards and discuss the underlying technology. For collecting content from web sites we developed advanced crawling technology that automatically identifies relevant news, blog and forum pages and extracts the relevant content and metadata. The collected content is then further analyzed to identify the main sentiments before everything is indexed to be disclosed in the online dashboards. Various sentiment analysis variants that have proven successful in an academic setting have been evaluated on our live collections. I will demonstrate that success on academic test collections does not necessarily imply the practical use of a sentiment analysis algorithm.

See also: Who rules?

New room for lectures IR

Wednesday, September 15th, 2010, posted by Djoerd Hiemstra

All following lectures Information Retrieval wil be held in room ZI-2126. The lecture of 22 September is canceled to give you the opportunity to visit the Interactief Symposium Predict 2010. See you 29 September, or at Predict 2010!

Tangible Information Retrieval for Children

Sunday, May 16th, 2010, posted by Djoerd Hiemstra

by Michel Jansen, Wim Bos, Paul van der Vet, Theo Huibers and Djoerd Hiemstra

Despite several efforts to make search engines more child-friendly, children still have trouble using systems that require keyboard input. We present TeddIR: a system using a tangible interface that allows children to search for books by placing tangible figurines and books they like/dislike in a green/red box, causing relevant results to be shown on a display. This way, issues with spelling and query formulation are avoided. A fully functional prototype was built and evaluated with children aged 6-8 at a primary school. The children understood TeddIR to a large extent and enjoyed the playful interaction.

TeddIR in the set-up used during evaluation.

TeddIR will be presented at 9th International Conference on Interaction Design and Children, Barcelona June 9-11, 2010.

Guest lecture by Pavel Serdyukov

Friday, October 16th, 2009, posted by Djoerd Hiemstra

Pavel Serdyukov from TU Delft will give a guest lecture for the course Information Retrieval

When: Wednesday, October 21, 2009
Where: HO-B1212
Title: Faceted and Expert Search in the Enterprise


Enterprise Search problems recently received a considerable amount of attention from academia, mainly due to the increasing demand in industrial solutions supporting various search tasks in intranets. In this lecture I will give the research perspective on two core aspects of search in the Enterprise: Faceted and Expert search. I will demonstrate typical search scenarios, visualization approaches and ranking techniques. In the first part, I will overview the ways to support faceted search in typical cases, from easiest to hardest: with the availability of structured or unstructured document metadata and with no document metadata available. In the second part, I will talk about the latest developments in expert finding, namely, language model and graph-based based methods. I will also show the ways to to acquire expertise evidence outside of the Enterprise.