Welcome to the MapReduce course

Monday, November 2nd, 2009, posted by Djoerd Hiemstra

Welcome to the course Distributed Data Processing using MapReduce! Please, find a schedule of the lectures and assignments on Blackboard under “Course Information” (scroll down).

This will be a course that is on top of some very exciting developments in cluster computing and data centers, initiated by Google, and followed by many others such as Yahoo, Amazon, AOL, Baidu, Joost, Mylife, Facebook, etc., etc. The course is not only about processing terabytes of data on large clusters. In fact, not many courses in the master’s Computer Science will be so “core computer science”: We will discuss new file systems (GFS and Hadoop FS), new programming paradigms (MapReduce), new programming languages and query languages (Sawzall, Pig), and of course many web search and data mining applications that made Google one of today’s leading IT companies.

I hope to see you at our lectures on Friday’s 3/4 hour.

Indexing half a billion web pages

Tuesday, October 27th, 2009, posted by Djoerd Hiemstra

by Claudia Hauff and Djoerd Hiemstra

The University of Twente participated in three tasks of TREC 2009: the adhoc task, the diversity task and the relevance feedback task. All experiments are performed on the English part of ClueWeb09. In this draft paper, we describe our approach to tuning our retrieval system in absence of training data in Section 3. We describe the use of categories and a query log for diversifying search results in Section 4. Section 5 describes preliminary results for the relevance feedback task.

[download pdf]

Dutch-Belgian Database Day in Delft

Tuesday, October 20th, 2009, posted by Djoerd Hiemstra

The Dutch Belgian Database Day (DBDBD) is a yearly one-day workshop organized in a Belgian or Dutch university, whose general topic is database research. DBDBD invites submissions (1 page abstract) on a broad range of database and database-related topics, including but not limited to data storage and management, theoretical database issues, database performance, data mining, information retrieval, data semantics, querying, ontologies etc. Based on the submissions, the workshop will be organized in different sessions each covering a particular topic.

At the DBDBD, junior researchers from the Netherlands and Belgium can present their recent results. It is an excellent opportunity to meet up with your Belgian/Dutch colleagues, and to get informed about the (recent) database-related research performed in Belgian/Dutch universities. The workshop is also open to non-Belgian/Dutch participants (presentations are in English).

The DBDBD 2009 is organized under auspices of SIKS, the Dutch research school for information and knowledge systems. This year, DBDBD will be held in the Aula Congrescentre of the TUDelft, located on the university campus, on Monday November 30th 2009. Participation is free for all SIKS-members (Phd-students, research fellows, senior research fellows and associated members).

Visit the DBDBD 2009 home page.

Guest lecture by Pavel Serdyukov

Friday, October 16th, 2009, posted by Djoerd Hiemstra

Pavel Serdyukov from TU Delft will give a guest lecture for the course Information Retrieval

When: Wednesday, October 21, 2009
Where: HO-B1212
Title: Faceted and Expert Search in the Enterprise


Enterprise Search problems recently received a considerable amount of attention from academia, mainly due to the increasing demand in industrial solutions supporting various search tasks in intranets. In this lecture I will give the research perspective on two core aspects of search in the Enterprise: Faceted and Expert search. I will demonstrate typical search scenarios, visualization approaches and ranking techniques. In the first part, I will overview the ways to support faceted search in typical cases, from easiest to hardest: with the availability of structured or unstructured document metadata and with no document metadata available. In the second part, I will talk about the latest developments in expert finding, namely, language model and graph-based based methods. I will also show the ways to to acquire expertise evidence outside of the Enterprise.

New DB group member: Sergio Duarte Torres

Monday, October 12th, 2009, posted by Djoerd Hiemstra

Today, Sergio Duarte Torres joined our group to work on PuppyIR, a European project that will develop an open source environment to construct information services for children. Welcome Sergio!

SIKS Research Methodology Course

Thursday, October 8th, 2009, posted by Djoerd Hiemstra

On 25, 26, and 27 November 2009, the School for Information and Knowledge Systems (SIKS) organizes the annual three-day course Research methods and methodology for IKS. The location will be Conference center Zonheuvel in Doorn. The course will be given in English and is part of the educational Program for SIKS-Ph.D. students.

Research methods and methodology for IKS is relevant for all SIKS-Ph.D.-students (whether working in computer science or in information science), The primary goal of this hands-on course is to enable these Ph.D. students to make a good research design for their own research project. To this end, it provides an interactive training in various elements of research design, such as the conceptual design and the research planning. But the course also contains a general introduction to the philosophy of science (and particularly to the philosophy of mathematics, computer science and AI). And, it addresses such divergent topics as “the case-study method”, “elementary research methodology for the empirical sciences” and “empirical methods for computer science”.

Research methods and methodology for IKS is an intense and interactive course. First, all students enrolling for this course are asked to read some pre-course reading material, comprising some papers that address key problems in IKS-methodology. These papers will be sent to the participants immediately after registration. Secondly, all participants are expected to give a brief characterization of their own research project/proposal, by answering a set questions, formulated by the course directors, and based on the aforementioned literature. We believe that this approach results in a more efficient and effective course; it will help you to prepare yourself for the course and this will increase the value that you will get from it. Course coordinators are Hans Weigand(UvT), Roel Wieringa(UT), John-Jules Meyer(UU), Hans Akkermans(VU) and Richard Starmans (UU)

09.30-10.00 Coffee / Tea
10.00-10.30 Opening (Richard Starmans, UU)
10.30-11.30 Introduction (Hans Weigand,UvT)
11.30-12.30 Conceptual design (Hans Weigand, UvT)
12.30-13.45 Lunch
13.45-15.30 Philosophy of formal sciences (John-Jules Meyer, UU )
15.30-16.00 Break
16.00-17.30 Research Methods in IR (Djoerd Hiemstra, UT)

09.00-12.30 Research Design I (Roel Wieringa, UT)
12.30-13.30 Lunch
13.30-15.00 Research Design II (Roel Wieringa, UT)
15.00-15.30 Break
15.30-17.30 Research methods (Hans Akkermans, VU)

09.00-09.45 Example research project (Inge v/d Weerd, UU)
09.45-10.00 Break
10.00-11.00 Research challenges in the Netherlands (Jaap van den Herik, UvT)
11.00-12.15 Research methods in MAS (Catholijn Jonker, TUD)
12.15-13.15 Lunch
13.15-14.45 Simulation as a research method (Jack Kleijnen, UvT)
14.45-15.00 Break
15.00-16.15 Research methods in Machine Learning (Antal van den Bosch,UvT)

Guest lecture by Thijs Westerveld

Wednesday, October 7th, 2009, posted by Djoerd Hiemstra

Thijs Westerveld from Teezir will give a guest lecture for the course Information Retrieval

When: Wednesday, October 14, 2009
Where: HO-B1212
Title: Automatically Analyzing Word of Mouth And Focused Crawling

Teezir is a young and innovative technology company that develops and deploys comprehensive search solutions. Teezir lets companies take advantage of large and diverse amounts of documents or texts, using break through search technology. Teezir’s search platform provides functionality for the entire process of disclosing data: from gathering content, analyzing documents and building indexes for efficient access to effective querying and ranking of information. Teezir’s framework is based on full-text retrieval techniques.

Handouts for practical work

Monday, October 5th, 2009, posted by Paul van der Vet

The handout for the practical part of the course Information Retrieval has been added under Course Materials on Blackboard. Additionally, you will find two useful handouts there that help you to write your report and to insert citations in it.

First PuppyIR search architecture

Tuesday, September 29th, 2009, posted by Djoerd Hiemstra

PuppyIR: Designing an Open Source Framework for Interactive Information Services for Children

by Leif Azzopardi, Richard Glassey, Mounia Lalmas, Tamara Polajnar, and Ian Ruthven

One of the main aims of the PuppyIR project is to provide an open source framework for the development of Interactive Information Retrieval Services. The main focus of the project is directed towards developing such services for children, which introduces a number of novel and challenging issues to address (such as language development, security, moderation, etc).

In this poster paper, we outline the preliminary high-level design of the open source framework. The framework uses a layered architecture to minimize dependencies between the user-side concerns of interaction and presentation, and the system-side concerns of aggregating content from multiple sources and processing information appropriately. Each layer will consist of a series of interchangeable components, which can be interconnected to form a complete service. To facilitate the construction of diverse information services, a dataflow language is proposed to enable the assembly of the components in an intuitive and visual manner. One of the the design goals of the architecture, and ultimate measures of success, is to provide a “lego” style building block environment in which researchers and developers of any age can build their own information service. The poster provides the starting point for the design of the framework and aims to seek comments, feedback and suggestions from the community in order to improve and refine the architecture.

[download paper]

Deadline to form groups: 30 September

Tuesday, September 29th, 2009, posted by Djoerd Hiemstra

Deadline to form pairs for the Information Retrieval Course Project is 30 September. Please send names and email addresses to the course staff. Groups will be numbered and listed (under Email) on Blackboard.