Archive for the 'Cultural heritage' Category

Supporting the Exploration of Online Cultural Heritage Collections

Wednesday, January 10th, 2018, posted by Djoerd Hiemstra

The Case of the Dutch Folktale Database

by Iwe Muiser, Mariët Theune, Ruud de Jong, Nigel Smink, Dolf Trieschnigg, Djoerd Hiemstra, and Theo Meder

This paper demonstrates the use of a user-centred design approach for the development of generous interfaces/rich prospect browsers for an online cultural heritage collection, determining its primary user groups and designing different browsing tools to cater to their specific needs. We set out to solve a set of problems faced by many online cultural heritage collections. These problems are lack of accessibility, limited functionalities to explore the collection through browsing, and risk of less known content being overlooked. The object of our study is the Dutch Folktale Database, an online collection of tens of thousands of folktales from the Netherlands. Although this collection was designed as a research commodity for folktale experts, its primary user group consists of casual users from the general public. We present the new interfaces we developed to facilitate browsing and exploration of the collection by both folktale experts and casual users. We focus on the user-centred design approach we adopted to develop interfaces that would fit the users’ needs and preferences.

Screen Shot of the Dutch Folktale Database

Published in Digital Humanities Quarterly 11(4), 2017

[Read more]

Access the Folktale Database at:

Exploring Language Identification Techniques for Dutch Folktales

Friday, April 27th, 2012, posted by Djoerd Hiemstra

by Dolf Trieschnigg , Djoerd Hiemstra , Mariët Theune, Franciska de Jong, and Theo Meder

The Dutch Folktale Database contains fairy tales, traditional legends, urban legends, and jokes written in a large variety and combination of languages including (Middle and 17th century) Dutch, Frisian and a number of Dutch dialects. In this work we compare a number of approaches to automatic language identification for this collection. We show that in comparison to typical language identification tasks, classification performance for highly similar languages with little training data is low. The studied dataset consisting of over 39,000 documents in 16 languages and dialects is available on request for followup research.

The paper will be presented at the LREC Workshop Adaptation of Language Resources and Tools for Processing Cultural Heritage Objects on 26 May 2012 in Istanbul, Turkey

[download preprint]

Iwe Muiser joins Database Group

Tuesday, January 17th, 2012, posted by Djoerd Hiemstra

Iwe Muiser joined the Database Group to work on the project FACT: “Folktales as Classifiable Texts”. Welcome Iwe!

Job Vacancy: Scientific Programmer

Thursday, September 22nd, 2011, posted by Djoerd Hiemstra

Scientific programmer: folktale search and visualisation

The FACT project will investigate new possibilities for humanities researchers (folktale researchers, narratologists, documentalists, etc.) to study folktales based on annotations and relations that have been automatically assigned using data-driven methods. The Dutch Folktale Database (Nederlandse Volksverhalenbank) of the Meertens Institute is a very large and varied collection of Dutch Folktales. Within FACT, software will be developed to automatically enrich the folktales in this collection with metadata such as names, keywords, genre, a summary and type. An additional research goal is to investigate if automatic analysis of the folktale collection can reveal relations between folktales that are difficult to discover through human inspection. The annotation and clustering methods to be developed will be integrated in a user-friendly XML-based platform for the annotation and exploration of folktales, to support research on the variability of human oral and written transmission.

The University of Twente has vacancies for a PhD-student, a postdoc and a scientific programmer, who will be working together as a team to achieve the project goals. In addition there will be close cooperation with the Tunes & Tales project (funded under the Computational Humanities programme of KNAW) that is aimed at investigating sequences of motifs in, and variability of, melodies and folktales in oral transmission.

The scientific programmer will work on the development of user-friendly tools for folktale researchers that incorporate the annotation and clustering techniques developed by the postdoc and the PhD student. The annotation tool should allow for (semi) automatic annotation of folktales with language, genre, keywords, names, summary and type. The visualization tool should enable easy inspection of document clusters. In addition, the programmer will develop an XML-based search system that allows the general public to search for folktales in the Folktale Database based on their annotations.

Apply on-line (Deadline: 1 November 2011)

Folktales As Classifiable Texts

Friday, July 1st, 2011, posted by Djoerd Hiemstra

De Nederlandse Volksverhalenbank FACT, Folktales As Classifiable Texts, is a project funded by the NWO Catch program. In the FACT project, the HMI group and DB group of the University of Twente will cooperate with the Meertens Institute to study new possibilities for researchers from humanities disciplines (folktale and narratology researchers, documentalists, etc.) to explore folktales based on annotations and links generated by data-driven methods. To this end, FACT will develop software enabling the computer to automatically enrich a corpus of Dutch folktales with metadata such as names, genre, type, and a summary. In addition, FACT represents the first effort to systematically apply and evaluate various clustering techniques on a very large (40.000+) and diverse collection of folktales. The algorithms developed in the project will be integrated in a user-friendly platform that supports annotation as well as exploratory research into variability in oral and written transmission, using XML database technology to model all folktale data (both annotations and the text of the tale itself) in one unifying framework. A large part of the scientific research in FACT will deal with the pros and cons of human classification and computerized clustering to investigate variation in (oral) transmission. By using document clustering, we hope to discover relationships between documents that cannot be readily identified by human annotators. The main challenge will be to make the computer decide which texts are related and which are not. This is not a black-or-white issue: folktales may be related to each other on different dimensions and to varying degrees. Will the computer be able to recognize the cultural DNA of tales, and make a distinction between different types (no kinship) and versions of the same type (kinship)?

More information at: Nederlandse Volksverhalenbank, and eLab Oral Culture.

Towards Affordable Disclosure of Spoken Heritage Archives

Friday, December 11th, 2009, posted by Djoerd Hiemstra

by Roeland Ordelman, Willemijn Heeren, Franciska de Jong, Marijn Huijbregts, and Djoerd Hiemstra

This paper presents and discusses ongoing work aiming at affordable disclosure of real-world spoken heritage archives in general, and in particular of a collection of recorded interviews with Dutch survivors of World War II concentration camp Buchenwald. Given such collections, we at least want to provide search at different levels and a flexible way of presenting results. Strategies for automatic annotation based on speech recognition – supporting e.g., within-document search– are outlined and discussed with respect to the Buchenwald interview collection. In addition, usability aspects of the spoken word search are discussed on the basis of our experiences with the online Buchenwald web portal. It is concluded that, although user feedback is generally fairly positive, automatic annotation performance is not yet satisfactory, and requires additional research.

To be published in the Journal of Digital Information 10(6).

[download pdf]