Steven Verkuil graduates on Reference Extraction Techniques

Wednesday, July 6th, 2016, posted by Djoerd Hiemstra

Journal Citation Statistics for Library Collections using Document Reference Extraction Techniques

by Steven Verkuil

Providing access to journals often comes with a considerable subscription fee for universities. It is not always clear how these journal subscriptions actually contribute to ongoing research. This thesis provides a multistage process for evaluating which journals are actively referenced in publications. Our software tool for journal citation reports, CiteRep, is designed to aid decision making processes by providing statistics about the number of times a journal is referenced in a document set. Citation reports are automatically generated from online repositories containing PDF documents. The process of extracting citations and identifying journals is user and maintenance friendly. CiteRep allows to filter generated reports by year, faculty and study providing detailed insight in journal usage for specific user groups. Our software tool achieves an overall weighted precision and recall of 66,2% when identifying journals in a fresh set of PDF documents. While leaving open some areas of improvement, CiteRep outperforms the two most popular citation parsing libraries, ParsCit and FreeCite with respect to journal identification accuracy. CiteRep should be considered for creation of journal citation reports from document repositories.

[download pdf]

Clone CiteRep on Github.

A new search engine for the university

Thursday, March 24th, 2016, posted by Djoerd Hiemstra

As of this today, the university is using our Distributed Search approach as their main search engine on: (and also stand-alone on The UT search engine offers its user not only the results from a large web crawl, but also live results from many sources that were previously invisible, such as courses, timetables, staff contact information, publications, the local photo database “Beeldbank”, vacancies, etc. The search engine combines about 30 of such sources, and learns over time which sources should be included for a query, even if it has never seen that query, nor the results for the query.

University of Twente

Five postdoc positions for the Living Smart Campus

Monday, July 27th, 2015, posted by Djoerd Hiemstra

