<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="wordpress/2.0.4" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>Djoerd Hiemstra's home page</title>
	<link>http://wwwhome.cs.utwente.nl/~hiemstra</link>
	<description>A bit of teaching, some research, shake well...</description>
	<pubDate>Sun, 12 Feb 2012 23:01:17 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.0.4</generator>
	<language>en</language>
			<item>
		<title>Peer-to-Peer Information Retrieval: An Overview</title>
		<link>http://wwwhome.cs.utwente.nl/~hiemstra/2012/peer-to-peer-information-retrieval-an-overview.html</link>
		<comments>http://wwwhome.cs.utwente.nl/~hiemstra/2012/peer-to-peer-information-retrieval-an-overview.html#comments</comments>
		<pubDate>Fri, 10 Feb 2012 15:05:18 +0000</pubDate>
		<dc:creator>Djoerd Hiemstra</dc:creator>
		
	<category>Distributed Search</category>
		<guid isPermaLink="false">http://wwwhome.cs.utwente.nl/~hiemstra/2012/peer-to-peer-information-retrieval-an-overview.html</guid>
		<description><![CDATA[
by Almer Tigelaar, Djoerd Hiemstra, Dolf Trieschnigg


Peer-to-peer technology is widely used for file sharing. In the past decade a number of prototype
peer-to-peer information retrieval systems have been developed. Unfortunately, none of these
have seen widespread real-world adoption and thus, in contrast with file sharing, information
retrieval is still dominated by centralised solutions. In this paper we provide [...]]]></description>
			<content:encoded><![CDATA[
<p>by Almer Tigelaar, Djoerd Hiemstra, Dolf Trieschnigg</p>

<p>
Peer-to-peer technology is widely used for file sharing. In the past decade a number of prototype
peer-to-peer information retrieval systems have been developed. Unfortunately, none of these
have seen widespread real-world adoption and thus, in contrast with file sharing, information
retrieval is still dominated by centralised solutions. In this paper we provide an overview of
the key challenges for peer-to-peer information retrieval and the work done so far. We want
to stimulate and inspire further research to overcome these challenges. This will open the door
to the development and large-scale deployment of real-world peer-to-peer information retrieval
systems that rival existing centralised client-server solutions in terms of scalability, performance,
user satisfaction and freedom.
</p>

<p>
<em>The paper will appear in <a href="http://tois.acm.org/">ACM Transactions on Information Systems</a>.</em>
</p>

]]></content:encoded>
			<wfw:commentRSS>http://wwwhome.cs.utwente.nl/~hiemstra/2012/peer-to-peer-information-retrieval-an-overview.html/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>Treinplanner on Dutch television</title>
		<link>http://wwwhome.cs.utwente.nl/~hiemstra/2012/treinplanner-on-dutch-television.html</link>
		<comments>http://wwwhome.cs.utwente.nl/~hiemstra/2012/treinplanner-on-dutch-television.html#comments</comments>
		<pubDate>Sun, 29 Jan 2012 21:25:44 +0000</pubDate>
		<dc:creator>Djoerd Hiemstra</dc:creator>
		
	<category>Distributed Search</category>
	<category>Photos</category>
		<guid isPermaLink="false">http://wwwhome.cs.utwente.nl/~hiemstra/2012/treinplanner-on-dutch-television.html</guid>
		<description><![CDATA[





Dutch broadcaster BNN tests the intuitive train planner developed at the Database Group. Their verdict: &#8220;ingenious&#8221;, and &#8220;approved for elderly&#8221;. Picture of Kien Tjin-Kam-Jet proudly in the back  (in Dutch).
See the treinplanner in action at: http://treinplanner.info


]]></description>
			<content:encoded><![CDATA[
<p>
<iframe width="373" height="240" src="http://www.youtube.com/embed/ebm2WMR1Yqk" frameborder="0" allowfullscreen></iframe>
</p>

<p>
Dutch broadcaster BNN tests the intuitive train planner developed at the Database Group. Their verdict: &#8220;ingenious&#8221;, and &#8220;approved for elderly&#8221;. Picture of Kien Tjin-Kam-Jet proudly in the back  (in Dutch).
See the treinplanner in action at: <a href="http://treinplanner.info">http://treinplanner.info</a>
</p>

]]></content:encoded>
			<wfw:commentRSS>http://wwwhome.cs.utwente.nl/~hiemstra/2012/treinplanner-on-dutch-television.html/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>CLEF 2012 in Rome</title>
		<link>http://wwwhome.cs.utwente.nl/~hiemstra/2012/clef-2012-in-rome.html</link>
		<comments>http://wwwhome.cs.utwente.nl/~hiemstra/2012/clef-2012-in-rome.html#comments</comments>
		<pubDate>Mon, 23 Jan 2012 18:01:35 +0000</pubDate>
		<dc:creator>Djoerd Hiemstra</dc:creator>
		
	<category>Conference &amp; Workshop</category>
		<guid isPermaLink="false">http://wwwhome.cs.utwente.nl/~hiemstra/2012/clef-2012-in-rome.html</guid>
		<description><![CDATA[
CLEF 2012: Conference and Labs of the Evaluation Forum: First Call for Participation


The CLEF 2012 is next year&#8217;s edition of the popular CLEF campaign and workshop series which has run since 2000 contributing to the systematic evaluation of information access systems, primarily through experimentation on shared tasks. 
In 2010 CLEF was launched in a new [...]]]></description>
			<content:encoded><![CDATA[
<p><strong>CLEF 2012: Conference and Labs of the Evaluation Forum:</strong> First Call for Participation</p>

<p>
The CLEF 2012 is next year&#8217;s edition of the popular <a href="http://www.clef-initiative.eu/">CLEF campaign</a> and workshop series which has run since 2000 contributing to the systematic evaluation of information access systems, primarily through experimentation on shared tasks. 
In 2010 CLEF was launched in a new format, as a conference with research presentations, panels, poster and demo sessions and laboratory evaluation workshops. Labs follow under two types: laboratories to conduct evaluation of information access systems, and workshops to discuss and pilot innovative evaluation activities.
In 2012, CLEF will take place in September 17-20 in Rome, and researchers and practitioners from all segments of the information access and related communities are invited to participate to the following Evaluation Labs:
</p>

<ul>

<li><a href="http://www.promise-noe.eu/chic-2012/home">CHiC</a> - Cultural Heritage in CLEF</li>

<li><a href="http://ifs.tuwien.ac.at/~clef-ip/">CLEF-IP</a> - Informaton Retrieval in the Intellectual Property domain</li>

<li><a href="http://www.imageclef.org/">ImageCLEF</a> - Cross Language Image Retrieval</li>

<li><a href="http://inex.mmci.uni-saarland.de/">INEX</a> - INitiative for the Evaluation of XML Retrieval</li>

<li><a href="http://pan.webis.de">PAN</a> - Uncovering Plagiarism, Authorship, and Social Software Misuse</li>

<li><a href="http://celct.fbk.eu/QA4MRE/">QA4MRE</a> - Question Answering for Machine Reading Evaluation</li>

<li><a href="">RepLab 2012</a> - Online Reputation Management 

<li><a href="http://www.nicta.com.au/clefehealth2012">CLEFeHealth</a> - Electronic Health</li>

</ul>

<p>More information at: <a href="http://clef2012.org/">http://clef2012.org/</a></p>

]]></content:encoded>
			<wfw:commentRSS>http://wwwhome.cs.utwente.nl/~hiemstra/2012/clef-2012-in-rome.html/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>Treinplanner</title>
		<link>http://wwwhome.cs.utwente.nl/~hiemstra/2012/treinplanner.html</link>
		<comments>http://wwwhome.cs.utwente.nl/~hiemstra/2012/treinplanner.html#comments</comments>
		<pubDate>Mon, 23 Jan 2012 17:24:59 +0000</pubDate>
		<dc:creator>Djoerd Hiemstra</dc:creator>
		
	<category>Distributed Search</category>
		<guid isPermaLink="false">http://wwwhome.cs.utwente.nl/~hiemstra/2012/treinplanner.html</guid>
		<description><![CDATA[

We released a demo today: The Treinplanner built by Kien that allows you to search the search the Dutch Railways Journey planner with a single search box. (in Dutch)


]]></description>
			<content:encoded><![CDATA[
<p>
We released a demo today: The <a href="http://snipdex.org/ns/">Treinplanner</a> built by Kien that allows you to search the search the Dutch Railways Journey planner with a single search box. (in Dutch)
</p>

]]></content:encoded>
			<wfw:commentRSS>http://wwwhome.cs.utwente.nl/~hiemstra/2012/treinplanner.html/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>English Wikipedia off-line as protest</title>
		<link>http://wwwhome.cs.utwente.nl/~hiemstra/2012/english-wikipedia-off-line-as-protest.html</link>
		<comments>http://wwwhome.cs.utwente.nl/~hiemstra/2012/english-wikipedia-off-line-as-protest.html#comments</comments>
		<pubDate>Tue, 17 Jan 2012 13:34:14 +0000</pubDate>
		<dc:creator>Djoerd Hiemstra</dc:creator>
		
	<category>UT Internet lesson</category>
		<guid isPermaLink="false">http://wwwhome.cs.utwente.nl/~hiemstra/2012/english-wikipedia-off-line-as-protest.html</guid>
		<description><![CDATA[
Yesterday, the Wikipedia community announced its decision to black out the English-language Wikipedia for 24 hours, worldwide, on Wednesday, January 18. The blackout is a protest against proposed legislation in the United States - the Stop Online Piracy Act (SOPA) and the PROTECT IP Act (PIPA). See:
http://wikimediafoundation.org/wiki/English_Wikipedia_anti-SOPA_blackout


If I understand things right, the Stop Online Piracy [...]]]></description>
			<content:encoded><![CDATA[
<p>Yesterday, the Wikipedia community announced its decision to black out the English-language Wikipedia for 24 hours, worldwide, on Wednesday, January 18. The blackout is a protest against proposed legislation in the United States - the Stop Online Piracy Act (SOPA) and the PROTECT IP Act (PIPA). See:<br />
<a href="http://wikimediafoundation.org/wiki/English_Wikipedia_anti-SOPA_blackout">http://wikimediafoundation.org/wiki/English_Wikipedia_anti-SOPA_blackout</a>
</p>

<p>If I understand things right, the Stop Online Piracy Act will allow a U.S. court to legally demand to take utwente.nl off-line, just because a student or professor published a link to circumvent internet censorship on his/her University of Twente web page, for instance a link like:<br />
<a href="https://addons.mozilla.org/en-US/firefox/addon/desopa/">https://addons.mozilla.org/en-US/firefox/addon/desopa/</a>&#8230;
</p>

]]></content:encoded>
			<wfw:commentRSS>http://wwwhome.cs.utwente.nl/~hiemstra/2012/english-wikipedia-off-line-as-protest.html/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>New team member: Mohammad Khelghati</title>
		<link>http://wwwhome.cs.utwente.nl/~hiemstra/2011/new-team-member-mohammad-khelghati.html</link>
		<comments>http://wwwhome.cs.utwente.nl/~hiemstra/2011/new-team-member-mohammad-khelghati.html#comments</comments>
		<pubDate>Thu, 15 Dec 2011 11:44:32 +0000</pubDate>
		<dc:creator>Djoerd Hiemstra</dc:creator>
		
	<category>Uncategorized</category>
		<guid isPermaLink="false">http://wwwhome.cs.utwente.nl/~hiemstra/2011/new-team-member-mohammad-khelghati.html</guid>
		<description><![CDATA[Mohammad Khelghati joined the database group to work on deep web entity monitoring. Welcome Mohammad!
]]></description>
			<content:encoded><![CDATA[Mohammad Khelghati joined the database group to work on deep web entity monitoring. Welcome Mohammad!
]]></content:encoded>
			<wfw:commentRSS>http://wwwhome.cs.utwente.nl/~hiemstra/2011/new-team-member-mohammad-khelghati.html/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>DIR 2012 in beautiful Ghent</title>
		<link>http://wwwhome.cs.utwente.nl/~hiemstra/2011/dir-2012-in-beautiful-ghent.html</link>
		<comments>http://wwwhome.cs.utwente.nl/~hiemstra/2011/dir-2012-in-beautiful-ghent.html#comments</comments>
		<pubDate>Wed, 14 Dec 2011 16:26:17 +0000</pubDate>
		<dc:creator>Djoerd Hiemstra</dc:creator>
		
	<category>Conference &amp; Workshop</category>
		<guid isPermaLink="false">http://wwwhome.cs.utwente.nl/~hiemstra/2011/dir-2012-in-beautiful-ghent.html</guid>
		<description><![CDATA[


On February 23rd and 24th 2012, the &#8220;12th Dutch-Belgian Information Retrieval Workshop&#8221; (DIR 2012) will be organised at the Ghent University conference center Het Pand.

One of the primary goals of this year&#8217;s edition of DIR is to create an informal meeting place between the major IR research groups and related companies in Belgium and the [...]]]></description>
			<content:encoded><![CDATA[
<p><img src="/~hiemstra/wp-content/ghent.jpg" width="312" height="143" alt="Ghent"/></p>

<p>On <strong>February 23rd and 24th 2012</strong>, the &#8220;12th Dutch-Belgian Information Retrieval Workshop&#8221; (DIR 2012) will be organised at the Ghent University conference center Het Pand.

One of the primary goals of this year&#8217;s edition of DIR is to create an informal meeting place between the major IR research groups and related companies in Belgium and the Netherlands, to exchange information and to present innovative research developments.

If your field of expertise is situated in the broad area of Information Retrieval, you are warmly invited to submit a &#8220;short paper&#8221; (4 pages) containing new research results, or a &#8220;compressed contribution&#8221; (2 pages) from a recent high-standard conference or journal paper.
For companies active in this area, we offer the opportunity to submit demo papers, focusing on novel IR technology and applications.

All further information regarding DIR 2012 can be found on <a href="http://dir2012.intec.ugent.be/">http://dir2012.intec.ugent.be/</a>.
</p>

]]></content:encoded>
			<wfw:commentRSS>http://wwwhome.cs.utwente.nl/~hiemstra/2011/dir-2012-in-beautiful-ghent.html/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>2011’s DB colloquium</title>
		<link>http://wwwhome.cs.utwente.nl/~hiemstra/2011/2011%e2%80%99s-db-colloquium.html</link>
		<comments>http://wwwhome.cs.utwente.nl/~hiemstra/2011/2011%e2%80%99s-db-colloquium.html#comments</comments>
		<pubDate>Mon, 12 Dec 2011 13:26:54 +0000</pubDate>
		<dc:creator>Djoerd Hiemstra</dc:creator>
		
	<category>Colloquia</category>
		<guid isPermaLink="false">http://wwwhome.cs.utwente.nl/~hiemstra/2011/2011%e2%80%99s-db-colloquium.html</guid>
		<description><![CDATA[Below you find last year’s DB colloquia, usually Tuesday’s from 13:45h - 14:30h. in ZI-3126. 





6 January 2011 (Thursday at 15.00 h.)

Andreas WombacherData driven inter-model conformance checking
Sensor data document changes in the physical world, which can be understood based on metadata modeling part of the physical world.

In the digital world, information systems are used for [...]]]></description>
			<content:encoded><![CDATA[Below you find last year’s DB colloquia, usually Tuesday’s from 13:45h - 14:30h. in ZI-3126. 

<dl>

<a name="andreas"></a>

<dt><strong>6 January 2011 (Thursday at 15.00 h.)</strong></dt>

<dd><i>Andreas Wombacher</i><br /><a href="#andreas">Data driven inter-model conformance checking</a><br />
Sensor data document changes in the physical world, which can be understood based on metadata modeling part of the physical world.

In the digital world, information systems are used for handling exchange of information between different actors, where some information is related to physical objects. Since these objects are potentially the same as observed by sensors, the sensor model (metadata) and the information system should describe the handling of physical objects in the same way, i.e., the information system and the sensor model should conform. So far conformance checking has been done on model level. I propose to use observed sensor and potentially information system data to check conformance.
</dd>

<p />

<a name="dwdm"></a>

<dt><strong>13 January 2011 (Thursday, 10:45-12:30h. in CR-3E)</strong></dt>

<dd><i>Data Warehousing and Data Mining guest lectures</i>
<p />
<i>Tom Jansen  (Distimo)</i><br />
<a href="#dwdm">Data warehousing for app store analytics</a>.
Distimo is an innovative app store analytics company built to solve the challenges created by a widely fragmented app store marketplace filled with equally fragmented information and statistics.
<p />
<i>Jacques Niehof and Alexandra Molenaar (SIOD)</i><br />
<a href="#dwdm">Data Mining for selection profiles for fraud and misuse of social security</a>.
The Social Intelligence and Investigation Service (SIOD) of the Ministry of Social Affairs and Employment (Ministerie van Sociale Zaken en Werkgelegenheid) fights criminality in the field of social security.
</dd>

<p />

<a name="paul"></a>

<dt><strong>8 February 2011</strong></dt>

<dd><i>Paul Stapersma</i><br />
<a href="#paul">A probabilistic XML database on top of MayBMS</a><br />
We use the probabilistic XML model proposed by Van Keulen and De Keijzer to create a prototype of an probabilistic XML database. One disadvantage of the XML data model is that queries cannot be executed as efficiently as in the relational database model. Many non-probabilistic mapping techniques have been developed to map semi structured data into relational databases to overcome this disadvantage. In this research, we use the schema-less mapping technique â€˜XPath Acceleratorâ€™ to build a probabilistic XML database (PXML-DBMS) based on an URDBMS. A working prototype can be found at <a href="http://code.google.com/p/pxmlconverter/">http://code.google.com/p/pxmlconverter/</a>.
</dd>

<p />

<a name="almer"></a>

<dt><strong>24 May 2011 (11:30h. in ZI-5126)</strong></dt>

<dd><i>Almer Tigelaar</i><br />
<a href="/~hiemstra/2011/search-result-caching-in-p2p-information-retrieval-networks.html">Search Result Caching in P2P Information Retrieval Networks</a><br />
We explore the solution potential of search result caching in large-scale peer-to-peer information retrieval networks by simulating such networks with increasing levels of realism. We find that a small bounded cache offers performance comparable to an unbounded cache. Furthermore, we explore partially centralised and fully distributed scenarios, and find that in the most realistic distributed case caching can reduce the query load by thirty-three percent. With optimisations this can be boosted to nearly seventy percent. 
</dd>

<p />

<a name="kien"></a>

<dt><strong>31 May 2011 (11:30h. in ZI-5126)</strong></dt>

<dd><i>Kien Tjin-Kam-Jet</i><br />
<a href="/~hiemstra/2011/free-text-search-over-complex-web-forms.html">Free-Text Search over Complex Web Forms</a><br />
This paper investigates the problem of using free-text queries as an alternative means for searching â€˜behindâ€™ web forms. We introduce a novel specification language for specifying free-text interfaces, and report the results of a user study where we evaluated our prototype in a travel planner scenario. Our results show that users prefer this free-text interface over the original web form and that they are about 9% faster on average at completing their search tasks.
</dd>

<p />

<dt><strong>7 June 2011</strong></dt>

<dd><a href="http://www.ctit.utwente.nl/ctit_symposium2011/">CTIT Symposium Security and Privacy - something to worry about?</a><br />
The list of invited speakers includes Prof.dr.ir. Vincent Rijmen (TU Graz, Austria and KU Leuven), Dr. George Danezis (Microsoft Research), Dr. Steven Murdoch (University of Cambridge), Prof. Bert-Jaap Koops (TILT), Prof.mr.dr. Mireille Hildebrandt (RU Nijmegen), Dr.ir. Martijn van Otterlo (KU Leuven) and Prof. Pieter Hartel (UT). 
<br />
<a href="http://www.ctit.utwente.nl/ctit_symposium2011/">Read more&#8230;</a>
</dd>

<p />

<a name="andreas2"></a>

<dt><strong>21 June 2011</strong></dt>

<dd><i>Andreas Wombacher</i><br />
<a href="#andreas2">Sensor Data Visualization &amp; Aggregation: A self-organizing approach</a><br />
In this year&#8217;s Advanced Database Systems course the students had the assignment to design and implement the database functionality to visualize sensor data based on user requests in a Web based system. To guarantee good response times of the database it is necessary to pre-aggregate data. The core of the assignment was to find good pre-aggregations which minimize the query response times while using only a limited amount of storage space. The pre-aggregation levels must adjust in case the characteristics of the user requests changes. In this talk I will present the different approximation approaches of the students and present an optimal solution NP-hard solution. 

<p />
</dd>

<a name="marijn"></a>

<dt><strong>28 June 2011</strong></dt>

<dd><i>Marijn Koolen (University of Amsterdam)</i><br />
<a href="#marijn">Relevance, Diversity and Redundancy in Web Search Evaluation</a><br />
Diversity performance is often evaluated with a measure that combines relevance, subtopic coverage and redundancy. Although this is understandable from a user perspective, it is problematic when analysing the impact of diversifying techniques on diversity performance. Such techniques not only affect subtopic coverage, but often the underlying relevance ranking as well. A evaluation measure that conflates these aspects hampers our progress in developing systems that provide diverse search results. In this talk, I argue that to further our understanding of how system components affect diversity, we need to look at relevance, coverage and redundancy individually. Using the official runs of the TREC 2009 Diversity task, I show that differences in diversity performance are mainly due to difference in the relevance ranking, with only minimal differences in how the relevant documents are ordered amongst themselves. If we measure diversity independent of the relevance ranking, we fin
d that some of systems that perform badly on conflated measures have the most diverse ordering of relevant documents.

<p />
</dd>

<a name="ontwerpproject"></a>

<dt><strong>30 June 2011 (Thursday, 10:30-11:30h. in ZI-3126)</strong></dt>

<dd><i>Design Project Presentations</i>
<p />

<i>Tristan Brugman, Maarten Hoek, Mustafa Radha, Iwan Timmer, Steven van der Vegt</i><br />
<a href="#ontwerpproject">UT-Search</a>.
UT Search is gemaakt als een vervanger voor google custom search. Er is gepoogd om een zoekmachine te maken waarin de kennis van de verschillende systemen binnen de UT wordt gebruikt om ook de vele informatie te vinden die niet via custom search gevonden kunnen worden. We maken gebruik van aggregated search om de verschillende systemen van de universiteit aan te spreken. Middels faceted search kan de gebruiker selecties maken van de systemen die hij wil doorzoeken om zo de zoekopdracht te verfijnen.
Het doel is om een centrale zoekmachine te hebben voor de universiteit waar vanaf alle verschillende systemen doorzocht kunnen worden om de informatie toegankelijker te maken.

<p />

<i> Ralph Broenink, Rick van Galen, Jarmo van Lenthe, Bas Stottelaar,  Niek Tax</i><br />
<a href="#ontwerpproject">Alexia: Het borrelbeheersysteem voor Stichting Borrelbeheer Zilverling</a>
Met de activiteiten van vijf verenigingen, afstudeerborrels en andere activiteiten worden de beide borrelruimtes van het EducafÃ© druk bezet. Er is gebleken dat door deze hoge bezettingsgraad er behoefte is aan een managementsysteem dat borrels kan plannen, voorraad kan bijhouden, tappers kan beheren en ook het verbruik tijdens de borrels kan registreren. Als extraatje is het ook mogelijk om op basis van RFID-kaarten op rekening te drinken.
Tijdens de presentatie laten wij zijn hoe we uit de eisen van deze vijf verenigingen een webapplicatie hebben ontwikkeld dat bovenstaande functies kan vervullen, gebruikmakende van de laatste technologieÃ«n zoals Django, CSS3 en HTML5.

<p />
</dd>

<a name="sergio"></a>

<dt><strong>12 July 2011</strong></dt>

<dd><i>Sergio Duarte</i><br />
Sergio will talk about his internship at Yahoo! Research, Barcelona.
</dd>

<p/>

<a name="rezwan"></a>

<dt><strong>23 August 2011</strong></dt>

<dd><i>Rezwan Huq</i><br />
<a href="#rezwan">Inferring Fine-grained Data Provenance in Stream Data Processing: Reduced Storage Cost, High Accuracy</a>
Fine-grained data provenance ensures reproducibility of results in decision making, process control and e-science applications. However, maintaining this provenance is challenging in stream data processing because of its massive storage consumption, especially with large overlapping sliding windows. In this paper, we propose an approach to infer fine-grained data provenance by using a temporal data model and coarse-grained data provenance of the processing. The approach has been evaluated on a real dataset and the result shows that our proposed inferring method provides provenance information as accurate as explicit fine-grained provenance at reduced storage consumption.
</dd>

 

<p />

<a name="maarten"></a>

<dt><strong>30 August 2011 at 14:30h.</strong></dt>

<dd><i>Maarten Fokkinga</i><br />
<a href="#maarten">Aggregation - polymorphic and polytypic</a><br />
Repeating the work of Meertens &#8220;Calculate Polytypically!'&#8217; we show how
to define in a few lines a very general &#8220;aggregation'&#8217; function. Our
intention is to give a self-contained exposition that is, compared to
Meertens&#8217; work, more accessible for the uninitiated reader who wants to
see the idea with a minimum of formal details.
</dd>

 

<p />

<a name="robin"></a>

<dt><strong>5 September 2011 (Monday at 13.30h.)</strong></dt>

<dd><i>Robin Aly</i><br />
<a href="#robin">Towards a Better Understanding of the Relationship Between Probabilistic Models in IR</a><br />
Probability of relevance (PR) models are generally assumed to implement the Probability Ranking Principle (PRP) of IR, and recent publications claim that PR models and language models are similar. However, a careful analysis reveals two gaps in the chain of reasoning behind this statement. First, the  PRP considers the relevance of particular documents, whereas PR models consider the relevance of any query-document pair.  Second, unlike PR models, language models consider draws of terms and documents. We  bridge the first gap by showing how the probability measure of PR models can be used to define the probabilistic model of the PRP. Furthermore, we argue that given the differences between PR models and language models, the second gap cannot be bridged at the probabilistic model level. We instead define a new PR model based on logistic regression, which has a similar score function to the one of the query likelihood model. The performance of both models is strongly correlated, hence providing a bridge for the second gap at the functional and ranking level. Understanding language models in relation with logistic regression models opens ample new research directions which we propose as future work. 
</dd>

<p/>

<a name="juan"></a>

<dt><strong>20 September 2011</strong></dt>

<dd><i>Juan Amiguet</i><br />
<a href="#juan">Annotation propagation and topology based approaches</a><br />
When making sensor data stream  applications more robust to changes in
the sensing environment by introducing annotations representing the
changes.
We find ourselves with the need to propagate such annotations across
processing elements.
We present here a technique for performing such propagation exploiting
the relations amongst the inputs and outputs both from an information
theoretic and topological perspective.
Topologies are used to describe the structure of the inputs and the
outputs separately.
Whilst information theory techniques are used to model the transform
as a channel enabling the topological transformations to be treated as
optimisation problems.
We present here a framework of functions which is generic in the light
of all transforms, and which enables for the maximisation of the
entropy across the transform.
</dd>

<p/>

<a name="sergio"></a>

<dt><strong>26 October 2011</strong></dt>

<dd><i>Sergio Duarte</i><br />
<a href="/~hiemstra/2011/what-and-how-children-search-on-the-web.html">What and How Children Search on the Web</a><br />
In this work we employed a large query log sample from a commercial web search engine to identify the struggles and search behavior of children of the age of 6 to young adults of the age of 18. Concretely we hypothesized that the large and complex volume of information to which children are exposed leads to ill-defined searches and to dis-orientation during the search process. For this purpose, we quantified their search difficulties based on query metrics (e.g. fraction of queries posed in natural language), session metrics (e.g. fraction of abandoned sessions) and click activity (e.g. fraction of ad clicks). We also used the search logs to retrace stages of child development. Concretely we looked for changes in the user interests (e.g. distribution of topics searched), language development (e.g. readability of the content accessed) and cognitive development (e.g. sentiment expressed in the queries) among children and adults. We observed that these metrics clearly demonstrate an increased level of confusion and unsuccessful search sessions among children. We also found a clear relation between the reading level of the clicked pages and the demographics characteristics of the users such as age and average educational attainment of the zone in which the user is located.<br/>
<a href="/~hiemstra/2011/what-and-how-children-search-on-the-web.html">Read more</a>
</dd>

<p/>

<a name="rezwan2"></a>

<dt><strong>29 November 2011</strong></dt>

<dd><i>Rezwan Huq</i><br />
<a href="#rezwan2">Adaptive Inference of Fine-grained Data Provenance to Achieve High Accuracy at Lower Storage Costs</a><br />
In stream data processing, data arrives continuously and is processed by decision making, process control and e-science applications. To control and monitor these applications, reproducibility of result is a vital requirement. However, it requires massive amount of storage space  to store fine-grained provenance data especially for those transformations with overlapping sliding windows. In this paper, we propose techniques which can significantly reduce storage costs and can achieve high accuracy. Our evaluation shows that adaptive inference technique can achieve more than 90% accurate provenance information for a given dataset at lower storage costs than the other techniques. Moreover, we present a guideline about the usage of different provenance collection techniques described in this paper based on the transformation operation and stream characteristics. 
</dd>

<p/>

</dl>

<p>See also: <a href="/~hiemstra/2010/2010s-db-colloquium.html">2010&#8217;s DB colloquium</a>.</p>

]]></content:encoded>
			<wfw:commentRSS>http://wwwhome.cs.utwente.nl/~hiemstra/2011/2011%e2%80%99s-db-colloquium.html/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>Keynote talk by Stefano Ceri at DBDBD</title>
		<link>http://wwwhome.cs.utwente.nl/~hiemstra/2011/keynote-talk-by-stefano-ceri-at-dbdbd.html</link>
		<comments>http://wwwhome.cs.utwente.nl/~hiemstra/2011/keynote-talk-by-stefano-ceri-at-dbdbd.html#comments</comments>
		<pubDate>Thu, 17 Nov 2011 14:22:59 +0000</pubDate>
		<dc:creator>Djoerd Hiemstra</dc:creator>
		
	<category>Conference &amp; Workshop</category>
		<guid isPermaLink="false">http://wwwhome.cs.utwente.nl/~hiemstra/2011/keynote-talk-by-stefano-ceri-at-dbdbd.html</guid>
		<description><![CDATA[
Stefano Ceri will give a keynote talk at the DBDBD on 2 December 2011. Ceri is professor of Database Systems at  Politecnico di Milano, Italy. He co-authored over 250 articles in International Journals and Conference Proceedings, and is co-author or editor of many international books, including best-selling classics like &#8220;Conceptual database design: an Entity-relationship [...]]]></description>
			<content:encoded><![CDATA[
<p><img src="/~hiemstra/wp-content/ceria.jpg" alt="Prof. Stefano Ceri" width="120" align="left" hspace="8" /><strong>Stefano Ceri will give a keynote talk at the DBDBD on 2 December 2011.</strong> Ceri is professor of Database Systems at  Politecnico di Milano, Italy. He co-authored over 250 articles in International Journals and Conference Proceedings, and is co-author or editor of many international books, including best-selling classics like &#8220;Conceptual database design: an Entity-relationship approach&#8221; with Carlo Batini and Shamkant Navathe. His research interests cover many aspects of database management systems, including distributed databases,  deductive and active databases, streaming data, object orientation, XML query languages, as well as design methods for data-intensive web sites.
</p>

<p>
Professor Ceri was awarded the prestigious IDEAS Advanced Grant, funded by the European Research Council (ERC), on <em>Search Computing</em> (<a href="http://www.search-computing.it/">search-computing.it</a>): Search computing enables answering questions via a constellation of dynamically selected, cooperating, search services. Search computing should enable answering complex queries like: &#8220;Who are the strongest European competitors on software ideas?&#8221;, &#8220;Who is the best doctor to cure insomnia in a nearby hospital?&#8221;, or very important for poor PhD students, &#8220;Where can I attend an interesting conference in my field closest to a sunny beach?&#8221;
</p>

<p>
More information on: <a href="http://dbdbd.nl">dbdbd.nl</a>
</p>

]]></content:encoded>
			<wfw:commentRSS>http://wwwhome.cs.utwente.nl/~hiemstra/2011/keynote-talk-by-stefano-ceri-at-dbdbd.html/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>Keynote lecture by Jimmy Lin at Big Data tutorial</title>
		<link>http://wwwhome.cs.utwente.nl/~hiemstra/2011/keynote-lecture-by-jimmy-lin-at-big-data-tutorial.html</link>
		<comments>http://wwwhome.cs.utwente.nl/~hiemstra/2011/keynote-lecture-by-jimmy-lin-at-big-data-tutorial.html#comments</comments>
		<pubDate>Tue, 18 Oct 2011 08:30:32 +0000</pubDate>
		<dc:creator>Djoerd Hiemstra</dc:creator>
		
	<category>Conference &amp; Workshop</category>
	<category>Photos</category>
		<guid isPermaLink="false">http://wwwhome.cs.utwente.nl/~hiemstra/2011/keynote-lecture-by-jimmy-lin-at-big-data-tutorial.html</guid>
		<description><![CDATA[
Jimmy Lin will give a keynote lecture at the SIKS/BigGrid Big Data tutorial that preceeds the DBDBD on 30 November and 1 December 2011. Dr. Lin, who holds a PhD from MIT, is associate professor in the iSchool at the University of Maryland. He also has appointments in the Institute for Advanced Computer Studies (UMIACS) [...]]]></description>
			<content:encoded><![CDATA[
<p><img src="/~hiemstra/wp-content/jimmylin.jpg" align="left" hspace="12" /><strong>Jimmy Lin will give a keynote lecture at the SIKS/BigGrid Big Data tutorial that preceeds the DBDBD on 30 November and 1 December 2011.</strong> Dr. Lin, who holds a PhD from MIT, is associate professor in the iSchool at the University of Maryland. He also has appointments in the Institute for Advanced Computer Studies (UMIACS) and the Department of Computer Science at Maryland. Lin works at the intersection of natural language processing (NLP) and information retrieval (IR), with a recent emphasis on scalable algorithm design and large-data issues. He directs the recently-formed Cloud Computing Center, an interdisciplinary group which explores the many aspects of cloud computing as it impacts technology, people, and society. He is also a member of both the Computational Linguistics and Information Processing Lab (CLIP) and the Human-Computer Interaction Lab (HCIL). Lin worked on Cloudera, which aims to bring Hadoop MapReduce to the enterprise, and is currently spending a sabbatical at Twitter</p>

<p>See: <a href="http://dbdbd.nl/big-data-tutorial/">Big Data Tutorial</a></p>

]]></content:encoded>
			<wfw:commentRSS>http://wwwhome.cs.utwente.nl/~hiemstra/2011/keynote-lecture-by-jimmy-lin-at-big-data-tutorial.html/feed/</wfw:commentRSS>
		</item>
	</channel>
</rss>

