Consortium: University of Tübingen (Germany), CWI (The Netherlands), University of Twente (The Netherlands)
Pathfinder is a co-operation between the University of Tübingen, the University of Twente and CWI. We are working on a database-based implementation of the W3C proposals for XQuery. The goal is to create an XQuery compiler, whose architecture is designed to accommodate a number of different backends. Currently, the Pathfinder people are working on turning the main memory database system MonetDB (developed at CWI, Amsterdam) into a full-fledged XML DBMS, MonetDB/XQuery, available under open source licence.
PhD student: Mena Badieh Habib; Funding: Special funding concerning ITC/UT merger to promote cooperation between ITC and other UT-departments (UT)
In this project, we develop XML-based data technology to support the channeling of large and ill-behaved neogeographic data streams. In neogeography, geographic information is derived from end-users, not from official bodies like mapping agencies, cadasters or other official, (para-)governmental organizations. The motivation is that multiple (neo)geographic information sources on the same phenomenon can be mutually enriching.
Content provision and feedback from large communities of end-users has great potential for sustaining a high level of data quality. The technology is meant to reach a substantial user community in the less-developed world through content provision and delivery via cell phone networks. Exploiting such neogeographic data requires a.o. the extraction of the where and when from textual descriptions. This comes with intrinsic uncertainty in space, time, but also thematically in terms of entity identification: which is the restaurant, bus stop, farm, market, forest mentioned in this information source? The rise of sensor networks adds to the mix a badly needed verification mechanism for the real-time neogeographic data.
We strive for a proper mix of carefully integrated techniques in geoinformation handling, approaches to spatiotemporal imprecision and incompleteness, as well as data augmentation through sensors in a generic framework with which purpose- oriented end-user communities can be served appropriately.
The UT PhD position focuses on spatiotemporal data technology in XML databases and theory and support technology for storage, manipulation and reasoning with spatiotemporal and thematic uncertainty. The work is to be validated through testbed use cases, such as the H20 project with google.org (water consumers in Zanzibar), AGCommons project with the Gates Foundation (smallholder farmers in sub-Saharan Africa), and other projects with large user communities.
The Opaque project focuses on query optimization for relational XQuery engines. Advances in techniques for storage and query processing for XML using existing relational engines show that this approach has great potential for being able to manage the ever growing volumes of XML data. Opaque is one of the formal projects that is part of Pathfinder. Contributions on storage models, index structures, and algorithms have been made in recent years. Currently, attention is shifting towards query optimization techniques to address the remaining performance problems, to which Opaque hopes to make a contribution.
To be able to integrate XML query processing with traditional query processing, the proposed approach is based on extending existing query optimizer technology. It possibly involves extending relational algebra to incorporate some XML-specifics not easily captured in relational terms, development of a query rewriter based on equivalence rules in the algebra, a cost model that as accurately as possible predicts query execution cost based on statistics, or any other relationally inspired technique for XQuery query optimization.
Consortium: CWI, eMAXX, Philips, University of Twente; PhD students: Ander de Keijzer, Arthur van Bunningen; Funding: SenterNovem/Bsik (national)
The goal of Ambient Multimedia Databases (AmbientDB) is to ease the creation of distributed intelligent media-rich applications on typically mobile devices. Depending on the situation, such applications need to make optimal use of an ever varying collection of (multimedia) sensor and data sources that is in reach. This creates the need to for a data management middleware infrastructure to shield applications from such heterogeneity and variability. AmbientDB offers database-functionality on top of an ad-hoc P2P overlay network that can unite many data sources. Apart from the creation of AmbientDB itself, the project investigates the requirements of this new middleware both from the upper level (what are the needs of intelligent context-aware applications?) as well as from the data sources (e.g. multimedia databases offering video retrieval-by-content). AmbientDB is a sub-project of MultimediaN, a large national project in which many scientific, commercial and societal organizations work together on the development of multimedia information technology for advanced applications.
The PhD research on schema and data integration is a subtopic of AmbientDB. It aims at advancing the state-of-the-art in (semi-)automatic semantic schema evolution, schema integration, and context-aware querying using the AmbientDB infrastructure and semantic web and data mining technologies. We envision a system based on (semi-)automatically generated schema mapping rules for XML and the use of a probabilistic XML DBMS to handle semantic uncertainties. Integrating data from heterogeneous sources by actually merging information related to the same real world objects, is known to be a difficult problem. Since the relationship with the real world is involved, existing approaches involve a human to make semantic decisions. In this project, we try to develop a technique based on probability theory that doesn’t need user-interaction. The idea is to develop a probabilistic XML data model and use this to capture the uncertainties the occur as a result of integrating data. A corresponding XQuery-like query language will be developed that is able to query probabilistic XML data.
[publications] [MultimediaN] [MultimediaN/AmbientDB]