Data Science – Page 2 – Djoerd Hiemstra

Data Science Day

On 20 April, we organize a Data Science Day in the DesignLab. Invited speakers at the Data Science Colloquium are Piet Daas, methodologist and big data research coordinator of the CBS (Centraal Bureau Statistiek) who will talk about big data from Twitter and Facebook as a data source for official statistics; Rolf de By and Raul Zurita Milla, professors of ITC Geo-Information Science and Earth Observation, will talk about remote sensing techniques, using satelites and drones for helping economies in poor areas in the world, a prestigious project funded by the Bill and Melinda Gates Foundation; and Jan Willem Tulp creator of interactive data visualisations for magazines like Scientific American and Popular Science, as well as companies, for instance the Tax Free Retail Analysis Tool for Schiphol Amsterdam Airport.

The Data Science colloquia are kindly sponsored by the CTIT and the Netherlands Research School for Information and Knowledge Systems (SIKS) and part of the SIKS educational program.

[more information]

Twente Data Science Center

Scientific and economic progress is increasingly powered by our capabilities to explore big datasets. Data is the driving force behind the successful innovation of Internet companies like Google, Twitter, and Yahoo, and job advertisements show an increasing need for data scientists and big data analysts. Data scientists dig for value in data by analyzing for instance texts, application usage logs, and sensor data. The need for data scientists and big data analysts is apparent in almost every sector in our society, including business, health care, and education.

The Twente Center for Data Science is a collaboration between research groups of the University of Twente to research, promote and facilitate big data analysis for all scientific disciplines. The center operates by the participants sharing their expertise, sharing their contacts, sharing their data, and sharing their research infrastructure (hardware and software) for large-scale data analysis.

The Twente Data Science Center offers a unique combination of expertise in computer science, mathematics, management, behavioral sciences and social sciences; collaborations with leading international companies such as Google, Twitter and Yahoo; and local infrastructure and support for the analysis of very large datasets.

More information

Norvig Web Data Science Award 2014

The Norvig Web Data Science Award is organized by Common Crawl and SURFsara for researchers and students in the Benelux. SURFsara provides free access to the their Hadoop cluster with a copy of the full Common Crawl web crawl from March 2014 – almost 3 billion web pages. Participants are completely free in choosing their research question. For example, last year there were submissions looking at concept association, connections between languages, readability and more. Be creative and think outside of the box!

The award is named after Peter Norvig, Director of Research at Google, who chairs the jury that will select the winning submission. The contest will run until July 31, 2014. The winning team will be announced at the award ceremony in September 2014 and will get a tablet, smart watch and Github small plan for a year.

Keynote by Ravi Kumar

We are very proud that Ravi Kumar from Google agreed to give a keynote speech at the CTIT Symposium on Big Data and the Emergence of Data Science. Kumar, who is well-known for hist work on web and data mining and algorithms for large data sets, has been a senior staff research scientist at Google since June 2012. Prior to this, he was a research staff member at the IBM Almaden Research Center and a principal research scientist at Yahoo! Research. He obtained his Ph.D. in Computer Science from Cornell University in 1998.
Ravi Kumar's talk will cover two non- conventional computational models for analyzing big data. The first is data streams: in this model, data arrives in a stream and the algorithm is tasked with computing a function of the data without explicitly storing it. The second is map-reduce: in this model, data is distributed across many machines and computation is done as sequence of map and reduce operations. Kumar will present a few algorithms in these models and discuss their scalability.

The workshop takes place on Tuesday 4 June at the University of Twente. Other invited spearkers at the CTIT symposium are Maarten de Rijke (U. Amsterdam) and Milan Petkovic (Philips).

The Winners of The Norvig Web Data Science Award

by Lisa Green (Common Crawl)

We are very excited to announce that the winners of the Norvig Web Data Science Award: Lesley Wevers, Oliver Jundt, and Wanno Drijfhout from the University of Twente! The Norvig Web Data Science Award was created by Common Crawl and SURFsara to encourage research in web data science and named in honor of distinguished computer scientist Peter Norvig.

There were many excellent submissions that demonstrated how you can extract valuable insight and knowledge from web crawl data. Be sure to check out the work of the winning team, Traitor – Associating Concepts Using The World Wide Web, and the other finalists on the award website. You will find descriptions of the projects as well as links to the code that was used. We hope that these projects will serve as an inspiration for what kind of work can be done with the Common Crawl corpus. All code is open source and we are looking forward to seeing it reused and adapted for other projects.

A huge thank you to our distinguished panel of judges: Peter Norvig, Ricardo Baeza-Yates, Hilary Mason, Jimmy Lin, and Evert Lammerts!

Added on 18 March: Award winners Oliver Jundt, Wanno Drijfhout, and Lesley Wevers with their prize: a high-end Android tablet!

Participate in the Dutch Common Crawl Challenge

What can you do with 6 billion webpages?

Together with Common Crawl and SARA, we invite students and researchers studying at or employed by research institutes or universities in the Netherlands to dive into the Common Crawl web corpus using the SARA Hadoop service. The best submission will receive the The Norvig Web Data Science Award, a tablet, and 1500 Euro to spend on travel, accommodation, and conference registration fee for SIGIR 2013 to be held in Dublin, Ireland.

The award is named after Peter Norvig, Google's director of research with a resume too impressive to summarize. Peter is on the advisory board of Common Crawl, and is chair of the jury for this award. Other jury members are Ricardo Baeza-Yates (Yahoo!), Hilary Mason (bit.ly), Jimmy Lin (University of Maryland), and Evert Lammerts (SARA).

Find out more at the Norvig Award page at Github, the Common Crawl Blog, or come to the Inter-Actief Challenges Information Lunch on 22 November at 12.30h. in Absint.