Archive for the 'DataGrant' Category

#WhoAmI in 160 Characters?

Wednesday, October 5th, 2016, posted by Djoerd Hiemstra

Classifying Social Identities Based on Twitter

by Anna Priante, Djoerd Hiemstra, Tijs van den Broek, Aaqib Saeed, Michel Ehrenhard, and Ariana Need

We combine social theory and NLP methods to classify English-speaking Twitter users’ online social identity in profile descriptions. We conduct two text classification experiments. In Experiment 1 we use a 5-category online social identity classification based on identity and self-categorization theories. While we are able to automatically classify two identity categories (Relational and Occupational), automatic classification of the other three identities (Political, Ethnic/religious and Stigmatized) is challenging. In Experiment 2 we test a merger of such identities based on theoretical arguments. We find that by combining these identities we can improve the predictive performance of the classifiers in the experiment. Our study shows how social theory can be used to guide NLP methods, and how such methods provide input to revisit traditional social theory that is strongly consolidated in offline setting

To be presented at the EMNLP Workshop on Natural Language Processing and Computational Social Science (NLP+CSS) on November 5 in Austin, Texas, USA.

[download pdf]

Download the code book and classifier source code from github.

#SupportTheCause: Online Protest and Advocacy Symposium

Wednesday, January 6th, 2016, posted by Djoerd Hiemstra

21-22 January 2016
University of Twente

#SupportTheCauseIf you’re interested in social media analysis and/or computational social science, there will be interesting guest speakers, including speakers from UCLA, TNO, TU Delft, Greenpeace, Sanquin, and Twitter.

http://supportthecause.nl

The Influence of Prosocial Norms and Online Network Structure on Prosocial Behavior

Friday, August 28th, 2015, posted by Djoerd Hiemstra

The Influence of Prosocial Norms and Online Network Structure on Prosocial Behavior: An Analysis of Movember’s Twitter Campaign in 24 Countries

by Tijs van den Broek, Ariana Need, Michel Ehrenhard, Anna Priante and Djoerd Hiemstra

Sociological research points at norms and social networks as antecedents of prosocial behavior. To date, the literature remains undecided on how these factors jointly influence prosocial behavior. Furthermore, the use of social media by campaign organizations may change the need for formal networks to organize large-scale collective action. Hence, in this paper we examine the interplay of prosocial norms and the structure of online social networks on offline prosocial behavior. For this purpose we use donation data from the global Movember campaign, messages about the Movember campaign on the online social networking site Twitter, and data from the World Giving Index. A multi-level analysis of Movember’s campaigns in 24 countries finds support for the logic of connective action: larger and more decentralized networks raise more donations. Furthermore, we find that the effect of prosocial norms on donations is decreased by larger and denser campaign networks.

To be presented at Social media, Activism, and Organizations 2015 (SMAO) on 6 November in Londen, UK.

On the Impact of Twitter-based Health Campaigns

Friday, August 21st, 2015, posted by Djoerd Hiemstra

A Cross-Country Analysis of Movember

by Nugroho Dwi Prasetyo (TU Delft), Claudia Hauff (TU Delft), Dong Nguyen, Tijs van den Broek, Djoerd Hiemstra

Health campaigns that aim to raise awareness and subsequently raise funds for research and treatment are commonplace. While many local campaigns exist, very few attract the attention of a global audience. One of those global campaigns is Movember, an annual campaign during the month of November, that is directed at men’s health with special focus on cancer and mental health. Health campaigns routinely use social media portals to capture people’s attention. Recently, researchers began to consider to what extent social media is effective in raising the awareness of health campaigns. In this paper we expand on those works by conducting an investigation across four different countries, while not only restricting ourselves to the impact on awareness but also on fund-raising. To that end, we analyze the 2013 Movember Twitter campaigns in Canada, Australia, the United Kingdom and the United States.

To be presented at the 6th International Workshop on Health Text Mining and Information Analysis (Louhi 2015) Workshop at EMNLP 2015 on September 17 in Lisbon, Portugal.

[download pdf]

Han van der Veen graduates on composing a more complete and relevant Twitter dataset

Tuesday, August 18th, 2015, posted by Djoerd Hiemstra

Composing a more complete and relevant Twitter dataset

by Han van der Veen

Social data is widely used by many researchers. Facebook, Twitter and other social networks are producing huge amounts of social data. This social data can be used for analyzing human behavior. Social datasets are typically created by a hashtag, however not all relevant data includes the hashtag. A better overview can be constructed with more data. This research is focusing on creating a more complete and relevant dataset. Using additional keywords for finding more relevant tweets and a filtering mechanism to filter out the irrelevant tweets. Three additional keywords methods are proposed and evaluated. One based on word frequency, one on probability of word in a dataset and the last method is using estimates about the volume of tweets. Two classifiers are used for filtering Tweets. A Naive Bayes classifier and a Support Vector Machine classifier are compared. Our method increases the size of the dataset with 105%. The average precision was reduced from 95% of only using a hashtag to 76% for a resulting dataset. These evaluations were executed on two TV-Shows and two sport events. A tool was developed that automatically executes all parts of the program. As input a specific hashtag of an event is required and using the hash will output a more complete and relevant dataset than using the original hashtag. This is useful for social researchers that uses Tweets, but also other researchers that uses Tweets as their data.

[download pdf]

#SupportTheCause: Identifying Motivations to Participate in Online Health Campaigns

Monday, August 10th, 2015, posted by Djoerd Hiemstra

by Dong Nguyen, Tijs van den Broek, Claudia Hauff (TU Delft), Djoerd Hiemstra, and Michel Ehrenhard

We consider the task of automatically identifying participants’ motivations in the public health campaign Movember and investigate the impact of the different motivations on the amount of campaign donations raised. Our classification scheme is based on the Social Identity Model of Collective Action (van Zomeren et al., 2008). We find that automatic classification based on Movember profiles is fairly accurate, while automatic classification based on tweets is challenging. Using our classifier, we find a strong relation between types of motivations and donations. Our study is a first step towards scaling-up collective action research methods.

The paper will be presented at the Conference on Empirical Methods in Natural Language Processing (EMNLP) on September 17-21, in Lisbon, Portugal.

[download pdf]

What country was this tweeted from?

Monday, August 3rd, 2015, posted by Djoerd Hiemstra

Determine the User Country of a Tweet

by Han van der Veen, Djoerd Hiemstra, Tijs van den Broek, Michel Ehrenhard, and Ariana Need

In the widely used message platform Twitter, about 2% of the tweets contains the geographical location through exact GPS coordinates (latitude and longitude). Knowing the location of a tweet is useful for many data analytics questions. This research is looking at the determination of a location for tweets that do not contain GPS coordinates. An accuracy of 82% was achieved using a Naive Bayes model trained on features such as the users’ timezone, the user’s language, and the parsed user location. The classiffier performs well on active Twitter countries such as the Netherlands and United Kingdom. An analysis of errors made by the classiffier shows that mistakes were made due to limited information and shared properties between countries such as shared timezone. A feature analysis was performed in order to see the effect of different features. The features timezone and parsed user location were the most informative features.

[download pdf]

Cancer Early Detection Campaigns on Twitter

Thursday, April 17th, 2014, posted by Djoerd Hiemstra

It is official! Twitter awards the University of Twente with a prestigious Twitter #DataGrant (with Tijs van den Broek, Michel Ehrenhard and Ariana Need). Twitter awarded 6 out 1,300 proposals.

Our research project aims to study the diffusion process and effectiveness of cancer early detection campaigns. We plan to analyse popular Twitter campaigns covering different types of cancer and geographical scopes, such as #Mamming (breast cancer), #Movember (prostate cancer), #DaveDay (pancreatic cancer) and #HPVReport (cervical cancer). We aim to map the diffusion process in detail by determining key events and actors that accelerate the diffusion process. Social network analysis will reveal if and when the campaign leads to word-of-mouth discussion, promotion and responses. We also aim to assess the effectiveness of the campaigns by comparing the frequency and sentiment of mentions of a particular type of cancer (e.g. breast cancer in case of #mamming) before and after the campaign.