The rapid growth of Internet usage in the last two decades adds new challenges to understand the informal user generated content (UGC) on the Internet. Textual UGC refers to textual posts on social media, blogs, emails, chat conversations, instant messages, forums, reviews, or advertisements that are created by end-users of an online system. A large portion of language used on textual UGC is informal. Informal text is the style of writing that disregard language grammars and uses a mixture of abbreviations and context dependent terms. The straightforward application of state-of-the-art Natural Language Processing approaches on informal text typically results in significantly degraded performance due to the following reasons: the lack of sentence structure; the lack of enough context required; the seldom entities involved; the noisy sparse contents of users' contributions; and the untrusted facts contained. It is the aim of this work- shop to bring the attention of researchers to the opportunities and challenges involved in informal text processing. In particular, we are interested in discussing informal text modeling, normalization, mining, and understanding in addition to various application areas in which UGC is involved.


We invite submissions on topics that include, but are not limited to, the following core NLP approaches for informal UGC: language identification, classification, clustering, filtering, summarization, tokenization, segmentation, morphological analysis, POS tagging, parsing, named entity extraction, named entity disambiguation, relation/fact extraction, semantic annotation, sentiment analysis, language normalization, informality modeling and measuring, language generation, handling uncertainties, machine translation, ontology construction, dictionary construction, etc.


Authors are invited to submit original work not submitted to another conference or workshop. Workshop submissions could be a full paper or short paper. Paper length should not exceed 10 pages for full papers and 5 pages for short papers. All papers should be formatted according to the ACM SIG Proceedings template. Papers in PDF can be sent via the EasyChair Conference System. Each submission will receive, in addition to a meta-review, at least 2 peer double-blind reviews. Therefore, the paper must not include the authors' names and affiliations. Furthermore, self-references that reveal the author's identity must be avoided. Each full paper will get 25 minutes presentation time. Short papers will get 5 minutes presentation time in addition to a poster. Beside papers, we also plan to have an invited talk by a renowned scientist on a topic relevant for the workshop. The papers accepted in the workshop will be published as a part of the "Companion volume" of the WWW 2016 conference.
To contact the NLPIT 2016 organization team, please send an e-mail to: nlpit2016@easychair.org.


- Submission deadline: December 22, 2015  January 8, 2016  Extended
- Notification deadline: February 2, 2016
- Camera-ready version: February 8, 2016
- Workshop date: April 12, 2016

Keynote Speech

RaphaŽl Troncy

Title: Linking Entities for Enriching and Structuring Social Media Content.

Social media platforms such as Twitter, Facebook or LinkedIn become a reliable source of news and play a key role for being aware of events around the world. Using social media to recognize, enrich or summarize events is however very challenging. In the first part of this talk, we will present ADEL, a novel hybrid architecture for an adaptive entity linking system, that combines methods from the natural language processing, information retrieval and semantic fields. We will show how ADEL can outperform the state-of-the-art systems on the reference NEEL challenge that happens in the yearly #Micropost workshop (2014-2016). In the second part of this talk, we will present a framework that can collect microposts from more than 12 social platforms and that contain media items, as a result of a query -- for example a trending event. We will then show how we can automatically create different visual storyboards that reflect what users have shared about this particular event.

Dr. RaphaŽl Troncy is an Associate Professor in the Data Science Department of EURECOM leading the Multimedia Semantics group. He is also co-chair of the W3C Media Fragments Working Group and W3C Incubator Group on Multimedia Semantics, contributes to the W3C Web Annotations Working Group and numerous W3C Community Groups such as Schema.org or Open Linked Education. He is working on information extraction (project ASRAEL) and in particular named entity linking (framework NERD), ontology modeling in the multimedia domain (project DOREMUS) and large scale data integration. Applications range from smart cities (project 3cixty) to social TV and second screen (projects LinkedTV and NexGen-TV) where semantic web, information extraction and multimedia analysis technologies are used together.


- Mena B. Habib, University of Twente, The Netherlands.
- Florian Kunneman, Radboud University, The Netherlands.
- Maurice van Keulen, University of Twente, The Netherlands.

Program Committee

- Alexandra Balahur, The European Commission's Joint Research Centre (JRC), Italy
- Barbara Plank, University of Copenhagen, Denmark
- Claudia Hauff, Delft University, The Netherlands
- Diana Maynard, University of Sheffield, UK
- Djoerd Hiemstra, University of Twente, The Netherlands
- Dolf Trieschnigg, My Data Factory, The Netherlands
- Julia Kiseleva, Eindhoven University of Technology, The Netherlands
- Kevin Gimpel, Toyota Technological Institute, USA
- Natalia Konstantinova, University of Wolverhampton, UK
- Orphee De Clerq, Ghent University, Belgium
- Robert Remus, ExB Group, Germany
- Wang Ling, Carnegie Mellon University, USA
- Yannis Korkontzelos, Edge Hill University, UK
- Zhemin Zhu, Elsevier, The Netherlands

Final Program

Date: April 12th 2016.
09:00 - 09:05 : Welcome and Workshop Overview.
09:05 - 09:50: Keynote Speech: RaphaŽl Troncy, "Linking Entities for Enriching and Structuring Social Media Content".
09:50 - 10:15 : Full Paper: Tunde Adegbola, "Pattern-based Unsupervised Induction of Yoruba Morphology ".
10:15 - 10:30 : Short Paper: Anna JÝrgensen and Anders SÝgaard, "A test suite for evaluating POS taggers across varieties of English ".
10:30 - 11:00 : Coffee Break.
11:00 - 11:25 :
Full Paper: Ming Yang and William Hsu, "HDPauthor: A New Hybrid Author-Topic Model using Latent Dirichlet Allocation and Hierarchical Dirichlet Processes".
11:25 - 11:50 :
Full Paper:
Jhon AdriŠn Cerůn-GuzmŠn and Elizabeth Leůn-GuzmŠn, "Lexical Normalization of Spanish Tweets ".
11:50 - 12:05 : Short Paper:
Qian Zhang and Bruno GonÁalves, "Topical differences between Chinese language Twitter and Sina Weibo ".
- 12:20: Short Paper: Pedro Dias Cardoso and Anindya Roy, "Language Identification for Social Media: short messages and transliteration".
12:20 - 12:30 : Wrap Up and Closing.