The rapid growth of Internet usage in the last two decades adds new challenges to understand the informal user generated content (UGC) on the Internet. Textual UGC refers to textual posts on social media, blogs, emails, chat conversations, instant messages, forums, reviews, or advertisements that are created by end-users of an online system. A large portion of language used on textual UGC is informal. Informal text is the style of writing that disregard language grammars and uses a mixture of abbreviations and context dependent terms. The straightforward application of state-of-the-art Natural Language Processing approaches on informal text typically results in significantly degraded performance due to the following reasons: the lack of sentence structure; the lack of enough context required; the seldom entities involved; the noisy sparse contents of users' contributions; and the untrusted facts contained. It is the aim of this workshop to bring the attention of researchers to the opportunities and challenges involved in informal text processing. In particular, we are interested in discussing informal text modeling, normalization, mining, and understanding in addition to various application areas in which UGC is involved.


We invite submissions on topics that include, but are not limited to, the following core NLP approaches for informal UGC: language identification, classification, clustering, filtering, summarization, tokenization, segmentation, morphological analysis, POS tagging, parsing, named entity extraction, named entity disambiguation, relation/fact extraction, semantic annotation, sentiment analysis, language normalization, informality modeling and measuring, language generation, handling uncertainties, machine translation, ontology construction, dictionary construction, etc.


Authors are invited to submit original work not submitted to another conference or workshop. Workshop submissions could be a full paper or short paper. Long papers should present completed work and may consist of up to 12 pages of content including references. Short papers can present work in progress and may consist of up to 8 pages including references. All papers should follow the Springer LNCS format. Papers in PDF can be sent via the EasyChair Conference System. Each submission will receive, in addition to a meta-review, at least 2 peer double-blind reviews. Self-references that reveal the author's identity must be avoided. Each full paper will get 30 minutes presentation time. Short papers will get 15 minutes presentation time. The papers accepted in the workshop will be published as a part of the workshops post-proceedings of the ICWE 2017 conference.
To contact the NLPIT 2017 organization team, please send an e-mail to: nlpit2017 <AT> easychair.org.


- Submission deadline: March 31st, 2017 April 8th, 2017 (EXTENDED)
- Notification deadline: April 28thth, 2017
- Camera-ready version: May 12th, 2017
- Workshop date: June 5th, 2017
- Post Proceedings: 24th June 2017

Keynote Speech

Speaker: Roberto Navigli

Title: Overcoming the Language Barrier with BabelNet and Multilingual Disambiguation of Text

Multilinguality is a key feature of today’s Web, and it is this feature that we leverage and exploit in our research work at the Sapienza University of Rome’s Linguistic Computing Laboratory, which I am going to overview and showcase in this talk. I will start by presenting BabelNet (http://babelnet.org), the largest multilingual encyclopedic dictionary and semantic network (now also a knowledge base), which covers 271 languages and 14 million concepts and named entities. BabelNet provides both coverage for all the open-class parts of speech, thanks to the seamless integration of WordNet, Wikipedia, Wiktionary, OmegaWiki, Wikidata and the Open Multilingual WordNet. Next, I will present Babelfy (http://babelfy.org), a unified approach that leverages BabelNet to jointly perform word sense disambiguation and entity linking in arbitrary languages, with performance on both tasks on a par with, or surpassing, those of task-specific state-of-the-art supervised systems. Babelfy also includes a language-agnostic setting in which languages can be mixed in arbitrary ways. Finally, I will describe the most recent developments, including deep learning approaches to latent vector representations of meaning and word sense disambiguation.

Bio: Roberto Navigli
is an Associate Professor in the Department of Computer Science of the Sapienza University of Rome. He was awarded the Marco Somalvico 2013 AI*IA Prize for the best young researcher in AI. He is the first Italian recipient of an ERC Starting Grant in computer science, on multilingual word sense disambiguation (2011-2016), and one of the few computer scientists to have received two ERC Grants (a Consolidator Grant starting in 2017). He was a co-PI of a Google Focused Research Award on Natural Language Understanding. In 2015 he received the META prize for groundbreaking work in overcoming language barriers with BabelNet, a project also highlighted in TIME magazine this year. His research lies in the field of Natural Language Processing (including multilingual word sense disambiguation and induction, multilingual entity linking, large-scale knowledge acquisition, ontology learning from scratch, gamification for NLP, open information extraction and relation extraction). Currently he is an Associate Editor of the Artificial Intelligence Journal.


- Mena B. Habib, Maastricht University, The Netherlands
- Florian Kunneman, Radboud University, The Netherlands
- Maurice van Keulen, University of Twente, The Netherlands

Program Committee

- Alexandra Balahur, The European Commission's Joint Research Centre (JRC), Italy
- Barbara Plank, University of Copenhagen, Denmark
- Chenliang Li, Wuhan University, China
- Claudia Hauff, Delft University, The Netherlands
- Dolf Trieschnigg, My Data Factory, The Netherlands
Erik Tjong Kim Sang, Meertens Institute, The Netherlands
- Gerasimos Spanakis, Maastricht University, The Netherlands
- Heba Elfardy, Columbia University, USA

- Julia Kiseleva, Eindhoven University of Technology, The Netherlands
- Kevin Gimpel, Toyota Technological Institute, USA
- Malvina Nissim, University of Groningen, The Netherlands
- Natalia Konstantinova, University of Wolverhampton, UK
- Orphee De Clerq, Ghent University, Belgium
- Robert Remus, ExB Group, Germany
- Sabine Bergler, Concordia University, Canada
- Wang Ling, Carnegie Mellon University, USA
- Yannis Korkontzelos, Edge Hill University, UK
- Zhemin Zhu, Elsevier, The Netherlands

Final Program

Date:  5th June 2017
Session 1: Chair (Giulio Napolitano)
09:00 - 09:05 : Introduction
09:05 - 10:00 : Keynote speech : Overcoming the Language Barrier with BabelNet and Multilingual Disambiguation of Text (Roberto Navigli)
10:00 - 10:30 : Analysis and Quantitative Study of Egyptian Dialect On Twitter (Hamdy Mubarak)
10:30 - 11:00 : Coffee Break.
Session 2: Chair (Giulio Napolitano)
11:00 - 11:30 : Named Entity Recognition in Twitter using Images and Text (Diego Esteves, Rafael Peres, Jens Lehmann, and Giulio Napolitano)

11:30 - 12:00 :
Online Expectation Maximization for Language Characterization of Streaming Text (Jonathan Wintrode, Nhat Bui, Jan Stepinski and Chris Reed)

12:00 - 12:10 : Wrap Up and Closing.