The rapid growth of Internet usage in the last two decades adds new challenges to understand the informal user generated content (UGC) on the Internet. Textual UGC refers to textual posts on social media, blogs, emails, chat conversations, instant messages, forums, reviews, or advertisements that are created by end-users of an online system. A large portion of language used on textual UGC is informal. Informal text is the style of writing that disregard language grammars and uses a mixture of abbreviations and context dependent terms. The straightforward application of state of-the-art Natural Language Processing approaches on informal text typically results in significantly degraded performance due to the following reasons: the lack of sentence structure; the lack of enough context required; the seldom entities involved; the noisy sparse contents of users' contributions; and the untrusted facts contained. It is the aim of this work- shop to bring the attention of researchers to the opportunities and challenges involved in informal text processing. In particular, we are interested in discussing informal text modeling, normalization, mining, and understanding in addition to various application areas in which UGC is involved.


We invite submissions on topics that include, but are not limited to, the following core NLP approaches for informal UGC: language identification, classification, clustering, filtering, summarization, tokenization, segmentation, morphological analysis, POS tagging, parsing, named entity extraction, named entity disambiguation, relation/fact extraction, semantic annotation, sentiment analysis, language normalization, informality modeling and measuring, language generation, handling uncertainties, machine translation, ontology construction, dictionary construction, etc.


Authors are invited to submit original work not submitted to another conference or workshop. Workshop submissions could be a full paper or short paper. Paper length should not exceed 12 pages for full papers and 6 pages for short papers. All papers should follow the Springer's LNCS format. Papers in PDF can be sent via the EasyChair Conference System https://easychair.org/conferences/?conf=nlpit2015. Each submission will receive, in addition to a meta-review, at least 2 peer double-blind reviews. Therefore, the paper must not include the authors' names and affiliations. Furthermore, self-references that reveal the author's identity must be avoided. Each full paper will get 25 minutes presentation time. Short papers will get 5 minutes presentation time in addition to a poster. Beside papers, we also plan to have an invited talk by a renowned scientist on a topic relevant for the workshop. Workshop proceedings will be published as part of the ICWE2015 workshop proceedings.
To contact the NLPIT 2015 organization team, please send an e-mail to: nlpit2015@easychair.org.

For all accepted papers, at least one authors should be registered at the conference and attend the workshop to present the paper. The conference Early Bird registration (at reduced fees) is until 28 May.


- Submission deadline: April 17, 2015 April 26, 2015
- Notification deadline: May 17, 2015 May 24, 2015
- Camera-ready version: May 24, 2015 July 18, 2015
- Workshop date: June 23, 2015

Keynote Speech

Speaker: Nathan Schneider

Title: Hacking a Way Through the Twitter Language Jungle: Syntactic Annotation, Tagging, and Parsing of English Tweets.

Diverse genres of digital text are forcing us to rethink our modus operandi in NLP: with a Web awash in natural language text from ordinary users, it is not enough to be able to analyze well-edited news sentences. To build tools that are accurate for other linguistic styles, we need new linguistic resources. Unfortunately, traditional forms of syntactic analysis are expensive and do not readily adapt to informal genres. This talk plunges into parsing in the Twitter jungle, following a trajectory that tightly integrates linguistic data preparation and statistical modeling. I argue against treating traditional linguistic representations as sacrosanct, and instead present new representations that are simpler and more flexible for conversational text. Taking advantage of these for a streamlined annotation process, I and colleagues at CMU have compiled datasets of tweets and built domain-appropriate POS taggers and parsers (Gimpel et al., ACL 2011; Owoputi et al., NAACL 2013; Schneider et al., LAW 2013; Kong et al., EMNLP 2014). Would-be Twitter trailblazers are invited to plunder these resources, which are stashed at http://www.ark.cs.cmu.edu/TweetNLP/.

Nathan Schneider is an annotation schemer and computational modeler for natural language. His 2014 dissertation (Carnegie Mellon University) introduced a coarse-grained representation for lexical semantics that facilitates rapid annotation and is practical for broad-coverage statistical NLP. He has also worked on semantic parsing for the FrameNet representation, the design of the Abstract Meaning Representation formalism, and other forms of syntactic/semantic annotation and processing for social media text. He is presently a postdoctoral researcher at the University of Edinburgh, where he continues to play with data and algorithms for linguistic meaning.


- Mena B. Habib, University of Twente, The Netherlands.
- Florian Kunneman, Radboud University, The Netherlands.
- Maurice van Keulen, University of Twente, The Netherlands.

Program Committee

- Alexandra Balahur, The European Commission's Joint Research Centre (JRC), Italy
- Barbara Plank, University of Copenhagen, Denmark
- Diana Maynard, University of Sheffield, UK
- Djoerd Hiemstra, University of Twente, The Netherlands
- Kevin Gimpel, Toyota Technological Institute, USA
- Leon Derczynski, University of Sheffield, UK
- Marieke van Erp, VU University Amsterdam, The Netherlands
- Natalia Konstantinova, University of Wolverhampton, UK
- Robert Remus, Universitt Leipzig, Germany
- Wang Ling, Carnegie Mellon University, USA
- Wouter Weerkamp, 904Labs
- Zhemin Zhu, University of Twente, The Netherlands

Final Program

Date: June 23rd 2015.
9:00 - 10:00 : Keynote Speech: Nathan Schneider, "Hacking a Way Through the Twitter Language Jungle: Syntactic Annotation, Tagging, and Parsing of English Tweets".
10:00 - 10:30 :
Regular Paper:
Tetsuya Suzuki, "Introduction of N-gram into a Run-Length Encoding based ASCII Art Extraction Method".
10:30 - 11:00 : Coffee break.
11:00 - 11:30 :
Regular Paper:
Jihene Younes, Hadhemi Achour and Emna Souissi, "Constructing Linguistic Resources for the Tunisian Dialect using Textual User-Generated Contents on the Social Web".
11:30 - 12:00 : Regular Paper:
Amandyk Kartbayev, "SMT: A Case Study of Kazakh-English Word Alignment".
12:00 - 12:30 : Regular Paper:
Mariona Taulé, M Antonia Martí, Ann Bies, Aina Garí, Montserrat Nofre, Zhiyi Song, Stephanie Strassel and Joe Ellis, "Spanish Treebank Annotation of Informal Non-Standard Web Text".