Translator Disclaimer
5 June 2014 Language and dialect identification in social media analysis
Author Affiliations +
Historically-unwritten Arabic dialects are increasingly appearing online in social media texts and are often intermixed with other languages, including Modern Standard Arabic, English, and French. The next generation analyst will need new capabilities to quickly distinguish among the languages appearing in a given text and to identify informative patterns of language switching that occur within a user’s social network—patterns that may correspond to socio-cultural aspects such as participants’ perceived and projected group identity. This paper presents work to (i) collect texts written in Moroccan Darija, a low-resource Arabic dialect from North Africa, and (ii) build an annotation tool that (iii) supports development of automatic language and dialect identification and (iv) provides social and information network visualizations of languages identified in tweet conversations.
© (2014) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Stephen Tratz, Douglas Briesch, Jamal Laoudi, Clare Voss, and V. Melissa Holland "Language and dialect identification in social media analysis", Proc. SPIE 9122, Next-Generation Analyst II, 91220K (5 June 2014);

Back to Top