5 June 2014 Language and dialect identification in social media analysis
Author Affiliations +
Abstract
Historically-unwritten Arabic dialects are increasingly appearing online in social media texts and are often intermixed with other languages, including Modern Standard Arabic, English, and French. The next generation analyst will need new capabilities to quickly distinguish among the languages appearing in a given text and to identify informative patterns of language switching that occur within a user’s social network—patterns that may correspond to socio-cultural aspects such as participants’ perceived and projected group identity. This paper presents work to (i) collect texts written in Moroccan Darija, a low-resource Arabic dialect from North Africa, and (ii) build an annotation tool that (iii) supports development of automatic language and dialect identification and (iv) provides social and information network visualizations of languages identified in tweet conversations.
© (2014) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Stephen Tratz, Stephen Tratz, Douglas Briesch, Douglas Briesch, Jamal Laoudi, Jamal Laoudi, Clare Voss, Clare Voss, V. Melissa Holland, V. Melissa Holland, } "Language and dialect identification in social media analysis", Proc. SPIE 9122, Next-Generation Analyst II, 91220K (5 June 2014); doi: 10.1117/12.2059092; https://doi.org/10.1117/12.2059092
PROCEEDINGS
11 PAGES


SHARE
Back to Top