Paper
5 June 2014 Language and dialect identification in social media analysis
Stephen Tratz, Douglas Briesch, Jamal Laoudi, Clare Voss, V. Melissa Holland
Author Affiliations +
Abstract
Historically-unwritten Arabic dialects are increasingly appearing online in social media texts and are often intermixed with other languages, including Modern Standard Arabic, English, and French. The next generation analyst will need new capabilities to quickly distinguish among the languages appearing in a given text and to identify informative patterns of language switching that occur within a user’s social network—patterns that may correspond to socio-cultural aspects such as participants’ perceived and projected group identity. This paper presents work to (i) collect texts written in Moroccan Darija, a low-resource Arabic dialect from North Africa, and (ii) build an annotation tool that (iii) supports development of automatic language and dialect identification and (iv) provides social and information network visualizations of languages identified in tweet conversations.
© (2014) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Stephen Tratz, Douglas Briesch, Jamal Laoudi, Clare Voss, and V. Melissa Holland "Language and dialect identification in social media analysis", Proc. SPIE 9122, Next-Generation Analyst II, 91220K (5 June 2014); https://doi.org/10.1117/12.2059092
Lens.org Logo
CITATIONS
Cited by 3 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Visualization

Data modeling

Social networks

Web 2.0 technologies

Information visualization

Gold

Machine learning

Back to Top