Archive - Central European Conference on Information and Intelligent Systems, CECIIS - 2013

Font Size: 
Mining Social Networks of Other Languages/Cultures
Ahmed Mohamed Sameh

Last modified: 2013-08-18

Abstract


The universal nature of Social Networks and its wide spread in the whole world allowed users to post their contributions in their own languages. In this paper we are using a combination of language translation and text-mining techniques to understand other cultures in Social Networks that are using other languages of interest. In particular we are studying the case of Twitter and focus on two cultures: Hebrew-speaking and Persian-speaking in the domain of politics. We propose a new Twitter Client that collects live tweet streams on frequent time intervals from #tags of specific topics of the target language and compute a mix of qualitative and quantitative indicators to measure Opinion spread. The proposed Client provides three levels of analysis: Surface, Shallow, and Deep depending on analyst wish and the traffic rate of input streams for the purpose of providing real-time mining service. Hebrew, Persian and Arabic tweets are analyzed taking into consideration important context background information. The proposed Client uses Google translate service to translate all Tweets to Arabic and then focus only on “Arabic” analysis (assuming users are Arabic speakers). The new Client introduces three “Tweet Coloring” algorithms that make use of previously developed Arabic NLP tools and graph algorithms such as “Arabic Wordnet”, “Q-WordNet” ontology for sentiments, “Arabic Lexicons”, “Arabic Tweet Corpus”, “Max Flow Minimum Cut” in order to speculate Tweeters’ inclination and impression about the subject of interest. On the other hand, three “Edge Coloring” algorithms are also introduced in the proposed client to speculate on opinion, influence, and trust spread (cascading) through Twitter social network graphs. Each one of the three algorithms has a “binary” version (+ve/-ve, Yes/No) and a “continuous” version.  A prototype of the proposed Client is implemented on-top of “Nodexl”; an open source template for Microsoft Excel that allows automated connection to a social network server and import (Using Twitter APIs) any data stream into the usual Excel environment. Tweet translation, coloring and Edge coloring algorithms are implemented as Excel Macros with selective setting to either surface, shallow, or deep analysis parameters. Visualization graphs are provided that allow dynamic filtering, vertex grouping, adjusted appearance (zoom into areas of interest), graph metric calculations, etc. In order to provide real-time mining service even in ‘Deep’ analysis setup, a parallel cluster farm of duplicate servers is provided for the proposed Client in order to speedup with parallel processing capabilities. 

The proposed Client can serve as a spay agent for any “Ad Hoc” queries by incorporating color-related algorithms as macros to color code the opinions in the Tweets and the edges in the graphs in the target language/culture of interest. The proposed Client is also applicable to other target language/culture domains of applications such as “Stock Market Predication”, “Public Opinion”, “Customer Voice”, “Service Benchmarking”, and “Blog Analysis” in the target language/culture. Our future work will archive posts over a longer period of time for better history (finger print), and will tackle other social networks such as Facebook, Flickr, Youtube, etc.