A GENERAL TAXONOMY FOR VISUALIZATION OF PREDICTIVE SOCIAL MEDIA ANALYTICS Stacey Franklin Jones, D.Sc. ProTech Global Solutions Annapolis, MD Abstract The use of Social Media as a resource to characterize historical and current occurrences to reasonably predict future events is rapidly emerging. Effective visualization of Predictive Social Media Analytics (PSMA) data lends to meaningful and relevant translation, efficient interpretation and ease in use of results. A general taxonomy for the measurement of these attributes enable comparison of visualization techniques with the ultimate goal of identifying some of the best PSMA data representation approaches and export features for a broad range of applications and respective complexity. Background The quest to identify trends in large publicly available data realms such as that associated with Social Media and using them to foresee what may happen in the time to come requires robust analytic approaches. We are seeing the emergence of various applications that make use of algorithmic techniques to provide this predictive capability based on extraction and manipulation of huge volumes of data available via websites and other online means of communication used by people to share social information. These Predictive Social Media Analytics (PSMA) applications offer various features at different complexity levels to aid in making more informed decisions about marketing products and improving financial services. PSMA applications are also beginning to infiltrate the defense and intelligence space to assist in monitoring and forecasting world events. Their common goal is to generate output that facilitates understanding of complex data relationships over time. Since PSMA application operate on such massive amounts of data, visualization and other data representation of temporally significant instances with a reasonable degree of graphical clarity becomes no easy feat. Ultimately their value will be measured not only by the speed, sophistication and accuracy of their algorithms, but by how well they produce or provide interfaces that facilitate meaningful and relevant translations that lend to efficient interpretation and ease in use of results. Amongst the numerous PSMA applications are three (3) that cover the general forecasting space of consumer product-focused, financial climate and defense intelligence forecasting. Educational use of PSMA is of particular interest to the author(s) but at this time is more a residual space related to the aforementioned. As such, for the purpose of a general survey of relevant PSMA visualization techniques the following applications are reviewed: IBM s SPSS predictive analytics software, SAS Social Media Analytics and Recorded Future. Admittedly so, the predictive power of the named PSMA applications do not claim to be equal. However, they generally incorporate output data representation patterns sufficient to develop a general taxonomy for the measurement of key attributes.
IBM s SPSS predictive analytics software developers state that they can predict with confidence what will happen next so that you can make smarter decisions, solve problems and improve outcomes. They invite the user to create and share what is referred to as compelling visualizations that better communicate [your] analytics results [1]. IBM SPSS also states that with the use of their IBM SPSS Visualization Designer the user may easily develop and build new visualizations that enable new ways to portray and communicate analytics without extensive programming skills. Three (3) example visualizations available to the IBM SPSS user are: Network Graph, Scatterplot Matrix and Jungle Book Graph shown in Figures 1-3 [Reprint Courtesy of International Business Machines Corporation, SPSS, Inc., an IBM Company]. Figure 1 Network Graph Figure 2 Scatterplot Matrix Figure 3 Jungle Book Graph SAS Social Media Analytics makes the claim that they provide the power to know what lies ahead around the next corner through an integrated environment for predictive and descriptive modeling, data mining, text analytics, forecasting, optimization, simulation, experimental design and more. From dynamic visualization to predictive modeling, model deployment and process optimization, SAS provides a range of techniques and processes for the collection, classification, analysis and interpretation of data to reveal patterns, anomalies, key variables and relationships, leading ultimately to new insights
and better answers faster. [2] Associated with the SAS analytics tool is an ability to illustrate trends and enable drilling down to a level of actual comments that contribute to the prediction(s). To illustrate these changes over time, they offer amongst other options, a dashboard of standard column, bar, line and pie charts and tables as a method for visualizing correlations of social media trends with the circumstances that triggered those events. An example of this approach which can incorporate marketing activities, product changes, world events and/or market conditions is shown in Figures 4 and 5. Figure 4 SAS SMA Superimposition A Figure 5 SAS SMA Superimposition B Also offered via SAS Visual Data Discovery is an interactive data visualization for analytics suite that includes, but is not limited to: scatter, bubble and 3-D contour plots with animation. Recorded Future strives to provide tools which assist in identifying and understanding historical developments, and which can also help formulate hypotheses about and give clues to likely future events. Recorded Future data may be accessed through a web services API using two export formats json or csv text. Developers are then referred to statistical or visualization software packages such as R, Spotfire or others that can make use of the export formats. A review of R presents a number of output formats such as box and whisper charts, pie charts, pairs plots, coplots, forest plots and common 3-D plots. Spotfire offers an analytics visual and exploratory experience which includes elegant and configurable visuals: Bar chart, Map chart, Line chart, Pie chart, Scatter plot, Combination Bar and Line chart, Cross Table, 3D Scatter Plot, Treemap, Heat Map, and Parallel Coordinates plot capability in customizable dashboard configurations. In general, Recorded Future s analytics PSMA visualization capability is as robust as the application programming interface and the linked packages. One could argue that the keys to the successful emergence of PSMA in terms of timely and informative use will be meaningful and relevant visualizations. However, what is common amongst the reviewed PSMA application visualization approaches is the utilization of old and primarily two (2) dimensional data representations for a new multi-dimensional phenomenon of analytics. Furthermore the temporal element of prediction introduces a complex dimension that maybe expanded and linked to places, event, entities, location, sentiment, behaviors and other such elements that lend to forecasting future events.
Why A Taxonomy? As visualization techniques attempt to catch up with the advancement of PSMA, we can expect a commensurate desire for comparison and subsequently a proliferation of best representations. A taxonomy that facilitates such a comparison is needed. The taxonomy need not be complex at this juncture of evolution as PSMA applications are bound to morph many times over with the improvement of the underlying search and analyses engines. However, there is a need to categorically address the different levels of analysis of the PSMA application output data. Three (3) general PSMA application visualization categories can be established based on the reviewing analysts expertise and familiarity with data representations: 1. High expert data mining for intelligence, defense and other such datasets with high levels of complexity and many elements 2. Moderate strategic/competitive product introduction, political predictions, and other such datasets with medium levels of complexity and several elements 3. Low general consumer type for the common data observer which may be used in the educational environment and to communicate with the general public using a few elements A General Taxonomy A general taxonomy that characterizes the effectiveness of visualization of PSMA data would need to cover three (3) areas: 1. Meaningful and relevant data translation 2. Efficient interpretation 3. Ease in use of results Effectiveness measurements would be taken across each of the three (3) PSMA application categories. Meaningful and Relevant Data Translation PSMA applications search and operate on large volumes of content from government sites, news sites, blogs posts, tweets and other such information available on the web. Predictive results of these massive processing tasks have the challenge of presentation in a manner that is informative and would ultimately lead to better decisions, problem solving and improved outcomes. As such, the measurement of how meaningful and relevant the results translation is paramount and is the first component of a general taxonomy in all three (3) of the PSMA application visualization categories. Efficient Interpretation The need to interpret predictive results timely is a component of each of the three (3) PSMA application visualization categories, but varies in sense of urgency. For the High category the ability to reasonably forecast a potentially threatening world event and mobilize any intervention would have a desirable range of minutes or hours. Whereas for the Medium or Low categories, a several days or perhaps even weeks is most sufficient for forecasting and preparing for a new product release or education focused event. Differing time requirements for interpretation are expected but are key elements for all three (3) categories. Consequently, the measurement of how efficiently the resulting predictive information can be interpreted is an additional component of the taxonomy.
Ease in Use of Results All of the PSMA applications categories would benefit from data representations that lend to ease of use of the information intensive results. Two (2) dimensional charts and grams are limited in domain and range of visualization and are at least one generation behind the third generation search engine characterization of PSMA applications. Therefore, measurement of the extent to which the application offers results that are easily transferrable to next generation representations, perhaps that more animated and/or incorporate three (3) or better dimensions is desirable. The following is a simplified General Taxonomy for Visualization of Predictive Social Media Analytics Grid Figure: High Medium Low Meaningful/ Relevant Data Efficient Interpretation Less Ease of Use of Results Next Steps Validation of the general taxonomy or assessment of the PSMA applications visualization capabilities will need to be conducted. The metrics for the validation may have both quantitative and qualitative components and go across each of the three (3) visualization categories (i.e. High, Medium, Low). This validation can be initially conducted on the three (3) PSMA applications reviewed as they appear to represent the more evolved in the market. However, next steps should include a full survey of PSMA applications as they are developed and released as products. A call to and invitation to other investigators to address the first two (2) PSMA application visualization categories specifically, the High and Medium from a taxonomy perspective is made. The authors immediate investigative interests will be primarily in the third category and more specifically as it has to do with predictive behaviors that affect the teaching/learning/scholarship environment of college students. Conclusions In conclusion, while the scope of this investigation and development of a General Taxonomy for Visualization of Predictive Social Media Analytics (PSMA) is not exhaustive in its review of application, it is representative of the evolving applications in the market today. It provides a framework for categorizing requirements based on type of use and information, and a respective set of general measurement both quantitative and qualitative - classes for comparison and extracting best visualizations. The next steps would include validation of the general taxonomy and more broad and detailed investigation of a full survey of PSMA applications.
References [1] SPSS Visualization Designer Charting your course just got easier http://www-01.ibm.com/software/analytics/spss/products/statistics/vizdesigner/ (last referenced May 15, 2012) [2] SAS Analytics Analytics delivering greater insight http://www.sas.com/technologies/analytics/ (last referenced May 15, 2012) [3] Bollen, J., Mao, H., & Pepe, A. (2011). Modeling Public Mood and Emotion: Twitter Sentiment and Socio-Economic Phenome-na. In Proc. ICWSM 2011. [4] Bollen, J., Mao, H., & Zeng, X. (2011). Twitter Mood Predicts the Stock Market. Journal of Computational Science www.elsevier.com/locate/jocs [5] De Choudhury, M., and Sundaram, H. (2011). Why Do We Converse on Social Media? An Analysis of Intrinsic and Extrinsic Network Factors. In Proceedings of the Third SIGMM Workshop on Social Media (WSM 2011), in conjunction with ACM Multimedia 2011 (Scottsdale, Arizona, USA, November 28 - December 1, 2011) [6] De Choudhury, M., Lin, Y.-R., Sundaram, H., Candan, K.S., Xie, L. & Kelliher, A. How Does the Data Sampling Strategy Impact the Discovery of Information Diffusion in Social Media? ICWSM '10: Proceedings of the 4th International AAAI Conference on Weblogs and Social Media. 2010 [7] Gruhl, D., Guha, R., Kumar, R., Novak, J., Tomkins, A. The Predictive Power of Online Chatter http://www.tomkinshome.com/site_media/papers/papers/ggk+05.pdf [8] Murphy, T., Data Mining: Using Predictive Analysis And Social Network Analysis 02/09/2011 New Tech Post http://newtechpost.com/node/271 [9] Schmidt, J. (2007). Blogging practices: An analytical framework. Journal of Computer-Mediated Communication, 12(4), article 13. http://jcmc.indiana.edu/vol12/issue4/schmidt.html [10] Truvé, S., Recorded Future : A White Paper on Temporal Analytics https://www.recordedfuture.com/assets/rf-white-paper.pdf (last referenced May 15, 2012) [11] Social Media As Predictive Analytics? http://www.marketingvox.com/social-media-as-predictive-analytics-048020/ (Nov 3, 2010) (last referenced May 15, 2012) [12] Spotfire About Spotfire http://spotfire.tibco.com [On Line Demo(s)] (last referenced May 15, 2012) Figures Reprint Courtesy of International Business Machines Corporation, SPSS, Inc., an IBM Company. SPSS was acquired by IBM in October, 2009; and permission of SAS Social Media Analytics