A GENERAL TAXONOMY FOR VISUALIZATION OF PREDICTIVE SOCIAL MEDIA ANALYTICS



Similar documents
Sentiment analysis on tweets in a financial domain

Information Visualization WS 2013/14 11 Visual Analytics

A Visualization is Worth a Thousand Tables: How IBM Business Analytics Lets Users See Big Data

Better planning and forecasting with IBM Predictive Analytics

Tweets Miner for Stock Market Analysis

SalesLogix Advanced Analytics

Social Media Implementations

Streamlining the Process of Business Intelligence with JReport

<no narration for this slide>

2.0 COMMON FORMS OF DATA VISUALIZATION

Making confident decisions with the full spectrum of analysis capabilities

P16_IBM_WebSphere_Business_Monitor_V602.ppt. Page 1 of 17

Hexaware E-book on Predictive Analytics

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

White Paper. Redefine Your Analytics Journey With Self-Service Data Discovery and Interactive Predictive Analytics

Employee Survey Analysis

Data Visualization Techniques

CUSTOMER Presentation of SAP Predictive Analytics

Customer Analytics. Turn Big Data into Big Value

ADVANCED DATA VISUALIZATION

Winning with an Intuitive Business Intelligence Solution for Midsize Companies

Data Visualization Techniques

Impact of Data Visualization in Key Sectors (Technical Insights)

Principles of Data Visualization for Exploratory Data Analysis. Renee M. P. Teate. SYS 6023 Cognitive Systems Engineering April 28, 2015

Sentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015

JamiQ Social Media Monitoring Software

an introduction to VISUALIZING DATA by joel laumans

How To Choose A Business Intelligence Toolkit

Sentiment Analysis on Big Data

The 4 Pillars of Technosoft s Big Data Practice

Self-Service Business Intelligence: The hunt for real insights in hidden knowledge Whitepaper

APPROACHABLE ANALYTICS MAKING SENSE OF DATA

birt Analytics data sheet Reduce the time from analysis to action

CSE 598 Project Report: Comparison of Sentiment Aggregation Techniques

The impact of social media is pervasive. It has

Data Isn't Everything

A Conceptual Approach to Data Visualization for User Interface Design of Smart Grid Operation Tools

Create Cool Lumira Visualization Extensions with SAP Web IDE Dong Pan SAP PM and RIG Analytics Henry Kam Senior Product Manager, Developer Ecosystem

Capturing Meaningful Competitive Intelligence from the Social Media Movement

OpenText Actuate Big Data Analytics 5.2

WROX Certified Big Data Analyst Program by AnalytixLabs and Wiley

KnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES

SPATIAL DATA CLASSIFICATION AND DATA MINING

Spotfire and Tableau Positioning. Summary

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

GETTING STARTED WITH R AND DATA ANALYSIS

The Scientific Data Mining Process

Data Visualization. or Graphical Data Presentation. Jerzy Stefanowski Instytut Informatyki

and BI Services Overview CONTACT W: E: M: +385 (91) A: Lastovska 23, Zagreb, Croatia

Exploratory Data Analysis for Ecological Modelling and Decision Support

MicroStrategy Desktop

Big Data Classification: Problems and Challenges in Network Intrusion Prediction with Machine Learning

Interactive Data Mining and Visualization

Enhancing Sales and Operations Planning with Forecasting Analytics and Business Intelligence WHITE PAPER

Some vendors have a big presence in a particular industry; some are geared toward data scientists, others toward business users.

Data Mining mit der JMSL Numerical Library for Java Applications

Introduction to Data Mining

Visualization methods for patent data

TIBCO Spotfire Business Author Essentials Quick Reference Guide. Table of contents:

Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative Analysis of the Main Providers

Can Twitter provide enough information for predicting the stock market?

Concept and Project Objectives

DATA VISUALIZATION: When Data Speaks Business PRODUCT ANALYSIS REPORT IBM COGNOS BUSINESS INTELLIGENCE. Technology Evaluation Centers

Understanding Data: A Comparison of Information Visualization Tools and Techniques

Best Practices in Data Visualizations. Vihao Pham January 29, 2014

Best Practices in Data Visualizations. Vihao Pham 2014

Create Mobile, Compelling Dashboards with Trusted Business Warehouse Data

Big Data and Healthcare Payers WHITE PAPER

Contents WEKA Microsoft SQL Database

Dong-Joo Kang* Dong-Kyun Kang** Balho H. Kim***

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Data Visualization Handbook

Introduction. A. Bellaachia Page: 1

IBM Cognos Express Essential BI and planning for midsize companies

Big Data Text Mining and Visualization. Anton Heijs

Diagrams and Graphs of Statistical Data

Sage 200 Business Intelligence Datasheet

White Paper. Data Visualization Techniques. From Basics to Big Data With SAS Visual Analytics

MicroStrategy Analytics Express User Guide

SkySpark Tools for Visualizing and Understanding Your Data

Predictive Analytics

How To Use Social Media To Improve Your Business

Knowledge Discovery from patents using KMX Text Analytics

TURN YOUR DATA INTO KNOWLEDGE

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Big Data in Pictures: Data Visualization

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

Transcription:

A GENERAL TAXONOMY FOR VISUALIZATION OF PREDICTIVE SOCIAL MEDIA ANALYTICS Stacey Franklin Jones, D.Sc. ProTech Global Solutions Annapolis, MD Abstract The use of Social Media as a resource to characterize historical and current occurrences to reasonably predict future events is rapidly emerging. Effective visualization of Predictive Social Media Analytics (PSMA) data lends to meaningful and relevant translation, efficient interpretation and ease in use of results. A general taxonomy for the measurement of these attributes enable comparison of visualization techniques with the ultimate goal of identifying some of the best PSMA data representation approaches and export features for a broad range of applications and respective complexity. Background The quest to identify trends in large publicly available data realms such as that associated with Social Media and using them to foresee what may happen in the time to come requires robust analytic approaches. We are seeing the emergence of various applications that make use of algorithmic techniques to provide this predictive capability based on extraction and manipulation of huge volumes of data available via websites and other online means of communication used by people to share social information. These Predictive Social Media Analytics (PSMA) applications offer various features at different complexity levels to aid in making more informed decisions about marketing products and improving financial services. PSMA applications are also beginning to infiltrate the defense and intelligence space to assist in monitoring and forecasting world events. Their common goal is to generate output that facilitates understanding of complex data relationships over time. Since PSMA application operate on such massive amounts of data, visualization and other data representation of temporally significant instances with a reasonable degree of graphical clarity becomes no easy feat. Ultimately their value will be measured not only by the speed, sophistication and accuracy of their algorithms, but by how well they produce or provide interfaces that facilitate meaningful and relevant translations that lend to efficient interpretation and ease in use of results. Amongst the numerous PSMA applications are three (3) that cover the general forecasting space of consumer product-focused, financial climate and defense intelligence forecasting. Educational use of PSMA is of particular interest to the author(s) but at this time is more a residual space related to the aforementioned. As such, for the purpose of a general survey of relevant PSMA visualization techniques the following applications are reviewed: IBM s SPSS predictive analytics software, SAS Social Media Analytics and Recorded Future. Admittedly so, the predictive power of the named PSMA applications do not claim to be equal. However, they generally incorporate output data representation patterns sufficient to develop a general taxonomy for the measurement of key attributes.

IBM s SPSS predictive analytics software developers state that they can predict with confidence what will happen next so that you can make smarter decisions, solve problems and improve outcomes. They invite the user to create and share what is referred to as compelling visualizations that better communicate [your] analytics results [1]. IBM SPSS also states that with the use of their IBM SPSS Visualization Designer the user may easily develop and build new visualizations that enable new ways to portray and communicate analytics without extensive programming skills. Three (3) example visualizations available to the IBM SPSS user are: Network Graph, Scatterplot Matrix and Jungle Book Graph shown in Figures 1-3 [Reprint Courtesy of International Business Machines Corporation, SPSS, Inc., an IBM Company]. Figure 1 Network Graph Figure 2 Scatterplot Matrix Figure 3 Jungle Book Graph SAS Social Media Analytics makes the claim that they provide the power to know what lies ahead around the next corner through an integrated environment for predictive and descriptive modeling, data mining, text analytics, forecasting, optimization, simulation, experimental design and more. From dynamic visualization to predictive modeling, model deployment and process optimization, SAS provides a range of techniques and processes for the collection, classification, analysis and interpretation of data to reveal patterns, anomalies, key variables and relationships, leading ultimately to new insights

and better answers faster. [2] Associated with the SAS analytics tool is an ability to illustrate trends and enable drilling down to a level of actual comments that contribute to the prediction(s). To illustrate these changes over time, they offer amongst other options, a dashboard of standard column, bar, line and pie charts and tables as a method for visualizing correlations of social media trends with the circumstances that triggered those events. An example of this approach which can incorporate marketing activities, product changes, world events and/or market conditions is shown in Figures 4 and 5. Figure 4 SAS SMA Superimposition A Figure 5 SAS SMA Superimposition B Also offered via SAS Visual Data Discovery is an interactive data visualization for analytics suite that includes, but is not limited to: scatter, bubble and 3-D contour plots with animation. Recorded Future strives to provide tools which assist in identifying and understanding historical developments, and which can also help formulate hypotheses about and give clues to likely future events. Recorded Future data may be accessed through a web services API using two export formats json or csv text. Developers are then referred to statistical or visualization software packages such as R, Spotfire or others that can make use of the export formats. A review of R presents a number of output formats such as box and whisper charts, pie charts, pairs plots, coplots, forest plots and common 3-D plots. Spotfire offers an analytics visual and exploratory experience which includes elegant and configurable visuals: Bar chart, Map chart, Line chart, Pie chart, Scatter plot, Combination Bar and Line chart, Cross Table, 3D Scatter Plot, Treemap, Heat Map, and Parallel Coordinates plot capability in customizable dashboard configurations. In general, Recorded Future s analytics PSMA visualization capability is as robust as the application programming interface and the linked packages. One could argue that the keys to the successful emergence of PSMA in terms of timely and informative use will be meaningful and relevant visualizations. However, what is common amongst the reviewed PSMA application visualization approaches is the utilization of old and primarily two (2) dimensional data representations for a new multi-dimensional phenomenon of analytics. Furthermore the temporal element of prediction introduces a complex dimension that maybe expanded and linked to places, event, entities, location, sentiment, behaviors and other such elements that lend to forecasting future events.

Why A Taxonomy? As visualization techniques attempt to catch up with the advancement of PSMA, we can expect a commensurate desire for comparison and subsequently a proliferation of best representations. A taxonomy that facilitates such a comparison is needed. The taxonomy need not be complex at this juncture of evolution as PSMA applications are bound to morph many times over with the improvement of the underlying search and analyses engines. However, there is a need to categorically address the different levels of analysis of the PSMA application output data. Three (3) general PSMA application visualization categories can be established based on the reviewing analysts expertise and familiarity with data representations: 1. High expert data mining for intelligence, defense and other such datasets with high levels of complexity and many elements 2. Moderate strategic/competitive product introduction, political predictions, and other such datasets with medium levels of complexity and several elements 3. Low general consumer type for the common data observer which may be used in the educational environment and to communicate with the general public using a few elements A General Taxonomy A general taxonomy that characterizes the effectiveness of visualization of PSMA data would need to cover three (3) areas: 1. Meaningful and relevant data translation 2. Efficient interpretation 3. Ease in use of results Effectiveness measurements would be taken across each of the three (3) PSMA application categories. Meaningful and Relevant Data Translation PSMA applications search and operate on large volumes of content from government sites, news sites, blogs posts, tweets and other such information available on the web. Predictive results of these massive processing tasks have the challenge of presentation in a manner that is informative and would ultimately lead to better decisions, problem solving and improved outcomes. As such, the measurement of how meaningful and relevant the results translation is paramount and is the first component of a general taxonomy in all three (3) of the PSMA application visualization categories. Efficient Interpretation The need to interpret predictive results timely is a component of each of the three (3) PSMA application visualization categories, but varies in sense of urgency. For the High category the ability to reasonably forecast a potentially threatening world event and mobilize any intervention would have a desirable range of minutes or hours. Whereas for the Medium or Low categories, a several days or perhaps even weeks is most sufficient for forecasting and preparing for a new product release or education focused event. Differing time requirements for interpretation are expected but are key elements for all three (3) categories. Consequently, the measurement of how efficiently the resulting predictive information can be interpreted is an additional component of the taxonomy.

Ease in Use of Results All of the PSMA applications categories would benefit from data representations that lend to ease of use of the information intensive results. Two (2) dimensional charts and grams are limited in domain and range of visualization and are at least one generation behind the third generation search engine characterization of PSMA applications. Therefore, measurement of the extent to which the application offers results that are easily transferrable to next generation representations, perhaps that more animated and/or incorporate three (3) or better dimensions is desirable. The following is a simplified General Taxonomy for Visualization of Predictive Social Media Analytics Grid Figure: High Medium Low Meaningful/ Relevant Data Efficient Interpretation Less Ease of Use of Results Next Steps Validation of the general taxonomy or assessment of the PSMA applications visualization capabilities will need to be conducted. The metrics for the validation may have both quantitative and qualitative components and go across each of the three (3) visualization categories (i.e. High, Medium, Low). This validation can be initially conducted on the three (3) PSMA applications reviewed as they appear to represent the more evolved in the market. However, next steps should include a full survey of PSMA applications as they are developed and released as products. A call to and invitation to other investigators to address the first two (2) PSMA application visualization categories specifically, the High and Medium from a taxonomy perspective is made. The authors immediate investigative interests will be primarily in the third category and more specifically as it has to do with predictive behaviors that affect the teaching/learning/scholarship environment of college students. Conclusions In conclusion, while the scope of this investigation and development of a General Taxonomy for Visualization of Predictive Social Media Analytics (PSMA) is not exhaustive in its review of application, it is representative of the evolving applications in the market today. It provides a framework for categorizing requirements based on type of use and information, and a respective set of general measurement both quantitative and qualitative - classes for comparison and extracting best visualizations. The next steps would include validation of the general taxonomy and more broad and detailed investigation of a full survey of PSMA applications.

References [1] SPSS Visualization Designer Charting your course just got easier http://www-01.ibm.com/software/analytics/spss/products/statistics/vizdesigner/ (last referenced May 15, 2012) [2] SAS Analytics Analytics delivering greater insight http://www.sas.com/technologies/analytics/ (last referenced May 15, 2012) [3] Bollen, J., Mao, H., & Pepe, A. (2011). Modeling Public Mood and Emotion: Twitter Sentiment and Socio-Economic Phenome-na. In Proc. ICWSM 2011. [4] Bollen, J., Mao, H., & Zeng, X. (2011). Twitter Mood Predicts the Stock Market. Journal of Computational Science www.elsevier.com/locate/jocs [5] De Choudhury, M., and Sundaram, H. (2011). Why Do We Converse on Social Media? An Analysis of Intrinsic and Extrinsic Network Factors. In Proceedings of the Third SIGMM Workshop on Social Media (WSM 2011), in conjunction with ACM Multimedia 2011 (Scottsdale, Arizona, USA, November 28 - December 1, 2011) [6] De Choudhury, M., Lin, Y.-R., Sundaram, H., Candan, K.S., Xie, L. & Kelliher, A. How Does the Data Sampling Strategy Impact the Discovery of Information Diffusion in Social Media? ICWSM '10: Proceedings of the 4th International AAAI Conference on Weblogs and Social Media. 2010 [7] Gruhl, D., Guha, R., Kumar, R., Novak, J., Tomkins, A. The Predictive Power of Online Chatter http://www.tomkinshome.com/site_media/papers/papers/ggk+05.pdf [8] Murphy, T., Data Mining: Using Predictive Analysis And Social Network Analysis 02/09/2011 New Tech Post http://newtechpost.com/node/271 [9] Schmidt, J. (2007). Blogging practices: An analytical framework. Journal of Computer-Mediated Communication, 12(4), article 13. http://jcmc.indiana.edu/vol12/issue4/schmidt.html [10] Truvé, S., Recorded Future : A White Paper on Temporal Analytics https://www.recordedfuture.com/assets/rf-white-paper.pdf (last referenced May 15, 2012) [11] Social Media As Predictive Analytics? http://www.marketingvox.com/social-media-as-predictive-analytics-048020/ (Nov 3, 2010) (last referenced May 15, 2012) [12] Spotfire About Spotfire http://spotfire.tibco.com [On Line Demo(s)] (last referenced May 15, 2012) Figures Reprint Courtesy of International Business Machines Corporation, SPSS, Inc., an IBM Company. SPSS was acquired by IBM in October, 2009; and permission of SAS Social Media Analytics