Web Data Visualization Department of Communication PhD Student Workshop Web Mining for Communication Research April 22-25, 2014 http://weblab.com.cityu.edu.hk/blog/project/workshops Jie Qin & Hexin Chen
Review NodeXL Visual Web Ripper APIs Day 1: Data collection Day 2: Data preprocessing Python NLTK SPSS SNA R Day 3: Data analysis Today: Data visualization Text Network Spatial Temporal 2
Outline I. De[ine visualization II. What visualization can do III. Research questions and visualization options IV. Four types of data and related visualization tools V. Resources 3
Outline I. De2ine visualization II. What visualization can do III. Research questions and visualization options IV. Four types of data and related visualization tools V. Resources 4
Jonathan Zhu: Visualization is of the data, by the data, and for the data Data visualization differs from the general graphic design in that it is of the data, by the data, and for the data. Of the data: an integrated phase of the discovery rather than a post- analysis phase to decorate the [indings By the data: guided primarily by data results rather than esthetical considerations For the data: to tell accurate, informative, and understandable quantitative stories 5
To Visualize is to What is my innovation/ contribution? Highlight Select What variables have been tested in extant literature? Present How to present my study to the audiences (reviewers, audiences in a seminar, etc.)? Writings, drawings, etc. 6
Outline I. De[ine visualization II. What visualization can do III. Research questions and visualization options IV. Four types of data and related visualization tools V. Resources 7
What Visualization Can Do (Tufte 2001/1983) Show the data Induce to viewer to think about the data Avoid distorting what the data have to say Present many numbers in a small space Make large data sets coherent Encourage the eye to compare different pieces of data Reveal the data at several levels of detail, from overview to [ine structure Serve a clear purpose: Description, exploration, tabulation, or decoration Be closely integrated with the statistical and verbal descriptions of a data set. 8
Misleading Visualization Source: http://data.heapanalytics.com/how-to-lie-with-data-visualization/ 9
Misleading Visualization (continued) Source: http://data.heapanalytics.com/how-to-lie-with-data-visualization/ 10
Misleading Visualization (continued) Source: http://qz.com/122921/the-chart-tim-cook-doesnt-want-you-to-see/ 11
Misleading Visualization (continued) Source: http://qz.com/122921/the-chart-tim-cook-doesnt-want-you-to-see/ 12
Finding the right way view your data is as much an art as a science. Source: http://www.manyeyes.com/software/analytics/manyeyes/page/visualization_options.html 13
Outline I. De[ine visualization II. What visualization can do III.Research questions and visualization options IV. Four types of data and related visualization tools V. Resources 14
Research Questions and Visualization Options See relationships among data points Scatterplot Matrix Chart Network Diagram Compare a set of values Bar Chart Block Histogram Bubble Chart Track rises and falls over time Line Graph Stack Graph Stack Graph for Categories See the parts of a whole Pie Chart Treemap Treemap for Comparisons Analyze a text Word Tree Tag Cloud Phrase Net See the world Map Source: http://www.manyeyes.com/software/analytics/manyeyes/page/visualization_options.html 15
Outline I. De[ine visualization II. What visualization can do III. Research questions and visualization options IV. Four types of data and related visualization tools V. Resources 16
Love in Paris: A Google Search Story 17
Today we ll focus on four types of data Texts and discourse analysis Network and hyperlink network analysis Spatial data Temporal data 18
TEXTS AND DISCOURSE ANALYSIS
Texts in Communication Research 20
Wordle: How Toy Ad Vocabulary Reinforces Gender Stereotypes Guess which one for boys and which one for girls? Source: http://www.achilleseffect.com/2011/03/word-cloud-how-toy-ad-vocabulary-reinforcesgender-stereotypes/# 21
Demo 1: Word Cloud of Obama s Addresses State of the Union Addresses 2009-2012 Data can be downloaded from http:// weblab.com.cityu.edu.hk/blog/project/ workshops/ 22
HTML5 Word Cloud 23
Voyant Tools 24
Word Trends in Voyant Tools Data: Obama s addresses in 2009 and 2012 25
Word Trends of Three Premiers of China Reform Job Source: http://news.qq.com/newspedia/baogao.htm 26
Data: Obama s address in 2012 Word Net in Voyant Tools
Word Nets and Framing Analysis Qin (2014). Snowden Wins on Twitter but Fails in News: The Mismatch between Social Media Frame and Mass Media Frame 28
Demo 2: Word Trends and Word Nets in Obama s Addresses State of the Union Addresses 2009-2012 Data can be downloaded from http:// weblab.com.cityu.edu.hk/blog/project/ workshops/ 29
NETWORKS AND HYPERLINK NETWORK ANALYSIS
Topology of World Wide Web based on Hyperlink Analysis Daisy Model (Donato et al., 2005) Bowtie Model (Broder et al., 2000) SCC: strongly connected component IN: unilaterally connected to SCC OUT: unilaterally connected by SCC Teapot Model (Zhu et al., 2008) 31
My Dissertation 32
Highlights in My Dissertation The content of hyperlinks: Interorganizational hyperlinks are shaped by various pre-existing inter-organizational relationships, especially the personal ties. The strength of hyperlinks: Hyperlinks are symbols of inter-organizational strong ties. The direction of hyperlinks: More of vertical than of horizontal. 33
Tools I am going to Introduce NodeXL Google Fusion Tables (You need a Google account to use this tool) 34
Book: Analyzing Social Media Networks with NodeXL I. Getting Started with Analyzing Social Media Networks 1. Introduction to Social Media and Social Networks 2. Social media: New Technologies of Collaboration 3. Social Network Analysis II. NodeXL Tutorial: Learning by Doing 4. Layout, Visual Design & Labeling 5. Calculating & Visualizing Network Metrics 6. Preparing Data & Filtering 7. Clustering &Grouping III Social Media Network Analysis Case Studies 8. Email 9. Threaded Networks 10. Twitter 11. Facebook 12. WWW 13. Flickr 14. YouTube 15. Wiki Networks 35
NodeXL 1. Import data: Edge lists. 2. Click Show Graph. 3. Play with [ilters and other options. Demo 3: Hyperlink networks among 14 higher education institutions in Hong Kong 36
Google Fusion Tables Demo 4: Hyperlink networks among 14 higher education institutions in Hong Kong 37
SPATIAL DATA
Whereabout of Ph.D Graduates Source: http://www.cityu.edu.hk/com/programme_phd_sp_graduates.aspx 39
Map of Doctoral Programs in Communication in USA Source: http://weblab.com.cityu.edu.hk/blog/visualization/ 40
Demo 5: The Map of Young Scholars http://www6.cityu.edu.hk/ccr/ DuoWenYaJi_Scholar_All.aspx?year=2013 Recall what you ve learned on day 1: How to collect data from the web pages? How to preprocess the data? How to make the map? 41
Temporal Data Temporal means of or relating to time. Dynamic something that moves Discrete something that just happens Stationary stands still but records changes Change change or growth Planes Vehicles Animals Satellites Storms Crimes Lightning Accidents Weather Stations Traf2ic Sensors Air Quality Sensors Population Distribution Fire Perimeter Source: http://proceedings.esri.com/library/userconf/proc12/tech-workshops/tw_189.ppt 42
Air Traf[ic Source: http://www.wired.com/2014/03/plane-viz/ 43
U.S. Unemployment: A Historical View Source: http://online.wsj.com/public/resources/documents/jobshistory09.html 44
Interactive Visualization in Google Charts Precise & Magni[icent 45
Demo 6: Google Code Playground https://code.google.com/apis/ajax/ playground/?type=visualization 46
Outline I. De[ine visualization II. What visualization can do III. Research questions and visualization options IV. Four types of data and related visualization tools V. Resources 47
Resources: Texts Bamboo DiRT: This wiki lists tools used by Digital Humanities researchers. This link takes you to the list of text- analysis tools that includes brief descriptions. ManyEyes: A collection of data visualization tools. You can upload your own data and create web- based visualizations that are made available to the public for comments and discussions. You need to create an account to upload data. Voyant (Voyeur): a web- based text- analysis environment that incorporates visualization tools. WordSmith: A desktop text- analysis program that works with Windows. The program has been tested and works with any Unicode (UTF- 8) text. Wordij: A semantic network tool. Wordij creates networks of collocates, or pairs of words that occur near each other in a text. 48
Resources: Networks aisee: Graph visualization Cytoscape: Visualizing molecular interaction networks Gephi: Visualization and exploration platform KrackPlot: Social network visualization program Mage: 3D vector display program (showing kinemage graphics) NetDraw: Program associated with UCINET NodeXL OGDF (successor of AGD): Open Graph Drawing Framework Otter: Tool for topology display SoNIA: Visualizing longitudinal network data Tulip: Visualization of large graphs udraw(graph) (successor of davinci): Graph drawing VOSON: VOSON system is a web- based software that enables the collection and analysis of online network data. Zoomgraph Visualizing zoomable data driven graphs 49
Resources: Spatial and Temporal Data Google Geomap D3.js 50