Welcome to Introduction to Data Visualization Slides can be downloaded from https://cisco.box.com/dsf-dv-7
Introduction to Data Visualization IT Technical Education Initiative (Data Science Focus) Michael D. Marquiz Distinguished Engineer, IT March 2015
This introductory-level class is focused on terminology and concepts within data visualization.
Data Visualization (aka Data Viz) The study and application of visual and artistic techniques for portraying and telling a data story; leading to insights, business value and competitive advantage.
Cisco Technology Radar From https://techradar.cisco.com/
Cisco Technology Radar From https://techradar.cisco.com/
Cisco Technology Radar From https://techradar-internal.cisco.com/
LinkedIn Connections From http://inmaps.linkedinlabs.com/
Increasing cognitive resources by providing visual images to expand human working memory Amplifying Cognitive Capabilities Reducing search by aggregating large quantities of data into an intuitive display Enhancing pattern recognition by including visual elements that highlight new relationships Supporting the perceptual inference of otherwise hidden relationships Providing a medium that allows further exploration (for interactive displays)
Identify Collect Purge Data Science Lifecycle Data Visualization Tools & Techniques are highly applicable in these phases. Condition Archive Model & Explore Evaluate & Communicate Clean Data Analytics Machine Learning Natural Language Processing Network Science
Scoping Data Visualization Lifecycle Designing Refining Building
Class Outline Visual Representations of Data Designing Data Visualizations Introduction to Visualization Technology
Visual Representations of Data
A Simple Data Visualization Provides quick insight about preparing for the elements.
Data Visualization Goals Visually appealing consumercentric display that provides insight and business value to its intended audience. Effective Likeable Easy to Use
Effective All information in one place and easy to make relevant decisions. Aggregates various spatial data from around the UK, and displays this data on a dashboard and a map. Exemplifies one of the dreams of the smart city movement: to provide a single, open overview of almost all available data streams that a modern city of today creates (from citydashboard.org)
Likeable Engaging and easy to understand. Cyber attacks in real-time (from http://map.ipviking.com/).
Likeable Engaging and easy to understand. Cyber attacks in real-time (from http://map.ipviking.com/).
Easy to Use Straightforward to navigate and absorb. Twitter conversation activity (from https://moments.twitter.com/uki/).
Easy to Use Straightforward to navigate and absorb. Twitter conversation activity (from https://moments.twitter.com/uki/).
Cutting-Edge Pursuing Data Visualization Excellence Visually Appealing Scalable Presents the right information Intuitive user-interface Incorporates Responsive Design Allows rapid development & deployment
Dashboards Different Types of Data Visualizations Infographics Visual Analytics Animations Hybrids (e.g. Interactive Dashboard)
Aggregate information from multiple sources to display Key Performance Indicators (KPIs), metrics and important alerts. Benefits Focus on critical measurements Dashboards Automate many calculations Alert stakeholders to actions Increase productivity Is a dashboard right for you? Stakeholder support for long haul Ease of understanding
From http://linksservice.com/data-visualization-dashboard/
From http://www.pieterhendrikx.com/wp-content/uploads/screen-shot-2014-07-10-at-11.46.11.png
Visually appealing, design-heavy images that are constructed to tell a specific story. High-degree of design work goes into the creation of an infographic. Infographics Less likely to have dynamic data. More conclusions and less data. Examples Case Study Comparison Expert Advice Chronology Compilation How-To Guide
From http://visual.ly/healthy-farm-vision-us-agriculture-interactive-infographic
From http://www.pardot.com/ideas/8-tips-effective-infographics/
Visual Analytics Data analytics paired with interactive visual interfaces. Data Analytics The science of examining/mining data and applying statistical methods for the purposes of identifying hidden patterns and unknown correlations, drawing conclusions, and predicting the likelihood of future events and trends. Combines elements of information visualization with computational transformation and data analytics.
From http://support.sas.com/documentation/cdl/en/vaug/65384/html/default/viewer.htm#p1hqvfkcw4eqdfn1epmsr7kz2ji7.htm
From http://www.slideshare.net/tableausoftware/visual-analytics-best-practices
Incorporate motion to portray additional dynamics and insights. Animations From http://www.gapminder.org
Animated Data Visualizations Incorporating motion to portray additional dynamics and insights. Wealth and Health of Nations (from http://www.gapminder.org).
Introduction to Designing Data Visualizations
Determining an appropriate visualization Who is the target audience? Important to know what will resonate. What key information needs to be displayed in order to illustrate the data story? Critical to select visuals that are aesthetic and match the data that you have. Will the visualization be static and/or dynamic? Will the visualization be interactive? Self-service, immediacy, exploration. What type of data visualization best fits the data? Don t force data into a visualization for purely artistic effect.
Good judgment comes from experience, and experience comes from bad judgment. Rita Mae Brown
If I have seen further than others, it is by standing upon the shoulders of giants. Sir Isaac Newton
Building Data Visualization Skills Educate to Elevate Books, Free Courses, Edward R. Tufte, etc. Evaluate existing templates as a starting point Models designed by others; confining vs. freeing; cost savings Embrace Best Practices and Guidelines Less is more; limit number of colors and fonts; etc.
Building Data Visualization Skills [Cont d] Integrate A/B (two-sample hypothesis) testing into the data visualization design and evolution process; include members of the target audience in the sample Data Viz A (Control) Data Viz B (Variation) Identify changes in the data visualization that increase or maximize an outcome of interest
Subtle Impact of Data on Visualizations Quality Incomplete data or data with errors will elicit a useless result. Context Ability to draw conclusions from the data based on how it was obtained and how current it is. Biases Personal biases must be eliminated along with any biases which were introduced in the data collection, conditioning, and cleaning processes.
Keep Your Options Open Focus on the properties of the data, not on specific tools Avoid dying by tool choices; let the data and the visualization requirements drive the decision regarding which tool(s) to use. Tying oneself to a tool before knowing what needs to be visualized is a recipe for failure. Understanding the capabilities (strengths/weaknesses) of the available tools is critical. For example, personalization, alerts, dynamic data content, animations, visual querying/exploration, multiple dimensions, etc.
Using Charts Effectively Charts are meant for data and not vice versa Selecting the right chart(s) is a key element in conveying the right information to the target audience (avoid confusion) Package information in an easy to understand format Focus on simplicity (should require little explanation) Only display the most important information Avoid data overload (mobile first mentality) Critical Elements in a Chart Labels (title, legend, axes, etc.) Color choices Type (Bar, Column, Pie, Line, Small Multiple, Sparklines, etc.) References to data sources
Using Charts Effectively [Cont d] Bar and Column Charts Best used for comparisons Line Charts Best used for depicting trends or movement; connecting data points over time Scorecards Best used for showing multiple measurements at the same time Pie Charts Lots of controversy regarding usefulness (# of slices, distinguishing between similar slices; lack of comparisons/trends; loathed by Tufte) Gauges Best used for monitoring the status of KPIs Radar/Spider/Star Charts Show the values of different categories along an axis that starts in the center and moves outward
Using Charts Effectively [Cont d] Candlestick Charts Best used for depicting movement of stock market data (derivatives, securities, currency, etc.) Waterfall Charts Shows positive and negative changes on a specific value over time Small Multiple (trellis, lattice, grid, panel) A series of similar charts using the same scale and axis, allowing comparisons to be made Sparklines A condensed way of depicting trends and variations,
Using Text Effectively Text provides messaging for the visualization Emphasize brevity; text should complement the visualization to ensure that all consumers get the same message General Guidelines Leverage simple words and avoid acronyms Favor single lines over multiple lines Minimize color (e.g. black, blue, grey, etc.) Strategically position title and description Keep all text horizontal and within the range of the visual Make important text larger; maintain size consistency with labels Stick to standard fonts and muted colors except for alerts (RAG) Dynamic text must fit any scenario Rule of Thumb: 20% text 80% visualization
Gestalt Theory Theory dealing with how what we see is transformed into meaning. All of the actions required to make meaning of things happens in our brains when we view things based on size and position. Understanding how people digest data can aid in the development of highly-effective visualizations. Proximity Closure Figure/Ground Similarity Continuation When items are placed in close When items look the same proximity, people perceive that people perceive them to be they are in the same group because of the same type. they are close to one another and apart from other groups. Symmetry The mind perceives objects as being symmetrical and forming around a central point. When two unconnected symmetrical objects are shown, people perceive them as being connected in a coherent shape. Our eyes tend to add any missing pieces of a familiar shape. If two sections are taken out of a circle, people still perceive the whole circle. If people perceive objects as moving in a certain direction, they see them as continuing to move that way. Depending on how people look at a picture, they see either the figure (foreground) or the background (ground).
Take Advantage of Layout Patterns People read text and images in certain specific patterns. Understanding how people initially scan/read a visualization can aid in the development of highly-effective visualizations*. Gutenberg Design A general pattern the eyes move through when looking at evenly distributed, homogenous information Z Pattern Layout The Z-pattern layout follows the shape of the letter Z. * Layout patterns shown for left to right languages, would be reversed for right to left languages. F Pattern Layout The F-pattern layout follows the shape of the letter F.
Pattern Design Considerations Balance Color Hierarchy Size & position of elements Repetition Build familiarity White Space Allow content to breathe and avoid clutter
Introduction to Visualization Technology
Infographics Visualization Technology Dashboards D3.js JavaScript Library Web-based Data Visualizations Flash (ActionScript) Library Web-based Data Visualizations Charts http://www.google.com/publicdata/directory# 39 Data Visualization Tools http://blog.profitbricks.com/39-data-visualization-tools-for-big-data/ http://www-01.ibm.com/software/analytics/many-eyes/
Continuing the Data Science Journey
Additional Learning Opportunities TechEd Data Science Focus http://iwe.cisco.com/web/view-post/post/-/posts?postid=627500083 datasciencefocus Mailer alias it-technical-education Mailer alias Advanced Data Visualization Parts I & II Advanced Data Science courses Cisco Data Science Certification http://iwe.cisco.com/web/it/data-science-community-of-practice Data Science Stretch Assignments Starting in 2015 datasciencefocus Mailer alias https://sami.cisco.com Cisco Talent Exchange Online Courses https://www.coursera.org/ https://www.edx.org/ https://www.udacity.com itunes University TED Talks @ http://www.ted.com Safari Online Books http://cisco.safaribooksonline.com/ Data Science Workshops Starting in 2016 datasciencefocus Mailer alias it-technical-education Mailer alias
TechEd Data Science Focus Training for the Cisco Workforce
Approach Experience + Exposure Applying Knowledge, Fostering Innovation and Building Capability Building Awareness, Knowledge and Skills Intro. Classes Adv. Classes 2016 Data Science Workshops Incorporating Cutting-Edge Techniques Applying Frameworks & Tools Understanding Terminology & Concepts Introductory Data Science Classes Timeline 2015 Unifying Theory & Practice Advanced Data Science Classes Education 2014 Data Science Stretch Assignments 2017 Workshops Workshops Stretch Assignments
Core Competencies Big Data Natural Language Processing Machine Learning Network Science Data Analytics Data Science Data Visualization
Learning Paths Advanced Classes Available to All Workshops Introductory Classes Limited Availability Data Science Certification Program Stretch Assignments
Detailed Timeline Introduction to Data Science Introduction to Big Data 2014 Jul Introduction to Data Analytics Introduction to Network Science Introduction to Natural Language Processing Introduction to Machine Learning Sep Nov Introduction to Data Visualization Advanced Big Data Apr/May Mar Jun/Jul Aug Sep Advanced Machine Learning I & II Advanced Data Analytics I & II Advanced Data Visualization I & II Advanced Natural Lang. Processing I & II Advanced Network Science I & II Dec Oct/Nov 2015 Hawk Robin Stretch Assignment (Jul-Sep) Workshop 1 Jan/Feb Workshop 2 May Mar/Apr Stretch Assignment (Oct-Dec) Jun/Jul Workshop 3 Aug Sep/Oct Nov Dec 2016 Eagle Raven Stretch Assignment (Jan-Mar) Owl Stretch Assignment (Apr-Jun) Workshop 4 Jan/Feb Mar/Apr Blue Jay Stretch Assignment (Jul-Sep) Workshop 5 May Jun/Jul Aug Stretch Assignment (Oct-Dec) Workshop 6 Sep/Oct Nov Dec 2017 Quail Seagull Stretch Assignment (Jan-Mar) Live Classes Completed Recorded Classes available via EMS Crow Stretch Assignment (Apr-Jun) In-person Classes SJC and RTP Live Classes via WebEx Sandpiper Stretch Assignment (Jul-Sep) Introductory Classes Advanced Classes Stretch Assignment (Oct-Dec) Workshop Stretch Assignment
2015 Class Schedule Topic Introductory Track (Technical Breadth) Advanced Track (Technical Depth) Data Science Overview Big Data Mar 2015 Machine Learning Part I: Apr/May 2015 Part II: Apr/May 2015 Data Analytics Part I: Jun/Jul 2015 Part II: Jun/Jul 2015 Part I: Aug 2015 Part II: Aug 2015 Natural Language Processing Part I: Sep 2015 Part II: Sep 2015 Network Science Part I: Oct/Nov 2015 Part II: Oct/Nov 2015 Data Visualization Live Classes Completed Mar 2015 Recorded Classes Available via EMS In-person Classes SJC and RTP Live Classes via WebEx
Thank You Michael D. Marquiz mmarquiz@cisco.com Slides for this class https://cisco.box.com/dsf-dv-7 Data Science Classes, Workshops, and Stretch Assignments Subscribe to this mailer alias: datasciencefocus@cisco.com Data Science Focus Program Details http://iwe.cisco.com/web/viewpost/post/-/posts?postid=627500083 IT Technical Education Classes Subscribe to this mailer alias: it-technical-education@cisco.com