Data Visualization Scientific Principles, Design Choices and Implementation in LabKey Catherine Richards, PhD, MPH Staff Scientist, HICOR crichar2@fredhutch.org Cory Nathe Software Engineer, LabKey cnathe@labkey.com
Outline o Scientific Principles and Design Choices o Implementation in LabKey o Case Study: HICOR IQ
Scientific Principles and Design Choices o Why use data visualizations o Choosing the best chart type and visual attributes o Incorporating design best practices
Why use data visualizations? o Leverage visual system to absorb large amounts of information very quickly Identify patterns or outliers o Inspire new questions o Help identify problems
Data Viz show patterns tables do not
Data Viz show patterns tables do not o Average X = 9 o Average Y = 7.5 o Y=3+0.5X --> same linear model o R 2 =0.67 --> same R 2
Data Viz show patterns tables do not
Scientific Principles and Design Choices o Why use data visualizations o Choosing the best chart type and visual attributes o Incorporating design best practices
Chart Types http://www.datavizcatalogue.com/
Visual Attributes
Visual Attributes o Data encoding: mapping data to visual attributes o Process Choose data dimensions to graph Classify data types Determine which visual attributes represent data types most effectively
Data Dimensions o Unique information
Data Dimensions o Most common Visualizations with 3 or 4 data dimensions o Rare Visualizations with 6,7 or more The more dimensions the more visual attributes needed
Data Types o Nominal o Ordinal o Quantitative Interval Ratio Stevens. On the theory of scales of measurements. Science. 1946
Data Types o Nominal (labels) Fruits: apples, oranges, pears o Ordinal Restaurant inspection grades: A, B, C o Quantitative Interval (location of zero arbitrary) Dates Location Ratio (zero fixed) Physical measurement: weight, height Stevens. On the theory of scales of measurements. Science. 1946
Operations Permitted with Data Types o Nominal (labels) Operations: =, o Ordinal Operations: =,, <,>,, o Interval (location of zero arbitrary) Operations: =,, <, >,,, -(subtraction) Can measure distances or spans o Ratio (zero fixed) Operations :=,, <, >,,, -, /(division), *(multiplication) Can measure ratios or proportions Stevens. On the theory of scales of measurements. Science. 1946
Visual Attributes Adapted from figure 4-3 in Designing Data Visualizations by Illinksy & Steele
Science of Data Viz o Psychophysics Branch of psychology that deals with relationship between physical stimuli and sensory response Human graphical perception
Ranking of Elementary Perceptual Tasks Cleveland & McGill. JASA. 1984. 79 (387): 531-554
Length-Position Experiment Cleveland & McGill. JASA. 1984. 79 (387): 531-554
Length-Position Experiment Most accurate Cleveland & McGill. JASA. 1984. 79 (387): 531-554
Ranking of Elementary Perceptual Tasks Cleveland & McGill. JASA. 1984. 79 (387): 531-554
Chart Types http://www.datavizcatalogue.com/
Chart Types http://www.datavizcatalogue.com/
Chart Types http://www.datavizcatalogue.com/
Chart Types http://www.datavizcatalogue.com/
Chart Types http://www.datavizcatalogue.com/
Chart Types http://www.datavizcatalogue.com/
Scientific Principles and Design Choices o Why use data visualizations o Choosing the best chart type and visual attributes o Incorporating design best practices
Incorporating Design Best Practices o Graphic design Color theory Typography o Tufte s Rules
Tufte s Rules 1.Reduce chart-junk and increase data-to-ink ratio 2.Maximize contrast 3.Use readable labels 4.Don t repeat yourself 5.Instead of legends label data series (points) directly 6.Avoid smoothing and 3D 7.Sort for comprehension Edward Tufte. The Visual Display of Quantitative Information. 2001
Tufte s Rules Edward Tufte. The Visual Display of Quantitative Information. 2001
Tufte s Rules Edward Tufte. The Visual Display of Quantitative Information. 2001
Tufte s Rules Edward Tufte. The Visual Display of Quantitative Information. 2001
Tufte s Rules Edward Tufte. The Visual Display of Quantitative Information. 2001
Tufte s Rules Edward Tufte. The Visual Display of Quantitative Information. 2001
Tufte s Rules Edward Tufte. The Visual Display of Quantitative Information. 2001
Tufte s Rules Edward Tufte. The Visual Display of Quantitative Information. 2001
Tufte s Rules Edward Tufte. The Visual Display of Quantitative Information. 2001
Outline o Scientific Principles and Design Choices o Implementation in LabKey o Case Study: HICOR IQ
LabKey Built-in Reports o For non-developers Plotting tools built in to LabKey Data Regions Rendered using LabKey Visualization API (built on D3js library) Example: box plot, scatter plot, time chart o For developers JavaScript Views R Reports (Rserver/Knitr) Advanced View (invoke command line program) Module Reports (using LABKEY.Report.execute) o Shown in Data Views Browser Customize grouping, label, thumbnail, etc. Control visibility (private vs. shared)
LabKey Data API Access o Access data from study dataset, external schema, list, etc. o LabKey Client APIs Examples: JavaScript, Java, Perl, Python, Rlabkey, SAS Macros, HTTP Interface Secure, auditable, programmatic access to data and services Exporting data grid as a Script
LabKey JavaScript Visualization API o Shapes / Geoms: Point / Bin Path ErrorBar BoxPlot / BarPlot o Interactions: Callback function for point click Callback function for mouse over/out Brushing (1D, 2D) o Plot Helpers PieChart LeveyJenningsPlot SurvivalCurvePlot
LabKey Visualization - Live Demo JavaScript based charts from LabKey Demo Study Data Region > Charts/Views menu Generic Chart (box/scatter plot) Time Chart JavaScript View Reports Webpart
[Live Demo (1 of 5)]
[Live Demo (2 of 5)]
[Live Demo (3 of 5)]
[Live Demo (4 of 5)]
[Live Demo (5 of 5)]
Examples (1 of 3) Panorama - Levey-Jennings report, Pareto plot Data Source: SProCoP Tutorial
Examples (2 of 3) Dataspace - scatter with gutter plots Data Source: CAVD DataSpace
Examples (3 of 3) HIDRA Argos - pie chart, survival curve, bar plot, timeline report Argos, an application developed in partnership with Fred Hutch. The Timeline report was created by the Oncoscape Core team and is maintained by Lisa McFerrin. Oncoscape is supported by Fred Hutch and STTR.
Outline o Scientific Principles and Design Choices o Implementation in LabKey o Case Study: HICOR IQ
HICOR IQ - Overview o Regional Oncology Informatics Platform o GOAL: to provide patients, payers, providers and health systems with transparent information to support decision-making in cancer care
HICOR IQ - Overview o The initial launch includes a limited initial set of reports based on ASCO 2012 Choosing Wisely Recommendations o The initial functionality allows users to select metrics of interest, configure plots based on regional or clinic views, and generate reports categorized by sub-groups
HICOR IQ - Live Demo o Data Views direct link to different metrics o Configure report (apply filters, switch chart type) o Bar plot, Scatter plot, Time plot o Population size, filters, exclusions o Download PDF
[Live Demo (1 of 4)]
[Live Demo (2 of 4)]
[Live Demo (3 of 4)]
[Live Demo (4 of 4)]
HICOR IQ - Implementation o Collaboration between HICOR and LabKey Iterative layout and user experience design D3 code creation for plot rendering o Custom Java module New database schema and tables Use of OLAP cube for accessing measures and dimensions Plots generated with dimple JavaScript D3 library o Additional data security Data can not be directly accessed from schema browser Server only returns aggregate data with small populations removed
HICOR IQ - Code Example renderplot: function () {... //initialize the svg svg = dimple.newsvg("#" + this.renderid, fullwidth, fullheight); //create the chart component and set margins chart = new dimple.chart(svg, data); chart.setbounds(margin.l, margin.t, plotwidth, plotheight); //configure the x-axis x = chart.addcategoryaxis("x", "Group"); x.floatingbarwidth = 20; //configure the y-axis y = chart.addmeasureaxis("y", "Value"); y.showgridlines = false; y.ticks=4; y.overridemax=1.0; y.tickformat = "%"; //sorting the x-axis variable x.addorderrule("group"); //render the chart as an svg and remove the dimple title chart.draw(); x.titleshape.remove(); //use D3 to update some content and add titles this.rendertitle(svg, fullwidth, 0); this.styleaxis(svg, x, y, margin); //define the content of the bar hover tooltip this.overridetooltiptext(s, data, function(row) { return [ "Group: " + row.group, "Utilization: " +row.value ]; }); //add a bar series to the plot s = chart.addseries(null, dimple.plot.bar); dimple: http://dimplejs.org/
HICOR IQ - Code Example renderplot: function () {... //initialize the svg svg = dimple.newsvg("#" + this.renderid, fullwidth, fullheight); //create the chart component and set margins chart = new dimple.chart(svg, data); chart.setbounds(margin.l, margin.t, plotwidth, plotheight); //configure the x-axis x = chart.addcategoryaxis("x", "Group"); x.floatingbarwidth = 20; //configure the y-axis y = chart.addmeasureaxis("y", "Value"); y.showgridlines = false; y.ticks=4; y.overridemax=1.0; y.tickformat = "%"; //add a bar series to the plot s = chart.addseries(null, dimple.plot.bar); var y //sorting = d3.scale.linear() the x-axis variable.range([height, x.addorderrule("group"); 0]); //render the chart as an svg and remove the dimple title y.domain([0, 1.00]); chart.draw(); x.titleshape.remove(); d3.svg.axis().scale(y) //use D3 to update some content and add titles.orient("left") this.rendertitle(svg, fullwidth, 0);.tickValues([0, this.styleaxis(svg,.25, x,.5, y,.75, margin); 1]).tickFormat(function(d) { return d * 100 + "%"; }); //define the content of the bar hover tooltip... this.overridetooltiptext(s, data, function(row) { return [ "Group: " + row.group, svg.append("g") "Utilization: " +row.value.attr("class", ]; "y axis") });.call(axis).style("font-weight","bold").style("font-family", "Arial").append("text").attr("class", "ylabel").attr("y", -20).attr("x", -40).attr("dy", ".71em").text(label); dimple: http://dimplejs.org/
HICOR IQ - Future o Allow new metric definition and data loading o Split module for security (server) vs. plotting (client) o Identification of My Clinic for comparison in scatter plot o Include static reports o Clinic / Payor dashboard report o Better organization of sub-metrics
Thank You Any questions? Catherine Richards, PhD, MPH Staff Scientist, HICOR crichar2@fredhutch.org (soon to be Director, Scientific and User Engagement at Aetion) Cory Nathe Software Engineer, LabKey cnathe@labkey.com