Institut für Mathematik
|
|
|
- Baldwin Harmon
- 10 years ago
- Views:
Transcription
1 U n i v e r s i t ä t A u g s b u r g Institut für Mathematik Ali Ünlü, Anatol Sargin Interactive Visualization of Assessment Data: The Software Package Mondrian Preprint Nr. 04/ Februar 2008 Institut für Mathematik, Universitätsstraße, D Augsburg
2 Impressum: Herausgeber: Institut für Mathematik Universität Augsburg Augsburg ViSdP: Ali Ünlü Institut für Mathematik Universität Augsburg Augsburg Preprint: Sämtliche Rechte verbleiben den Autoren c 2008
3 [Submitted to: Applied Psychological Measurement] Running Head: Computer software review: Mondrian Interactive visualization of assessment data: The software package Mondrian Ali Ünlü and Anatol Sargin Department of Mathematics, University of Augsburg, Germany January 17, 2008 Correspondence to: Ali Ünlü Department of Mathematics University of Augsburg Universitätsstrasse 14 D Augsburg Germany Phone: Fax:
4 Computer software review: Mondrian 2 Abstract Mondrian is state-of-the-art statistical data visualization software featuring modern interactive visualization techniques for a wide range of data types. This paper reviews the capabilities, functionality, and interactive properties of this software package. Key features of Mondrian are illustrated with data from the Programme for International Student Assessment (PISA) and for item analysis applications. Keywords: Mondrian; R; Data visualization software; Interactive graphics; Spineplot; Item analysis; Empirical Item Response Function; Distractor Analysis; Differential Item Functioning; Programme for International Student Assessment (PISA) data
5 Computer software review: Mondrian 3 Introduction Data visualization is a vital tool in decision support. Graphics are widely used in modern applied statistics, because they are easy to create, convenient to use, and they can present information effectively (e.g., Cook & Swayne, 2007; Unwin, Theus, & Hofmann, 2006; Wilkinson, 2005; Young, Valero-Mora, & Friendly, 2006). Visualizations, however, are often static (e.g., Emerson, 1998), merely utilized for the presentation rather than the exploration of data. Interactive statistical data visualization, on the other hand, is a powerful alternative for the detection of structural patterns and regularities in the data. Interactive graphics become indispensable especially when analyzing large and complex data sets, in which case statistical modeling and methodologies, generally, fail to account for the complexity of the data satisfactorily (e.g., Unwin, Volinsky, & Winkler, 2003). This review evaluates Mondrian, a state-of-the-art, interactive data visualization software developed by Theus (2002a, 2002b). A forthcoming book by Theus and Urbanek (2008) thoroughly discusses applications of the software Mondrian to a number of real data examples. Mondrian is a stand-alone package with a wide range of graphics and interactive features, and has been developed in the programming language Java. It is platform independent and is freely available for Windows, Mac OS X, and UNIX, by download from Mondrian s web sites at or These web sites in particular contain a tutorial and sample data sets that can be loaded and tested with Mondrian. (Whereas on Mac OS X a suitable Java virtual machine will be pre-installed, Windows users may have to install a recent Java runtime environment from Sun Microsystems at if not already installed. On UNIX one needs to download the jar-file of Mondrian, and start it manually using the command mymachine> java jar Mondrian.jar.)
6 Computer software review: Mondrian 4 Data management capabilities Mondrian in principle has no practical limitations on the size of data sets that can be managed. The software can load and export data in local tab-delimited ASCII files and also connect directly to distributed or online databases via the JDBC interface. Another useful feature of Mondrian is the simplicity with which it allows to generate new variables from the initial ones of the loaded data set, (a) by calculating functions of selected variables (such as the sum or ratio of two variables, or the logarithm of a variable), or (b) by deriving variables from selections or color assignments. Plotting capabilities Mondrian implements interactive graphics for a wide range of data types including missing value plots, and plots for: 1 (a) categorical data: (weighted) barcharts and spineplots (Hummel, 1996), (weighted) mosaic plots and their variations same bin size (each cell is allocated the same amount of space, and the information is reduced to the binary case of whether a cell is or is not empty), fluctuation diagram (each cell is allocated the same amount of space, and the cell with the maximum frequency fills its space completely, thus fixing the scale for the rest of the diagram), and multiple barchart (each cell is allocated the same amount of space, and only the heights of the bars in the cells are scaled) (Friendly, 1994; Hartigan & Kleiner, 1981; Hofmann, 1998, 2000, 2007), and double decker plots (instead of alternately splitting the x and y axes as in a mosaic plot, only the x axis is used) (Hofmann, 2007; Hofmann & Wilhelm, 2001); 1 Some of the following plot types are shown in the figures that we present. There are examples of a choropleth map, mosaic plot, parallel coordinate plot, histogram, scatterplot, barchart, parallel boxplot, and spineplot in Figure 1, in respective order, top down from left to right. Examples of barcharts and spineplots can also be found in Figures 2-4. Throughout this review, special care is taken to provide a comprehensive list of references for detailed information and further reading on the concepts and issues that we mention or discuss.
7 Computer software review: Mondrian 5 (b) continuous data: (weighted) histograms and spinograms (all bars are normalized to have the same height, with proportional widths) (cf. Hofmann & Theus, under revision), scatterplots and scatterplot matrices, parallel coordinate plots (Inselberg, 1985, 1998; Wegman, 1990), parallel boxplots, and boxplots y by x; (c) geographical data: choropleth maps (maps with color-shadings to represent quantities) (Carr, Olsen, Courbois, Pierson, & Carr, 1998; Carr, Zhang, & Li, 2002; Dykes, MacEachren, & Kraak, 2005). All types of plots can handle missing values (coded as NA in the input file). Missing data can even be included as a separate group. For example, when plotting categorical data, missing values constitute an extra group, and the associated plot object is colored white. Figure 1 provides a gallery screenshot from Mondrian that illustrates some of the aforementioned plot types. 2 [Insert Fig. 1 about here] Mondrian can fully interact with the R statistical computing environment (R Development Core Team, 2006; to enhance data analysis with statistical procedures. 3 This interaction between Mondrian and R provides additional plotting capabilities including: (a) density estimation; (b) smoothing using R functions such as loess() and regression splines with confidence intervals; (c) multidimensional scaling; and (d) principal component analysis. Mondrian also includes a model navigator for the stepwise graphical building of loglinear models using mosaic plots (Theus & Lauer, 1999). Table 1 summarizes key features of Mondrian. [Insert Table 1 about here] 2 For black and white reproduction all figures in this review are presented in gray scales only. 3 Requires connection to Rserve, a freely accessible TCP/IP server.
8 Computer software review: Mondrian 6 Interactivity Mondrian replicates many of the same types of plots readily available in most statistical software packages (e.g., barcharts, histograms, and scatterplots). However, Mondrian is highly interactive and offers a wide range of query and data exploration options. All plots can handle large data sets and are fully linked. Special features include a binned mode for scatterplots (the plotting region is divided into a regular grid, and the data points falling into a particular bin are aggregated in that bin) and α-channel transparency (also called α-blending) of plotted objects (specifying the transparency of the color of a plotted object) (e.g., Theus & Urbanek, 2008; Unwin et al., 2006). These features avoid overplotting redundant or high-density information that would ordinarily make difficult or even impossible the interpretation of the plots with large data sets. Linking of plots is easily accomplished by selection, specifying selection sequences, and by highlighting. For example, a single data point or case selected in one plot is highlighted in all other plots. A special characteristic of Mondrian is the provision of selection sequences allowing for step-by-step refined selections of specific parts of the data, which becomes indispensable especially when dealing with massive data sets (Theus, Hofmann, & Wilhelm, 1998; see also Schneiderman, 1994; Wills, 1996). Selection sequences allow users to combine current selections with new selections via simple Boolean operators. By storing the sequence, it is possible to flexibly modify any individual selection in the sequence during data analysis. Creating multiple (simultaneous) views in single or different plot windows, manually and automatically sorting and reordering categories and variables, and varying point size of points in a scatterplot are further interactive options available in Mondrian. Mondrian
9 Computer software review: Mondrian 7 also allows for standard, logical, and censored zooming, and color brushing (persistent assignment of colors; for instance, to mark outliers more permanently). Mondrian s help system includes a reference card that summaries all keyboard and mouse shortcuts, for the Windows and Mac OS X operating systems; for instance, including information on how to export graphics. However, the reference card is not mature, for instance not allowing for changing the size, copying parts, or even adding new entries. Sample graphics and applications 4 Some of the key features of Mondrian are next illustrated with part of empirical data from the 2003 Programme for International Student Assessment (PISA; using two data sets with multiple-choice or open format test data (available from the authors). One data set contains item responses by 317 German students on a 12-item dichotomously scored, mathematical literacy test. This data set also included a sex variable coded as 1 = female or 2 = male. The other data set contains item responses by another set of 327 German students on another 5-item polytomously scored, mathematical literacy test. An interesting application of Mondrian might be to conduct a visual item analysis. For example, Figure 2 shows side-by-side spineplots (upper plots) for two items (Items 6 and 7 of the first data set) where the proportions of correct responses (horizontal axes) are plotted as a function of the number-correct (total) scores (vertical axes). The marginal absolute correct and incorrect values are displayed as barcharts below the spineplots. This type of depiction provides an empirical item response function plot and also suggests a positive linear relationship between the conditional correct item response proportions and the total scores. This type of plot could be supplemented with a smoothing function, if desired. 4 This section has greatly benefited from comments made by Professor Richard M. Luecht.
10 Computer software review: Mondrian 8 [Insert Fig. 2 about here] Figure 3 provides a corresponding visual distractor analysis that shows the proportions of examinees responding to each response category for two items (Items 1 and 3 of the second data set), with the proportions again conditional on number-correct (total) score. This type of visual analysis might detect particular incorrect distractors that are proportionally more attractive for examinees within particular regions of the total score scale. [Insert Fig. 3 about here] Figure 4 displays the separate proportion-correct bars for one item (Item 4 of the first data set) as a function of the total score for males and females. This type of graphic provides a crude visual analysis of differential item functioning or DIF (e.g., Holland and Wainer, 1993). In this type of plot, we do not get an aggregate measure of DIF, but perhaps a better indication of where along the total score scale, if anyplace, the two sex groups, matched on total score, differ in their apparent response to a particular item. This example does not imply that Mondrian would replace conventional DIF analyses, however, it could certainly supplement the information provided to test developers or item editors for items flagged as demonstrating significant DIF. [Insert Fig. 4 about here] Conclusions This review is intended to inform and encourage people working in such disciplines as psychometrics and, even broader, quantitative psychology, to take advantage of the most current techniques in the field of interactive graphics. As illustrated by the exemplar graphics of assessment data, the latter can provide a way of better understanding the data and
11 Computer software review: Mondrian 9 supporting the process of model building, which shall be a useful supplement to the applied work of a variety of measurement professionals and researchers. Mondrian can do quite well, in this respect. Different views demonstrated by graphics implemented in Mondrian help users to interactively and flexibly see, with relatively low investment of time to become proficient, particular aspects of their data. Besides being available for free, the software package Mondrian is constantly updated to cover recent developments in the research on interactive graphics. It is intuitive and easy to use, and features many powerful capabilities for the efficient analysis of (even very large) data. Acknowledgments We thank Professor Antony R. Unwin for introducing us to interactive graphics and the software Mondrian, for motivating us to write this review, and for his helpful comments on a first draft of the manuscript. Thanks are also given to Martin Theus for his invaluable expertise and help that he provided us, and to Waqas Ahmed Malik for fruitful discussions about Mondrian. We are grateful to Professor Mark L. Davison, Editor in Chief, for his work on this review. In particular, we are deeply indebted to Professor Richard M. Luecht, Computer Software Review Editor. His critical and valuable comments and suggestions have improved the manuscript greatly.
12 Computer software review: Mondrian 10 References Carr, D.B., Olsen, A.R., Courbois, J.-Y.P., Pierson, S.M., & Carr, D.A. (1998). Linked micromap plots: Named and described. Statistical Computing & Statistical Graphics Newsletter, 9, Carr, D.B., Zhang, Y., & Li, Y. (2002). Dynamically conditioned choropleth maps. Statistical Computing & Statistical Graphics Newsletter, 13, 2-7. Cook, D., & Swayne, D. (2007). Interactive and dynamic graphics for data analysis. New York: Springer. Dykes, J.A., MacEachren, A.M., & Kraak, M.-J. (Eds.) (2005). Exploring geovisualization. Amsterdam: Elsevier. Emerson, J.W. (1998). Mosaic displays in S-Plus: A general implementation and a case study. Statistical Computing & Statistical Graphics Newsletter, 9, Friendly, M. (1994). Mosaic displays for multi-way contingency tables. Journal of the American Statistical Association, 89, Hartigan, J.A., & Kleiner, B. (1981). Mosaics for contingency tables. In W.F. Eddy (Ed.), Computer science and statistics: Proceedings of the thirteenth symposium on the interface (pp ). New York: Springer. Hofmann, H. (1998). Simpson on board the Titanic? Interactive methods for dealing with multivariate categorical data. Statistical Computing & Statistical Graphics Newsletter, 9, Hofmann, H. (2000). Exploring categorical data: Interactive mosaic plots. Metrika, 51, Hofmann, H. (2007). Mosaic plots and their variants. In C.H. Chen, W. Haerdle, & A.R. Unwin (Eds.), Handbook of data visualization. Heidelberg: Springer.
13 Computer software review: Mondrian 11 Hofmann, H., & Theus, M. (under revision). Interactive graphics for visualizing conditional densities. Journal of Computational and Graphical Statistics. Hofmann, H., & Wilhelm, A.F.X. (2001). Visual comparison of association rules. Computational Statistics, 16, Holland, P.W., & Wainer, H. (1993). Differential item functioning. Hillsdale, NJ: Lawrence Erlbaum Associates. Hummel, J. (1996). Linked bar charts: Analysing categorical data graphically. Computational Statistics, 11, Inselberg, A. (1985). The plane with parallel coordinates. The Visual Computer, 1, Inselberg, A. (1998). Visual data mining with parallel coordinates. Computational Statistics, 13, R Development Core Team (2006). R: A language and environment for statistical computing (ISBN ). Vienna, Austria: R Foundation for Statistical Computing. Schneiderman, B. (1994). Dynamic queries for visual information seeking. IEEE Software, 11, Theus, M. (2002a). Interactive data visualization using Mondrian. Statistical Computing & Statistical Graphics Newsletter, 13, Theus, M. (2002b). Interactive data visualization using Mondrian. Journal of Statistical Software, 7 (11). Theus, M., Hofmann, H., & Wilhelm, A.F.X. (1998). Selection sequences interactive analysis of massive data sets. Computing Science and Statistics, 29, Theus, M., & Lauer, S.R.W. (1999). Visualizing loglinear models. Journal of Computational and Graphical Statistics, 8, Theus, M., & Urbanek, S. (2008). Interactive graphics for data analysis. London: CRC Press.
14 Computer software review: Mondrian 12 Unwin, A.R., Theus, M., & Hofmann, H. (2006). Graphics of large datasets. New York: Springer. Unwin, A.R., Volinsky, C., Winkler, S. (2003). Parallel coordinates for exploratory modelling analysis. Computational Statistics & Data Analysis, 43, Wegman, E.J. (1990). Hyperdimensional data analysis using parallel coordinates. Journal of the American Statistical Association, 85, Wilkinson, L. (2005). The grammar of graphics (2nd ed.). New York: Springer. Wills, G.J. (1996). Selection: 524,288 ways to say this is interesting. Proceedings of InfoVis 96, IEEE symposium on information visualization (pp ). IEEE Computer Society Press. Young, F.W., Valero-Mora, P.M., & Friendly, M. (2006). Visual statistics: Seeing data with dynamic interactive graphics. Hobken, NJ: Wiley.
15 Computer software review: Mondrian 13 Figure captions and notes Fig. 1. Gallery Screenshot from Mondrian Note. Data on the 2004 United States presidential election, which for instance can be downloaded from Mondrian s web sites, are used to produce the individual plots. Fig. 2. Empirical Item Response Functions: Barcharts for Items 6 and 7 (Lower Plots) and Corresponding Spineplots for the Total Score (Upper Plots) Note. In the barcharts the examinees solving that particular item are color brushed. In the spineplots the proportions of correct responses are plotted as a function of the numbercorrect (total) scores. Fig. 3. Distractor Analysis: Barcharts for Items 1 and 3 (Lower Plots) and Corresponding Spineplots for the Total Score (Upper Plots) Note. In the barcharts the examinees responding to a particular category of the items are color brushed. The proportions of examinees responding to each response category, conditional on total score, are depicted in the spineplots. Fig. 4. Differential Item Functioning: Barcharts for Item 4 (Lower Plots) and Sex (Middle Plots) and Corresponding Spineplots for the Total Score (Upper Plots) Note. In the lower barcharts the males and females solving that item are color brushed. Their fractions of the total number of males and females are depicted in the middle plots. The spineplots display the separate proportion-correct bars as a function of the total score for males and females.
16 Computer software review: Mondrian 14 Figures Fig. 1
17 Fig. 2 Computer software review: Mondrian 15
18 Fig. 3 Computer software review: Mondrian 16
19 Computer software review: Mondrian 17 Fig. 4 Tables
20 Computer software review: Mondrian 18 Table 1 Key Features of Mondrian Plot Data type R Plot variation Interactivity functionality Linking Querying Zooming α blending Binned mode Missing value plot Categorical, continuous Available Standard Standard Barchart Categorical Weighted barchart, Available Standard Standard spineplot Mosaic plot Categorical Fluctuation diagram, multiple barchart, same bin size, double decker plot Available Standard Standard, censored Histogram Continuous Density Weighted histogram, Available Orientation, Standard estimation spinogram standard Scatterplot Continuous Smoothing: Scatterplot matrix Available Orientation, Standard, Available Available least square, standard, logical loess, extended splines Parallel coordinate plot Continuous Parallel boxplot Available Standard, Extended Standard Available Note. Other plots implemented in Mondrian are boxplots y by x and choropleth maps. All plots can handle missing values and are fully linked. Linking of plots is realized via selections, selection sequences, and highlighting. Mondrian also allows for the interactive options of creating multiple (simultaneous) views in single or different plot windows, manually and automatically sorting and reordering categories and variables, color brushing, and point size variation of points in a scatterplot. Further R functionalities of Mondrian (R computations callable in Mondrian) are multidimensional scaling and principal component analysis. Mondrian includes a model navigator for the graphical building of loglinear models using mosaic plots. Eventually, new variables can be generated from the initial ones given in the loaded data set. (Orientation querying: gives the location of the mouse pointer in the coordinate system of a plot; standard querying: gives basic information associated with an object of a plot; extended querying: gives additional information, not only related to the variables used in a plot, but to other information about an object of the plot. Standard
21 Computer software review: Mondrian 19 zooming: the display is magnified; logical zooming: as magnification increases, more detail is shown; censored zooming: in ceilingcensored zooming objects are magnified up to a limit, in floor-censored zooming objects smaller than a size are not displayed.)
Interactive Data Visualization using Mondrian
Interactive Data Visualization using Mondrian Martin Theus University of Augsburg Department of Computeroriented Statistics and Data Analysis Universitätsstr. 14, 86135 Augsburg, Germany [email protected]
Visualization of missing values using the R-package VIM
Institut f. Statistik u. Wahrscheinlichkeitstheorie 040 Wien, Wiedner Hauptstr. 8-0/07 AUSTRIA http://www.statistik.tuwien.ac.at Visualization of missing values using the R-package VIM M. Templ and P.
The Value of Visualization 2
The Value of Visualization 2 G Janacek -0.69 1.11-3.1 4.0 GJJ () Visualization 1 / 21 Parallel coordinates Parallel coordinates is a common way of visualising high-dimensional geometry and analysing multivariate
Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller
Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller Getting to know the data An important first step before performing any kind of statistical analysis is to familiarize
Information Visualization Multivariate Data Visualization Krešimir Matković
Information Visualization Multivariate Data Visualization Krešimir Matković Vienna University of Technology, VRVis Research Center, Vienna Multivariable >3D Data Tables have so many variables that orthogonal
Introduction Course in SPSS - Evening 1
ETH Zürich Seminar für Statistik Introduction Course in SPSS - Evening 1 Seminar für Statistik, ETH Zürich All data used during the course can be downloaded from the following ftp server: ftp://stat.ethz.ch/u/sfs/spsskurs/
Data Exploration Data Visualization
Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select
GeoGebra Statistics and Probability
GeoGebra Statistics and Probability Project Maths Development Team 2013 www.projectmaths.ie Page 1 of 24 Index Activity Topic Page 1 Introduction GeoGebra Statistics 3 2 To calculate the Sum, Mean, Count,
R Graphics Cookbook. Chang O'REILLY. Winston. Tokyo. Beijing Cambridge. Farnham Koln Sebastopol
R Graphics Cookbook Winston Chang Beijing Cambridge Farnham Koln Sebastopol O'REILLY Tokyo Table of Contents Preface ix 1. R Basics 1 1.1. Installing a Package 1 1.2. Loading a Package 2 1.3. Loading a
MetroBoston DataCommon Training
MetroBoston DataCommon Training Whether you are a data novice or an expert researcher, the MetroBoston DataCommon can help you get the information you need to learn more about your community, understand
SECTION 2-1: OVERVIEW SECTION 2-2: FREQUENCY DISTRIBUTIONS
SECTION 2-1: OVERVIEW Chapter 2 Describing, Exploring and Comparing Data 19 In this chapter, we will use the capabilities of Excel to help us look more carefully at sets of data. We can do this by re-organizing
Lecture 2: Descriptive Statistics and Exploratory Data Analysis
Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals
The Visual Statistics System. ViSta
Visual Statistical Analysis using ViSta The Visual Statistics System Forrest W. Young & Carla M. Bann L.L. Thurstone Psychometric Laboratory University of North Carolina, Chapel Hill, NC USA 1 of 21 Outline
Visualization methods for patent data
Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes
Exploratory Data Analysis
Exploratory Data Analysis Johannes Schauer [email protected] Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction
An Interactive Tool for Residual Diagnostics for Fitting Spatial Dependencies (with Implementation in R)
DSC 2003 Working Papers (Draft Versions) http://www.ci.tuwien.ac.at/conferences/dsc-2003/ An Interactive Tool for Residual Diagnostics for Fitting Spatial Dependencies (with Implementation in R) Ernst
Summarizing and Displaying Categorical Data
Summarizing and Displaying Categorical Data Categorical data can be summarized in a frequency distribution which counts the number of cases, or frequency, that fall into each category, or a relative frequency
Visualizing Categorical Data in ViSta
Visualizing Categorical Data in ViSta Pedro M. Valero-Mora Universitat de València-EG Email: [email protected] Forrest W. Young University of North Carolina Email: [email protected] Michael Friendly York University
Diagrams and Graphs of Statistical Data
Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in
Chapter 6: Constructing and Interpreting Graphic Displays of Behavioral Data
Chapter 6: Constructing and Interpreting Graphic Displays of Behavioral Data Chapter Focus Questions What are the benefits of graphic display and visual analysis of behavioral data? What are the fundamental
Exploratory Spatial Data Analysis
Exploratory Spatial Data Analysis Part II Dynamically Linked Views 1 Contents Introduction: why to use non-cartographic data displays Display linking by object highlighting Dynamic Query Object classification
Graphics - an Ace up a Statistician's Sleeve
Graphics - an Ace up a Statistician's Sleeve Heike Hofmann Bad graphics Beginning of Statistical Graphics Milestones in Graphics Interactive Graphics BAD Graphics Guidelines for a bad graphic: (Howard
Big Data: Rethinking Text Visualization
Big Data: Rethinking Text Visualization Dr. Anton Heijs [email protected] Treparel April 8, 2013 Abstract In this white paper we discuss text visualization approaches and how these are important
IBM SPSS Direct Marketing 23
IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release
Module 2: Introduction to Quantitative Data Analysis
Module 2: Introduction to Quantitative Data Analysis Contents Antony Fielding 1 University of Birmingham & Centre for Multilevel Modelling Rebecca Pillinger Centre for Multilevel Modelling Introduction...
IBM SPSS Direct Marketing 22
IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release
http://school-maths.com Gerrit Stols
For more info and downloads go to: http://school-maths.com Gerrit Stols Acknowledgements GeoGebra is dynamic mathematics open source (free) software for learning and teaching mathematics in schools. It
Data Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler
Data Mining: Exploring Data Lecture Notes for Chapter 3 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Topics Exploratory Data Analysis Summary Statistics Visualization What is data exploration?
Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com
SPSS-SA Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com SPSS-SA Training Brochure 2009 TABLE OF CONTENTS 1 SPSS TRAINING COURSES FOCUSING
Exploratory data analysis (Chapter 2) Fall 2011
Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,
3D Interactive Information Visualization: Guidelines from experience and analysis of applications
3D Interactive Information Visualization: Guidelines from experience and analysis of applications Richard Brath Visible Decisions Inc., 200 Front St. W. #2203, Toronto, Canada, [email protected] 1. EXPERT
SPSS Manual for Introductory Applied Statistics: A Variable Approach
SPSS Manual for Introductory Applied Statistics: A Variable Approach John Gabrosek Department of Statistics Grand Valley State University Allendale, MI USA August 2013 2 Copyright 2013 John Gabrosek. All
Data Visualization Techniques
Data Visualization Techniques From Basics to Big Data with SAS Visual Analytics WHITE PAPER SAS White Paper Table of Contents Introduction.... 1 Generating the Best Visualizations for Your Data... 2 The
Simple Predictive Analytics Curtis Seare
Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam
Data Analysis Methods in Knowledge Space Theory
Universität Augsburg Mathematisch-Naturwissenschaftliche Fakultät Institut für Mathematik Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse Data Analysis Methods in Knowledge Space Theory Dissertation
COM CO P 5318 Da t Da a t Explora Explor t a ion and Analysis y Chapte Chapt r e 3
COMP 5318 Data Exploration and Analysis Chapter 3 What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping
BIRT: A Field Guide to Reporting
BIRT: A Field Guide to Reporting x:.-. ^ 11 Diana Peh Alethea Hannemann Nola Hague AAddison-Wesley Upper Saddle River, NJ Boston Indianapolis San Francisco New York Toronto Montreal London Munich Parts
An introduction to using Microsoft Excel for quantitative data analysis
Contents An introduction to using Microsoft Excel for quantitative data analysis 1 Introduction... 1 2 Why use Excel?... 2 3 Quantitative data analysis tools in Excel... 3 4 Entering your data... 6 5 Preparing
Introduction to Exploratory Data Analysis
Introduction to Exploratory Data Analysis A SpaceStat Software Tutorial Copyright 2013, BioMedware, Inc. (www.biomedware.com). All rights reserved. SpaceStat and BioMedware are trademarks of BioMedware,
Appendix 2.1 Tabular and Graphical Methods Using Excel
Appendix 2.1 Tabular and Graphical Methods Using Excel 1 Appendix 2.1 Tabular and Graphical Methods Using Excel The instructions in this section begin by describing the entry of data into an Excel spreadsheet.
Getting started manual
Getting started manual XLSTAT Getting started manual Addinsoft 1 Table of Contents Install XLSTAT and register a license key... 4 Install XLSTAT on Windows... 4 Verify that your Microsoft Excel is up-to-date...
Data Visualization Techniques
Data Visualization Techniques From Basics to Big Data with SAS Visual Analytics WHITE PAPER SAS White Paper Table of Contents Introduction.... 1 Generating the Best Visualizations for Your Data... 2 The
Data exploration with Microsoft Excel: analysing more than one variable
Data exploration with Microsoft Excel: analysing more than one variable Contents 1 Introduction... 1 2 Comparing different groups or different variables... 2 3 Exploring the association between categorical
Visualizing Data. Contents. 1 Visualizing Data. Anthony Tanbakuchi Department of Mathematics Pima Community College. Introductory Statistics Lectures
Introductory Statistics Lectures Visualizing Data Descriptive Statistics I Department of Mathematics Pima Community College Redistribution of this material is prohibited without written permission of the
CHARTS AND GRAPHS INTRODUCTION USING SPSS TO DRAW GRAPHS SPSS GRAPH OPTIONS CAG08
CHARTS AND GRAPHS INTRODUCTION SPSS and Excel each contain a number of options for producing what are sometimes known as business graphics - i.e. statistical charts and diagrams. This handout explores
C:\Users\<your_user_name>\AppData\Roaming\IEA\IDBAnalyzerV3
Installing the IDB Analyzer (Version 3.1) Installing the IDB Analyzer (Version 3.1) A current version of the IDB Analyzer is available free of charge from the IEA website (http://www.iea.nl/data.html,
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
Week 1. Exploratory Data Analysis
Week 1 Exploratory Data Analysis Practicalities This course ST903 has students from both the MSc in Financial Mathematics and the MSc in Statistics. Two lectures and one seminar/tutorial per week. Exam
TIBCO Spotfire Business Author Essentials Quick Reference Guide. Table of contents:
Table of contents: Access Data for Analysis Data file types Format assumptions Data from Excel Information links Add multiple data tables Create & Interpret Visualizations Table Pie Chart Cross Table Treemap
Demographics of Atlanta, Georgia:
Demographics of Atlanta, Georgia: A Visual Analysis of the 2000 and 2010 Census Data 36-315 Final Project Rachel Cohen, Kathryn McKeough, Minnar Xie & David Zimmerman Ethnicities of Atlanta Figure 1: From
Guidebook to R Graphics Using Microsoft Windows
Brochure More information from http://www.researchandmarkets.com/reports/2162901/ Guidebook to R Graphics Using Microsoft Windows Description: Introduces the graphical capabilities of R to readers new
Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics
Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),
Clustering & Visualization
Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.
MDS. Measured Data Server Online Measurement Network. Properties and Benefits »»» »»»» ProduCt information
ProduCt information MDS Measured Data Server Online Measurement Network Properties and Benefits Unlimited access to your data via Internet Measured data can be transferred online in minute intervals Economical
The Forgotten JMP Visualizations (Plus Some New Views in JMP 9) Sam Gardner, SAS Institute, Lafayette, IN, USA
Paper 156-2010 The Forgotten JMP Visualizations (Plus Some New Views in JMP 9) Sam Gardner, SAS Institute, Lafayette, IN, USA Abstract JMP has a rich set of visual displays that can help you see the information
Scatter Plots with Error Bars
Chapter 165 Scatter Plots with Error Bars Introduction The procedure extends the capability of the basic scatter plot by allowing you to plot the variability in Y and X corresponding to each point. Each
MTH 140 Statistics Videos
MTH 140 Statistics Videos Chapter 1 Picturing Distributions with Graphs Individuals and Variables Categorical Variables: Pie Charts and Bar Graphs Categorical Variables: Pie Charts and Bar Graphs Quantitative
It is important to bear in mind that one of the first three subscripts is redundant since k = i -j +3.
IDENTIFICATION AND ESTIMATION OF AGE, PERIOD AND COHORT EFFECTS IN THE ANALYSIS OF DISCRETE ARCHIVAL DATA Stephen E. Fienberg, University of Minnesota William M. Mason, University of Michigan 1. INTRODUCTION
Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.
Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing
Principles of Data Visualization for Exploratory Data Analysis. Renee M. P. Teate. SYS 6023 Cognitive Systems Engineering April 28, 2015
Principles of Data Visualization for Exploratory Data Analysis Renee M. P. Teate SYS 6023 Cognitive Systems Engineering April 28, 2015 Introduction Exploratory Data Analysis (EDA) is the phase of analysis
Variables. Exploratory Data Analysis
Exploratory Data Analysis Exploratory Data Analysis involves both graphical displays of data and numerical summaries of data. A common situation is for a data set to be represented as a matrix. There is
Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining
Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 8/05/2005 1 What is data exploration? A preliminary
Exercise 1.12 (Pg. 22-23)
Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.
How To Use Statgraphics Centurion Xvii (Version 17) On A Computer Or A Computer (For Free)
Statgraphics Centurion XVII (currently in beta test) is a major upgrade to Statpoint's flagship data analysis and visualization product. It contains 32 new statistical procedures and significant upgrades
There are six different windows that can be opened when using SPSS. The following will give a description of each of them.
SPSS Basics Tutorial 1: SPSS Windows There are six different windows that can be opened when using SPSS. The following will give a description of each of them. The Data Editor The Data Editor is a spreadsheet
Data Visualization. Scientific Principles, Design Choices and Implementation in LabKey. Cory Nathe Software Engineer, LabKey cnathe@labkey.
Data Visualization Scientific Principles, Design Choices and Implementation in LabKey Catherine Richards, PhD, MPH Staff Scientist, HICOR [email protected] Cory Nathe Software Engineer, LabKey [email protected]
Plotting Data with Microsoft Excel
Plotting Data with Microsoft Excel Here is an example of an attempt to plot parametric data in a scientifically meaningful way, using Microsoft Excel. This example describes an experience using the Office
Data Visualization in R
Data Visualization in R L. Torgo [email protected] Faculdade de Ciências / LIAAD-INESC TEC, LA Universidade do Porto Oct, 2014 Introduction Motivation for Data Visualization Humans are outstanding at detecting
T O P I C 1 2 Techniques and tools for data analysis Preview Introduction In chapter 3 of Statistics In A Day different combinations of numbers and types of variables are presented. We go through these
Visualization Quick Guide
Visualization Quick Guide A best practice guide to help you find the right visualization for your data WHAT IS DOMO? Domo is a new form of business intelligence (BI) unlike anything before an executive
Ohio University Computer Services Center August, 2002 Crystal Reports Introduction Quick Reference Guide
Open Crystal Reports From the Windows Start menu choose Programs and then Crystal Reports. Creating a Blank Report Ohio University Computer Services Center August, 2002 Crystal Reports Introduction Quick
Exploratory Data Analysis with MATLAB
Computer Science and Data Analysis Series Exploratory Data Analysis with MATLAB Second Edition Wendy L Martinez Angel R. Martinez Jeffrey L. Solka ( r ec) CRC Press VV J Taylor & Francis Group Boca Raton
Using SPSS, Chapter 2: Descriptive Statistics
1 Using SPSS, Chapter 2: Descriptive Statistics Chapters 2.1 & 2.2 Descriptive Statistics 2 Mean, Standard Deviation, Variance, Range, Minimum, Maximum 2 Mean, Median, Mode, Standard Deviation, Variance,
Specific Usage of Visual Data Analysis Techniques
Specific Usage of Visual Data Analysis Techniques Snezana Savoska 1 and Suzana Loskovska 2 1 Faculty of Administration and Management of Information systems, Partizanska bb, 7000, Bitola, Republic of Macedonia
Fairfield Public Schools
Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity
IBM SPSS Statistics for Beginners for Windows
ISS, NEWCASTLE UNIVERSITY IBM SPSS Statistics for Beginners for Windows A Training Manual for Beginners Dr. S. T. Kometa A Training Manual for Beginners Contents 1 Aims and Objectives... 3 1.1 Learning
Big Data Visualization for Occupational Health and Security problem in Oil and Gas Industry
Big Data Visualization for Occupational Health and Security problem in Oil and Gas Industry Daniela Gorski Trevisan 2, Nayat Sanchez-Pi 1, Luis Marti 3, and Ana Cristina Bicharra Garcia 2 1 Instituto de
Information Server Documentation SIMATIC. Information Server V8.0 Update 1 Information Server Documentation. Introduction 1. Web application basics 2
Introduction 1 Web application basics 2 SIMATIC Information Server V8.0 Update 1 System Manual Office add-ins basics 3 Time specifications 4 Report templates 5 Working with the Web application 6 Working
Iris Sample Data Set. Basic Visualization Techniques: Charts, Graphs and Maps. Summary Statistics. Frequency and Mode
Iris Sample Data Set Basic Visualization Techniques: Charts, Graphs and Maps CS598 Information Visualization Spring 2010 Many of the exploratory data techniques are illustrated with the Iris Plant data
Chapter 4 Creating Charts and Graphs
Calc Guide Chapter 4 OpenOffice.org Copyright This document is Copyright 2006 by its contributors as listed in the section titled Authors. You can distribute it and/or modify it under the terms of either
Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining
Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar What is data exploration? A preliminary exploration of the data to better understand its characteristics.
Criteria for Evaluating Visual EDA Tools
Criteria for Evaluating Visual EDA Tools Stephen Few, Perceptual Edge Visual Business Intelligence Newsletter April/May/June 2012 We visualize data for various purposes. Specific purposes direct us to
STATGRAPHICS Online. Statistical Analysis and Data Visualization System. Revised 6/21/2012. Copyright 2012 by StatPoint Technologies, Inc.
STATGRAPHICS Online Statistical Analysis and Data Visualization System Revised 6/21/2012 Copyright 2012 by StatPoint Technologies, Inc. All rights reserved. Table of Contents Introduction... 1 Chapter
Exploratory Data Analysis
Exploratory Data Analysis Learning Objectives: 1. After completion of this module, the student will be able to explore data graphically in Excel using histogram boxplot bar chart scatter plot 2. After
Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools
Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools Occam s razor.......................................................... 2 A look at data I.........................................................
Statistics 151 Practice Midterm 1 Mike Kowalski
Statistics 151 Practice Midterm 1 Mike Kowalski Statistics 151 Practice Midterm 1 Multiple Choice (50 minutes) Instructions: 1. This is a closed book exam. 2. You may use the STAT 151 formula sheets and
