AN INTRODUCTION TO R
|
|
|
- Darrell O’Connor’
- 9 years ago
- Views:
Transcription
1 AN INTRODUCTION TO R DEEPAYAN SARKAR Lattice graphics lattice is an add-on package that implements Trellis graphics (originally developed for S and S-PLUS) in R. It is a powerful and elegant high-level data visualization system, with an emphasis on multivariate data, that is sufficient for typical graphics needs, and is also flexible enough to handle most nonstandard requirements. This tutorial covers the basics of lattice and gives pointers to further resources. Some examples To fix ideas, we start with a few simple examples. We use the Chem97 dataset from the mlmrev package. > data(chem97, package = "mlmrev") > head(chem97) lea school student score gender age gcsescore gcsecnt The dataset records information on students appearing in the 1997 A-level chemistry examination in Britain. We are only interested in the following variables: score: point score in the A-level exam, with six possible values (0, 2,,, ). gcsescore: average score in GCSE exams. This is a continuous score that may be used as a predictor of the A-level score. gender: gender of the student. Using lattice, we can draw a histogram of all the gcsescore values using > histogram(~ gcsescore, data = Chem97) Date: October
2 2 DEEPAYAN SARKAR Percent of Total gcsescore This plot shows a reasonably symmetric unimodal distribution, but is otherwise uninteresting. A more interesting display would be one where the distribution of gcsescore is compared across different subgroups, say those defined by the A-level exam score. This can be done using > histogram(~ gcsescore factor(score), data = Chem97) Percent of Total gcsescore 0 2
3 AN INTRODUCTION TO R 3 ore effective comparison is enabled by direct superposition. This is hard to do with conventional histograms, but easier using kernel density estimates. In the following example, we use the same subgroups as before in the different panels, but additionally subdivide the gcsescore values by gender within each panel. > densityplot(~ gcsescore factor(score), Chem97, groups = gender, plot.points = ALSE, auto.key = TRUE) Density gcsescore 0 2 Exercise 1. What happens if the extra arguments plot.points and auto.key are omitted? What happens if the inline call to factor() is omitted? The plot.points argument is described in the?panel.densityplot help page, and auto.key in?xyplot. Without the call to factor(), score is considered to be a numeric variable, and converted into a shingle ; see?shingle. Exercise 2. lattice uses a system of customizable settings to derive default graphical parameters (see?trellis.device). Notice that the densities for the two genders are distinguished by differing line types. What happens when you run the same commands interactively? Which do you think is more effective?
4 DEEPAYAN SARKAR Basic ideas lattice provides a high-level system for statistical graphics that is independent of traditional R graphics. It is modeled on the Trellis suite in S-PLUS, and implements most of its features. In fact, lattice can be considered an implementation of the general principles of Trellis graphics (?). It uses the grid package (?) as the underlying implementation engine, and thus inherits many of its features by default. Trellis displays are defined by the type of graphic and the role different variables play in it. Each display type is associated with a corresponding high-level function (histogram, densityplot, etc.). Possible roles depend on the type of display, but typical ones are primary variables: those that define the primary display (e.g., gcsescore in the previous examples). conditioning variables: divides data into subgroups, each of which are presented in a different panel (e.g., score in the last two examples). grouping variables: subgroups are contrasted within panels by superposing the corresponding displays (e.g., gender in the last example). The following display types are available in lattice. unction histogram() densityplot() qqmath() qq() stripplot() bwplot() dotplot() barchart() xyplot() splom() contourplot() levelplot() wireframe() cloud() parallel() Default Display Histogram Kernel Density Plot Theoretical Quantile Plot Two-sample Quantile Plot Stripchart (Comparative 1-D Scatterplots) Comparative Box-and-Whisker Plots Cleveland Dot Plot Bar Plot Scatterplot Scatterplot atrix Contour Plot of Surfaces alse Color Level Plot of Surfaces Three-dimensional Perspective Plot of Surfaces Three-dimensional Scatterplot Parallel Coordinates Plot New high-level functions can be written to represent further visualization types; examples are ecdfplot() and mapplot() in the latticeextra package. Design goals Visualization is an art, but it can benefit greatly from a systematic, scientific approach. In particular,? has shown that it is possible to come up with general rules that can be applied to design more effective graphs. One of the primary goals of Trellis graphics is to provide tools that make it easy to apply these rules, so that the burden of compliance is shifted from the user to the software to the extent possible. Some obvious examples of such rules are: Use as much of the available space as possible orce direct comparsion by superposition (grouping) when possible Encourage comparison when juxtaposing (conditioning): use common axes, add common reference objects such as grids. These design goals have some technical drawbacks; for example, non-wastage of space requires the complete display to be known when plotting begins, so, the incremental approach common in traditional R graphics (e.g., adding a main title after the main plot is finished) doesn t fit in. lattice deals with this using an
5 AN INTRODUCTION TO R 5 object-based paradigm: plots are represented as regular R objects, incremental updates are performed by modifying such objects and re-plotting them. Although rules are useful, any serious graphics system must also be flexible. lattice is designed to be flexible, but obviously there is a trade-off between flexibility and ease of use for the more common tasks. lattice tries to achieve a balance using the following model: A display is made up of various elements The defaults are coordinated to provide meaningful results, but Each element can be controlled by the user independently of the others The main elements are: the primary (panel) display axis annotation strip annotation (describing the conditioning process) legends (typically describing the grouping process) In each case, additional arguments to the high-level calls can be used to activate common variants, and full flexibility is allowed through arbitrary user-defined functions. This is particularly useful for controlling the primary display through panel functions. ost nontrivial use of lattice involves manipulation of one or more of these elements. Not all graphical designs segregate neatly into these elements; lattice may not be a good tool for such displays. Common high-level functions Visualizing univariate distributions. Several standard statistical graphics are intended to visualize the distribution of a continuous random variable. We have already seen histograms and density plots, which are both estimates of the probability density function. Another useful display is the normal Q-Q plot, which is related to the distribution function (x) = P (X x). Normal Q-Q plots can be produced by the lattice function qqmath(). > qqmath(~ gcsescore factor(score), Chem97, groups = gender, f.value = ppoints(100), auto.key = list(columns = 2), type = c("p", "g"), aspect = "xy") gcsescore qnorm
6 DEEPAYAN SARKAR Normal Q-Q plots plot empirical quantiles of the data against quantiles of the normal distribution (or some other theoretical distribution). They can be regarded as an estimate of the distribution function, with the probability axis transformed by the normal quantile function. They are designed to detect departures from normality; for a good fit, the points lie approximate along a straight line. In the plot above, the systematic convexity suggests that the distributions are left-skewed, and the change in slopes suggests changing variance. The type argument adds a common reference grid to each panel that makes it easier to see the upward shift in gcsescore across panels. The aspect argument automatically computes an aspect ratio. Two-sample Q-Q plots compare quantiles of two samples (rather than one sample and a theoretical distribution). They can be produced by the lattice function qq(), with a formula that has two primary variables. In the formula y ~ x, y needs to be a factor with two levels, and the samples compared are the subsets of x for the two levels of y. or example, we can compare the distributions of gcsescore for males and females, conditioning on A-level score, with > qq(gender ~ gcsescore factor(score), Chem97, f.value = ppoints(100), type = c("p", "g"), aspect = 1) The plot suggests that females do better than males in the GCSE exam for a given A-level score (in other words, males tend to improve more from the GCSE exam to the A-level exam), and also have smaller variance (except in the first panel).
7 AN INTRODUCTION TO R 7 Two-sample Q-Q plots only allow comparison between two samples at a time. A well-known graphical design that allows comparison between an arbitrary number of samples is the comparative box-and-whisker plot. They are related to the Q-Q plot: the values compared are five special quantiles, the median, the first and third quartiles, and the extremes. ore commonly, the extents of the whiskers are defined differently, and values outside plotted explicitly, so that heavier-than-normal tails tend to produce many points outside the extremes. See?boxplot.stats for details. Box-and-whisker plots can be produced by the lattice function bwplot(). > bwplot(factor(score) ~ gcsescore gender, Chem97) gcsescore The decreasing lengths of the boxes and whiskers suggest decreasing variance, and the large number of outliers on one side indicate heavier left tails (characteristic of a left-skewed distribution). The same box-and-whisker plots can be displayed in a slightly different layout to emphasize a more subtle effect in the data: for example, the median gcsescore does not uniformly increase from left to right in the following plot, as one might have expected. > bwplot(gcsescore ~ gender factor(score), Chem97, layout = c(, 1))
8 DEEPAYAN SARKAR gcsescore 2 0 The layout argument controls the layout of panels in columns, rows, and pages (the default would not have been as useful in this example). Note that the box-and-whisker plots are now vertical, because of a switch in the order of variables in the formula. Exercise 3. All the plots we have seen suggest that the distribution of gcsescore is slightly skewed, and have unequal variances in the subgroups of interest. Using a Box Cox transformation often helps in such situations. The boxcox() function in the ASS package can be used to find the optimal Box Cox transformation, which in this case is approximate 2.3. Reproduce the previous plots replacing gcsescore by gcsescore^2.3. Would you say the transformation was successful? Exercise. Not all tools are useful for all problems. Box-and-whisker plots, and to a lesser extent Q-Q plots, are mostly useful when the distributions are symmetric and unimodal, and can be misleading otherwise. or example, consider the display produced by > data(gvhd10, package = "latticeextra") > bwplot(days ~ log(sc.h), data = gvhd10)
9 AN INTRODUCTION TO R 9 What would you conclude about the distribution of log(sc.h) from this plot? Now draw kernel density plots of log(sc.h) conditioning on Days. Would you reach the same conclusions as before? or small samples, summarizing is often unnecessary, and simply plotting all the data reveals interesting features of the distribution. The following example, which uses the quakes dataset, plots depths of earthquake epicenters by magnitude. > stripplot(depth ~ factor(mag), data = quakes, jitter.data = TRUE, alpha = 0., main = "Depth of earthquake epicenters by magnitude", xlab = "agnitude (Richter)", ylab = "Depth (km)") Depth of earthquake epicenters by magnitude 00 Depth (km) agnitude (Richter)
10 10 DEEPAYAN SARKAR This is known as a strip plot or a one-dimensional scatterplot. Note the use of jittering and partial transparency to alleviate potential overplotting. The arguments xlab, ylab, and main have been used to add informative labels; this is possible in all high-level lattice functions. Visualizing tabular data. Tables form an important class of statistical data. Popular visualization methods designed for tables are bar charts and Cleveland dot plots. 1 or illustration, we use the VADeaths dataset, which gives death rates in the U.S. state of Virginia in 191 among different population subgroups. VADeaths is a matrix. > VADeaths Rural ale Rural emale Urban ale Urban emale To use the lattice formula interface, we first need to convert it into a data frame. > VADeathsD <- as.data.frame.table(vadeaths, responsename = "Rate") > VADeathsD Var1 Var2 Rate Rural ale Rural ale Rural ale Rural ale Rural ale Rural emale Rural emale Rural emale Rural emale Rural emale Urban ale Urban ale Urban ale Urban ale Urban ale Urban emale Urban emale Urban emale Urban emale Urban emale 50.0 Bar charts are produced by the barchart() function, and Cleveland dot plots by dotplot(). Both allow a formula of the form y ~ x (plus additional conditioning and grouping variables), where one of x and y should be a factor. A bar chart of the VADeathsD data is produced by 1 Pie charts are also popular, but they have serious design flaws and should not be used. lattice does not have a high-level function that produces pie charts.
11 AN INTRODUCTION TO R 11 > barchart(var1 ~ Rate Var2, VADeathsD, layout = c(, 1)) Rural ale Rural emale Urban ale Urban emale Rate This plot is potentially misleading, because a strong visual effect in the plot is the comparison of the areas of the shaded bars, which do not mean anything. This problem can be addressed by making the areas proportional to the values they encode. > barchart(var1 ~ Rate Var2, VADeathsD, layout = c(, 1), origin = 0) Rural ale Rural emale Urban ale Urban emale Rate A better design is to altogether forego the bars, which distract from the primary comparison of the endpoint positions, and instead use a dot plot.
12 12 DEEPAYAN SARKAR > dotplot(var1 ~ Rate Var2, VADeathsD, layout = c(, 1)) Rural ale Rural emale Urban ale Urban emale Rate In this particular example, the display is more effective if we use Var2 as a grouping variable, and join the points within each group. > dotplot(var1 ~ Rate, data = VADeathsD, groups = Var2, type = "o", auto.key = list(space = "right", points = TRUE, lines = TRUE)) Rural ale Rural emale Urban ale Urban emale Rate
13 AN INTRODUCTION TO R 13 This plot clearly shows that the pattern of death rate by age is virtually identical for urban and rural females, with an increased rate for rural males, and a further increase for urban males. This interaction is difficult to see in the earlier plots. Generic functions and methods. High-level lattice functions are actually generic functions, with specific methods doing the actual work. All the examples we have seen so far use the formula methods; that is, the method called when the first argument is a formula. Because barchart() and dotplot() are frequently used for multiway tables stored as arrays, lattice also includes suitable methods that bypass the conversion to a data frame that would be required otherwise. or example, an alternative to the last example is > dotplot(vadeaths, type = "o", auto.key = list(points = TRUE, lines = TRUE, space = "right")) Rural ale Rural emale Urban ale Urban emale req ethods available for a particular generic can be listed using 2 > methods(generic.function = "dotplot") [1] dotplot.array* dotplot.default* dotplot.formula* dotplot.matrix* [5] dotplot.numeric* dotplot.table* Non-visible functions are asterisked The special features of the methods, if any, are described in their respective help pages; for example,?dotplot.matrix for the example above. 2 This is only true for S3 generics and methods. lattice can be extended to use the S system as well, although we will not discuss such extensions here.
14 1 DEEPAYAN SARKAR Exercise 5. Reproduce the ungrouped dot plot using the matrix method. Scatterplots and extensions. Scatterplots are commonly used for continuous bivariate data, as well as for time-series data. We use the Earthquake data, which contains measurements recorded at various seismometer locations for 23 large earthquakes in western North America between 190 and 190. Our first example plots the maximum horizontal acceleration measured against distance of the measuring station from the epicenter. > data(earthquake, package = "nlme") > xyplot(accel ~ distance, data = Earthquake) accel distance The plot shows patterns typical of a right skewed distribution, and can be improved by plotting the data on a log scale. It is common to add a reference grid and some sort of smooth; for example, > xyplot(accel ~ distance, data = Earthquake, scales = list(log = TRUE), type = c("p", "g", "smooth"), xlab = "Distance rom Epicenter (km)", ylab = "aximum Horizontal Acceleration (g)")
15 AN INTRODUCTION TO R 15 10^0.0 aximum Horizontal Acceleration (g) 10^ ^ ^ ^ ^ ^0.0 10^0.5 10^1.0 10^1.5 10^2.0 10^2.5 Distance rom Epicenter (km) Shingles. Conditioning by factors is possible with scatterplots as usual. It is also possible to condition on shingles, which are continuous analogues of factors, with levels defined by possibly overlapping intervals. Using the quakes dataset again, we can try to understand the three-dimensional distribution of earthquake epicenters by looking at a series of two-dimensional scatterplots. > Depth <- equal.count(quakes$depth, number=, overlap=.1) > summary(depth) Intervals: min max count Overlap between adjacent intervals: [1]
16 1 DEEPAYAN SARKAR > xyplot(lat ~ long Depth, data = quakes) Depth Depth Depth Depth Depth lat Depth Depth Depth long Trivariate displays. Of course, for continuous trivariate data, it may be more effective to use a three-dimensional scatterplot. > cloud(depth ~ lat * long, data = quakes, zlim = rev(range(quakes$depth)), screen = list(z = 105, x = -70), panel.aspect = 0.75, xlab = "Longitude", ylab = "Latitude", zlab = "Depth")
17 AN INTRODUCTION TO R 17 Depth Longitude Latitude Static three-dimensional scatterplots are not very useful because of the strong effect of camera direction. Unfortunately, lattice does not allow interactive manipulation of the viewing direction. Still, looking at a few such plots suggests that the epicenter locations are concentrated around two planes in three-dimensional space. Other trivariate functions are wireframe() and levelplot(), which display data in the form of a three-dimensional surface. We will not explicitly discuss these and the other high-level functions in lattice, but examples can be found in their help pages. Exercise. Try changing viewing direction in the previous plot by modifying the screen argument. The trellis object. One important feature of lattice that makes it different from traditional R graphics is that high-level functions do not actually plot anything. Instead, they return an object of class trellis, that needs to be print()-ed or plot()-ed. R s automatic printing rule means that in most cases, the user does not see any difference in behaviour. Here is one example where we use optional arguments of the plot() method for trellis objects to display two plots side by side. > dp.uspe <- dotplot(t(uspersonalexpenditure), groups = ALSE, layout = c(1, 5), xlab = "Expenditure (billion dollars)") > dp.uspe.log <- dotplot(t(uspersonalexpenditure), groups = ALSE, layout = c(1, 5), scales = list(x = list(log = 2)), xlab = "Expenditure (billion dollars)") > plot(dp.uspe, split = c(1, 1, 2, 1)) > plot(dp.uspe.log, split = c(2, 1, 2, 1), newpage = ALSE)
18 1 DEEPAYAN SARKAR Private Education Personal Care edical and Health Household Operation ood and Tobacco Private Education Personal Care edical and Health Household Operation ood and Tobacco ^0 2^2 2^ 2^ Expenditure (billion dollars) Expenditure (billion dollars) urther resources The material we have covered should be enough that the documentation accompanying the lattice package should begin to make sense. Another useful resource is the Lattice book, part of Springer s UseR! series. The book s website contains all figures and code from the book. The Trellis website from Bell Labs offers a wealth of material on the original Trellis implementation in S-PLUS that is largely applicable to lattice as well (of course, features unique to lattice are not covered)
Use R! Series Editors: Robert Gentleman Kurt Hornik Giovanni Parmigiani
Use R! Series Editors: Robert Gentleman Kurt Hornik Giovanni Parmigiani Use R! Albert:Bayesian Computation with R Cook/Swayne:Interactive and Dynamic Graphics for Data Analysis: With R and GGobi Hahne/Huber/Gentleman/Falcon:
Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller
Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller Getting to know the data An important first step before performing any kind of statistical analysis is to familiarize
Gates/filters in Flow Cytometry Data Visualization
Gates/filters in Flow Cytometry Data Visualization October 3, Abstract The flowviz package provides tools for visualization of flow cytometry data. This document describes the support for visualizing gates
Exploratory data analysis (Chapter 2) Fall 2011
Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,
Exercise 1.12 (Pg. 22-23)
Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.
Getting started with qplot
Chapter 2 Getting started with qplot 2.1 Introduction In this chapter, you will learn to make a wide variety of plots with your first ggplot2 function, qplot(), short for quick plot. qplot makes it easy
Scatter Plots with Error Bars
Chapter 165 Scatter Plots with Error Bars Introduction The procedure extends the capability of the basic scatter plot by allowing you to plot the variability in Y and X corresponding to each point. Each
HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS
Mathematics Revision Guides Histograms, Cumulative Frequency and Box Plots Page 1 of 25 M.K. HOME TUITION Mathematics Revision Guides Level: GCSE Higher Tier HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS
Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics
Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),
Data exploration with Microsoft Excel: analysing more than one variable
Data exploration with Microsoft Excel: analysing more than one variable Contents 1 Introduction... 1 2 Comparing different groups or different variables... 2 3 Exploring the association between categorical
Diagrams and Graphs of Statistical Data
Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in
Data Exploration Data Visualization
Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select
Gestation Period as a function of Lifespan
This document will show a number of tricks that can be done in Minitab to make attractive graphs. We work first with the file X:\SOR\24\M\ANIMALS.MTP. This first picture was obtained through Graph Plot.
Exploratory Data Analysis
Exploratory Data Analysis Johannes Schauer [email protected] Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction
3D Interactive Information Visualization: Guidelines from experience and analysis of applications
3D Interactive Information Visualization: Guidelines from experience and analysis of applications Richard Brath Visible Decisions Inc., 200 Front St. W. #2203, Toronto, Canada, [email protected] 1. EXPERT
CSU, Fresno - Institutional Research, Assessment and Planning - Dmitri Rogulkin
My presentation is about data visualization. How to use visual graphs and charts in order to explore data, discover meaning and report findings. The goal is to show that visual displays can be very effective
Lecture 2: Descriptive Statistics and Exploratory Data Analysis
Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals
Good Graphs: Graphical Perception and Data Visualization
Good Graphs: Graphical Perception and Data Visualization Nina Zumel August 28th, 2009 1 Introduction What makes a good graph? When faced with a slew of numeric data, graphical visualization can be a more
Formulas, Functions and Charts
Formulas, Functions and Charts :: 167 8 Formulas, Functions and Charts 8.1 INTRODUCTION In this leson you can enter formula and functions and perform mathematical calcualtions. You will also be able to
R Graphics Cookbook. Chang O'REILLY. Winston. Tokyo. Beijing Cambridge. Farnham Koln Sebastopol
R Graphics Cookbook Winston Chang Beijing Cambridge Farnham Koln Sebastopol O'REILLY Tokyo Table of Contents Preface ix 1. R Basics 1 1.1. Installing a Package 1 1.2. Loading a Package 2 1.3. Loading a
Summarizing and Displaying Categorical Data
Summarizing and Displaying Categorical Data Categorical data can be summarized in a frequency distribution which counts the number of cases, or frequency, that fall into each category, or a relative frequency
Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs
Types of Variables Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Quantitative (numerical)variables: take numerical values for which arithmetic operations make sense (addition/averaging)
Demographics of Atlanta, Georgia:
Demographics of Atlanta, Georgia: A Visual Analysis of the 2000 and 2010 Census Data 36-315 Final Project Rachel Cohen, Kathryn McKeough, Minnar Xie & David Zimmerman Ethnicities of Atlanta Figure 1: From
The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)
Describing Data: Categorical and Quantitative Variables Population The Big Picture Sampling Statistical Inference Sample Exploratory Data Analysis Descriptive Statistics In order to make sense of data,
Describing and presenting data
Describing and presenting data All epidemiological studies involve the collection of data on the exposures and outcomes of interest. In a well planned study, the raw observations that constitute the data
Iris Sample Data Set. Basic Visualization Techniques: Charts, Graphs and Maps. Summary Statistics. Frequency and Mode
Iris Sample Data Set Basic Visualization Techniques: Charts, Graphs and Maps CS598 Information Visualization Spring 2010 Many of the exploratory data techniques are illustrated with the Iris Plant data
CHAPTER TWELVE TABLES, CHARTS, AND GRAPHS
TABLES, CHARTS, AND GRAPHS / 75 CHAPTER TWELVE TABLES, CHARTS, AND GRAPHS Tables, charts, and graphs are frequently used in statistics to visually communicate data. Such illustrations are also a frequent
Descriptive Statistics
Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web
Microsoft Excel 2010 Part 3: Advanced Excel
CALIFORNIA STATE UNIVERSITY, LOS ANGELES INFORMATION TECHNOLOGY SERVICES Microsoft Excel 2010 Part 3: Advanced Excel Winter 2015, Version 1.0 Table of Contents Introduction...2 Sorting Data...2 Sorting
Introduction Course in SPSS - Evening 1
ETH Zürich Seminar für Statistik Introduction Course in SPSS - Evening 1 Seminar für Statistik, ETH Zürich All data used during the course can be downloaded from the following ftp server: ftp://stat.ethz.ch/u/sfs/spsskurs/
Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:
Density Curve A density curve is the graph of a continuous probability distribution. It must satisfy the following properties: 1. The total area under the curve must equal 1. 2. Every point on the curve
Data Visualization in R
Data Visualization in R L. Torgo [email protected] Faculdade de Ciências / LIAAD-INESC TEC, LA Universidade do Porto Oct, 2014 Introduction Motivation for Data Visualization Humans are outstanding at detecting
Visualizing Data. Contents. 1 Visualizing Data. Anthony Tanbakuchi Department of Mathematics Pima Community College. Introductory Statistics Lectures
Introductory Statistics Lectures Visualizing Data Descriptive Statistics I Department of Mathematics Pima Community College Redistribution of this material is prohibited without written permission of the
Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining
Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar What is data exploration? A preliminary exploration of the data to better understand its characteristics.
Bar Graphs and Dot Plots
CONDENSED L E S S O N 1.1 Bar Graphs and Dot Plots In this lesson you will interpret and create a variety of graphs find some summary values for a data set draw conclusions about a data set based on graphs
Excel -- Creating Charts
Excel -- Creating Charts The saying goes, A picture is worth a thousand words, and so true. Professional looking charts give visual enhancement to your statistics, fiscal reports or presentation. Excel
What Does the Normal Distribution Sound Like?
What Does the Normal Distribution Sound Like? Ananda Jayawardhana Pittsburg State University [email protected] Published: June 2013 Overview of Lesson In this activity, students conduct an investigation
ABOUT THIS DOCUMENT ABOUT CHARTS/COMMON TERMINOLOGY
A. Introduction B. Common Terminology C. Introduction to Chart Types D. Creating a Chart in FileMaker E. About Quick Charts 1. Quick Chart Behavior When Based on Sort Order F. Chart Examples 1. Charting
Variables. Exploratory Data Analysis
Exploratory Data Analysis Exploratory Data Analysis involves both graphical displays of data and numerical summaries of data. A common situation is for a data set to be represented as a matrix. There is
Data exploration with Microsoft Excel: univariate analysis
Data exploration with Microsoft Excel: univariate analysis Contents 1 Introduction... 1 2 Exploring a variable s frequency distribution... 2 3 Calculating measures of central tendency... 16 4 Calculating
Scientific Graphing in Excel 2010
Scientific Graphing in Excel 2010 When you start Excel, you will see the screen below. Various parts of the display are labelled in red, with arrows, to define the terms used in the remainder of this overview.
Symbolizing your data
Symbolizing your data 6 IN THIS CHAPTER A map gallery Drawing all features with one symbol Drawing features to show categories like names or types Managing categories Ways to map quantitative data Standard
Simple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
Describing, Exploring, and Comparing Data
24 Chapter 2. Describing, Exploring, and Comparing Data Chapter 2. Describing, Exploring, and Comparing Data There are many tools used in Statistics to visualize, summarize, and describe data. This chapter
Visualization Quick Guide
Visualization Quick Guide A best practice guide to help you find the right visualization for your data WHAT IS DOMO? Domo is a new form of business intelligence (BI) unlike anything before an executive
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI
STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members
Intermediate PowerPoint
Intermediate PowerPoint Charts and Templates By: Jim Waddell Last modified: January 2002 Topics to be covered: Creating Charts 2 Creating the chart. 2 Line Charts and Scatter Plots 4 Making a Line Chart.
CHARTS AND GRAPHS INTRODUCTION USING SPSS TO DRAW GRAPHS SPSS GRAPH OPTIONS CAG08
CHARTS AND GRAPHS INTRODUCTION SPSS and Excel each contain a number of options for producing what are sometimes known as business graphics - i.e. statistical charts and diagrams. This handout explores
ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS
DATABASE MARKETING Fall 2015, max 24 credits Dead line 15.10. ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS PART A Gains chart with excel Prepare a gains chart from the data in \\work\courses\e\27\e20100\ass4b.xls.
Inference for two Population Means
Inference for two Population Means Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison October 27 November 1, 2011 Two Population Means 1 / 65 Case Study Case Study Example
BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I
BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential
Linear Models in STATA and ANOVA
Session 4 Linear Models in STATA and ANOVA Page Strengths of Linear Relationships 4-2 A Note on Non-Linear Relationships 4-4 Multiple Linear Regression 4-5 Removal of Variables 4-8 Independent Samples
Foundation of Quantitative Data Analysis
Foundation of Quantitative Data Analysis Part 1: Data manipulation and descriptive statistics with SPSS/Excel HSRS #10 - October 17, 2013 Reference : A. Aczel, Complete Business Statistics. Chapters 1
AP * Statistics Review. Descriptive Statistics
AP * Statistics Review Descriptive Statistics Teacher Packet Advanced Placement and AP are registered trademark of the College Entrance Examination Board. The College Board was not involved in the production
Using SPSS, Chapter 2: Descriptive Statistics
1 Using SPSS, Chapter 2: Descriptive Statistics Chapters 2.1 & 2.2 Descriptive Statistics 2 Mean, Standard Deviation, Variance, Range, Minimum, Maximum 2 Mean, Median, Mode, Standard Deviation, Variance,
Lecture 1: Review and Exploratory Data Analysis (EDA)
Lecture 1: Review and Exploratory Data Analysis (EDA) Sandy Eckel [email protected] Department of Biostatistics, The Johns Hopkins University, Baltimore USA 21 April 2008 1 / 40 Course Information I Course
Geostatistics Exploratory Analysis
Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras [email protected]
The Center for Teaching, Learning, & Technology
The Center for Teaching, Learning, & Technology Instructional Technology Workshops Microsoft Excel 2010 Formulas and Charts Albert Robinson / Delwar Sayeed Faculty and Staff Development Programs Colston
Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering
Engineering Problem Solving and Excel EGN 1006 Introduction to Engineering Mathematical Solution Procedures Commonly Used in Engineering Analysis Data Analysis Techniques (Statistics) Curve Fitting techniques
Updates to Graphing with Excel
Updates to Graphing with Excel NCC has recently upgraded to a new version of the Microsoft Office suite of programs. As such, many of the directions in the Biology Student Handbook for how to graph with
Lecture 2. Summarizing the Sample
Lecture 2 Summarizing the Sample WARNING: Today s lecture may bore some of you It s (sort of) not my fault I m required to teach you about what we re going to cover today. I ll try to make it as exciting
TIBCO Spotfire Business Author Essentials Quick Reference Guide. Table of contents:
Table of contents: Access Data for Analysis Data file types Format assumptions Data from Excel Information links Add multiple data tables Create & Interpret Visualizations Table Pie Chart Cross Table Treemap
How To Write A Data Analysis
Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction
Modifying Colors and Symbols in ArcMap
Modifying Colors and Symbols in ArcMap Contents Introduction... 1 Displaying Categorical Data... 3 Creating New Categories... 5 Displaying Numeric Data... 6 Graduated Colors... 6 Graduated Symbols... 9
A Correlation of. to the. South Carolina Data Analysis and Probability Standards
A Correlation of to the South Carolina Data Analysis and Probability Standards INTRODUCTION This document demonstrates how Stats in Your World 2012 meets the indicators of the South Carolina Academic Standards
Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)
Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume
Figure 1. An embedded chart on a worksheet.
8. Excel Charts and Analysis ToolPak Charts, also known as graphs, have been an integral part of spreadsheets since the early days of Lotus 1-2-3. Charting features have improved significantly over the
TEACHER NOTES MATH NSPIRED
Math Objectives Students will understand that normal distributions can be used to approximate binomial distributions whenever both np and n(1 p) are sufficiently large. Students will understand that when
TIPS FOR DOING STATISTICS IN EXCEL
TIPS FOR DOING STATISTICS IN EXCEL Before you begin, make sure that you have the DATA ANALYSIS pack running on your machine. It comes with Excel. Here s how to check if you have it, and what to do if you
Week 1. Exploratory Data Analysis
Week 1 Exploratory Data Analysis Practicalities This course ST903 has students from both the MSc in Financial Mathematics and the MSc in Statistics. Two lectures and one seminar/tutorial per week. Exam
Creating an Excel XY (Scatter) Plot
Creating an Excel XY (Scatter) Plot EXCEL REVIEW 21-22 1 What is an XY or Scatter Plot? An XY or scatter plot either shows the relationships among the numeric values in several data series or plots two
The following is an overview of lessons included in the tutorial.
Chapter 2 Tutorial Tutorial Introduction This tutorial is designed to introduce you to some of Surfer's basic features. After you have completed the tutorial, you should be able to begin creating your
Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.
Graphical Representations of Data, Mean, Median and Standard Deviation In this class we will consider graphical representations of the distribution of a set of data. The goal is to identify the range of
R Graphics II: Graphics for Exploratory Data Analysis
UCLA Department of Statistics Statistical Consulting Center Irina Kukuyeva [email protected] April 26, 2010 Outline 1 Summary Plots 2 Time Series Plots 3 Geographical Plots 4 3D Plots 5 Simulation
PROPERTIES OF THE SAMPLE CORRELATION OF THE BIVARIATE LOGNORMAL DISTRIBUTION
PROPERTIES OF THE SAMPLE CORRELATION OF THE BIVARIATE LOGNORMAL DISTRIBUTION Chin-Diew Lai, Department of Statistics, Massey University, New Zealand John C W Rayner, School of Mathematics and Applied Statistics,
Exploratory Data Analysis
Exploratory Data Analysis Learning Objectives: 1. After completion of this module, the student will be able to explore data graphically in Excel using histogram boxplot bar chart scatter plot 2. After
Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining
Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 8/05/2005 1 What is data exploration? A preliminary
Graphing in SAS Software
Graphing in SAS Software Prepared by International SAS Training and Consulting Destiny Corporation 100 Great Meadow Rd Suite 601 - Wethersfield, CT 06109-2379 Phone: (860) 721-1684 - 1-800-7TRAINING Fax:
Microsoft Business Intelligence Visualization Comparisons by Tool
Microsoft Business Intelligence Visualization Comparisons by Tool Version 3: 10/29/2012 Purpose: Purpose of this document is to provide a quick reference of visualization options available in each tool.
Each function call carries out a single task associated with drawing the graph.
Chapter 3 Graphics with R 3.1 Low-Level Graphics R has extensive facilities for producing graphs. There are both low- and high-level graphics facilities. The low-level graphics facilities provide basic
Visualization of missing values using the R-package VIM
Institut f. Statistik u. Wahrscheinlichkeitstheorie 040 Wien, Wiedner Hauptstr. 8-0/07 AUSTRIA http://www.statistik.tuwien.ac.at Visualization of missing values using the R-package VIM M. Templ and P.
Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance
2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample
SPSS Introduction. Yi Li
SPSS Introduction Yi Li Note: The report is based on the websites below http://glimo.vub.ac.be/downloads/eng_spss_basic.pdf http://academic.udayton.edu/gregelvers/psy216/spss http://www.nursing.ucdenver.edu/pdf/factoranalysishowto.pdf
Data Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler
Data Mining: Exploring Data Lecture Notes for Chapter 3 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Topics Exploratory Data Analysis Summary Statistics Visualization What is data exploration?
Chapter 2 Statistical Foundations: Descriptive Statistics
Chapter 2 Statistical Foundations: Descriptive Statistics 20 Chapter 2 Statistical Foundations: Descriptive Statistics Presented in this chapter is a discussion of the types of data and the use of frequency
Directions for Frequency Tables, Histograms, and Frequency Bar Charts
Directions for Frequency Tables, Histograms, and Frequency Bar Charts Frequency Distribution Quantitative Ungrouped Data Dataset: Frequency_Distributions_Graphs-Quantitative.sav 1. Open the dataset containing
5. Correlation. Open HeightWeight.sav. Take a moment to review the data file.
5. Correlation Objectives Calculate correlations Calculate correlations for subgroups using split file Create scatterplots with lines of best fit for subgroups and multiple correlations Correlation The
Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:
Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours
UNDERSTANDING THE TWO-WAY ANOVA
UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables
Creating Bar Charts and Pie Charts Excel 2010 Tutorial (small revisions 1/20/14)
Creating Bar Charts and Pie Charts Excel 2010 Tutorial (small revisions 1/20/14) Excel file for use with this tutorial GraphTutorData.xlsx File Location http://faculty.ung.edu/kmelton/data/graphtutordata.xlsx
A simple three dimensional Column bar chart can be produced from the following example spreadsheet. Note that cell A1 is left blank.
Department of Library Services Creating Charts in Excel 2007 www.library.dmu.ac.uk Using the Microsoft Excel 2007 chart creation system you can quickly produce professional looking charts. This help sheet
AP STATISTICS REVIEW (YMS Chapters 1-8)
AP STATISTICS REVIEW (YMS Chapters 1-8) Exploring Data (Chapter 1) Categorical Data nominal scale, names e.g. male/female or eye color or breeds of dogs Quantitative Data rational scale (can +,,, with
DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.
DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,
How To: Analyse & Present Data
INTRODUCTION The aim of this How To guide is to provide advice on how to analyse your data and how to present it. If you require any help with your data analysis please discuss with your divisional Clinical
Assignment #1: Spreadsheets and Basic Data Visualization Sample Solution
Assignment #1: Spreadsheets and Basic Data Visualization Sample Solution Part 1: Spreadsheet Data Analysis Problem 1. Football data: Find the average difference between game predictions and actual outcomes,
Session 7 Bivariate Data and Analysis
Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares
1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number
1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression
