Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

Size: px
Start display at page:

Download "Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple."

Transcription

1 Graphical Representations of Data, Mean, Median and Standard Deviation In this class we will consider graphical representations of the distribution of a set of data. The goal is to identify the range of values and the most likely values as well as properties like symmetry, uni- or multi-modality, tail behavior, etc. To quantify the central value of the distribution of a given sample we define the average and the median. To quantify the spread (dispersion) of the sample with respect to its central value we define the standard deviation. 1 Pie Charts Cherry Cherry The pie Blueberry Blueberry charts correspond to the proportion of ice-cream flavors sold annually by a given brand Apple Apple Cherry Other Boston Cream Vanilla Cream Blueberry Vanilla Cream Apple Apple Cherry Other Boston Cream Vanilla Cream Blueberry Vanilla Cream Other Boston Cream Other Boston Cream 2

2 Pie Charts are a bad idea! From the R manual page for the pie function: Pie charts are a very bad way of displaying information. The eye is good at judging linear measures and bad at judging relative areas. A bar chart or dot chart is a preferable way of displaying this type of data. Cleveland (1985), page 264: "Data that can be shown by pie charts always can be shown by a dot chart. This means that judgements of position along a common scale can be made instead of the less accurate angle judgements." This statement is based on the empirical investigations of Cleveland and McGill as well as investigations by perceptual psychologists. 3 The same data are represented now with bar charts. Notice that this representation allows a very clear quantification of the differences between the ice-cream types Vanilla Cream Other Boston Cream Apple Cherry Blueberry Bar Charts Blueberry Cherry Apple Boston Cream Other Vanilla Cream

3 Bar Charts Strictly speaking bar charts can be used for drawing a summary of either quantitative or qualitative data types. Qualitative data types, such as nominal or ordinal variables, can be visualized using bar charts. The next data summary graphing techniques, Stem and Leaf Plots and Histograms are used for quantitative variables. 5 Stem and Leaf Plots A similar technique used to graphically represent data is to use stem and leaf plots. Take a look at the Healthy Breakfast data set. This datafile contains nutritional information and grocery shelf location for 77 breakfast cereals. Current research states that adults should consume no more than 30 % of their calories in the form of fat, they need about 50 grams (women) or 63 grams (men) of protein daily, and should provide for the remainder of their caloric intake with complex carbohydrates. One gram of fat contains 9 calories and carbohydrates and proteins contain 4 calories per gram. A good diet should also contain grams of dietary fiber. Check out R code. 6

4 7 Income level in $ frequency 0 1, ,000 2, ,000 3, ,000 4, ,000 5, ,000 6, ,000 7, ,000 10, ,000 15, ,000 25, ,000 50, ,000 and over Frequency Histograms Consider the table of families by income in the US in 1973 and corresponding percents. In this table class intervals include the left point, but not the right point. It is important to specify which of the endpoints are included in each class. Notice that in this case class intervals do not have the same length. 8

5 We can draw a frequency histogram for each of the income level ranges specified by dividing the frequency counts by the total number of families. However, the ranges are different widths, so that the area of each block is NOT equally proportional to the number of families with incomes in the corresponding class interval. We really WANT the areas of each block to equally represent the proportion of families within the income class interval, so instead we use a density histogram. 9 Density histograms are similar to frequency histogram except heights of rectangles are calculated by dividing relative frequency by class width. (frequency total number of families class width). Resulting rectangle heights called densities, and the vertical scale called density scale. Density Histograms Density Density Histogram for Income Income in $1000 NOTE: I used total subjects in income data set = 211,908,

6 IMPORTANT! When comparing data sets with different sample sizes OR when drawing a histograms with varying class interval widths, it is NOT appropriate to compare raw frequency histograms. Why? When sample sizes are different, density scale histograms are BETTER. 11 Drawing Density Histograms Once the distribution table of percentages is available the next step is to draw a horizontal axis specifying the class intervals. Then we draw the blocks remembering that In a density histogram the areas of the blocks represent percentages So, it is a mistake to set the heights of the blocks equal to the percentages in the table. (that would be a relative frequency histogram, which we ll talk about next.) To figure out the height of a block divide the percentage by the class width of the interval. The table needed to calculate the heights of the blocks looks like 12

7 Income level in $ percent class width (in $1,000 s) height 0 1, ,000 2, ,000 3, ,000 4, ,000 5, ,000 6, ,000 7, ,000 10, ,000 15, ,000 25, ,000 50, ,000 and over 1 13 Distribution of family income in the US in 1973 percent per $ This is the resulting histogram. The sum of the areas of a density histogram adds to 1. Notice that the class interval of incomes above $50,000 has been ignored income in $

8 Vertical scale What is the meaning of the vertical scale in a histogram? Remember that the area of the blocks is proportional to the percents. A high height implies that large chunks of area accumulate in small portions of the horizontal scale. This implies that the density of the data is high in the intervals where the height is large. In other words, the data are more crowded in those intervals. 15 Another Example of a Density Histogram Information is available from 131 hospitals. We show a histogram of the average length of stay measured in days for each hospital. The area of each block is proportional to the number of hospitals in the corresponding class interval. Histogram of the average length of stay in hospital length of stay (days) In this example all the intervals have the same length, so the heights of the blocks give all the information about the number of hospitals in each class. 16

9 There are 7 class intervals corresponding to 6 to 8 days 8 to 10 days 10 to 12 days 12 to 14 days 14 to 16 days 16 to 18 days 18 to 20 days Note that the class that corresponds to 14 to 16 days is empty and that the class with the highest count of hospitals is the one of 8 to 10 days. 17 Cross tabulation In many situations we need to perform an exploratory analysis of data to observe possible associations with a discrete variable. For example, consider measuring the blood pressure of women and divide them in two groups: one taking the contraceptive pill and the other not taking it. We can produce a table with the distribution of one group in one column and the distribution of the other in another column. This can be used to produce two histograms in order to make a visual comparison of the the two groups. The variable that is used for the cross-tabulation is usually referred to as a covariable. 18

10 blood pressure non users users (mm) % % under women not using the pill blood pressure (mm) over We observe that the histogram of pill users is slightly shifted to the right, suggesting an increase in blood pressure among women taking the pill. These are relative frequency histograms. percent per mm percent per mm women using the pill blood pressure (mm) 19 percent per mm percent per mm Relative Frequency Histograms women not using the pill blood pressure (mm) women using the pill blood pressure (mm) These are relative frequency histograms, because the height of the bars is the fraction of times the value occurs e.g. the frequency of value(s) number of observations in the set. Relative frequency histograms are also useful for comparing two samples with different sample sizes. The Sum of all relative frequencies in a dataset is 1. 20

11 Problems Data from the 1990 Census produce the following for houses in the New York City area that are either occupied by the owner or rented out. 1. The owner-occupied percents add up to 99.9% and the renter-occupied percents add up to 100.1%, why? 2. The percentage of one-room units is much larger for renter-occupied housing. Is that because there is more renter-occupied housing in total? 3. Which are larger on the whole: the owner-occupied units or the renter-occupied units? 21 Number of Rooms Owner Occupied Renter Occupied Total Number 785,120 1,782,459 22

12 The answer to the first question is that there is rounding involved in the calculation of the percentages. As for the second question, the fact that we are taking percentages accounts for the difference in totals, so a larger total of renter-occupied units does not explain the difference. What seems to be happening is that units for rent tend to be smaller than units occupied by their owners. This is more clearly seen from the comparison of the two histograms. 23 Owner!occupied Renter!occupied

13 Average and spread in a histogram A histogram provides a graphical description of the distribution of a sample of data. If we want to summarize the properties of such a distribution we can measure the center and the spread of the histogram. Histogram of n1 These two histograms correspond to samples with the same center. The spread of the sample on top is smaller than that of the sample in the bottom Density Density !6!4! n.1 Histogram of n2!6!4! n.2 25 Average and median In addition to summarizing a variable graphically to look for patterns, it s also useful to summarize it numerically. The three most useful measures of center (or central tendency) are the mean, the median (and other quantiles, or percentiles), and the mode. It turns out the the mean has the graphical interpretation of the center of gravity of the data. If you visualize the histogram of a variable as made of bricks that are sitting on a number line made of plywood, which in turn is put on top of a saw-hours, the mean is the place where the histogram would exactly balance. 26

14 To obtain an estimate of the center of the distribution we can calculate an average. The average of a list of numbers equals their sum, divided by how many they are Thus, if 18; 18; 21; 20; 19; 20; 20; 20; 19; 20 are the ages of 10 students in this class, the average is given by = 19.5 In the hospital data that we considered in the previous class the data corresponded to the average length of stay of patients in each hospital in the survey. This means that the length of stay of all patients in a given hospital were added and the sum divided by the number of patients in that hospital. (remember Summation Notation??) 27 Average and median The median of a column of numbers is found by sorting the data, from smallest to highest, and finding the middle value in the list. If the sorted list has an odd number of elements, then the median is uniquely defined. If the sorted list has an even number of elements, the median is the mean of the two middle values. 28

15 histogram of rainfall in Guarico, Venezuela This histogram corresponds to Density median mean mm the rainfall over periods of 10 days in an area of the central plains of Venezuela. The average or mean rainfall is mm. We observe that only about 30% of the observations are above the average. Notice that this histogram is not symmetric with respect to the 29 average. The median of a histogram is the value with half the area to the left and half to the right. The median and average of a non-symmetric histogram are DIFFERENT. histogram of rainfall in Guarico, Venezuela Density median mean mm 30

16 Histogram of dat A symmetric histogram will look like this. In this case 50% of the data are above the average. Density In a symmetric histogram the median and the average coincide. By definition the median is the 50 th percentile, although it s also useful sometimes to look at other percentiles, for example the 25 th percentile ( also called the first quartile) is the place where 1 4 of the data is to the left of that place. dat 31 Average bigger than median: long right tail Average about the same as median: symmetry Average is smaller than median: long left tail The average is very sensitive to extreme observations, so when dealing with variables like income or rainfall, that exhibit very long tails, it is preferable to use the median as a measure of centrality. The relationship between the average and the median determines the shape of the tails of a histogram. 32

17 Problem 1:According to the Department of Commerce, the mean and median price of new houses sold in the United States in mid 1988 were 141, 200 and 117, 800. Which of these numbers is the mean and which is the median? Explain your answer. Problem 2: The number of deaths from cancer in the US has risen steadily over time. In 1985, about 462,000 people died of cancer, up from deaths in A member of Congress says that these numbers show that these numbers show that no progress has been made in treating cancer. Explain how the number of people dying of cancer could increase even if treatment of the disease were improving. Then describe at least one variable that would be a more appropriate measure of the effectiveness of medical treatment for a potentially fatal disease. 33 Consider the sample A measure of size 0, 5, 8, 7, 3 How big are these five numbers? If we consider the average as a measure of size then we obtain 0.2, which is a fairly small value compared to 7. The trouble is that in the average large negative quantities cancel large positive ones. To avoid this problem we need a measure of size that disregards signs. We proceed as follows: 1. square all values 2. Calculate the average of the resulting numbers 3. Take the root of the resulting mean. This is called the root mean square size of the sample. 34

18 For the previous data set we have r.m.s. size = ( 8) ( 3) 2 ) 5 = We could have also considered the average disregarding the signs, which amounts to = 4.6 Unfortunately the mathematical properties of this way of measuring size are not as appealing as the ones of r.m.s. 35 Spread As we saw at the beginning of the lecture two samples can have the same center and be scattered along their ranges in different ways. To measure the way a sample is spread around its average we can use the standard deviation, or SD. The SD of a list of numbers measures how far away they are from their average Thus a large SD implies that many observations are far from the overall average. Most observations will be one SD from the average. Very few will be more than two SDs away. 36

19 Empirical Rule SDs are a pain to compute by hand or with a calculator, and it s easy to make mistakes when doing so, so it s good to have a simple way to roughly approximate the SD of a list of numbers by looking at its histogram. If you start at the mean and go one SD either way, you ll capture about 2 3 of the data. Roughly 95% of the observations are within two SDs of the average. Roughly 99% of the observations are within three SDs of the average. This statements are more accurate when the distribution is symmetric. 37 Generally, more data is better than less data because more data mean less uncertainty (or smaller give or take). 38

20 Empirical Rule and the Cereal Data Example Density Histogram Density Let s look back at the cereal data example, which has a mean of about 160 mg of sodium milligrams of sodium Using the empirical rule, what guess would you make for the SD? 39 Goldilocks and the Cereal Data Example If you guessed 20 mg, that would be too small, because mg ought to be about 2 3 of the data. If you guessed 100 mg, that would be too large, because mg is more than 2 3 of the data. If you guessed 80 mg, then mg ought to be about 2 3 data, and 0-320mg would be about 95% of the data, which looks about just right! of the 40

21 Calculating the SD To calculate the standard deviation of a sample follow the steps: Calculate the average Calculate the list of deviations from the average by taking the difference between each datum and the average. Calculated the r.m.s. size of the resulting list. SD = r.m.s. deviation from average. Consider the list 20,10,15,15. Then average = The list of deviations is 5, -5, 0, 0. Then = 15 SD = 52 + ( 5) = Using a calculator Most scientific calculators will have a function to calculate the average and the SD of a sample. The steps needed to obtain those values vary from model to model. The important fact is that most calculators do not produce the SD as we have defined it here. They consider the sum of the squares of the deviations over the total number of data minus one. So, if you obtain the SD from your calculator (or spreadsheet), say SD, then SD = number of entries - one number of entries SD Some calculators have both, SD and SD. Please read the manual of your calculator regarding this fact. Notice that the units of SD are the same as the original data. So if the data were measured in years, SD is also in years. 42

22 Problems Problem 1: Both the following lists have the same average of 50. Which one has the smaller SD and why? (Do no computations) 1. 50,40,60,30,70,25, ,40,60,30,70,25,75,50,50,50 The second list has more entries at the average, so the SD is smaller. Repeat for the following two lists 1. 50,40,60,30,70,25, ,40,60,30,70,25,75,99,1 The second list has two wild observations, 99 and 1, which are away from the average, so the SD is larger. 43 Problem 2: Consider the list of numbers Without doing any arithmetic, guess whether the average is around 1, 5 or 10. Only three of the numbers are smaller than 1, none are bigger than 10, so the average is around Without doing any arithmetic, guess whether the SD is around 1,3 or 6. If the SD is 1, then the entries 0.6 and 9.9 are too far away from the average. The entries are too concentrated around 5 for the SD to be 6. So the 3 is the most likely value. 44

23 Problem 3: The usual method for determining heart rate is to take the pulse and count the number of beats in a given time period. The results are generally reported as beats per minute; for instance, if the time period is 15 seconds, the count is multilied by four. Take your pulse for two 15-sec. periods, two 30-sec. periods, and two 1-minute periods. Convert the counts to beats per minute and report the results. Which procedure do you think gives the best results?? Why? 45

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),

More information

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members

More information

Exploratory data analysis (Chapter 2) Fall 2011

Exploratory data analysis (Chapter 2) Fall 2011 Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,

More information

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Types of Variables Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Quantitative (numerical)variables: take numerical values for which arithmetic operations make sense (addition/averaging)

More information

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median CONDENSED LESSON 2.1 Box Plots In this lesson you will create and interpret box plots for sets of data use the interquartile range (IQR) to identify potential outliers and graph them on a modified box

More information

Descriptive Statistics

Descriptive Statistics Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

Describing, Exploring, and Comparing Data

Describing, Exploring, and Comparing Data 24 Chapter 2. Describing, Exploring, and Comparing Data Chapter 2. Describing, Exploring, and Comparing Data There are many tools used in Statistics to visualize, summarize, and describe data. This chapter

More information

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences Introduction to Statistics for Psychology and Quantitative Methods for Human Sciences Jonathan Marchini Course Information There is website devoted to the course at http://www.stats.ox.ac.uk/ marchini/phs.html

More information

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

More information

Valor Christian High School Mrs. Bogar Biology Graphing Fun with a Paper Towel Lab

Valor Christian High School Mrs. Bogar Biology Graphing Fun with a Paper Towel Lab 1 Valor Christian High School Mrs. Bogar Biology Graphing Fun with a Paper Towel Lab I m sure you ve wondered about the absorbency of paper towel brands as you ve quickly tried to mop up spilled soda from

More information

AMS 7L LAB #2 Spring, 2009. Exploratory Data Analysis

AMS 7L LAB #2 Spring, 2009. Exploratory Data Analysis AMS 7L LAB #2 Spring, 2009 Exploratory Data Analysis Name: Lab Section: Instructions: The TAs/lab assistants are available to help you if you have any questions about this lab exercise. If you have any

More information

Chapter 2: Frequency Distributions and Graphs

Chapter 2: Frequency Distributions and Graphs Chapter 2: Frequency Distributions and Graphs Learning Objectives Upon completion of Chapter 2, you will be able to: Organize the data into a table or chart (called a frequency distribution) Construct

More information

Describing and presenting data

Describing and presenting data Describing and presenting data All epidemiological studies involve the collection of data on the exposures and outcomes of interest. In a well planned study, the raw observations that constitute the data

More information

Diagrams and Graphs of Statistical Data

Diagrams and Graphs of Statistical Data Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in

More information

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175) Describing Data: Categorical and Quantitative Variables Population The Big Picture Sampling Statistical Inference Sample Exploratory Data Analysis Descriptive Statistics In order to make sense of data,

More information

Summarizing and Displaying Categorical Data

Summarizing and Displaying Categorical Data Summarizing and Displaying Categorical Data Categorical data can be summarized in a frequency distribution which counts the number of cases, or frequency, that fall into each category, or a relative frequency

More information

Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

More information

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals

More information

6.4 Normal Distribution

6.4 Normal Distribution Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

More information

Final Exam Practice Problem Answers

Final Exam Practice Problem Answers Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

More information

Variables. Exploratory Data Analysis

Variables. Exploratory Data Analysis Exploratory Data Analysis Exploratory Data Analysis involves both graphical displays of data and numerical summaries of data. A common situation is for a data set to be represented as a matrix. There is

More information

Statistics Chapter 2

Statistics Chapter 2 Statistics Chapter 2 Frequency Tables A frequency table organizes quantitative data. partitions data into classes (intervals). shows how many data values are in each class. Test Score Number of Students

More information

MBA 611 STATISTICS AND QUANTITATIVE METHODS

MBA 611 STATISTICS AND QUANTITATIVE METHODS MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain

More information

Descriptive Statistics and Measurement Scales

Descriptive Statistics and Measurement Scales Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

Descriptive statistics; Correlation and regression

Descriptive statistics; Correlation and regression Descriptive statistics; and regression Patrick Breheny September 16 Patrick Breheny STA 580: Biostatistics I 1/59 Tables and figures Descriptive statistics Histograms Numerical summaries Percentiles Human

More information

Exercise 1.12 (Pg. 22-23)

Exercise 1.12 (Pg. 22-23) Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

CHAPTER THREE. Key Concepts

CHAPTER THREE. Key Concepts CHAPTER THREE Key Concepts interval, ordinal, and nominal scale quantitative, qualitative continuous data, categorical or discrete data table, frequency distribution histogram, bar graph, frequency polygon,

More information

Descriptive statistics parameters: Measures of centrality

Descriptive statistics parameters: Measures of centrality Descriptive statistics parameters: Measures of centrality Contents Definitions... 3 Classification of descriptive statistics parameters... 4 More about central tendency estimators... 5 Relationship between

More information

Means, standard deviations and. and standard errors

Means, standard deviations and. and standard errors CHAPTER 4 Means, standard deviations and standard errors 4.1 Introduction Change of units 4.2 Mean, median and mode Coefficient of variation 4.3 Measures of variation 4.4 Calculating the mean and standard

More information

Foundation of Quantitative Data Analysis

Foundation of Quantitative Data Analysis Foundation of Quantitative Data Analysis Part 1: Data manipulation and descriptive statistics with SPSS/Excel HSRS #10 - October 17, 2013 Reference : A. Aczel, Complete Business Statistics. Chapters 1

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

TEACHER NOTES MATH NSPIRED

TEACHER NOTES MATH NSPIRED Math Objectives Students will understand that normal distributions can be used to approximate binomial distributions whenever both np and n(1 p) are sufficiently large. Students will understand that when

More information

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY 1. Introduction Besides arriving at an appropriate expression of an average or consensus value for observations of a population, it is important to

More information

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Descriptive Statistics Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Statistics as a Tool for LIS Research Importance of statistics in research

More information

Lesson 4 Measures of Central Tendency

Lesson 4 Measures of Central Tendency Outline Measures of a distribution s shape -modality and skewness -the normal distribution Measures of central tendency -mean, median, and mode Skewness and Central Tendency Lesson 4 Measures of Central

More information

Unit 7: Normal Curves

Unit 7: Normal Curves Unit 7: Normal Curves Summary of Video Histograms of completely unrelated data often exhibit similar shapes. To focus on the overall shape of a distribution and to avoid being distracted by the irregularities

More information

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS Mathematics Revision Guides Histograms, Cumulative Frequency and Box Plots Page 1 of 25 M.K. HOME TUITION Mathematics Revision Guides Level: GCSE Higher Tier HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

More information

Northumberland Knowledge

Northumberland Knowledge Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about

More information

6 3 The Standard Normal Distribution

6 3 The Standard Normal Distribution 290 Chapter 6 The Normal Distribution Figure 6 5 Areas Under a Normal Distribution Curve 34.13% 34.13% 2.28% 13.59% 13.59% 2.28% 3 2 1 + 1 + 2 + 3 About 68% About 95% About 99.7% 6 3 The Distribution Since

More information

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction CA200 Quantitative Analysis for Business Decisions File name: CA200_Section_04A_StatisticsIntroduction Table of Contents 4. Introduction to Statistics... 1 4.1 Overview... 3 4.2 Discrete or continuous

More information

Week 4: Standard Error and Confidence Intervals

Week 4: Standard Error and Confidence Intervals Health Sciences M.Sc. Programme Applied Biostatistics Week 4: Standard Error and Confidence Intervals Sampling Most research data come from subjects we think of as samples drawn from a larger population.

More information

Exploratory Data Analysis. Psychology 3256

Exploratory Data Analysis. Psychology 3256 Exploratory Data Analysis Psychology 3256 1 Introduction If you are going to find out anything about a data set you must first understand the data Basically getting a feel for you numbers Easier to find

More information

DATA INTERPRETATION AND STATISTICS

DATA INTERPRETATION AND STATISTICS PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

More information

Exploratory Data Analysis

Exploratory Data Analysis Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction

More information

The correlation coefficient

The correlation coefficient The correlation coefficient Clinical Biostatistics The correlation coefficient Martin Bland Correlation coefficients are used to measure the of the relationship or association between two quantitative

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

Week 1. Exploratory Data Analysis

Week 1. Exploratory Data Analysis Week 1 Exploratory Data Analysis Practicalities This course ST903 has students from both the MSc in Financial Mathematics and the MSc in Statistics. Two lectures and one seminar/tutorial per week. Exam

More information

Lecture 1: Review and Exploratory Data Analysis (EDA)

Lecture 1: Review and Exploratory Data Analysis (EDA) Lecture 1: Review and Exploratory Data Analysis (EDA) Sandy Eckel seckel@jhsph.edu Department of Biostatistics, The Johns Hopkins University, Baltimore USA 21 April 2008 1 / 40 Course Information I Course

More information

Expression. Variable Equation Polynomial Monomial Add. Area. Volume Surface Space Length Width. Probability. Chance Random Likely Possibility Odds

Expression. Variable Equation Polynomial Monomial Add. Area. Volume Surface Space Length Width. Probability. Chance Random Likely Possibility Odds Isosceles Triangle Congruent Leg Side Expression Equation Polynomial Monomial Radical Square Root Check Times Itself Function Relation One Domain Range Area Volume Surface Space Length Width Quantitative

More information

Comparing Means in Two Populations

Comparing Means in Two Populations Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we

More information

CHAPTER THREE COMMON DESCRIPTIVE STATISTICS COMMON DESCRIPTIVE STATISTICS / 13

CHAPTER THREE COMMON DESCRIPTIVE STATISTICS COMMON DESCRIPTIVE STATISTICS / 13 COMMON DESCRIPTIVE STATISTICS / 13 CHAPTER THREE COMMON DESCRIPTIVE STATISTICS The analysis of data begins with descriptive statistics such as the mean, median, mode, range, standard deviation, variance,

More information

Scatter Plots with Error Bars

Scatter Plots with Error Bars Chapter 165 Scatter Plots with Error Bars Introduction The procedure extends the capability of the basic scatter plot by allowing you to plot the variability in Y and X corresponding to each point. Each

More information

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential

More information

AP Statistics Solutions to Packet 2

AP Statistics Solutions to Packet 2 AP Statistics Solutions to Packet 2 The Normal Distributions Density Curves and the Normal Distribution Standard Normal Calculations HW #9 1, 2, 4, 6-8 2.1 DENSITY CURVES (a) Sketch a density curve that

More information

Examples of Data Representation using Tables, Graphs and Charts

Examples of Data Representation using Tables, Graphs and Charts Examples of Data Representation using Tables, Graphs and Charts This document discusses how to properly display numerical data. It discusses the differences between tables and graphs and it discusses various

More information

Mind on Statistics. Chapter 2

Mind on Statistics. Chapter 2 Mind on Statistics Chapter 2 Sections 2.1 2.3 1. Tallies and cross-tabulations are used to summarize which of these variable types? A. Quantitative B. Mathematical C. Continuous D. Categorical 2. The table

More information

MEASURES OF VARIATION

MEASURES OF VARIATION NORMAL DISTRIBTIONS MEASURES OF VARIATION In statistics, it is important to measure the spread of data. A simple way to measure spread is to find the range. But statisticians want to know if the data are

More information

Statistics Revision Sheet Question 6 of Paper 2

Statistics Revision Sheet Question 6 of Paper 2 Statistics Revision Sheet Question 6 of Paper The Statistics question is concerned mainly with the following terms. The Mean and the Median and are two ways of measuring the average. sumof values no. of

More information

Probability Distributions

Probability Distributions CHAPTER 5 Probability Distributions CHAPTER OUTLINE 5.1 Probability Distribution of a Discrete Random Variable 5.2 Mean and Standard Deviation of a Probability Distribution 5.3 The Binomial Distribution

More information

Data Exploration Data Visualization

Data Exploration Data Visualization Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select

More information

Unit 9 Describing Relationships in Scatter Plots and Line Graphs

Unit 9 Describing Relationships in Scatter Plots and Line Graphs Unit 9 Describing Relationships in Scatter Plots and Line Graphs Objectives: To construct and interpret a scatter plot or line graph for two quantitative variables To recognize linear relationships, non-linear

More information

THE BINOMIAL DISTRIBUTION & PROBABILITY

THE BINOMIAL DISTRIBUTION & PROBABILITY REVISION SHEET STATISTICS 1 (MEI) THE BINOMIAL DISTRIBUTION & PROBABILITY The main ideas in this chapter are Probabilities based on selecting or arranging objects Probabilities based on the binomial distribution

More information

Continuing, we get (note that unlike the text suggestion, I end the final interval with 95, not 85.

Continuing, we get (note that unlike the text suggestion, I end the final interval with 95, not 85. Chapter 3 -- Review Exercises Statistics 1040 -- Dr. McGahagan Problem 1. Histogram of male heights. Shaded area shows percentage of men between 66 and 72 inches in height; this translates as "66 inches

More information

Visualizing Data. Contents. 1 Visualizing Data. Anthony Tanbakuchi Department of Mathematics Pima Community College. Introductory Statistics Lectures

Visualizing Data. Contents. 1 Visualizing Data. Anthony Tanbakuchi Department of Mathematics Pima Community College. Introductory Statistics Lectures Introductory Statistics Lectures Visualizing Data Descriptive Statistics I Department of Mathematics Pima Community College Redistribution of this material is prohibited without written permission of the

More information

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

More information

What Does the Normal Distribution Sound Like?

What Does the Normal Distribution Sound Like? What Does the Normal Distribution Sound Like? Ananda Jayawardhana Pittsburg State University ananda@pittstate.edu Published: June 2013 Overview of Lesson In this activity, students conduct an investigation

More information

Mathematical goals. Starting points. Materials required. Time needed

Mathematical goals. Starting points. Materials required. Time needed Level S6 of challenge: B/C S6 Interpreting frequency graphs, cumulative cumulative frequency frequency graphs, graphs, box and box whisker and plots whisker plots Mathematical goals Starting points Materials

More information

Chapter 2 Data Exploration

Chapter 2 Data Exploration Chapter 2 Data Exploration 2.1 Data Visualization and Summary Statistics After clearly defining the scientific question we try to answer, selecting a set of representative members from the population of

More information

Using SPSS, Chapter 2: Descriptive Statistics

Using SPSS, Chapter 2: Descriptive Statistics 1 Using SPSS, Chapter 2: Descriptive Statistics Chapters 2.1 & 2.2 Descriptive Statistics 2 Mean, Standard Deviation, Variance, Range, Minimum, Maximum 2 Mean, Median, Mode, Standard Deviation, Variance,

More information

2 Describing, Exploring, and

2 Describing, Exploring, and 2 Describing, Exploring, and Comparing Data This chapter introduces the graphical plotting and summary statistics capabilities of the TI- 83 Plus. First row keys like \ R (67$73/276 are used to obtain

More information

Covariance and Correlation

Covariance and Correlation Covariance and Correlation ( c Robert J. Serfling Not for reproduction or distribution) We have seen how to summarize a data-based relative frequency distribution by measures of location and spread, such

More information

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles.

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles. Math 1530-017 Exam 1 February 19, 2009 Name Student Number E There are five possible responses to each of the following multiple choice questions. There is only on BEST answer. Be sure to read all possible

More information

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions. Algebra I Overview View unit yearlong overview here Many of the concepts presented in Algebra I are progressions of concepts that were introduced in grades 6 through 8. The content presented in this course

More information

The Importance of Statistics Education

The Importance of Statistics Education The Importance of Statistics Education Professor Jessica Utts Department of Statistics University of California, Irvine http://www.ics.uci.edu/~jutts jutts@uci.edu Outline of Talk What is Statistics? Four

More information

Scope and Sequence KA KB 1A 1B 2A 2B 3A 3B 4A 4B 5A 5B 6A 6B

Scope and Sequence KA KB 1A 1B 2A 2B 3A 3B 4A 4B 5A 5B 6A 6B Scope and Sequence Earlybird Kindergarten, Standards Edition Primary Mathematics, Standards Edition Copyright 2008 [SingaporeMath.com Inc.] The check mark indicates where the topic is first introduced

More information

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, 99.42 cm

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, 99.42 cm Error Analysis and the Gaussian Distribution In experimental science theory lives or dies based on the results of experimental evidence and thus the analysis of this evidence is a critical part of the

More information

Measures of Central Tendency and Variability: Summarizing your Data for Others

Measures of Central Tendency and Variability: Summarizing your Data for Others Measures of Central Tendency and Variability: Summarizing your Data for Others 1 I. Measures of Central Tendency: -Allow us to summarize an entire data set with a single value (the midpoint). 1. Mode :

More information

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary Shape, Space, and Measurement- Primary A student shall apply concepts of shape, space, and measurement to solve problems involving two- and three-dimensional shapes by demonstrating an understanding of:

More information

Sta 309 (Statistics And Probability for Engineers)

Sta 309 (Statistics And Probability for Engineers) Instructor: Prof. Mike Nasab Sta 309 (Statistics And Probability for Engineers) Chapter 2 Organizing and Summarizing Data Raw Data: When data are collected in original form, they are called raw data. The

More information

Fraction Basics. 1. Identify the numerator and denominator of a

Fraction Basics. 1. Identify the numerator and denominator of a . Fraction Basics. OBJECTIVES 1. Identify the numerator and denominator of a fraction. Use fractions to name parts of a whole. Identify proper fractions. Write improper fractions as mixed numbers. Write

More information

Statistics. Measurement. Scales of Measurement 7/18/2012

Statistics. Measurement. Scales of Measurement 7/18/2012 Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does

More information

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1.

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1. Lecture 6: Chapter 6: Normal Probability Distributions A normal distribution is a continuous probability distribution for a random variable x. The graph of a normal distribution is called the normal curve.

More information

Lecture 2. Summarizing the Sample

Lecture 2. Summarizing the Sample Lecture 2 Summarizing the Sample WARNING: Today s lecture may bore some of you It s (sort of) not my fault I m required to teach you about what we re going to cover today. I ll try to make it as exciting

More information

Correlation and Regression

Correlation and Regression Correlation and Regression Scatterplots Correlation Explanatory and response variables Simple linear regression General Principles of Data Analysis First plot the data, then add numerical summaries Look

More information

. 58 58 60 62 64 66 68 70 72 74 76 78 Father s height (inches)

. 58 58 60 62 64 66 68 70 72 74 76 78 Father s height (inches) PEARSON S FATHER-SON DATA The following scatter diagram shows the heights of 1,0 fathers and their full-grown sons, in England, circa 1900 There is one dot for each father-son pair Heights of fathers and

More information

A Correlation of. to the. South Carolina Data Analysis and Probability Standards

A Correlation of. to the. South Carolina Data Analysis and Probability Standards A Correlation of to the South Carolina Data Analysis and Probability Standards INTRODUCTION This document demonstrates how Stats in Your World 2012 meets the indicators of the South Carolina Academic Standards

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

Numeracy and mathematics Experiences and outcomes

Numeracy and mathematics Experiences and outcomes Numeracy and mathematics Experiences and outcomes My learning in mathematics enables me to: develop a secure understanding of the concepts, principles and processes of mathematics and apply these in different

More information

Introduction to Quantitative Methods

Introduction to Quantitative Methods Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

More information

Basic Tools for Process Improvement

Basic Tools for Process Improvement What is a Histogram? A Histogram is a vertical bar chart that depicts the distribution of a set of data. Unlike Run Charts or Control Charts, which are discussed in other modules, a Histogram does not

More information

Numeracy Targets. I can count at least 20 objects

Numeracy Targets. I can count at least 20 objects Targets 1c I can read numbers up to 10 I can count up to 10 objects I can say the number names in order up to 20 I can write at least 4 numbers up to 10. When someone gives me a small number of objects

More information

Mathematics. Probability and Statistics Curriculum Guide. Revised 2010

Mathematics. Probability and Statistics Curriculum Guide. Revised 2010 Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction

More information

Characteristics of Binomial Distributions

Characteristics of Binomial Distributions Lesson2 Characteristics of Binomial Distributions In the last lesson, you constructed several binomial distributions, observed their shapes, and estimated their means and standard deviations. In Investigation

More information

Chapter 4: Average and standard deviation

Chapter 4: Average and standard deviation Chapter 4: Average and standard deviation Context................................................................... 2 Average vs. median 3 Average.................................................................

More information

Measurement with Ratios

Measurement with Ratios Grade 6 Mathematics, Quarter 2, Unit 2.1 Measurement with Ratios Overview Number of instructional days: 15 (1 day = 45 minutes) Content to be learned Use ratio reasoning to solve real-world and mathematical

More information

Using Excel for descriptive statistics

Using Excel for descriptive statistics FACT SHEET Using Excel for descriptive statistics Introduction Biologists no longer routinely plot graphs by hand or rely on calculators to carry out difficult and tedious statistical calculations. These

More information

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce

More information

Module 2: Introduction to Quantitative Data Analysis

Module 2: Introduction to Quantitative Data Analysis Module 2: Introduction to Quantitative Data Analysis Contents Antony Fielding 1 University of Birmingham & Centre for Multilevel Modelling Rebecca Pillinger Centre for Multilevel Modelling Introduction...

More information