Handout 2: Data Exploration

Size: px
Start display at page:

Download "Handout 2: Data Exploration"

Transcription

1 Handout 2: Data Exploration Reading Assignment: Sections 1.3, 1.4, 1.5, 1.6, and Chapter 3 We previously looked at methods of sampling and the measurement levels of data. Suppose that you are a project development manager for an energy project. Your company, a new wind power energy company in Michigan, wants to help minimize emissions while producing optimal energy levels and wishes compare their emissions with the rest of the nation as well as in Michigan. Now we will begin to discuss analyzing data; however, before doing in-depth analyses, it is important to summarize what information is present in your data. Please note that we will be using data from the annual U.S. Electric Power Industry Estimated Emissions Report for this handout. Below is a sample of ten of the observations. From the sample of data, we see that there are seven variables Year, State, Type of Producer, Energy Source, CO2, SO2, and NOX but by just looking at the sample of the data, we do not get all the information. It is known that the data is collected from all 50 states between the years of 1990 and Also, the carbon dioxide, sulfur dioxide, and nitrogen oxide measurements (all in metric tons) are taken from all eight different energy sources with all seven types of producers (Knowing this information, can you speculate whether the entire dataset is a sample or a population?) Looking at the sample of the data and the variable descriptions, state what the levels of measurement for each variable are in the table below. Also, are the numerical variables continuous, taking on any value in an interval (e.g. height, blood pressure), or discrete, taking on only one of a countable list of distinct values (e.g. number of roommates living with you)? Variable Description Year State Type of Producer Energy Source CO2 SO2 NOX 1

2 Categorical Data Recall that categorical data consists of groups or category names and that they may or may not have a logical ordering to them. In order to summarize categorical variables we need to count how many subjects fall within each possible category. Typically, percentages are used rather than counts because they usually are more informative than counts. This method can also be used for summarizing two or more categorical variables, which we will discuss at a later time. A relative frequency table is a listing of all possible categories along with their relative frequencies, typically given as a proportion or percent. Both counts and percentages are commonly given together (see the figure below). Relative Relative Energy Source Frequency Frequency Percentage Natural Gas % Petroleum % Coal % Other % Other Biomass % Wood & Wood Derived Fuels % Other Gases % Geothermal % Grand Total % Relative Frequency Table of Energy Source A bar chart is useful for summarizing one (see figure below) or two categorical variables. These can be very helpful when comparing two categorical variables, as will be shown later. 30%# 25%# 20%# 15%# 10%# 5%# 0%# Natural#Gas# Petroleum# Coal# Other# Other#Biomass# Wood#&#Wood#Derived#Fuels# Other#Gases# Geothermal# Bar Chart of Energy Source 2

3 A pie chart is another useful for summarizing a single categorical variable (if there are not too many categories). See the figure below. Wood'&'Wood' Derived'Fuels' 6%' Other'Biomass' 9%' Other'Gases' 4%' Other' 11%' Coal' 17%' Geothermal' 1%' Natural'Gas' 27%' Petroleum' 25%' Pie Chart of Energy Source All three figures for categorical data show the same story, just in different ways. What do you notice in the data? Completely describe what the data is showing. Which method for presenting categorical data do you like best? 3

4 Numerical Data Recall that numerical data measures a quantity of something. Looking a long list of disorganized values that seem unrelated can be daunting and in order to make the data more informative, we need to organize it using visual displays and numerical summaries. Ways in which we can describe visual displays of numerical data are to focus on the distribution, the overall pattern of the data. There are three summary characteristics that tend to be of interest location, spread, and shape. Also, we are interested in whether there are any outliers, unusual data values when compared to the rest of the data. We will discuss these characteristics in more depth later in this handout. We will be using data from the Emissions Report, but only data on Michigan s CO2 emissions from other energy sources. A stem-and-leaf plot is a quick way to summarize small data sets and is also useful for ordering data from lowest to highest. The basic design of the plot is that the row stem contains all but the last digit of a number and the leaf within the row stem is the last digit of the number, regardless of whether it falls before or after a decimal point. Sometimes data values are truncated, or rounded, to make work easier. The example data was rounded to the ten-thousand place and the stem units are the hundred-thousand place. Stem and Leaf Display of Michigan s CO2 Emmissions (Metric Tons) for Other Energy Sources (stem=100,000 s) Since these plots can be a bit difficult with larger datasets and since Excel and your calculators do not easily create stem-and-leaf plots (if at all in the case of your calculator), you will not be required to construct these. However, it is important to be able to interpret them. More information about stem-and-leaf plots can be found in the course pack. When interpreting the data above, note that the stems are split into the bottom and top halves for each hundred-thousand (split at each 50,000 metric tons)and that one number in the leafs represents a single observations this is not the only way to construct stem-and-leaf plots, it depends on the data. So, in the first half of the 300,000s we see that there are ten total observations with three 300,000 observations, three 320,000 observations, three 330,000 observations, and one 340,000 observation. We will come back to this plot to discuss the summary characteristics in a little while. A histogram is similar to a bar chart, but for numerical variables. It shows how many values are in various intervals of the data. Typically, when constructing histograms, we want to decide how many intervals we want, but we will just let out calculators and Excel chose these intervals for us. Once the numbers of intervals are decided, the range of the data needs to be divided into equally spaced widths and then the number of values within each interval need to be counted - Excel does this in a frequency table. You can use frequencies or relative frequencies when constructing the table and histogram. Both the frequency table and histogram are below. Note that there are not gaps between the bars, unless one of the intervals has a frequency of zero. 4

5 Frequency Table of Michigan s CO2 Emissions (Metric Tons) for Other Energy Sources Histogram of Michigan s CO2 Emissions (Metric Tons) for Other Energy Sources What are some of the similarities and difference between the stem-and-leaf plot and the histogram? A box and whisker plot is a simple way to picture the information in one or more five number summaries. This plot is useful for comparing two or more groups and is also useful in identifying outliers. The five-number summary is comprised of five descriptive values from the data these being the lowest value; the cut-off points for 1/4, 1/2, and 3/4 of the data; and the highest value. The middle three values of the summary (the cut-off points) are called the lower quartile (Q1), median, and upper quartile (Q3), respectively. The box spans from the first quartile to the third quartile with a line in the middle to represent the median and the whiskers, with the exception of possible outliers, extend from the box to the minimum and maximum. Possible outliers would be marked with an asterisk and are calculated by being far outside the box. We will discuss this idea in a bit and our calculators can do this automatically (Excel takes a little bit of work). 5

6 Five Number Summary of Michigan s CO2 Emissions (Metric Tons) for Other Energy Sources Box and Whisker Olot of Michigan s CO2 Emissions (Metric Tons) for Other Energy Sources Looking at the box and whisker plot (and the five number summary) you can instantly look at percentages of data, for instance, 25% of the other energy sources in Michigan emitted 214,039 metric tons or more of CO2. Things to Look for in Plots: A Summary of Graphical Features Location One of the first ideas to look for while summarizing numerical data is location or center of the distributions of values. With this idea, we are looking at what a typical or average value of the data might be. For this class, we will be mainly looking at the average, or mean, which is the arithmetic average of the data values. This measure of center, however, does not accurately describe the CO2 data. The median is approximately the middle value in the data, every time. This measure is useful for skewed distributions (like Michigan s CO2 emissions for other energy sources). The median is also a special type of percentile. In general, the k th percentile is a number that has k% of the data values at or below it and (100-k)% of the data values at or above it. Knowing the definition of percentiles, we see that the median would be the percentile. Recall that the Box-and-Whisker Plot uses the five-number summary to create the plot. Those five numbers are percentiles the 0 th, 25 th, 50 th, 75 th, and 100 th that we label as quartiles. Other measures of center that are used are the following: Midrange MR = x min + x max 2 Midhinge MH = Q 1 + Q 2 2 Mode - most occurring value(s) we will look more at this shortly. 6

7 Spread A large part of Statistics is studying variability, or spread, among individual measurements and the variability among different samples from the same population (we will discuss the later point in a couple of handouts). Spread helps us to look at how much variation exist in the values; if they are about the same, or if there is a grouping of values with a few unusual data values. If you recall the five-number summary, we can assess spread by looking at the range, the difference between the maximum and minimum values, or the interquartile range (IQR), the difference between the third and first quartile (the middle 50% of the data). The standard deviation is another important measure of spread which measures the average size of deviation, departure from the mean. As we see in the course pack, and below, the formula can appear daunting and difficult to calculate, but the important aspect of the standard deviation is its interpretation. For the most part, we will let the calculator and Excel handle the grunt work. Please note, that when these values had to be calculated by hand, the variance would need to be calculated first ( s 2 ), then the square root of the variance would be taken to obtain the standard deviation. S = (Xi X) 2 n 1 One last measurement of spread that we will look at is the coefficient of variation which is the standard deviation divided by the average ( ***Need Equation*** ). This measurement explains the percent of variation around the mean. Why is this measurement useful for comparing the variation among different variables (think of the units)? Shape The easier feature to tell from the visual display of numerical data is the shape of how the variables are distributed. By looking at the graphical representations we can tell if most of the values are clumped together with values tailing off at each end, if the values are more in one direction, or if there are two distinct groupings of values. When looking at shape, data is usually described as symmetric, similar on both sides of the center, or skewed, values are more spread out on one side of the center than the other. Symmetric data may be able to be described as bell-shaped while skewed data can be right (positively) or left (negatively) skewed. How would the data of Michigan s CO2 emissions from other energy producers be described? Recall that the mode of a dataset is the most frequent value. The shape of a histogram can called unimodal when there is a single noticeable peak in a histogram, bimodal if there are two noticeable peaks, and so on. Some data can be described using a combination of these terms. 7

8 Other One last interesting feature to consider when analyzing data is to look whether any values are outliers, a data point that is not consistent with the majority of the data, or any other noticeable patterns (we will look more into patterns later in this course). Outliers can have a major influence on analyses and thus need special consideration because of the inaccurate conclusions if they are not. These inconsistent values can also cause complications in statistical procedures which cause some researchers to wrongly discard them rather than treating them as legitimate data. Outliers should never be disregarded unless there is proper justification to do so. Some possible reasons for outliers are that the outlier is: a legitimate data value and represents natural variability for the group and variable(s) measured. that a mistake was made while taking a measurement or entering the data. that the individual belongs to a different group than the bulk of individuals measured. Recall that outliers can be represented with asterisk (or other marks) in box-and-whisker plots. The way they are calculated are if they are a distance greater than one and a half times the IQR greater (or less) than the third (or first) quartile. lower fence upper fence = Q IQR = Q IQR Are there any outliers present in the CO2 emissions for other energy sources in Michigan? Calculate the IQR and find the upper and lower fences. A resistant statistic is a numerical summary of the data that is not affected by extreme observations or the influence of outliers. In other words, an outlier is not likely to have a major influence on its numerical value. The summary measures that are resistant are the median, mode, midhinge, and IQR while the other summary measures discussed would be non-resistant, or affected by outliers (mean, midrange, standard deviation and variance, range, and coefficient of variation). iclicker Question Are there any outliers present in the CO2 emissions data for other energy sources in Michigan? X min = 0, Q 1 = 18013, Q 2 = 94661, Q 3 = , X max = Given: (a) Yes; since the upper fence is (b) Yes; since the lower fence is (c) No; since the upper fence is (d) No; since the lower fence is (e) No; since the upper fence is

9 Numerical Descriptive Statistics The table below summarizes the CO2 emissions of other energy sources in Michigan. This table was obtained in Excel, but has also been edited to remove some of the statistics produced that are not discussed in this course as well as add some additional ones that are. Adding and Multiplying by a Constant Sometimes, it makes sense to add to or multiply by a constant to a list of data(think of switching between Celsius and Fahrenheit or adding a bonus points to an entire STAT 2160 class). There are rules that apply to these two situations, which follow. Rules for Adding a Constant Adding a constant, positive or negative, to a list of data will add the same constant to the mean, but the standard deviation will remain unchanged. Rules for Multiplying by a Constant If you multiply a list of data by a constant, positive or negative, the mean will be multiplied by the same constant while the standard deviation will be multiplied by the absolute value of the constant. Suppose the VP of sales of a mid-sized firm has decided to give a one-time bonus to her colleagues. Here are 9 associate monthly salaries in thousands of dollars: 1.2, 2.6, 3.5, 2.2, 1.4, 1.9, 4.4, 1.8, 3.8 (x = 2.5; s = 1.13) Suppose the VP decides to add $500 to each sales associate salary. What would the new mean and standard deviation be? What if the VP instead decided to add 10% to each sales associate. standard deviation be? What would the new mean and 9

10 iclicker Question If the data were symmetric, what would the relationship between the median and the mean be (where would they be located on the histogram)? (a) The median would be higher than the mean. (b) The median would be lower than the mean. (c) They would be relatively equal. (d) It is difficult to tell for this data. (e) There is not enough information to decide. Empirical Rule For bell-shaped data, once you know the mean and standard deviation you can determine approximate proportions of the data that will fall into any specified interval. We will discuss this more in depth later, but the Empirical Rule gives some approximate benchmarks. 68% of the values fall within one standard deviation of the mean in either direction 95% of the values fall within two standard deviations of the mean in either direction 99.7% of the values fall within three standard deviations of the mean in either direction The Empirical Rule is also summarized in the figure below. Please note that population notation (Greek letters) is used. Source: 10

Exploratory data analysis (Chapter 2) Fall 2011

Exploratory data analysis (Chapter 2) Fall 2011 Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,

More information

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members

More information

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.) Center: Finding the Median When we think of a typical value, we usually look for the center of the distribution. For a unimodal, symmetric distribution, it s easy to find the center it s just the center

More information

Descriptive Statistics

Descriptive Statistics Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web

More information

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),

More information

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Types of Variables Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Quantitative (numerical)variables: take numerical values for which arithmetic operations make sense (addition/averaging)

More information

Exercise 1.12 (Pg. 22-23)

Exercise 1.12 (Pg. 22-23) Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

3: Summary Statistics

3: Summary Statistics 3: Summary Statistics Notation Let s start by introducing some notation. Consider the following small data set: 4 5 30 50 8 7 4 5 The symbol n represents the sample size (n = 0). The capital letter X denotes

More information

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median CONDENSED LESSON 2.1 Box Plots In this lesson you will create and interpret box plots for sets of data use the interquartile range (IQR) to identify potential outliers and graph them on a modified box

More information

Lecture 1: Review and Exploratory Data Analysis (EDA)

Lecture 1: Review and Exploratory Data Analysis (EDA) Lecture 1: Review and Exploratory Data Analysis (EDA) Sandy Eckel seckel@jhsph.edu Department of Biostatistics, The Johns Hopkins University, Baltimore USA 21 April 2008 1 / 40 Course Information I Course

More information

Mind on Statistics. Chapter 2

Mind on Statistics. Chapter 2 Mind on Statistics Chapter 2 Sections 2.1 2.3 1. Tallies and cross-tabulations are used to summarize which of these variable types? A. Quantitative B. Mathematical C. Continuous D. Categorical 2. The table

More information

Summarizing and Displaying Categorical Data

Summarizing and Displaying Categorical Data Summarizing and Displaying Categorical Data Categorical data can be summarized in a frequency distribution which counts the number of cases, or frequency, that fall into each category, or a relative frequency

More information

Means, standard deviations and. and standard errors

Means, standard deviations and. and standard errors CHAPTER 4 Means, standard deviations and standard errors 4.1 Introduction Change of units 4.2 Mean, median and mode Coefficient of variation 4.3 Measures of variation 4.4 Calculating the mean and standard

More information

Variables. Exploratory Data Analysis

Variables. Exploratory Data Analysis Exploratory Data Analysis Exploratory Data Analysis involves both graphical displays of data and numerical summaries of data. A common situation is for a data set to be represented as a matrix. There is

More information

Exploratory Data Analysis. Psychology 3256

Exploratory Data Analysis. Psychology 3256 Exploratory Data Analysis Psychology 3256 1 Introduction If you are going to find out anything about a data set you must first understand the data Basically getting a feel for you numbers Easier to find

More information

Introduction to Environmental Statistics. The Big Picture. Populations and Samples. Sample Data. Examples of sample data

Introduction to Environmental Statistics. The Big Picture. Populations and Samples. Sample Data. Examples of sample data A Few Sources for Data Examples Used Introduction to Environmental Statistics Professor Jessica Utts University of California, Irvine jutts@uci.edu 1. Statistical Methods in Water Resources by D.R. Helsel

More information

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences Introduction to Statistics for Psychology and Quantitative Methods for Human Sciences Jonathan Marchini Course Information There is website devoted to the course at http://www.stats.ox.ac.uk/ marchini/phs.html

More information

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175) Describing Data: Categorical and Quantitative Variables Population The Big Picture Sampling Statistical Inference Sample Exploratory Data Analysis Descriptive Statistics In order to make sense of data,

More information

Diagrams and Graphs of Statistical Data

Diagrams and Graphs of Statistical Data Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in

More information

Exploratory Data Analysis

Exploratory Data Analysis Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction

More information

Data Exploration Data Visualization

Data Exploration Data Visualization Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select

More information

MBA 611 STATISTICS AND QUANTITATIVE METHODS

MBA 611 STATISTICS AND QUANTITATIVE METHODS MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain

More information

AP * Statistics Review. Descriptive Statistics

AP * Statistics Review. Descriptive Statistics AP * Statistics Review Descriptive Statistics Teacher Packet Advanced Placement and AP are registered trademark of the College Entrance Examination Board. The College Board was not involved in the production

More information

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

More information

Descriptive Statistics and Measurement Scales

Descriptive Statistics and Measurement Scales Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample

More information

1.3 Measuring Center & Spread, The Five Number Summary & Boxplots. Describing Quantitative Data with Numbers

1.3 Measuring Center & Spread, The Five Number Summary & Boxplots. Describing Quantitative Data with Numbers 1.3 Measuring Center & Spread, The Five Number Summary & Boxplots Describing Quantitative Data with Numbers 1.3 I can n Calculate and interpret measures of center (mean, median) in context. n Calculate

More information

EXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck!

EXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck! STP 231 EXAM #1 (Example) Instructor: Ela Jackiewicz Honor Statement: I have neither given nor received information regarding this exam, and I will not do so until all exams have been graded and returned.

More information

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential

More information

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple. Graphical Representations of Data, Mean, Median and Standard Deviation In this class we will consider graphical representations of the distribution of a set of data. The goal is to identify the range of

More information

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY 1. Introduction Besides arriving at an appropriate expression of an average or consensus value for observations of a population, it is important to

More information

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Descriptive Statistics Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Statistics as a Tool for LIS Research Importance of statistics in research

More information

Describing, Exploring, and Comparing Data

Describing, Exploring, and Comparing Data 24 Chapter 2. Describing, Exploring, and Comparing Data Chapter 2. Describing, Exploring, and Comparing Data There are many tools used in Statistics to visualize, summarize, and describe data. This chapter

More information

Sta 309 (Statistics And Probability for Engineers)

Sta 309 (Statistics And Probability for Engineers) Instructor: Prof. Mike Nasab Sta 309 (Statistics And Probability for Engineers) Chapter 2 Organizing and Summarizing Data Raw Data: When data are collected in original form, they are called raw data. The

More information

Lesson 4 Measures of Central Tendency

Lesson 4 Measures of Central Tendency Outline Measures of a distribution s shape -modality and skewness -the normal distribution Measures of central tendency -mean, median, and mode Skewness and Central Tendency Lesson 4 Measures of Central

More information

Shape of Data Distributions

Shape of Data Distributions Lesson 13 Main Idea Describe a data distribution by its center, spread, and overall shape. Relate the choice of center and spread to the shape of the distribution. New Vocabulary distribution symmetric

More information

THE BINOMIAL DISTRIBUTION & PROBABILITY

THE BINOMIAL DISTRIBUTION & PROBABILITY REVISION SHEET STATISTICS 1 (MEI) THE BINOMIAL DISTRIBUTION & PROBABILITY The main ideas in this chapter are Probabilities based on selecting or arranging objects Probabilities based on the binomial distribution

More information

a. mean b. interquartile range c. range d. median

a. mean b. interquartile range c. range d. median 3. Since 4. The HOMEWORK 3 Due: Feb.3 1. A set of data are put in numerical order, and a statistic is calculated that divides the data set into two equal parts with one part below it and the other part

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

Descriptive statistics parameters: Measures of centrality

Descriptive statistics parameters: Measures of centrality Descriptive statistics parameters: Measures of centrality Contents Definitions... 3 Classification of descriptive statistics parameters... 4 More about central tendency estimators... 5 Relationship between

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals

More information

Bar Graphs and Dot Plots

Bar Graphs and Dot Plots CONDENSED L E S S O N 1.1 Bar Graphs and Dot Plots In this lesson you will interpret and create a variety of graphs find some summary values for a data set draw conclusions about a data set based on graphs

More information

A Correlation of. to the. South Carolina Data Analysis and Probability Standards

A Correlation of. to the. South Carolina Data Analysis and Probability Standards A Correlation of to the South Carolina Data Analysis and Probability Standards INTRODUCTION This document demonstrates how Stats in Your World 2012 meets the indicators of the South Carolina Academic Standards

More information

Module 4: Data Exploration

Module 4: Data Exploration Module 4: Data Exploration Now that you have your data downloaded from the Streams Project database, the detective work can begin! Before computing any advanced statistics, we will first use descriptive

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

Using SPSS, Chapter 2: Descriptive Statistics

Using SPSS, Chapter 2: Descriptive Statistics 1 Using SPSS, Chapter 2: Descriptive Statistics Chapters 2.1 & 2.2 Descriptive Statistics 2 Mean, Standard Deviation, Variance, Range, Minimum, Maximum 2 Mean, Median, Mode, Standard Deviation, Variance,

More information

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS Mathematics Revision Guides Histograms, Cumulative Frequency and Box Plots Page 1 of 25 M.K. HOME TUITION Mathematics Revision Guides Level: GCSE Higher Tier HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

More information

6.4 Normal Distribution

6.4 Normal Distribution Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

More information

consider the number of math classes taken by math 150 students. how can we represent the results in one number?

consider the number of math classes taken by math 150 students. how can we represent the results in one number? ch 3: numerically summarizing data - center, spread, shape 3.1 measure of central tendency or, give me one number that represents all the data consider the number of math classes taken by math 150 students.

More information

Mathematical goals. Starting points. Materials required. Time needed

Mathematical goals. Starting points. Materials required. Time needed Level S6 of challenge: B/C S6 Interpreting frequency graphs, cumulative cumulative frequency frequency graphs, graphs, box and box whisker and plots whisker plots Mathematical goals Starting points Materials

More information

Chapter 1: Exploring Data

Chapter 1: Exploring Data Chapter 1: Exploring Data Chapter 1 Review 1. As part of survey of college students a researcher is interested in the variable class standing. She records a 1 if the student is a freshman, a 2 if the student

More information

Week 1. Exploratory Data Analysis

Week 1. Exploratory Data Analysis Week 1 Exploratory Data Analysis Practicalities This course ST903 has students from both the MSc in Financial Mathematics and the MSc in Statistics. Two lectures and one seminar/tutorial per week. Exam

More information

MEASURES OF VARIATION

MEASURES OF VARIATION NORMAL DISTRIBTIONS MEASURES OF VARIATION In statistics, it is important to measure the spread of data. A simple way to measure spread is to find the range. But statisticians want to know if the data are

More information

Lecture 2. Summarizing the Sample

Lecture 2. Summarizing the Sample Lecture 2 Summarizing the Sample WARNING: Today s lecture may bore some of you It s (sort of) not my fault I m required to teach you about what we re going to cover today. I ll try to make it as exciting

More information

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction CA200 Quantitative Analysis for Business Decisions File name: CA200_Section_04A_StatisticsIntroduction Table of Contents 4. Introduction to Statistics... 1 4.1 Overview... 3 4.2 Discrete or continuous

More information

Classify the data as either discrete or continuous. 2) An athlete runs 100 meters in 10.5 seconds. 2) A) Discrete B) Continuous

Classify the data as either discrete or continuous. 2) An athlete runs 100 meters in 10.5 seconds. 2) A) Discrete B) Continuous Chapter 2 Overview Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Classify as categorical or qualitative data. 1) A survey of autos parked in

More information

Topic 9 ~ Measures of Spread

Topic 9 ~ Measures of Spread AP Statistics Topic 9 ~ Measures of Spread Activity 9 : Baseball Lineups The table to the right contains data on the ages of the two teams involved in game of the 200 National League Division Series. Is

More information

2 Describing, Exploring, and

2 Describing, Exploring, and 2 Describing, Exploring, and Comparing Data This chapter introduces the graphical plotting and summary statistics capabilities of the TI- 83 Plus. First row keys like \ R (67$73/276 are used to obtain

More information

Mean = (sum of the values / the number of the value) if probabilities are equal

Mean = (sum of the values / the number of the value) if probabilities are equal Population Mean Mean = (sum of the values / the number of the value) if probabilities are equal Compute the population mean Population/Sample mean: 1. Collect the data 2. sum all the values in the population/sample.

More information

How To Write A Data Analysis

How To Write A Data Analysis Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

Chapter 2 Data Exploration

Chapter 2 Data Exploration Chapter 2 Data Exploration 2.1 Data Visualization and Summary Statistics After clearly defining the scientific question we try to answer, selecting a set of representative members from the population of

More information

First Midterm Exam (MATH1070 Spring 2012)

First Midterm Exam (MATH1070 Spring 2012) First Midterm Exam (MATH1070 Spring 2012) Instructions: This is a one hour exam. You can use a notecard. Calculators are allowed, but other electronics are prohibited. 1. [40pts] Multiple Choice Problems

More information

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions. Algebra I Overview View unit yearlong overview here Many of the concepts presented in Algebra I are progressions of concepts that were introduced in grades 6 through 8. The content presented in this course

More information

Box-and-Whisker Plots

Box-and-Whisker Plots Mathematics Box-and-Whisker Plots About this Lesson This is a foundational lesson for box-and-whisker plots (boxplots), a graphical tool used throughout statistics for displaying data. During the lesson,

More information

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi - 110 012 seema@iasri.res.in 1. Descriptive Statistics Statistics

More information

5/31/2013. 6.1 Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

5/31/2013. 6.1 Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives. The Normal Distribution C H 6A P T E R The Normal Distribution Outline 6 1 6 2 Applications of the Normal Distribution 6 3 The Central Limit Theorem 6 4 The Normal Approximation to the Binomial Distribution

More information

Common Tools for Displaying and Communicating Data for Process Improvement

Common Tools for Displaying and Communicating Data for Process Improvement Common Tools for Displaying and Communicating Data for Process Improvement Packet includes: Tool Use Page # Box and Whisker Plot Check Sheet Control Chart Histogram Pareto Diagram Run Chart Scatter Plot

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar What is data exploration? A preliminary exploration of the data to better understand its characteristics.

More information

Northumberland Knowledge

Northumberland Knowledge Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about

More information

2. Filling Data Gaps, Data validation & Descriptive Statistics

2. Filling Data Gaps, Data validation & Descriptive Statistics 2. Filling Data Gaps, Data validation & Descriptive Statistics Dr. Prasad Modak Background Data collected from field may suffer from these problems Data may contain gaps ( = no readings during this period)

More information

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1.

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1. Lecture 6: Chapter 6: Normal Probability Distributions A normal distribution is a continuous probability distribution for a random variable x. The graph of a normal distribution is called the normal curve.

More information

Statistics Chapter 2

Statistics Chapter 2 Statistics Chapter 2 Frequency Tables A frequency table organizes quantitative data. partitions data into classes (intervals). shows how many data values are in each class. Test Score Number of Students

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Suppose following data have been collected (heights of 99 five-year-old boys) 117.9 11.2 112.9 115.9 18. 14.6 17.1 117.9 111.8 16.3 111. 1.4 112.1 19.2 11. 15.4 99.4 11.1 13.3 16.9

More information

3.2 Measures of Spread

3.2 Measures of Spread 3.2 Measures of Spread In some data sets the observations are close together, while in others they are more spread out. In addition to measures of the center, it's often important to measure the spread

More information

T O P I C 1 2 Techniques and tools for data analysis Preview Introduction In chapter 3 of Statistics In A Day different combinations of numbers and types of variables are presented. We go through these

More information

Sampling and Descriptive Statistics

Sampling and Descriptive Statistics Sampling and Descriptive Statistics Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University Reference: 1. W. Navidi. Statistics for Engineering and Scientists.

More information

Interpreting Data in Normal Distributions

Interpreting Data in Normal Distributions Interpreting Data in Normal Distributions This curve is kind of a big deal. It shows the distribution of a set of test scores, the results of rolling a die a million times, the heights of people on Earth,

More information

Box Plots. Objectives To create, read, and interpret box plots; and to find the interquartile range of a data set. Family Letters

Box Plots. Objectives To create, read, and interpret box plots; and to find the interquartile range of a data set. Family Letters Bo Plots Objectives To create, read, and interpret bo plots; and to find the interquartile range of a data set. www.everydaymathonline.com epresentations etoolkit Algorithms Practice EM Facts Workshop

More information

Measures of Central Tendency and Variability: Summarizing your Data for Others

Measures of Central Tendency and Variability: Summarizing your Data for Others Measures of Central Tendency and Variability: Summarizing your Data for Others 1 I. Measures of Central Tendency: -Allow us to summarize an entire data set with a single value (the midpoint). 1. Mode :

More information

Data exploration with Microsoft Excel: univariate analysis

Data exploration with Microsoft Excel: univariate analysis Data exploration with Microsoft Excel: univariate analysis Contents 1 Introduction... 1 2 Exploring a variable s frequency distribution... 2 3 Calculating measures of central tendency... 16 4 Calculating

More information

Name: Date: Use the following to answer questions 2-3:

Name: Date: Use the following to answer questions 2-3: Name: Date: 1. A study is conducted on students taking a statistics class. Several variables are recorded in the survey. Identify each variable as categorical or quantitative. A) Type of car the student

More information

Describing and presenting data

Describing and presenting data Describing and presenting data All epidemiological studies involve the collection of data on the exposures and outcomes of interest. In a well planned study, the raw observations that constitute the data

More information

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller Getting to know the data An important first step before performing any kind of statistical analysis is to familiarize

More information

DesCartes (Combined) Subject: Mathematics Goal: Statistics and Probability

DesCartes (Combined) Subject: Mathematics Goal: Statistics and Probability DesCartes (Combined) Subject: Mathematics Goal: Statistics and Probability RIT Score Range: Below 171 Below 171 Data Analysis and Statistics Solves simple problems based on data from tables* Compares

More information

Definition 8.1 Two inequalities are equivalent if they have the same solution set. Add or Subtract the same value on both sides of the inequality.

Definition 8.1 Two inequalities are equivalent if they have the same solution set. Add or Subtract the same value on both sides of the inequality. 8 Inequalities Concepts: Equivalent Inequalities Linear and Nonlinear Inequalities Absolute Value Inequalities (Sections 4.6 and 1.1) 8.1 Equivalent Inequalities Definition 8.1 Two inequalities are equivalent

More information

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

Week 11 Lecture 2: Analyze your data: Descriptive Statistics, Correct by Taking Log

Week 11 Lecture 2: Analyze your data: Descriptive Statistics, Correct by Taking Log Week 11 Lecture 2: Analyze your data: Descriptive Statistics, Correct by Taking Log Instructor: Eakta Jain CIS 6930, Research Methods for Human-centered Computing Scribe: Chris(Yunhao) Wan, UFID: 1677-3116

More information

430 Statistics and Financial Mathematics for Business

430 Statistics and Financial Mathematics for Business Prescription: 430 Statistics and Financial Mathematics for Business Elective prescription Level 4 Credit 20 Version 2 Aim Students will be able to summarise, analyse, interpret and present data, make predictions

More information

STAT355 - Probability & Statistics

STAT355 - Probability & Statistics STAT355 - Probability & Statistics Instructor: Kofi Placid Adragni Fall 2011 Chap 1 - Overview and Descriptive Statistics 1.1 Populations, Samples, and Processes 1.2 Pictorial and Tabular Methods in Descriptive

More information

Scope and Sequence KA KB 1A 1B 2A 2B 3A 3B 4A 4B 5A 5B 6A 6B

Scope and Sequence KA KB 1A 1B 2A 2B 3A 3B 4A 4B 5A 5B 6A 6B Scope and Sequence Earlybird Kindergarten, Standards Edition Primary Mathematics, Standards Edition Copyright 2008 [SingaporeMath.com Inc.] The check mark indicates where the topic is first introduced

More information

determining relationships among the explanatory variables, and

determining relationships among the explanatory variables, and Chapter 4 Exploratory Data Analysis A first look at the data. As mentioned in Chapter 1, exploratory data analysis or EDA is a critical first step in analyzing the data from an experiment. Here are the

More information

MATH 103/GRACEY PRACTICE EXAM/CHAPTERS 2-3. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MATH 103/GRACEY PRACTICE EXAM/CHAPTERS 2-3. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. MATH 3/GRACEY PRACTICE EXAM/CHAPTERS 2-3 Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Provide an appropriate response. 1) The frequency distribution

More information

2: Frequency Distributions

2: Frequency Distributions 2: Frequency Distributions Stem-and-Leaf Plots (Stemplots) The stem-and-leaf plot (stemplot) is an excellent way to begin an analysis. Consider this small data set: 218 426 53 116 309 504 281 270 246 523

More information

Ch. 3.1 # 3, 4, 7, 30, 31, 32

Ch. 3.1 # 3, 4, 7, 30, 31, 32 Math Elementary Statistics: A Brief Version, 5/e Bluman Ch. 3. # 3, 4,, 30, 3, 3 Find (a) the mean, (b) the median, (c) the mode, and (d) the midrange. 3) High Temperatures The reported high temperatures

More information

Characteristics of Binomial Distributions

Characteristics of Binomial Distributions Lesson2 Characteristics of Binomial Distributions In the last lesson, you constructed several binomial distributions, observed their shapes, and estimated their means and standard deviations. In Investigation

More information

Statistics. Measurement. Scales of Measurement 7/18/2012

Statistics. Measurement. Scales of Measurement 7/18/2012 Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does

More information

Chapter 2: Frequency Distributions and Graphs

Chapter 2: Frequency Distributions and Graphs Chapter 2: Frequency Distributions and Graphs Learning Objectives Upon completion of Chapter 2, you will be able to: Organize the data into a table or chart (called a frequency distribution) Construct

More information

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles.

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles. Math 1530-017 Exam 1 February 19, 2009 Name Student Number E There are five possible responses to each of the following multiple choice questions. There is only on BEST answer. Be sure to read all possible

More information