1 Organizing and Graphing Data

Size: px
Start display at page:

Download "1 Organizing and Graphing Data"

Transcription

1 1 Organizing and Graphing Data 1.1 Organizing and Graphing Categorical Data After categorical data has been sampled it should be summarized to provide the following information: 1. Which values have been observed? (red, green, blue, brown, orange, yellow) 2. How often did every value occur? Categorical data is usually summarized in a table giving the following information: categories observed frequency, or number of measurements for each category relative frequency, or proportion of measurements for each category percentage of measurements for each category Definition: The relative frequency for a particular category is the fraction or proportion of the frequency that the category appears in the the data set. It is calculated as Relative frequency of a category = frequency of that category Sum of all frequencies percent = 100 Relative Frequency Example: Sum of all frequencies = sample size = number of observations=n=200 category frequency relative frequency percentage wood % tiles % linoleum % carpet % total % Such a table is called the frequency distribution table for categorical data. Once the data is summarized in a frequency distribution table, the data can be displayed in a bar chart or pie chart. The bar chart (bar graph) will effectively show the frequencies in the different categories whereas the pie chart will show the relationship between the parts and the whole. 1

2 1.1.1 Bar Graph Definition 1 A graph made of bars whose heights represent the frequencies of respective categories is called a bar graph. Instead of frequencies a bar graph might display the relative frequencies or percentages of the categories. For every category the x-axis is marked with a tick. Each category is represented by a bar, which AREA is proportional to the corresponding frequency (relative frequency). label the y-axis. Remark: The width of each bar should be the same, so the height is proportional to the corresponding frequency. Example 1 Suppose the frequency distribution of the mainly used flooring products is: frequency relative freq wood tiles linoleum carpet

3 1.1.2 Pie Charts Pie charts provide an alternative kind of graph for categorical data: Definition 2 A circle divided into portions that represent the relative frequencies or percentages of a population or sample belonging to different categories is called a pie-chart. The size of the slice representing a particular category is proportional to the corresponding frequency (relative frequency) that fall within this category. How to create a pie chart: Draw a circle Calculate the slice size (angle) (fraction of the circle for the category) use protractor to mark the angles slice size=category relative frequency 360 frequency relative freq angle wood tiles linoleum carpet

4 M&M s example: On the M&M s webpage the following information on the distribution of colors in peanut M&M s is provided color brown yellow red blue orange green percent 12% 15% 12% 23% 23% 15 In order if this distribution is a true description of what is in a bag, someone bought a bag with 200 peanut M&M s and wants to describe the colors of the contents. Color is a categorical variable, so a relative frequency table shall be obtained. color count rel. freq. percentage brown % yellow % red % blue % orange % green % Total % And a bar chart would look like this: For the pie chart the angles of the slices have to be determined color count rel. freq. angle brown o yellow o red o blue o orange o green o Total o This results in the following pie chart 4

5 1.2 Organizing and Graphing Quantitative Data Graphs from this section display the data for a quantitative variable in a fashion so that the distribution of the data becomes apparent Stem and Leaf Plots Another way of displaying numerical data is the stem and leaf plot. Each observed number is broken into two pieces called the stem and the leaf. How to do a stem and leaf plot: 1. Divide each measurement into two parts: The first digit(s) of the number are the stems. The last digit(s) of the number are the leaves. 2. List the stems in a column, with a vertical line to their right. 3. For each measurement, record the leaf portion in the same row as its corresponding stem. 4. Order the leaves from lowest to highest in each stem. 5. Provide a key to your stem and leaf coding so that the reader can recreate the actual measurements. Example 2 Acceptance rates at some business schools: 16.3, 12.0, 25.1,20.3, 31.9, 20.7, 30.1, 19.5, 36.2, 46.9, 25.8, 36.7, 33.8, 24.2, 21.5, 35.1, 37.6, 23.9, 17.0, 38.4, 31.2, 43.8, 28.9, 31.4, 48.9 Stem and Leaf Plot:

6 stem=tens leaf=tenth It shows: center, range, concentration, nature of distribution (unimodal, bimodal, multimodal), unusual values, skewed to the right/left. Sometimes the available stem choices result in a plot that contains too few stems and a large number of leaves within each stem. In this situation you can stretch the stems by dividing each into several lines. The two common choices for dividing stems are: Into two lines, with leaves 0 to 4 and 5 to 9 into 5 lines, with leaves 0-1, 2-3, 4-5, 6-7, 8-9 Example:(acceptance rates) You also can use stem and leaf plots for the comparison of the distribution of two groups: Relative Frequency Histograms The most common graph for describing numerical continuous data is the histogram. It visualizes the distribution of the underlying variable, that is: how many measurements are found where on the measurement scale. How a histogram looks like: 6

7 Definition: A relative frequency histogram for a quantitative data set is a bar graph in which the hight of the bar shows how often (measured as a relative frequency) measurements fall in a particular interval. The classes or intervals are plotted along the horizontal axis. The first step into creating a histogram, is finding the frequency distribution of the variable of interest. Definition 3 A frequency distribution for quantitative data lists all the classes and the number of values that belong to each class. How to obtain a frequency distribution: 1. Decide which class intervals (preferably of equal length) to use for the frequency distribution. Each class is given through its lower boundary and its upper boundary. The class width= upper boundary - lower boundary. The number of class intervals used should be approximately the square root of the sample size, but not lower than 4 and not larger than 20. Use sensible interval boundaries: The intervals should have if possible the same width and the boundaries should be rounded numbers (if possible whole numbers or tenth, or multiples). 2. Create a frequency table for the class intervals using the method of left inclusion. List the class intervals and the frequency of values falling within this interval. Also give the relative frequencies for each class interval. These relative frequencies can now be displayed in a histogram. To obtain the histogram from the frequency distribution, follow the following steps: 1. Mark the boundaries of the class intervals on a horizontal axis. 2. Use the relative frequency on the vertical axis. 7

8 3. Draw a bar for each class interval, with heights according to the relative frequency of the corresponding class interval. Example 3 Histogram for acceptance rates: 1. The sample size is 25, the square root is 5, but we will use 4 class intervals, because of the range is about 10-50, which is easily divided into intervals [10, 20), [20, 30), [30, 40), [40, 50) 2. class intervals frequency relative frequency [10, 20) [20, 30) [30, 40) [40, 50) This graph uses the frequency (relative frequency is a better choice the intervals have the same width!) It shows: center, range, concentration, nature of distribution (unimodal, bimodal, multimodal), unusual values, skewed to the right/left. 8

9 Features to check for in a histogram 1. center, where is the middle of the data? 2. range, the data fall between which values (here:40 and 100). 3. number of peaks: unimodal(just one peak), bimodal (often occurs if you have observation from two groups (men, women)(two peaks), multimodal(more than 2 peaks) 4. symmetry: if you can draw a vertical line so that the part to the left is a mirror image of the part to the right, then it is symmetric. 5. nonsymmetric graphs are skewed. If the upper tail of the histogram stretches out farther than the lower tail, then is the histogram positively skewed, or skewed to the right. 6. Is the lower tail longer than the upper tail the histogram is negatively skewed. 7. Check for outliers. 9

10 2 Numerical Descriptive Measures methods for describing data JUST FOR NUMERICAL VARIABLES!! 2.1 Measures of Central Tendency The mean of a set of numerical observation is the familiar arithmetic average. To write the formula for the mean in a mathematical fashion we have to introduce some notation. Introduction of notation: x= the variable for which we have sample data n= sample size = number of observations x 1 =the first sample observation x 2 =the second sample observation. x n = the nth sample observation For example, we might have a sample of n=4 observations on x=battery lifetime(hr): x 1 =5.9, x 2 =7.3, x 3 =6.6, x 4 =5.7, The sum of x 1, x 2,..., x n can be denoted by but this is cumbersome. x 1 + x x n The Greek letter Σ is traditionally used in mathematics to denote summation. In particular Σ n i=1x i will denote the sum of x 1,, x n. Abbreviation Σx is used in the book. For the example above Σ 4 i=1x i = x 1 + x 2 + x 3 + x 4 = =

11 Definition: The sample mean of a numerical sample x 1, x 2,..., x n denoted by x is x = sum of all observations number of observations = x 1 + x x n n = Σn i=1x i n The mean battery life is x = = = Another number to describe the center of a sample is the median. The median is the value that divides the ordered sample in two sets of the same size, so that 50% of the data is less than this number (and 50% is greater than this number). Definition: The sample median, M, is determined by first ordering the n observation from smallest to largest. Then { the single middle value if n is odd M = sample median = the average of the middle two values if n is even Example: Suppose you have the following ordered sample of size 10: The median would be in this case the mean of the fifth and sixth observation (6+7)/2=6.5 and the sample mean is x = The median of the sample is the third observation which is 8, the sample mean x = 7.4. Comparing mean and median The mean is the balance point of the distribution. If you would try to balance a histogram on a pin, you would have to position the pin at the mean in order to succeed. The median is the point where the distribution is cut into two parts of the same area. In a symmetric distribution mean and median are equal. In a positively skewed distribution the mean is greater than the median. In a negatively skewed distribution the mean is smaller than the median. 2.2 Measures of Dispersion for numerical data It is not enough just to report a number that describes the center of a sample. The spread, the variability in a sample is also an important characteristic of a sample. Examples: graphs Definition: The range of a sample is the difference between the largest and the smallest value in the sample. Range = largest value - smallest value. Usually the greater the range the larger the variability. However, variability depends on more than just the distance between the two most extreme values. It is a characteristic of the whole data set and every observation contributes to it. 11

12 Sample 1: * * * * * o * * * * * Sample 2: * ****O**** * Definition The n deviations from the sample mean are the differences x 1 x, x 2 x, x n x A specific deviation is greater than zero if the value is greater than x and negative if it is less than x. The set of deviations describes the variability of the data set, but n i=1 (x i x)=0. If you square every deviation before summing them up, you will receive a number that characterizes the variability in the data set. Definition: The sample variance, denoted by s 2, is the sum of squared deviations from the mean divided by n 1. That is ni=1 s 2 (x i x) 2 = n 1 The sample standard deviation is the positive square root of the sample variance and is denoted by s. s = s 2 = ni=1 (x i x) 2 n 1 For calculating the sample variance for a given sample the following formula is easier to compute: s 2 = x 2 i ( x i ) 2 n 1 n 12

13 Example: Calculate the standard deviation of the 4 battery lives. i x i x i x (x i x) 2 x 2 i Σ The sample variance is s 2 = 1.589/3 = and the sample standard deviation is s = = Using the other formula, first calculate s 2 = With this we get s = 0.53 = Measures of position ni=1 x 2 i ( n i=1 x i) 2 n = = 3 = 1.59 = The concept of the median can be generalized, by asking for the number so that k% (instead of 50%) falls below the number. Definition: For any particular number k between 0 and 100, the k th percentile is a value such that k percent of the observations in the data set fall at or below that value. With this definition, the median is the 50 th percentile, 50% of the data fall below the median. n An alternative measure of variability is the interquartile range. Like the mean the standard deviation is greatly affected by outliers. The interquartile range is as the median resistant to outliers. It is based on quantities called quartiles. Definition: The lower quartile Q 1 is the 25th percentile, 25% of the data fall below it. The median Q 2 is the 50th percentile, 50% of the data fall below it. The upper quartile Q 3 is the 75th percentile, 75% of the data fall below it (and 25% above). The middle 50% of the measurements fall between the lower and upper quartile. The quartiles of a sample are obtained by: 13

14 1. Divide the n ordered observations into a lower and an upper half; if n is odd, the median is excluded from both halves. 2. The lower quartile Q 1 is the median of the lower half. 3. The upper quartile Q 3 is the median of the upper half. Example: Q1 med Q3 Definition: The interquartile range (IQR) is given by IQR = upper quartile lower quartile=q 3 Q 1 The IQR in the example is IQR= 8 5 = 3. The middle 50% of the data points in this sample are captured in an interval not longer than Summarizing a data set with a Boxplot The boxplot is a powerful graphical tool for summarizing data It shows the center, the spread, and the symmetry or the skewness at the same time. It is based on the median, the iqr, and the minimum and maximum of the observations. Construction of a boxplot 1. Draw a horizontal or vertical measurement scale. 2. Draw a rectangular box, whose lower edge is at the lower quartile and whose upper edge is at the upper quartile. 3. Draw a line segment inside the box at the location of the median. 4. Add line segments from each end of the box to the smallest and largest observation in the data set. Example: Sample of pulse after exercise of size 92. mean=80 median= 76.0, min=50, max=140, q l =68, q u =87 14

15 A boxplot can be supplied with even more information. Sometimes a star * is added for the mean. This will help to give a visual comparison between mean and median. In addition outliers may me identified in the boxplot. In order to do this, we first have to define, what an outlier is. Definition: An observation is called an outlier if it is more than 1.5 iqr away from the closest quartile. In order to determine if there is an outlier present in the data set calculate upper fence = upper quartile iqr, every measurement above the upper fence is an upper outlier lower fence = lower quartile 1.5 iqr, every measurement below the lower fence is called a lower outlier. Example: The iqr in the example is 87 68=19. (1.5 *19)=28.5. upper fence = *19= The maximum equals 140, so the data contains at least one upper outlier. lower fence = *19=39.5. The minimum equals 50, so that there is no lower outlier present. Outliers may be marked by a circle or a star in a box plot. In this case the whiskers only extend to the smallest and largest non outliers. One can create comparative boxplots by drawing several boxes in one graph. This is a good tool for comparing continuous variables in different categories. Example: Resting pulse and pulse after exercise boxplots in one graph. 15

16 16

17 3 A four step process 1. STATE: What is the practical question in context of the discipline? 2. PLAN: What statistical tool(s) have to be employed to find an answer? 3. SOLVE: Make the graphs and calculations necessary. 4. CONCLUDE: Give the answer to the question STATEd above in the context of the discipline. Example (Logging in the Rainforest(pg.57): 1. STATE Does logging the tropical rain forest result in its destruction? To answer this question we have data on the number of trees per acre on plots that had never been logged (Group 1), that had been logged 1 year earlier (Group 2), and plots that had been logged 8 years earlier. 2. Plan: Do side by side boxplots and descriptive statistics for the data from the 3 groups. 3. Solve: GROUP N Mean Median StDev Minimum Maximum Q1 Q Conclude The numerical summary as well as the boxplot suggests, that logging results in average in a smaller number of trees per acre, whereas the standard deviation seems to be almost unchanged. 17

Diagrams and Graphs of Statistical Data

Diagrams and Graphs of Statistical Data Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in

More information

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members

More information

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Types of Variables Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Quantitative (numerical)variables: take numerical values for which arithmetic operations make sense (addition/averaging)

More information

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),

More information

Summarizing and Displaying Categorical Data

Summarizing and Displaying Categorical Data Summarizing and Displaying Categorical Data Categorical data can be summarized in a frequency distribution which counts the number of cases, or frequency, that fall into each category, or a relative frequency

More information

Exploratory data analysis (Chapter 2) Fall 2011

Exploratory data analysis (Chapter 2) Fall 2011 Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,

More information

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median CONDENSED LESSON 2.1 Box Plots In this lesson you will create and interpret box plots for sets of data use the interquartile range (IQR) to identify potential outliers and graph them on a modified box

More information

Exercise 1.12 (Pg. 22-23)

Exercise 1.12 (Pg. 22-23) Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

Descriptive Statistics

Descriptive Statistics Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web

More information

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.) Center: Finding the Median When we think of a typical value, we usually look for the center of the distribution. For a unimodal, symmetric distribution, it s easy to find the center it s just the center

More information

3: Summary Statistics

3: Summary Statistics 3: Summary Statistics Notation Let s start by introducing some notation. Consider the following small data set: 4 5 30 50 8 7 4 5 The symbol n represents the sample size (n = 0). The capital letter X denotes

More information

Lecture 1: Review and Exploratory Data Analysis (EDA)

Lecture 1: Review and Exploratory Data Analysis (EDA) Lecture 1: Review and Exploratory Data Analysis (EDA) Sandy Eckel seckel@jhsph.edu Department of Biostatistics, The Johns Hopkins University, Baltimore USA 21 April 2008 1 / 40 Course Information I Course

More information

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple. Graphical Representations of Data, Mean, Median and Standard Deviation In this class we will consider graphical representations of the distribution of a set of data. The goal is to identify the range of

More information

Variables. Exploratory Data Analysis

Variables. Exploratory Data Analysis Exploratory Data Analysis Exploratory Data Analysis involves both graphical displays of data and numerical summaries of data. A common situation is for a data set to be represented as a matrix. There is

More information

Data Exploration Data Visualization

Data Exploration Data Visualization Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select

More information

Exploratory Data Analysis

Exploratory Data Analysis Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction

More information

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175) Describing Data: Categorical and Quantitative Variables Population The Big Picture Sampling Statistical Inference Sample Exploratory Data Analysis Descriptive Statistics In order to make sense of data,

More information

Bar Graphs and Dot Plots

Bar Graphs and Dot Plots CONDENSED L E S S O N 1.1 Bar Graphs and Dot Plots In this lesson you will interpret and create a variety of graphs find some summary values for a data set draw conclusions about a data set based on graphs

More information

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences Introduction to Statistics for Psychology and Quantitative Methods for Human Sciences Jonathan Marchini Course Information There is website devoted to the course at http://www.stats.ox.ac.uk/ marchini/phs.html

More information

Exploratory Data Analysis. Psychology 3256

Exploratory Data Analysis. Psychology 3256 Exploratory Data Analysis Psychology 3256 1 Introduction If you are going to find out anything about a data set you must first understand the data Basically getting a feel for you numbers Easier to find

More information

1.3 Measuring Center & Spread, The Five Number Summary & Boxplots. Describing Quantitative Data with Numbers

1.3 Measuring Center & Spread, The Five Number Summary & Boxplots. Describing Quantitative Data with Numbers 1.3 Measuring Center & Spread, The Five Number Summary & Boxplots Describing Quantitative Data with Numbers 1.3 I can n Calculate and interpret measures of center (mean, median) in context. n Calculate

More information

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential

More information

MBA 611 STATISTICS AND QUANTITATIVE METHODS

MBA 611 STATISTICS AND QUANTITATIVE METHODS MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain

More information

Describing, Exploring, and Comparing Data

Describing, Exploring, and Comparing Data 24 Chapter 2. Describing, Exploring, and Comparing Data Chapter 2. Describing, Exploring, and Comparing Data There are many tools used in Statistics to visualize, summarize, and describe data. This chapter

More information

Means, standard deviations and. and standard errors

Means, standard deviations and. and standard errors CHAPTER 4 Means, standard deviations and standard errors 4.1 Introduction Change of units 4.2 Mean, median and mode Coefficient of variation 4.3 Measures of variation 4.4 Calculating the mean and standard

More information

Statistics Chapter 2

Statistics Chapter 2 Statistics Chapter 2 Frequency Tables A frequency table organizes quantitative data. partitions data into classes (intervals). shows how many data values are in each class. Test Score Number of Students

More information

Introduction to Environmental Statistics. The Big Picture. Populations and Samples. Sample Data. Examples of sample data

Introduction to Environmental Statistics. The Big Picture. Populations and Samples. Sample Data. Examples of sample data A Few Sources for Data Examples Used Introduction to Environmental Statistics Professor Jessica Utts University of California, Irvine jutts@uci.edu 1. Statistical Methods in Water Resources by D.R. Helsel

More information

Lesson 4 Measures of Central Tendency

Lesson 4 Measures of Central Tendency Outline Measures of a distribution s shape -modality and skewness -the normal distribution Measures of central tendency -mean, median, and mode Skewness and Central Tendency Lesson 4 Measures of Central

More information

6.4 Normal Distribution

6.4 Normal Distribution Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

More information

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Descriptive Statistics Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Statistics as a Tool for LIS Research Importance of statistics in research

More information

Chapter 2: Frequency Distributions and Graphs

Chapter 2: Frequency Distributions and Graphs Chapter 2: Frequency Distributions and Graphs Learning Objectives Upon completion of Chapter 2, you will be able to: Organize the data into a table or chart (called a frequency distribution) Construct

More information

2 Describing, Exploring, and

2 Describing, Exploring, and 2 Describing, Exploring, and Comparing Data This chapter introduces the graphical plotting and summary statistics capabilities of the TI- 83 Plus. First row keys like \ R (67$73/276 are used to obtain

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

How To Write A Data Analysis

How To Write A Data Analysis Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction

More information

Chapter 2 Data Exploration

Chapter 2 Data Exploration Chapter 2 Data Exploration 2.1 Data Visualization and Summary Statistics After clearly defining the scientific question we try to answer, selecting a set of representative members from the population of

More information

Descriptive Statistics and Measurement Scales

Descriptive Statistics and Measurement Scales Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample

More information

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS Mathematics Revision Guides Histograms, Cumulative Frequency and Box Plots Page 1 of 25 M.K. HOME TUITION Mathematics Revision Guides Level: GCSE Higher Tier HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

More information

Week 1. Exploratory Data Analysis

Week 1. Exploratory Data Analysis Week 1 Exploratory Data Analysis Practicalities This course ST903 has students from both the MSc in Financial Mathematics and the MSc in Statistics. Two lectures and one seminar/tutorial per week. Exam

More information

Mind on Statistics. Chapter 2

Mind on Statistics. Chapter 2 Mind on Statistics Chapter 2 Sections 2.1 2.3 1. Tallies and cross-tabulations are used to summarize which of these variable types? A. Quantitative B. Mathematical C. Continuous D. Categorical 2. The table

More information

Common Tools for Displaying and Communicating Data for Process Improvement

Common Tools for Displaying and Communicating Data for Process Improvement Common Tools for Displaying and Communicating Data for Process Improvement Packet includes: Tool Use Page # Box and Whisker Plot Check Sheet Control Chart Histogram Pareto Diagram Run Chart Scatter Plot

More information

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY 1. Introduction Besides arriving at an appropriate expression of an average or consensus value for observations of a population, it is important to

More information

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals

More information

Visualizing Data. Contents. 1 Visualizing Data. Anthony Tanbakuchi Department of Mathematics Pima Community College. Introductory Statistics Lectures

Visualizing Data. Contents. 1 Visualizing Data. Anthony Tanbakuchi Department of Mathematics Pima Community College. Introductory Statistics Lectures Introductory Statistics Lectures Visualizing Data Descriptive Statistics I Department of Mathematics Pima Community College Redistribution of this material is prohibited without written permission of the

More information

Chapter 1: Exploring Data

Chapter 1: Exploring Data Chapter 1: Exploring Data Chapter 1 Review 1. As part of survey of college students a researcher is interested in the variable class standing. She records a 1 if the student is a freshman, a 2 if the student

More information

Module 4: Data Exploration

Module 4: Data Exploration Module 4: Data Exploration Now that you have your data downloaded from the Streams Project database, the detective work can begin! Before computing any advanced statistics, we will first use descriptive

More information

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) B Bar graph a diagram representing the frequency distribution for nominal or discrete data. It consists of a sequence

More information

THE BINOMIAL DISTRIBUTION & PROBABILITY

THE BINOMIAL DISTRIBUTION & PROBABILITY REVISION SHEET STATISTICS 1 (MEI) THE BINOMIAL DISTRIBUTION & PROBABILITY The main ideas in this chapter are Probabilities based on selecting or arranging objects Probabilities based on the binomial distribution

More information

TEACHER NOTES MATH NSPIRED

TEACHER NOTES MATH NSPIRED Math Objectives Students will understand that normal distributions can be used to approximate binomial distributions whenever both np and n(1 p) are sufficiently large. Students will understand that when

More information

How To Check For Differences In The One Way Anova

How To Check For Differences In The One Way Anova MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

AP * Statistics Review. Descriptive Statistics

AP * Statistics Review. Descriptive Statistics AP * Statistics Review Descriptive Statistics Teacher Packet Advanced Placement and AP are registered trademark of the College Entrance Examination Board. The College Board was not involved in the production

More information

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles.

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles. Math 1530-017 Exam 1 February 19, 2009 Name Student Number E There are five possible responses to each of the following multiple choice questions. There is only on BEST answer. Be sure to read all possible

More information

Topic 9 ~ Measures of Spread

Topic 9 ~ Measures of Spread AP Statistics Topic 9 ~ Measures of Spread Activity 9 : Baseball Lineups The table to the right contains data on the ages of the two teams involved in game of the 200 National League Division Series. Is

More information

EXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck!

EXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck! STP 231 EXAM #1 (Example) Instructor: Ela Jackiewicz Honor Statement: I have neither given nor received information regarding this exam, and I will not do so until all exams have been graded and returned.

More information

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller Getting to know the data An important first step before performing any kind of statistical analysis is to familiarize

More information

Data exploration with Microsoft Excel: univariate analysis

Data exploration with Microsoft Excel: univariate analysis Data exploration with Microsoft Excel: univariate analysis Contents 1 Introduction... 1 2 Exploring a variable s frequency distribution... 2 3 Calculating measures of central tendency... 16 4 Calculating

More information

Descriptive statistics parameters: Measures of centrality

Descriptive statistics parameters: Measures of centrality Descriptive statistics parameters: Measures of centrality Contents Definitions... 3 Classification of descriptive statistics parameters... 4 More about central tendency estimators... 5 Relationship between

More information

determining relationships among the explanatory variables, and

determining relationships among the explanatory variables, and Chapter 4 Exploratory Data Analysis A first look at the data. As mentioned in Chapter 1, exploratory data analysis or EDA is a critical first step in analyzing the data from an experiment. Here are the

More information

a. mean b. interquartile range c. range d. median

a. mean b. interquartile range c. range d. median 3. Since 4. The HOMEWORK 3 Due: Feb.3 1. A set of data are put in numerical order, and a statistic is calculated that divides the data set into two equal parts with one part below it and the other part

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

Describing and presenting data

Describing and presenting data Describing and presenting data All epidemiological studies involve the collection of data on the exposures and outcomes of interest. In a well planned study, the raw observations that constitute the data

More information

Shape of Data Distributions

Shape of Data Distributions Lesson 13 Main Idea Describe a data distribution by its center, spread, and overall shape. Relate the choice of center and spread to the shape of the distribution. New Vocabulary distribution symmetric

More information

STAT355 - Probability & Statistics

STAT355 - Probability & Statistics STAT355 - Probability & Statistics Instructor: Kofi Placid Adragni Fall 2011 Chap 1 - Overview and Descriptive Statistics 1.1 Populations, Samples, and Processes 1.2 Pictorial and Tabular Methods in Descriptive

More information

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

More information

Demographics of Atlanta, Georgia:

Demographics of Atlanta, Georgia: Demographics of Atlanta, Georgia: A Visual Analysis of the 2000 and 2010 Census Data 36-315 Final Project Rachel Cohen, Kathryn McKeough, Minnar Xie & David Zimmerman Ethnicities of Atlanta Figure 1: From

More information

Sampling and Descriptive Statistics

Sampling and Descriptive Statistics Sampling and Descriptive Statistics Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University Reference: 1. W. Navidi. Statistics for Engineering and Scientists.

More information

MEASURES OF VARIATION

MEASURES OF VARIATION NORMAL DISTRIBTIONS MEASURES OF VARIATION In statistics, it is important to measure the spread of data. A simple way to measure spread is to find the range. But statisticians want to know if the data are

More information

Using SPSS, Chapter 2: Descriptive Statistics

Using SPSS, Chapter 2: Descriptive Statistics 1 Using SPSS, Chapter 2: Descriptive Statistics Chapters 2.1 & 2.2 Descriptive Statistics 2 Mean, Standard Deviation, Variance, Range, Minimum, Maximum 2 Mean, Median, Mode, Standard Deviation, Variance,

More information

Foundation of Quantitative Data Analysis

Foundation of Quantitative Data Analysis Foundation of Quantitative Data Analysis Part 1: Data manipulation and descriptive statistics with SPSS/Excel HSRS #10 - October 17, 2013 Reference : A. Aczel, Complete Business Statistics. Chapters 1

More information

Intro to Statistics 8 Curriculum

Intro to Statistics 8 Curriculum Intro to Statistics 8 Curriculum Unit 1 Bar, Line and Circle Graphs Estimated time frame for unit Big Ideas 8 Days... Essential Question Concepts Competencies Lesson Plans and Suggested Resources Bar graphs

More information

Practice#1(chapter1,2) Name

Practice#1(chapter1,2) Name Practice#1(chapter1,2) Name Solve the problem. 1) The average age of the students in a statistics class is 22 years. Does this statement describe descriptive or inferential statistics? A) inferential statistics

More information

SECTION 2-1: OVERVIEW SECTION 2-2: FREQUENCY DISTRIBUTIONS

SECTION 2-1: OVERVIEW SECTION 2-2: FREQUENCY DISTRIBUTIONS SECTION 2-1: OVERVIEW Chapter 2 Describing, Exploring and Comparing Data 19 In this chapter, we will use the capabilities of Excel to help us look more carefully at sets of data. We can do this by re-organizing

More information

Section 1.1 Exercises (Solutions)

Section 1.1 Exercises (Solutions) Section 1.1 Exercises (Solutions) HW: 1.14, 1.16, 1.19, 1.21, 1.24, 1.25*, 1.31*, 1.33, 1.34, 1.35, 1.38*, 1.39, 1.41* 1.14 Employee application data. The personnel department keeps records on all employees

More information

Interpreting Data in Normal Distributions

Interpreting Data in Normal Distributions Interpreting Data in Normal Distributions This curve is kind of a big deal. It shows the distribution of a set of test scores, the results of rolling a die a million times, the heights of people on Earth,

More information

Northumberland Knowledge

Northumberland Knowledge Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about

More information

Numeracy Targets. I can count at least 20 objects

Numeracy Targets. I can count at least 20 objects Targets 1c I can read numbers up to 10 I can count up to 10 objects I can say the number names in order up to 20 I can write at least 4 numbers up to 10. When someone gives me a small number of objects

More information

SPSS Manual for Introductory Applied Statistics: A Variable Approach

SPSS Manual for Introductory Applied Statistics: A Variable Approach SPSS Manual for Introductory Applied Statistics: A Variable Approach John Gabrosek Department of Statistics Grand Valley State University Allendale, MI USA August 2013 2 Copyright 2013 John Gabrosek. All

More information

+ Chapter 1 Exploring Data

+ Chapter 1 Exploring Data Chapter 1 Exploring Data Introduction: Data Analysis: Making Sense of Data 1.1 Analyzing Categorical Data 1.2 Displaying Quantitative Data with Graphs 1.3 Describing Quantitative Data with Numbers Introduction

More information

EXPLORING SPATIAL PATTERNS IN YOUR DATA

EXPLORING SPATIAL PATTERNS IN YOUR DATA EXPLORING SPATIAL PATTERNS IN YOUR DATA OBJECTIVES Learn how to examine your data using the Geostatistical Analysis tools in ArcMap. Learn how to use descriptive statistics in ArcMap and Geoda to analyze

More information

Week 11 Lecture 2: Analyze your data: Descriptive Statistics, Correct by Taking Log

Week 11 Lecture 2: Analyze your data: Descriptive Statistics, Correct by Taking Log Week 11 Lecture 2: Analyze your data: Descriptive Statistics, Correct by Taking Log Instructor: Eakta Jain CIS 6930, Research Methods for Human-centered Computing Scribe: Chris(Yunhao) Wan, UFID: 1677-3116

More information

consider the number of math classes taken by math 150 students. how can we represent the results in one number?

consider the number of math classes taken by math 150 students. how can we represent the results in one number? ch 3: numerically summarizing data - center, spread, shape 3.1 measure of central tendency or, give me one number that represents all the data consider the number of math classes taken by math 150 students.

More information

Lecture 2. Summarizing the Sample

Lecture 2. Summarizing the Sample Lecture 2 Summarizing the Sample WARNING: Today s lecture may bore some of you It s (sort of) not my fault I m required to teach you about what we re going to cover today. I ll try to make it as exciting

More information

Mathematics Content: Pie Charts; Area as Probability; Probabilities as Percents, Decimals & Fractions

Mathematics Content: Pie Charts; Area as Probability; Probabilities as Percents, Decimals & Fractions Title: Using the Area on a Pie Chart to Calculate Probabilities Mathematics Content: Pie Charts; Area as Probability; Probabilities as Percents, Decimals & Fractions Objectives: To calculate probability

More information

DESCRIPTIVE STATISTICS & DATA PRESENTATION*

DESCRIPTIVE STATISTICS & DATA PRESENTATION* Level 1 Level 2 Level 3 Level 4 0 0 0 0 evel 1 evel 2 evel 3 Level 4 DESCRIPTIVE STATISTICS & DATA PRESENTATION* Created for Psychology 41, Research Methods by Barbara Sommer, PhD Psychology Department

More information

Mean = (sum of the values / the number of the value) if probabilities are equal

Mean = (sum of the values / the number of the value) if probabilities are equal Population Mean Mean = (sum of the values / the number of the value) if probabilities are equal Compute the population mean Population/Sample mean: 1. Collect the data 2. sum all the values in the population/sample.

More information

CHAPTER THREE. Key Concepts

CHAPTER THREE. Key Concepts CHAPTER THREE Key Concepts interval, ordinal, and nominal scale quantitative, qualitative continuous data, categorical or discrete data table, frequency distribution histogram, bar graph, frequency polygon,

More information

Scope and Sequence KA KB 1A 1B 2A 2B 3A 3B 4A 4B 5A 5B 6A 6B

Scope and Sequence KA KB 1A 1B 2A 2B 3A 3B 4A 4B 5A 5B 6A 6B Scope and Sequence Earlybird Kindergarten, Standards Edition Primary Mathematics, Standards Edition Copyright 2008 [SingaporeMath.com Inc.] The check mark indicates where the topic is first introduced

More information

Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

More information

Statistics Revision Sheet Question 6 of Paper 2

Statistics Revision Sheet Question 6 of Paper 2 Statistics Revision Sheet Question 6 of Paper The Statistics question is concerned mainly with the following terms. The Mean and the Median and are two ways of measuring the average. sumof values no. of

More information

Statistics. Measurement. Scales of Measurement 7/18/2012

Statistics. Measurement. Scales of Measurement 7/18/2012 Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does

More information

Introduction; Descriptive & Univariate Statistics

Introduction; Descriptive & Univariate Statistics Introduction; Descriptive & Univariate Statistics I. KEY COCEPTS A. Population. Definitions:. The entire set of members in a group. EXAMPLES: All U.S. citizens; all otre Dame Students. 2. All values of

More information

Module 2: Introduction to Quantitative Data Analysis

Module 2: Introduction to Quantitative Data Analysis Module 2: Introduction to Quantitative Data Analysis Contents Antony Fielding 1 University of Birmingham & Centre for Multilevel Modelling Rebecca Pillinger Centre for Multilevel Modelling Introduction...

More information

AP Statistics Solutions to Packet 2

AP Statistics Solutions to Packet 2 AP Statistics Solutions to Packet 2 The Normal Distributions Density Curves and the Normal Distribution Standard Normal Calculations HW #9 1, 2, 4, 6-8 2.1 DENSITY CURVES (a) Sketch a density curve that

More information

First Midterm Exam (MATH1070 Spring 2012)

First Midterm Exam (MATH1070 Spring 2012) First Midterm Exam (MATH1070 Spring 2012) Instructions: This is a one hour exam. You can use a notecard. Calculators are allowed, but other electronics are prohibited. 1. [40pts] Multiple Choice Problems

More information

Part 2: Data Visualization How to communicate complex ideas with simple, efficient and accurate data graphics

Part 2: Data Visualization How to communicate complex ideas with simple, efficient and accurate data graphics Part 2: Data Visualization How to communicate complex ideas with simple, efficient and accurate data graphics Why visualize data? The human eye is extremely sensitive to differences in: Pattern Colors

More information

Sta 309 (Statistics And Probability for Engineers)

Sta 309 (Statistics And Probability for Engineers) Instructor: Prof. Mike Nasab Sta 309 (Statistics And Probability for Engineers) Chapter 2 Organizing and Summarizing Data Raw Data: When data are collected in original form, they are called raw data. The

More information

Bridging Documents for Mathematics

Bridging Documents for Mathematics Bridging Documents for Mathematics 5 th /6 th Class, Primary Junior Cycle, Post-Primary Primary Post-Primary Card # Strand(s): Number, Measure Number (Strand 3) 2-5 Strand: Shape and Space Geometry and

More information

Exploratory Data Analysis

Exploratory Data Analysis Exploratory Data Analysis Learning Objectives: 1. After completion of this module, the student will be able to explore data graphically in Excel using histogram boxplot bar chart scatter plot 2. After

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Suppose following data have been collected (heights of 99 five-year-old boys) 117.9 11.2 112.9 115.9 18. 14.6 17.1 117.9 111.8 16.3 111. 1.4 112.1 19.2 11. 15.4 99.4 11.1 13.3 16.9

More information

Mathematical Conventions Large Print (18 point) Edition

Mathematical Conventions Large Print (18 point) Edition GRADUATE RECORD EXAMINATIONS Mathematical Conventions Large Print (18 point) Edition Copyright 2010 by Educational Testing Service. All rights reserved. ETS, the ETS logo, GRADUATE RECORD EXAMINATIONS,

More information