sample median Sample quartiles sample deciles sample quantiles sample percentiles Exercise 1 five number summary # Create and view a sorted

Similar documents
Section 1.3 Exercises (Solutions)

A Percentile is a number in a SORTED LIST that has a given percentage of the data below it.

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Each function call carries out a single task associated with drawing the graph.

Chapter 3 RANDOM VARIATE GENERATION

CALCULATIONS & STATISTICS

Estimating the Average Value of a Function

Graphics in R. Biostatistics 615/815

6.4 Normal Distribution

TEST 2 STUDY GUIDE. 1. Consider the data shown below.

Probability Distributions

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

Statistics Revision Sheet Question 6 of Paper 2

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

The Normal Distribution

Mathematics Instructional Cycle Guide

Math Review. for the Quantitative Reasoning Measure of the GRE revised General Test

Exploratory Data Analysis

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

6 3 The Standard Normal Distribution

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1.

Chapter 4 - Lecture 1 Probability Density Functions and Cumul. Distribution Functions

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

2 Describing, Exploring, and

Descriptive Statistics

Chapter 2: Frequency Distributions and Graphs

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

Copyright 2011 Casa Software Ltd. Centre of Mass

Fourth Grade Math Standards and "I Can Statements"

1.2 GRAPHS OF EQUATIONS. Copyright Cengage Learning. All rights reserved.

Variables. Exploratory Data Analysis

Lecture 14. Chapter 7: Probability. Rule 1: Rule 2: Rule 3: Nancy Pfenning Stats 1000

Answer Key for California State Standards: Algebra I

AP Statistics Solutions to Packet 2

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

Mathematical Conventions Large Print (18 point) Edition

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

Using SPSS, Chapter 2: Descriptive Statistics

WEEK #22: PDFs and CDFs, Measures of Center and Spread

Lecture 7: Continuous Random Variables

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller

COMMON CORE STATE STANDARDS FOR MATHEMATICS 3-5 DOMAIN PROGRESSIONS

EXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck!

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

LINEAR EQUATIONS IN TWO VARIABLES

Getting started with qplot

4. Continuous Random Variables, the Pareto and Normal Distributions

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Minnesota Academic Standards

Descriptive Statistics and Measurement Scales

Key Concept. Density Curve

Mathematical Conventions. for the Quantitative Reasoning Measure of the GRE revised General Test

An Introduction to Point Pattern Analysis using CrimeStat

Myth or Fact: The Diminishing Marginal Returns of Variable Creation in Data Mining Solutions

Lesson 20. Probability and Cumulative Distribution Functions

Biggar High School Mathematics Department. National 5 Learning Intentions & Success Criteria: Assessing My Progress

Exploratory data analysis (Chapter 2) Fall 2011

Solutions to Homework 6 Statistics 302 Professor Larget

Mathematics Pre-Test Sample Questions A. { 11, 7} B. { 7,0,7} C. { 7, 7} D. { 11, 11}

MA107 Precalculus Algebra Exam 2 Review Solutions

BPS Math Year at a Glance (Adapted from A Story Of Units Curriculum Maps in Mathematics K-5) 1

5/31/ Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

Math 0980 Chapter Objectives. Chapter 1: Introduction to Algebra: The Integers.

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, cm

Common Tools for Displaying and Communicating Data for Process Improvement

Descriptive Statistics

Continuous Random Variables

ICASL - Business School Programme

Lesson 17: Margin of Error When Estimating a Population Proportion

Graphs. Exploratory data analysis. Graphs. Standard forms. A graph is a suitable way of representing data if:

Graphical Integration Exercises Part Four: Reverse Graphical Integration

7. Normal Distributions

CAMI Education linked to CAPS: Mathematics

Normal distribution. ) 2 /2σ. 2π σ

Geostatistics Exploratory Analysis

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

1 J (Gr 6): Summarize and describe distributions.

Measurement with Ratios

Viewing Ecological data using R graphics

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Algebra Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard

Scope and Sequence KA KB 1A 1B 2A 2B 3A 3B 4A 4B 5A 5B 6A 6B

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

Chapter 4. Probability and Probability Distributions

Lecture 5 : The Poisson Distribution

3: Summary Statistics

Exploratory Spatial Data Analysis

Selected practice exam solutions (part 5, item 2) (MAT 360)

Grade 6 Mathematics Assessment. Eligible Texas Essential Knowledge and Skills

In mathematics, there are four attainment targets: using and applying mathematics; number and algebra; shape, space and measures, and handling data.

How To Check For Differences In The One Way Anova

Chapter 3. The Normal Distribution

Mathematics. What to expect Resources Study Strategies Helpful Preparation Tips Problem Solving Strategies and Hints Test taking strategies

Possible Stage Two Mathematics Test Topics

Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions.

CURVE FITTING LEAST SQUARES APPROXIMATION

Maths Workshop for Parents 2. Fractions and Algebra

Module 5: Measuring (step 3) Inequality Measures

Means, standard deviations and. and standard errors

Transcription:

Sample uartiles We have seen that the sample median of a data set {x 1, x, x,, x n }, sorted in increasing order, is a value that divides it in such a way, that exactly half (i.e., 50%) of the sample observations fall below the median, and exactly half (50%) are above it. If the sample size n is odd, then precisely one of the data values will lie at the exact center; this value is located at position (n + 1)/ in the data set, and corresponds to the median. If the sample size n is even however, then the exact middle of the data set will fall between two values, located at positions n/ and n/ + 1. In this case, it is customary to define the median as the average of the two values, which lies midway between them. Sample quartiles are defined similarly: they divide the data set into quarters (i.e., 5%). The first quartile, designated, 1, marks the cutoff below which lies the lowest 5% of the sample. Likewise, the second quartile marks the cutoff between the second lowest 5% and second highest 5% of the sample; note that this coincides with the sample median! Finally, the third quartile marks the cutoff above which lies the highest 5% of the sample. This procedure of ranking data is not limited to quartiles. For example, if we wanted to divide a sample into ten intervals of 10% each, the cutoff points would be known as sample deciles. In general, the cutoff values that divide a data set into any given proportion(s) are known as sample quantiles or sample percentiles. For example, receiving an exam score in the 90 th percentile means that 90% of the scores are below it, and 10% are above. For technical reasons, the strict definitions of quartiles and other percentiles follow rigorous mathematical formulas; however, these formulas can differ slightly from one reference to another. As a consequence, different statistical computing packages frequently output slightly different values from one another. On the other hand, these differences are usually very minor, especially for very large data sets. Exercise 1 (not required): Using the R code given below, generate and view a random sample of n = 40 positive values, and find the quartiles via the so-called five number summary that is output. # Create and view a sorted sample, rounded to decimal places. x = round(sort(rchisq(40, 1)), ) print(x) y = rep(0, 40) # Plot it along the real number line. plot.new() plot(x, y, pch = 19, cex =.5, xlim = range(0, 1 + max(x)), ylim = range(0, 0.01), ylab = "", axes = F) axis(1) # Identify the quartiles. summary(x) # Plot the median (with a filled red circle). = summary(x)[] points(, 0, col = "red", pch = 19) # Plot the first quartile 1 (with a filled blue circle). 1 = summary(x)[] points(1, 0, col = "blue", pch = 19) # Plot the third quartile (with a filled green circle). = summary(x)[5] points(, 0, col = "green", pch = 19)

Exercise (not required): Using the same sample, sketch and interpret boxplot(x, pch = 19) identifying all relevant features. From the results of these two exercises, what can you conclude about the general shape of this distribution? NOTE: Finding the approximate quartiles (or other percentiles) of grouped data can be a little more challenging. Refer to the Lecture Notes, pages.-4 to.-6, and especially.-11. CONTINUED

To illustrate the idea of estimating quartiles from the density histogram of grouped data, let us consider a previous, posted exam problem (Fall 01)..0.0.5 First, we find the median, i.e., the value on the X-axis that divides the total area of 1 into.50 area on either side of it. By inspection, the cumulative area below the left endpoint 4 is equal to +.0 =.0, too small. Likewise, the cumulative area below the right endpoint 1 is +.0 +.0 =.60, too big. Therefore, in order to have.50 area both below and above it, must lie in the third interval [4, 1), in such a way that its corresponding rectangle of.0 area is split into left and right sub-areas of.0 +, respectively:.0 Total Area below = +.0 +.0 =.50 Total Area above = +.5 + =.50.0.5

Now just focus on this particular rectangle Density =.075 A =.0 B = a = 4 b = 1 and use any of the three boxed formulas on page.-5 of the Lecture Notes, with the quantities labeled above. For example, the third formula (which I think is easiest) yields Ab + Ba (.)(1) + (.1)(4).8 = = = = A+ B. +.1. 9.. The other quartiles are computed similarly. For example, the first quartile 1 is the cutoff for the lowest 5% of the area. By the same logic, this value must lie in the second interval [, 4), and split its corresponding rectangle of.0 area into left and right sub-areas of +.05, respectively: A =.05 = B sum =.5 Therefore, Density = 0.1 Ab + Ba ()(4) + (.05)().7 1 = = = =.5. A+ B +.05. a = 1 4 = b

Likewise, the third quartile is the cutoff for the highest 5% of the area. By the same logic as before, this value must lie in the fourth interval [1, ), and split its corresponding rectangle of.5 area into left and right sub-areas of +, respectively: sum =.5 Therefore, Density =.05 A = B = a = 1 b = Ab + Ba ()() + ()(1) 4.5 = = = = A+ B +.5 18. Estimating a sample proportion between two known quantile values is done pretty much the same way, except in reverse, using the formulas on the bottom of the same page,.-5. For example, the same problem asks to estimate the sample proportion in the interval [9, 0). This interval consists of the disjoint union of the subintervals [9, 1), [1, ), and [, 0). The first subinterval [9, 1) splits the corresponding rectangle of area.0 over the class interval [4, 1) into unknown left and right subareas A and B, respectively, as shown below. Since it is the right subarea B we want, we use the formula B = (b ) Density = (1 9).075 =.115. The next subinterval [1, ) contains the entire corresponding rectangular area of.5. The last subinterval [, 0) splits the corresponding rectangle of area over the class interval [, 4) into unknown left and right subareas A and B, respectively, as shown below. In this case, it is the left subarea A that we want, so we use A = ( a) Density = (0 ).015 =. Adding these three areas together yields our answer,.115 +.5 + =.465. Density =.075 A B =? a = 4 = 9 b = 1 Density =.015 A =? a = = 0 b = 4 B Page.-6 gives a way to calculate quartiles, etc., from the cumulative distribution function (cdf) table, without using the density histogram.