Descriptive Statistics



Similar documents
Measures of Spread and Boxplots Discrete Math, Section 9.4

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.

Sampling Distribution And Central Limit Theorem

5: Introduction to Estimation

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

I. Chi-squared Distributions

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles

Determining the sample size

Biology 171L Environment and Ecology Lab Lab 2: Descriptive Statistics, Presenting Data and Graphing Relationships

Overview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Math C067 Sampling Distributions

1. C. The formula for the confidence interval for a population mean is: x t, which was

Maximum Likelihood Estimators.

Overview of some probability distributions.

Normal Distribution.

Exploratory Data Analysis

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

1 Computing the Standard Deviation of Sample Means

Inference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval

Lesson 15 ANOVA (analysis of variance)

This document contains a collection of formulas and constants useful for SPC chart construction. It assumes you are already familiar with SPC.

3. Greatest Common Divisor - Least Common Multiple

Confidence Intervals for One Mean

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

Chapter 7 Methods of Finding Estimators

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

PSYCHOLOGICAL STATISTICS

Mann-Whitney U 2 Sample Test (a.k.a. Wilcoxon Rank Sum Test)

hp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation

Chapter 7: Confidence Interval and Sample Size

Chapter 5: Inner Product Spaces


Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem

Lesson 17 Pearson s Correlation Coefficient

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

CS103X: Discrete Structures Homework 4 Solutions

Quadrat Sampling in Population Ecology

Hypothesis testing. Null and alternative hypotheses

One-sample test of proportions

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here).

Output Analysis (2, Chapters 10 &11 Law)

Basic Elements of Arithmetic Sequences and Series

Properties of MLE: consistency, asymptotic normality. Fisher information.

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx

Introduction to Demand Planning & Forecasting

1 Correlation and Regression Analysis

Chapter 14 Nonparametric Statistics

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean

THE ARITHMETIC OF INTEGERS. - multiplication, exponentiation, division, addition, and subtraction

Section 11.3: The Integral Test

Topic 5: Confidence Intervals (Chapter 9)

Confidence Intervals

1. MATHEMATICAL INDUCTION

Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find

Incremental calculation of weighted mean and variance

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Convexity, Inequalities, and Norms

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S 2 CONTROL CHART FOR THE CHANGES IN A PROCESS

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

Statistical inference: example 1. Inferential Statistics

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature.

Listing terms of a finite sequence List all of the terms of each finite sequence. a) a n n 2 for 1 n 5 1 b) a n for 1 n 4 n 2

A probabilistic proof of a binomial identity

Soving Recurrence Relations

Descriptive Statistics

Department of Computer Science, University of Otago

BINOMIAL EXPANSIONS In this section. Some Examples. Obtaining the Coefficients

LECTURE 13: Cross-validation

Asymptotic Growth of Functions

Modified Line Search Method for Global Optimization

Trigonometric Form of a Complex Number. The Complex Plane. axis. ( 2, 1) or 2 i FIGURE The absolute value of the complex number z a bi is

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 9 Spring 2006

Universal coding for classes of sources

S. Tanny MAT 344 Spring be the minimum number of moves required.

How To Solve The Homewor Problem Beautifully

5 Boolean Decision Trees (February 11)


Theorems About Power Series

Now here is the important step

Exam 3. Instructor: Cynthia Rudin TA: Dimitrios Bisias. November 22, 2011

MATH 083 Final Exam Review

CHAPTER 3 DIGITAL CODING OF SIGNALS

SEQUENCES AND SERIES CHAPTER

NATIONAL SENIOR CERTIFICATE GRADE 12

FIBONACCI NUMBERS: AN APPLICATION OF LINEAR ALGEBRA. 1. Powers of a matrix

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling

BASIC STATISTICS. f(x 1,x 2,..., x n )=f(x 1 )f(x 2 ) f(x n )= f(x i ) (1)

A Recursive Formula for Moments of a Binomial Distribution

Your organization has a Class B IP address of Before you implement subnetting, the Network ID and Host ID are divided as follows:

Solutions to Selected Problems In: Pattern Classification by Duda, Hart, Stork

Transcription:

Descriptive Statistics We leared to describe data sets graphically. We ca also describe a data set umerically. Measures of Locatio Defiitio The sample mea is the arithmetic average of values. We deote the sample mea by y = 1 Σ y i i=1 = y 1 + y 2 + + y Example Exercise 2.15 Y = weight gai (lb) of lambs o a special diet for = 6 lambs. Compute the sample mea for the resultig data set. 11 13 19 2 10 1 Figure 2.27 illusrates that the sample mea ca be viewed as a balacig poit i the data The sample media is the value of the data earest their middle. We call the media Q 2. To fid the media of a data set Put the data i order The media is o The middle value if is odd o The average of the two middle values if is eve Example Exercise 2.17 lambs o a special diet agai The ordered values are y (1) =1, y (2) =2, y (3) =10, y (4) =11, y (5) =13, y (6) =19 Fid Q 2. Page 1

Notice the ew otatio. A lower case letter deotig the outcome of a radom variable with parethesis i the subscript y (i) deotes the i th observatio i order from smallest to largest. i=1 deotes the smallest observatio (miimum) ad i= deotes the largest (maximum). Questio: Which measure of locatio (or measure of ceter) do we report? Mea or media? To aswer this, explore what happes o certai data sets to the relatio betwee the mea ad media. Cosider two data sets. Fid the mea ad the media for both. 1 2 3 4 5 1 2 3 4 20 What happeed ad why? So the mea ad media ca idicate skewess. Data skewed right, mea media Data skewed left, mea media Data symmetric, mea media Measures of Dispersio The quartiles of a data set are poits that separate the data ito quarters (or fourths). Q 1 separates the lower quarter (25%) from the upper three quarters (75%) Q 2 separates the lower two quarters (50%)from the upper two quarters (50%) Q 3 separates the lower three quarters (75%) from the upper quarter (25%) Notice the media is the secod quartile. Oe way to report the dispersio (or spread) of a data set is to report the iter quartile rage. Defiitio The iter quartile rage is IQR = Q 3 Q 1 Defiitio The sample rage is y () y (1) = max mi Defiitio The five umber summary is {y (1), Q 1, Q 2, Q 3, y () } Descriptive Statistics Page 2

Example (from Example 2.22) I a commo biology experimet, radishes were grow i total darkess ad the legth (mm) of each radish shoot was measured at the ed of three days. Fid the five umber summary for these data. 8 10 11 15 15 15 20 20 22 25 29 30 33 35 37 A boxplot (a.k.a. box ad whisker plot) is a graphical display of the five umber summary. The box spas the quartiles ad the whiskers exted from the quartiles to the mi / max. Boxplots are ofte used for comparative purposes as i figure 2.32. Radish Legth at Three Days Grow Uder Three Coditios Descriptive Statistics Page 3

Defiitio A outlier is a observatio that differs dramatically from the rest of the data. Formally y i is a outlier if Example 2.25 Y = radish growth i full light coditio. The data are 3 5 5 7 7 8 9 10 10 10 10 14 20 21 Fid ay outliers. Defiitio The sample variace is S 2 = 1-1 Σi =1 ( Y i - Y ) 2 Defiitio The sample stadard deviatio is S = S 2 Example 2.28 I a experimet o chrysathemums, a botaist measured the stem elogatio (mm i 7 days) of five plats grow o the same greehouse bech 76 72 65 70 82 Fid the sample stadard deviatio. Descriptive Statistics Page 4

Empirical Rule For uimodal, ot too skewed data sets, the empirical rule states the followig: ~ 68% of the data lie betwee Y s ad Y s ~ 95% of the data lie betwee Y 2s ad Y 2s > 99% of the data lie betwee Y 3s ad Y 3s Example 2.36 Suppose Y = pulse rate after 5 miutes of exercise. For = 28 subjects, we fid Y = 98 (beats/mi) ad S = 13.4 (beats/mi). Thus, e.g., from the empirical rule we expect ~95% of the data to lie betwee 98 (2)(13.4) = 98 26.8 = 71.2 beats/mi ad 98 + (2)(13.4) = 98 + 26.8 = 124.8 beats/mi Populatio Defiitio The populatio is the larger group of subjects (orgaisms, plots, regios, ecosystems, etc.) o which we wish to draw ifereces Defiitio A parameter is a quatified populatio characteristic. Defiitio A statistic is a sample quatity used to estimate a populatio parameter Defiitio The populatio proportio is the proportio of subjects exhibitig a particular trait or outcome i the populatio. (It geeralizes to the probability that ay populatio elemet will exhibit the trait.) NOTATION: p Defiitio The SAMPLE PROPORTION is the umber of sample elemets exhibitig the trait, divided by the sample size,. NOTATION: p Descriptive Statistics Page 5