Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)



Similar documents
4. Continuous Random Variables, the Pareto and Normal Distributions

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Exploratory Data Analysis

Descriptive Statistics

Dongfeng Li. Autumn 2010

For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Random variables P(X = 3) = P(X = 3) = 1 8, P(X = 1) = P(X = 1) = 3 8.

Covariance and Correlation

Chapter 4. Probability and Probability Distributions

How To Write A Data Analysis

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

THE BINOMIAL DISTRIBUTION & PROBABILITY

ST 371 (IV): Discrete Random Variables

MBA 611 STATISTICS AND QUANTITATIVE METHODS

Chapter 4 Lecture Notes

An Introduction to Basic Statistics and Probability

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

Week 3&4: Z tables and the Sampling Distribution of X

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March Due:-March 25, 2015.

5. Continuous Random Variables

Algebra I Vocabulary Cards

5.1 Identifying the Target Parameter

AP STATISTICS REVIEW (YMS Chapters 1-8)

3.4 Statistical inference for 2 populations based on two samples

MULTIVARIATE PROBABILITY DISTRIBUTIONS

Descriptive Statistics

6.4 Normal Distribution

STAT 830 Convergence in Distribution

Exercise 1.12 (Pg )

Chapter 3 RANDOM VARIATE GENERATION

Exploratory data analysis (Chapter 2) Fall 2011

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Practice problems for Homework 11 - Point Estimation

Chapter 5. Random variables

Normal distribution. ) 2 /2σ. 2π σ

Estimation and Confidence Intervals

E3: PROBABILITY AND STATISTICS lecture notes

Opgaven Onderzoeksmethoden, Onderdeel Statistiek

Notes on Continuous Random Variables

Chapter 4 Statistical Inference in Quality Control and Improvement. Statistical Quality Control (D. C. Montgomery)

A and B This represents the probability that both events A and B occur. This can be calculated using the multiplication rules of probability.

Correlation in Random Variables

AP Statistics Solutions to Packet 2

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

Simple Regression Theory II 2010 Samuel L. Baker

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Lecture Notes Module 1

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

Geostatistics Exploratory Analysis

ELEMENTARY STATISTICS

1.1 Introduction, and Review of Probability Theory Random Variable, Range, Types of Random Variables CDF, PDF, Quantiles...

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

DATA INTERPRETATION AND STATISTICS

Final Exam Practice Problem Answers

Lecture 3: Continuous distributions, expected value & mean, variance, the normal distribution

CHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is.

Chapter 7 Section 1 Homework Set A

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

1.5 Oneway Analysis of Variance

The Normal Distribution. Alan T. Arnholt Department of Mathematical Sciences Appalachian State University

MATH4427 Notebook 2 Spring MATH4427 Notebook Definitions and Examples Performance Measures for Estimators...

3: Summary Statistics

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

Statistics 100A Homework 4 Solutions

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Joint Exam 1/P Sample Exam 1

Without data, all you are is just another person with an opinion.

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition

Northumberland Knowledge

Multivariate normal distribution and testing for means (see MKB Ch 3)

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

TEACHER NOTES MATH NSPIRED

Variables. Exploratory Data Analysis

Fairfield Public Schools

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

Random variables, probability distributions, binomial random variable

Lecture 6: Discrete & Continuous Probability and Random Variables

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

HYPOTHESIS TESTING: POWER OF THE TEST

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

Definition: Suppose that two random variables, either continuous or discrete, X and Y have joint density

Lesson 4 Measures of Central Tendency

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Using R for Linear Regression

How To Understand And Solve A Linear Programming Problem

Statistics 104: Section 6!

Lecture 8. Confidence intervals and the central limit theorem

Diagrams and Graphs of Statistical Data

Introduction to Hypothesis Testing

Chapter 2. Hypothesis testing in one population

Stat 5102 Notes: Nonparametric Tests and. confidence interval

UNIT I: RANDOM VARIABLES PART- A -TWO MARKS

Martingale Ideas in Elementary Probability. Lecture course Higher Mathematics College, Independent University of Moscow Spring 1996

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

Random Variables. Chapter 2. Random Variables 1

Transcription:

Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume a population consists of N values. Sample: An observed subset of population values. We assume a sample consists of n values. Graphic Summaries Dotplot - A graphic to display the original data. Draw a number line, and put a dot on it for each observation. For identical or really close observations, stack the dots... :.:.: :.. :.:..:: :: :..... :. ::.: :.:.:::::::::::..::::::.... :. +---------+---------+---------+---------+---------+------- 40 50 60 70 80 90 Histogram - A graphic that displays the shape of numeric data by grouping it in intervals. 1. Choose evenly spaced categories 2. Count the number of observations in each group 3. Draw a bar graph with the height of each bar equal to the number of observations in the corresponding interval. Stem and leaf plot Similar to a dotplot. Data are grouped according to their leading digits, and the last digit is used as a plotting symbol (like a dot in the dotplot). The left digits are a cumulative count on each side of the middle. The bracketed number is how many observations are in the middle. The middle column of digits are the first digits of the number, and the bars are the last digit. 2 4 57 5 5 133 13 5 56777899 23 6 0222334444 39 6 5667777788888899 (18) 7 000000111222222334 38 7 5555555666668899999999 16 8 0022333444 2

Numeric Summaries of Data Suppose that we have n observations, labeled x 1, x 2,..., x n. Then n x i means x i = x 1 + x 2 + x 3 +... + x n Some other relations are: f(x i ) = f(x 1 ) + f(x 2 ) + f(x 3 ) +... + f(x n ), for any function f, cx i = c x i for any constant c, (ax i + by i ) = a x i + b y i, for any constants a and b. Measures of Location - numeric summaries of the center of a distribution. Mean (or average) µ = 1 N N x i (population) x = 1 x i n (sample) Median - the middle observation The middle observation of the sorted data if n is odd, otherwise the average of the two middle values. Measures of Dispersion - numeric summaries of the spread or variation of a distribution. Variance and Standard Deviation σ 2 = 1 N N (x i µ) 2 (population) s 2 = 1 (x i x) 2 n 1 Population and sample standard deviations (σ and s) are just the square roots of these quantities. Interpreting σ Chebyshev s rule: for any population at least 75% of the observations lie within 2σ of µ, at least 89% of the observations lie within 3σ of µ, at least 100(1 1/m 2 )% of the observations lie within m σ of the mean µ. 3

IQR (Interquartile Range): The distance between the (n + 1)/4th and 3 (n + 1)/4th observations in an ordered dataset. These two values are called the first and third quartiles. Measure of symmetry : Skewness skewness = (x i x) 3 /n s 3 Negative means left skewed, 0 means symmetric, positive means right skewed. Measure of heavy tails : Kurtosis kurtosis = (x i x) 4 /n s 4 3 A normal distribution has a kurtosis of 3, so if we subtract 3, interpretations are relative to that. Positive values mean a sharper peak, and negative values mean a flatter top than a normal distribution. Box-and-whisker plot: A graphic that summarizes the data using the median and quartiles, and displays outliers. Good for comparing several groups of data o first quartile median third quartile largest value below Q3 + 1.5 IQR outlier 4

Probability (Ch. 5) Definitions and Set Theory: Random experiment: A process leading to at least two possible outcomes with uncertainty as to which will occur. Basic outcome: A possible outcome of the random experiment. Sample space: The set of all basic outcomes. Event: A set of basic outcomes from the sample space. An event is said to occur if any one of its constituent basic outcomes occurs. Combining events: let A and B be two events. Technical Symbol Pronounced Meaning Union A B A or B A occurs or B occurs or both occur Intersection A B A and B A occurs and B occurs Complement A not A A does not occur Venn Diagrams: A B A B A A B A C B shaded = A B shaded = A B A shaded A, B mutually exclusive A,B,C collectively exhaustive Probability Postulates: 1. If A is any event in the sample space S, 0 P(A) 1 2. Let A be an event in S and let O i denote the basic outcomes. Then P(A) = A P(O i ), where the notation implies that the summation extends over all the basic outcomes in A. 3. P(S) = 1 Probability rules for combining events: P(A B) = P(A) + P(B) P(A B) P(A B) = P(A) + P(B) if A and B mutually exclusive. P(A) = 1 P(A) Conditional Probability: P(A B) P(A B) = provided P(B) > 0. P(B) P(A B) = P(A B)P(B) = P(B A)P(A) 5

Independence: Two events are Statistically Independent if and only if P(A B) = P(A)P(B) or equivalently P(A B) = P(A) and P(B A) = P(B). General case: events E 1, E 2,..., E k are independent if and only if Bivariate Probability P(E 1 E 2... E k ) = P(E 1 )P(E 2 )... P(E k ) Probabilities of outcomes for bivariate events: B 1 B 2 B 3 A 1 P(A 1 B 1 ) P(A 1 B 2 ) P(A 1 B 3 ) P(A 1 ) A 2 P(A 2 B 1 ) P(A 2 B 2 ) P(A 2 B 3 ) P(A 2 ) P(B 1 ) P(B 2 ) P(B 3 ) P(A 1 B 2 ) is a probability of A 1 and B 2 occurring. P(A 1 ) = P(A 1 B 1 ) + P(A 1 B 2 ) + P(A 1 B 3 ) is the marginal probability that A 1 occurs. If we think of A 1, A 2 as a group of attributes A, and B 1, B 2, B 3 as a group of attributes B, then A and B are independent only if every one of {A 1, A 2 } are independent of every one of {B 1, B 2, B 3 }. Discrete Random Variables (Ch. 6-7) Definitions Random Variable: (r.v.) A variable that takes on numerical values determined by the outcome of a random experiment. Discrete Random Variable: A r.v. that can take on no more than a countable number of values. Continuous Random Variable: A r.v. that can take any value in an interval. Notation: An upper case letter (e.g. X) will represent a r.v.; a lower case letter (e.g. x) will represent one of its possible values. 6

Discrete Probability Distributions The probability function, P X (x), of a discrete r.v. X gives the probability that X takes the value x: P X (x) = P(X = x) where the function is evaluated at all possible values of x. Properties: 1. P X (x) 0 for any value x 2. x P X (x) = 1 Cumulative probability function, F X (x 0 ) of a r.v. X: Properties: 1. 0 F X (x) 1 for any x 2. If a < b, then F X (a) F X (b). F X (x 0 ) = P (X x 0 ) = x x 0 P X (x). Expectation of Discrete Random Variables Expected value of a discrete r.v.: E(X) = µ X = x xp X (x). For any function g(x), Variance of a discrete r.v.: E(g(X)) = x g(x)p X (x). Var(X) = σ 2 X = E((X µ X) 2 ) = x (x µ X ) 2 P X (x) = E(X 2 ) µ 2 X. The standard deviation of X is σ X. Plug-In Rules: let X be a r.v., and a and b constants. Then E(a + bx) = a + be(x) Var(a + bx) = b 2 Var(X). This only works for linear functions. 7

Jointly Distributed Discrete Random Variables Joint Probability Function: Suppose X and Y are r.v. s. Their joint probability function gives the probability that simultaneously X = x and Y = y: Properties: 1. P X,Y (x, y) 0 for any pair (x, y) 2. x y P X,Y (x, y) = 1. Marginal probability function: P X,Y (x, y) = P({X = x} {Y = y}) P X (x) = y P X,Y (x, y). Conditional probability function: P Y X (y x) = P X,Y (x, y) P X (x) Independence: X and Y are independent if and only if P X,Y (x, y) = P X (x)p Y (y) for all possible (x, y) pairs Expectation: Let X and Y be r.v. s, and g(x, Y ) any function. Then E(g(X, Y )) = x g(x, y)p X,Y (x, y). y Conditional Expectation: Let X and Y be r.v. s, and suppose we know the conditional distribution of X for Y = y, labeled P X Y (x y). Then E(X Y = y) = x xp X Y (x y). Covariance: If E(X) = µ X and E(Y ) = µ Y, Cov(X, Y ) = E[(X µ X )(Y µ Y )] = x (x µ X )(y µ Y )P X,Y (x, y) y [ ] = E(XY ) µ X µ Y = xyp X,Y (x, y) µ X µ Y x y If two r.v. s are independent, their covariance is zero. The converse is not necessarily true. 8

Correlation: ρ XY = Cor(X, Y ) = 1 ρ XY 1 always. ρ XY = ±1 if and only if Y = a + bx Cov(X, Y ) Var(X)Var(Y ) (a,b constants). Plug-in rules: Let X and Y be r.v. s, and a, b constants. Then E(aX + by ) = ae(x) + be(y ) Var(aX + by ) = a 2 Var(X) + b 2 Var(Y ) + 2abCov(X, Y ) Binomial distribution Bernoulli Trials: A sequence of repeated experiments are Bernoulli trials if: 1. The result of each trial is either a success or failure. 2. The probability p of a success is the same for all trials. 3. The trials are independent. If X is the number of successes in n Bernoulli trials, X is a Binomial Random Variable. It has probability function: ( ) n P X (x) = p x (1 p) n x x Where ( ) n x counts the number of ways of getting x successes in n trials. The formula for ( n x) is ( n x ) = n! x!(n x)! where n! = n (n 1) (n 2)... 2 1. Mean and Variance: E(X) = np, Var(X) = np(1 p). Continuous Random Variables (Ch. 7) Probability Distributions Probability density function: A function f X (x) of the continuous r.v. X with the following properties: 9

1. f X (x) 0 for all values of x. 2. P(a X b) = the area under f X (x) between a and b, if a < b. 3. The total area under the curve is 1 4. The area under the curve to the left of any value x is F X (x), the probability that X does not exceed x. Cumulative distribution function: Same as before. P(a X b) = F X (b) F X (a) (provided a < b). Expectations, Variances, Covariances, etc. Same rules as for discrete r.v. s. The summation ( ) is replaced by the integral ( ), which is not necessary for this course. Normal Distribution Probability Density function: f X (x) = 1 2πσ 2 e (x µ)2 /2σ 2 for constants µ and σ such that < µ < and 0 < σ <. Mean and Variance: E(X) = µ Var(X) = σ 2 Notation: X N(µ, σ 2 ) means X is normal with mean µ and variance σ 2. If Z N(0, 1) we say it has a standard normal distribution. If X N(µ, σ 2 ) then Z = (X µ)/σ N(0, 1). Thus P(a < X < b) = P ( a µ σ < Z < b µ ) σ ( ) b µ = F Z σ ( ) a µ F Z σ Central Limit Theorem Let X 1, X 2,..., X n be n independent r.v. s, each with identical distributions, mean µ and variance σ 2. As n becomes large, X N(µ, σ 2 /n) X i = nx N(nµ, nσ 2 ) 10

Sampling & Sampling distributions Simple random sample: (or random sample) A method of randomly drawing n objects wihich are Independent and Identically Distributed (I.I.D.). Statistic: A function of the sample information. Sampling distribution of a statistic: The probability distribution of the values a statistic can take, over all possible samples of a fixed size n. Sampling distribution of the mean : Suppose X 1,... X n are a random sample from some population with mean µ X and variance σx 2. The sample mean is It has the following properties: X = X i. 1. E(X) = µ X 2. It has standard deviation σ X = σ X / n. 3. If the population distribution is normal, X N(µ X, σ 2 X ) = N(µ X, σ 2 X/n). 4. If the population distribution is not normal, but n is large, then (3) is roughly true. Sampling distribution of a proportion : Suppose the r.v. X is the number of successes in a binomial sample of n trials, whose probability of success is p. The sample proportion is ˆp = X/n It has the following properties: 1. E(ˆp) = p 2. It has standard deviation σˆp = p(1 p). n 3. If n is large (np(1 p) > 9 or roughly n 40), ˆp N(p, σ 2ˆp ) = N(p, p(1 p)/n). 11

Point Estimation Estimator: A random variable that depends on the sample information and whose realizations provide approximations to an unknown population parameter. Estimate: A specific realization of an estimator. Point estimator: An estimator that is a single number. Point estimate: A specific realization of a point estimator. Bias: Let ˆθ be an estimate of the parameter θ. The bias in ˆθ is Bias(ˆθ) = E(ˆθ) θ. If the bias is 0, ˆθ is an unbiased estimator. Efficiency: Let ˆθ 1 and ˆθ 2 be two estimators of θ, based on the same sample. Then θ 1 is more efficient than ˆθ 2 if Var(ˆθ 1 ) < Var(ˆθ 2 ). Interval Estimation Confidence Interval: Let θ be an unknown parameter. Suppose that from sample information, we can find random variables A and B such that P(A < θ < B) = 1 α. If the observed values are a and b, then (a, b) is a 100(1 α)% confidence interval for θ. The quantity (1 α) is called the probability content of the interval. Student s t distribution: Given a random sample of n observations with mean X and standard deviation s, from a normal population with mean µ, the random variable T = X µ s/ n follows the Student s t distribution with (n 1) degrees of freedom. For n > 30, the t distribution is quite close to a N(0, 1) distribution. 12

Data Parameter 100(1 α)% C.I. N(µ, σ 2 ), σ 2 known µ x ± z α/2 σ n mean µ, σ 2 unknown, n > 30 µ x ± z α/2 s n N(µ, σ 2 ), σ 2 unknown µ x ± t n 1,α/2 s n Binomial(n, p), np(1 p) > 9, or roughly n 40 p ˆp ± z α/2 ˆp(1 ˆp) n n matched pairs, difference N(µ X µ Y, σ 2 ) 2 independent samples, means µ X, µ Y variances unknown, n > 30 2 independent samples, means µ X, µ Y variances unknown 2 independent samples, Binomial(n X, p X ),Binomial(n Y, p Y ) s d µ X µ Y d ± t n 1,α/2 n µ X µ Y x y ± z s2 x α/2 + s2 y n x n y µ X µ Y x y ± t s2 n x,α/2 + s2 y n x n y p X p Y ˆp x ˆp y ± z ˆp x(1 ˆp x ) α/2 + ˆp y(1 ˆp y ) n x n y Notes for the table: 1. All quantities in the C.I. column are either known constants or observed sample quantities. 2. P(Z > z α/2 ) = α/2 for Z N(0, 1). 3. P(T > t n 1,α/2 ) = α/2 for T Student s t with (n 1) d.f. 4. s, s x, s y, s d are observed sample standard deviations corresponding to x i, x i, y i, d i = x i y i respectively. 5. ˆp, ˆp x, ˆp y are the observed sample proportions corresponding to x i, x i, y i respectively. 6. d is the sample mean corresponding to d i = x i y i. 7. n, n x, n y are the total, x and y sample sizes. 13

8. n = ( ) s 2 2 /[ x + s2 y (s 2 x /n x ) 2 n x n y n x 1 + (s2 y /n y) 2 ] n y 1 Also note: The text gives a different formula for comparing two means with small sample sizes. It requires that the two variances be the same, which may not be the case. Unless you re sure the variances are equal, it s safer to use the approximation given here (the formula with a n ). If you are sure that the variances are equal, using the book s formula is ok. Estimating the sample size: If you want an 100(1 α)% interval of ±L (i.e. length 2L), choose n so situation n normal, σ known n = z2 α/2 σ2 L 2 Bernoulli, worst case n = 0.25z2 α/2 L 2 Hypothesis Testing Null Hypothesis (H 0 ): The hypothesis we assume to be true unless there is sufficient evidence to the contrary. Alternative Hypothesis (H 1 ): The hypothesis we test the null against. If there is evidence that H 0 is false, we accept H 1. Type I Error: Rejecting a true H 0. Type II Error: Not rejecting a false H 0. Significance Level: P(rejectH 0 H 0 true) = P(type I error). Power: The probability of rejecting a null hypothesis that is false. Note that this depends on the true value of the parameter. P-value: The smallest significance level at which a null hypothesis can be rejected. This is a measure of how likely the data is, if H 0 is true. Notes for the following table (In addition to the comments for CI s): 1. The first three tests are examples of one (> and < alternatives) and two sided ( alternative) tests. The remaining tests all have a > alternative, but are easily adaptable to either of the other two alternatives. 14

2. In the last test, ˆp 0 = n x ˆp x + n y ˆp y n x + n y 3. In all the tests, we are comparing the unknown parameter (such as µ, p or µ X µ Y ) to constants (µ 0, p 0, and D 0 ). Hypothesis tests with significance level α Data H 0 H 1 reject H 0 if N(µ, σ 2 ), σ 2 known x µ 0 µ = µ 0 or µ > µ 0 σ/ n > z α µ µ 0 same x µ 0 µ = µ 0 or µ < µ 0 σ/ n < z α µ µ 0 same µ = µ 0 µ µ 0 x µ 0 σ/ n mean µ, σ 2 unknown n > 30 x µ 0 µ = µ 0 or µ > µ 0 s/ n > z α µ µ 0 not in ( z α/2, z α/2 ) N(µ, σ 2 ), σ 2 unknown n < 30 Binomial(n, p) n(1 p) > 9 or roughly n > 40 n matched pairs, difference N(µ X µ Y, σ 2 ) x µ 0 µ = µ 0 or µ > µ 0 s/ n > t n 1,α µ µ 0 ˆp p 0 p = p 0 or p > p 0 p p 0 p 0 (1 p 0 )/n > z α µ X µ Y = D 0 or µ X µ Y D 0 (Continued on next page) µ X µ Y > D 0 d D 0 s d / n > t n 1,α 15

Hypothesis tests with significance level α Data H 0 H 1 reject H 0 if 2 independent samples, means µ X, µ Y variances unknown, n x > 30, n y > 30 2 independent normal samples, means µ X, µ Y variances unknown 2 independent samples, Binomial(n x, p x ) and Binomial(n y, p y ) µ X µ Y = D 0 or µ X µ Y D 0 µ X µ Y = D 0 or µ X µ Y D 0 p x p y = 0 or p x p y 0 x y D 0 µ X µ Y > D 0 > z α s2 x + s2 y n x n y x y µ X µ Y > D 0 > t n,α s2 x + s2 y n x n y p x p y > 0 ˆp x ˆp y ( ) > z α ˆp nx + n y 0 (1 ˆp 0 ) n x n y 16