# Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Size: px
Start display at page:

## Transcription

1 Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume a population consists of N values. Sample: An observed subset of population values. We assume a sample consists of n values. Graphic Summaries Dotplot - A graphic to display the original data. Draw a number line, and put a dot on it for each observation. For identical or really close observations, stack the dots... :.:.: :.. :.:..:: :: :..... :. ::.: :.:.:::::::::::..::::::.... : Histogram - A graphic that displays the shape of numeric data by grouping it in intervals. 1. Choose evenly spaced categories 2. Count the number of observations in each group 3. Draw a bar graph with the height of each bar equal to the number of observations in the corresponding interval. Stem and leaf plot Similar to a dotplot. Data are grouped according to their leading digits, and the last digit is used as a plotting symbol (like a dot in the dotplot). The left digits are a cumulative count on each side of the middle. The bracketed number is how many observations are in the middle. The middle column of digits are the first digits of the number, and the bars are the last digit (18)

2 Numeric Summaries of Data Suppose that we have n observations, labeled x 1, x 2,..., x n. Then n x i means x i = x 1 + x 2 + x x n Some other relations are: f(x i ) = f(x 1 ) + f(x 2 ) + f(x 3 ) f(x n ), for any function f, cx i = c x i for any constant c, (ax i + by i ) = a x i + b y i, for any constants a and b. Measures of Location - numeric summaries of the center of a distribution. Mean (or average) µ = 1 N N x i (population) x = 1 x i n (sample) Median - the middle observation The middle observation of the sorted data if n is odd, otherwise the average of the two middle values. Measures of Dispersion - numeric summaries of the spread or variation of a distribution. Variance and Standard Deviation σ 2 = 1 N N (x i µ) 2 (population) s 2 = 1 (x i x) 2 n 1 Population and sample standard deviations (σ and s) are just the square roots of these quantities. Interpreting σ Chebyshev s rule: for any population at least 75% of the observations lie within 2σ of µ, at least 89% of the observations lie within 3σ of µ, at least 100(1 1/m 2 )% of the observations lie within m σ of the mean µ. 3

3 IQR (Interquartile Range): The distance between the (n + 1)/4th and 3 (n + 1)/4th observations in an ordered dataset. These two values are called the first and third quartiles. Measure of symmetry : Skewness skewness = (x i x) 3 /n s 3 Negative means left skewed, 0 means symmetric, positive means right skewed. Measure of heavy tails : Kurtosis kurtosis = (x i x) 4 /n s 4 3 A normal distribution has a kurtosis of 3, so if we subtract 3, interpretations are relative to that. Positive values mean a sharper peak, and negative values mean a flatter top than a normal distribution. Box-and-whisker plot: A graphic that summarizes the data using the median and quartiles, and displays outliers. Good for comparing several groups of data o first quartile median third quartile largest value below Q IQR outlier 4

4 Probability (Ch. 5) Definitions and Set Theory: Random experiment: A process leading to at least two possible outcomes with uncertainty as to which will occur. Basic outcome: A possible outcome of the random experiment. Sample space: The set of all basic outcomes. Event: A set of basic outcomes from the sample space. An event is said to occur if any one of its constituent basic outcomes occurs. Combining events: let A and B be two events. Technical Symbol Pronounced Meaning Union A B A or B A occurs or B occurs or both occur Intersection A B A and B A occurs and B occurs Complement A not A A does not occur Venn Diagrams: A B A B A A B A C B shaded = A B shaded = A B A shaded A, B mutually exclusive A,B,C collectively exhaustive Probability Postulates: 1. If A is any event in the sample space S, 0 P(A) 1 2. Let A be an event in S and let O i denote the basic outcomes. Then P(A) = A P(O i ), where the notation implies that the summation extends over all the basic outcomes in A. 3. P(S) = 1 Probability rules for combining events: P(A B) = P(A) + P(B) P(A B) P(A B) = P(A) + P(B) if A and B mutually exclusive. P(A) = 1 P(A) Conditional Probability: P(A B) P(A B) = provided P(B) > 0. P(B) P(A B) = P(A B)P(B) = P(B A)P(A) 5

5 Independence: Two events are Statistically Independent if and only if P(A B) = P(A)P(B) or equivalently P(A B) = P(A) and P(B A) = P(B). General case: events E 1, E 2,..., E k are independent if and only if Bivariate Probability P(E 1 E 2... E k ) = P(E 1 )P(E 2 )... P(E k ) Probabilities of outcomes for bivariate events: B 1 B 2 B 3 A 1 P(A 1 B 1 ) P(A 1 B 2 ) P(A 1 B 3 ) P(A 1 ) A 2 P(A 2 B 1 ) P(A 2 B 2 ) P(A 2 B 3 ) P(A 2 ) P(B 1 ) P(B 2 ) P(B 3 ) P(A 1 B 2 ) is a probability of A 1 and B 2 occurring. P(A 1 ) = P(A 1 B 1 ) + P(A 1 B 2 ) + P(A 1 B 3 ) is the marginal probability that A 1 occurs. If we think of A 1, A 2 as a group of attributes A, and B 1, B 2, B 3 as a group of attributes B, then A and B are independent only if every one of {A 1, A 2 } are independent of every one of {B 1, B 2, B 3 }. Discrete Random Variables (Ch. 6-7) Definitions Random Variable: (r.v.) A variable that takes on numerical values determined by the outcome of a random experiment. Discrete Random Variable: A r.v. that can take on no more than a countable number of values. Continuous Random Variable: A r.v. that can take any value in an interval. Notation: An upper case letter (e.g. X) will represent a r.v.; a lower case letter (e.g. x) will represent one of its possible values. 6

6 Discrete Probability Distributions The probability function, P X (x), of a discrete r.v. X gives the probability that X takes the value x: P X (x) = P(X = x) where the function is evaluated at all possible values of x. Properties: 1. P X (x) 0 for any value x 2. x P X (x) = 1 Cumulative probability function, F X (x 0 ) of a r.v. X: Properties: 1. 0 F X (x) 1 for any x 2. If a < b, then F X (a) F X (b). F X (x 0 ) = P (X x 0 ) = x x 0 P X (x). Expectation of Discrete Random Variables Expected value of a discrete r.v.: E(X) = µ X = x xp X (x). For any function g(x), Variance of a discrete r.v.: E(g(X)) = x g(x)p X (x). Var(X) = σ 2 X = E((X µ X) 2 ) = x (x µ X ) 2 P X (x) = E(X 2 ) µ 2 X. The standard deviation of X is σ X. Plug-In Rules: let X be a r.v., and a and b constants. Then E(a + bx) = a + be(x) Var(a + bx) = b 2 Var(X). This only works for linear functions. 7

7 Jointly Distributed Discrete Random Variables Joint Probability Function: Suppose X and Y are r.v. s. Their joint probability function gives the probability that simultaneously X = x and Y = y: Properties: 1. P X,Y (x, y) 0 for any pair (x, y) 2. x y P X,Y (x, y) = 1. Marginal probability function: P X,Y (x, y) = P({X = x} {Y = y}) P X (x) = y P X,Y (x, y). Conditional probability function: P Y X (y x) = P X,Y (x, y) P X (x) Independence: X and Y are independent if and only if P X,Y (x, y) = P X (x)p Y (y) for all possible (x, y) pairs Expectation: Let X and Y be r.v. s, and g(x, Y ) any function. Then E(g(X, Y )) = x g(x, y)p X,Y (x, y). y Conditional Expectation: Let X and Y be r.v. s, and suppose we know the conditional distribution of X for Y = y, labeled P X Y (x y). Then E(X Y = y) = x xp X Y (x y). Covariance: If E(X) = µ X and E(Y ) = µ Y, Cov(X, Y ) = E[(X µ X )(Y µ Y )] = x (x µ X )(y µ Y )P X,Y (x, y) y [ ] = E(XY ) µ X µ Y = xyp X,Y (x, y) µ X µ Y x y If two r.v. s are independent, their covariance is zero. The converse is not necessarily true. 8

8 Correlation: ρ XY = Cor(X, Y ) = 1 ρ XY 1 always. ρ XY = ±1 if and only if Y = a + bx Cov(X, Y ) Var(X)Var(Y ) (a,b constants). Plug-in rules: Let X and Y be r.v. s, and a, b constants. Then E(aX + by ) = ae(x) + be(y ) Var(aX + by ) = a 2 Var(X) + b 2 Var(Y ) + 2abCov(X, Y ) Binomial distribution Bernoulli Trials: A sequence of repeated experiments are Bernoulli trials if: 1. The result of each trial is either a success or failure. 2. The probability p of a success is the same for all trials. 3. The trials are independent. If X is the number of successes in n Bernoulli trials, X is a Binomial Random Variable. It has probability function: ( ) n P X (x) = p x (1 p) n x x Where ( ) n x counts the number of ways of getting x successes in n trials. The formula for ( n x) is ( n x ) = n! x!(n x)! where n! = n (n 1) (n 2) Mean and Variance: E(X) = np, Var(X) = np(1 p). Continuous Random Variables (Ch. 7) Probability Distributions Probability density function: A function f X (x) of the continuous r.v. X with the following properties: 9

9 1. f X (x) 0 for all values of x. 2. P(a X b) = the area under f X (x) between a and b, if a < b. 3. The total area under the curve is 1 4. The area under the curve to the left of any value x is F X (x), the probability that X does not exceed x. Cumulative distribution function: Same as before. P(a X b) = F X (b) F X (a) (provided a < b). Expectations, Variances, Covariances, etc. Same rules as for discrete r.v. s. The summation ( ) is replaced by the integral ( ), which is not necessary for this course. Normal Distribution Probability Density function: f X (x) = 1 2πσ 2 e (x µ)2 /2σ 2 for constants µ and σ such that < µ < and 0 < σ <. Mean and Variance: E(X) = µ Var(X) = σ 2 Notation: X N(µ, σ 2 ) means X is normal with mean µ and variance σ 2. If Z N(0, 1) we say it has a standard normal distribution. If X N(µ, σ 2 ) then Z = (X µ)/σ N(0, 1). Thus P(a < X < b) = P ( a µ σ < Z < b µ ) σ ( ) b µ = F Z σ ( ) a µ F Z σ Central Limit Theorem Let X 1, X 2,..., X n be n independent r.v. s, each with identical distributions, mean µ and variance σ 2. As n becomes large, X N(µ, σ 2 /n) X i = nx N(nµ, nσ 2 ) 10

10 Sampling & Sampling distributions Simple random sample: (or random sample) A method of randomly drawing n objects wihich are Independent and Identically Distributed (I.I.D.). Statistic: A function of the sample information. Sampling distribution of a statistic: The probability distribution of the values a statistic can take, over all possible samples of a fixed size n. Sampling distribution of the mean : Suppose X 1,... X n are a random sample from some population with mean µ X and variance σx 2. The sample mean is It has the following properties: X = X i. 1. E(X) = µ X 2. It has standard deviation σ X = σ X / n. 3. If the population distribution is normal, X N(µ X, σ 2 X ) = N(µ X, σ 2 X/n). 4. If the population distribution is not normal, but n is large, then (3) is roughly true. Sampling distribution of a proportion : Suppose the r.v. X is the number of successes in a binomial sample of n trials, whose probability of success is p. The sample proportion is ˆp = X/n It has the following properties: 1. E(ˆp) = p 2. It has standard deviation σˆp = p(1 p). n 3. If n is large (np(1 p) > 9 or roughly n 40), ˆp N(p, σ 2ˆp ) = N(p, p(1 p)/n). 11

11 Point Estimation Estimator: A random variable that depends on the sample information and whose realizations provide approximations to an unknown population parameter. Estimate: A specific realization of an estimator. Point estimator: An estimator that is a single number. Point estimate: A specific realization of a point estimator. Bias: Let ˆθ be an estimate of the parameter θ. The bias in ˆθ is Bias(ˆθ) = E(ˆθ) θ. If the bias is 0, ˆθ is an unbiased estimator. Efficiency: Let ˆθ 1 and ˆθ 2 be two estimators of θ, based on the same sample. Then θ 1 is more efficient than ˆθ 2 if Var(ˆθ 1 ) < Var(ˆθ 2 ). Interval Estimation Confidence Interval: Let θ be an unknown parameter. Suppose that from sample information, we can find random variables A and B such that P(A < θ < B) = 1 α. If the observed values are a and b, then (a, b) is a 100(1 α)% confidence interval for θ. The quantity (1 α) is called the probability content of the interval. Student s t distribution: Given a random sample of n observations with mean X and standard deviation s, from a normal population with mean µ, the random variable T = X µ s/ n follows the Student s t distribution with (n 1) degrees of freedom. For n > 30, the t distribution is quite close to a N(0, 1) distribution. 12

12 Data Parameter 100(1 α)% C.I. N(µ, σ 2 ), σ 2 known µ x ± z α/2 σ n mean µ, σ 2 unknown, n > 30 µ x ± z α/2 s n N(µ, σ 2 ), σ 2 unknown µ x ± t n 1,α/2 s n Binomial(n, p), np(1 p) > 9, or roughly n 40 p ˆp ± z α/2 ˆp(1 ˆp) n n matched pairs, difference N(µ X µ Y, σ 2 ) 2 independent samples, means µ X, µ Y variances unknown, n > 30 2 independent samples, means µ X, µ Y variances unknown 2 independent samples, Binomial(n X, p X ),Binomial(n Y, p Y ) s d µ X µ Y d ± t n 1,α/2 n µ X µ Y x y ± z s2 x α/2 + s2 y n x n y µ X µ Y x y ± t s2 n x,α/2 + s2 y n x n y p X p Y ˆp x ˆp y ± z ˆp x(1 ˆp x ) α/2 + ˆp y(1 ˆp y ) n x n y Notes for the table: 1. All quantities in the C.I. column are either known constants or observed sample quantities. 2. P(Z > z α/2 ) = α/2 for Z N(0, 1). 3. P(T > t n 1,α/2 ) = α/2 for T Student s t with (n 1) d.f. 4. s, s x, s y, s d are observed sample standard deviations corresponding to x i, x i, y i, d i = x i y i respectively. 5. ˆp, ˆp x, ˆp y are the observed sample proportions corresponding to x i, x i, y i respectively. 6. d is the sample mean corresponding to d i = x i y i. 7. n, n x, n y are the total, x and y sample sizes. 13

13 8. n = ( ) s 2 2 /[ x + s2 y (s 2 x /n x ) 2 n x n y n x 1 + (s2 y /n y) 2 ] n y 1 Also note: The text gives a different formula for comparing two means with small sample sizes. It requires that the two variances be the same, which may not be the case. Unless you re sure the variances are equal, it s safer to use the approximation given here (the formula with a n ). If you are sure that the variances are equal, using the book s formula is ok. Estimating the sample size: If you want an 100(1 α)% interval of ±L (i.e. length 2L), choose n so situation n normal, σ known n = z2 α/2 σ2 L 2 Bernoulli, worst case n = 0.25z2 α/2 L 2 Hypothesis Testing Null Hypothesis (H 0 ): The hypothesis we assume to be true unless there is sufficient evidence to the contrary. Alternative Hypothesis (H 1 ): The hypothesis we test the null against. If there is evidence that H 0 is false, we accept H 1. Type I Error: Rejecting a true H 0. Type II Error: Not rejecting a false H 0. Significance Level: P(rejectH 0 H 0 true) = P(type I error). Power: The probability of rejecting a null hypothesis that is false. Note that this depends on the true value of the parameter. P-value: The smallest significance level at which a null hypothesis can be rejected. This is a measure of how likely the data is, if H 0 is true. Notes for the following table (In addition to the comments for CI s): 1. The first three tests are examples of one (> and < alternatives) and two sided ( alternative) tests. The remaining tests all have a > alternative, but are easily adaptable to either of the other two alternatives. 14

14 2. In the last test, ˆp 0 = n x ˆp x + n y ˆp y n x + n y 3. In all the tests, we are comparing the unknown parameter (such as µ, p or µ X µ Y ) to constants (µ 0, p 0, and D 0 ). Hypothesis tests with significance level α Data H 0 H 1 reject H 0 if N(µ, σ 2 ), σ 2 known x µ 0 µ = µ 0 or µ > µ 0 σ/ n > z α µ µ 0 same x µ 0 µ = µ 0 or µ < µ 0 σ/ n < z α µ µ 0 same µ = µ 0 µ µ 0 x µ 0 σ/ n mean µ, σ 2 unknown n > 30 x µ 0 µ = µ 0 or µ > µ 0 s/ n > z α µ µ 0 not in ( z α/2, z α/2 ) N(µ, σ 2 ), σ 2 unknown n < 30 Binomial(n, p) n(1 p) > 9 or roughly n > 40 n matched pairs, difference N(µ X µ Y, σ 2 ) x µ 0 µ = µ 0 or µ > µ 0 s/ n > t n 1,α µ µ 0 ˆp p 0 p = p 0 or p > p 0 p p 0 p 0 (1 p 0 )/n > z α µ X µ Y = D 0 or µ X µ Y D 0 (Continued on next page) µ X µ Y > D 0 d D 0 s d / n > t n 1,α 15

15 Hypothesis tests with significance level α Data H 0 H 1 reject H 0 if 2 independent samples, means µ X, µ Y variances unknown, n x > 30, n y > 30 2 independent normal samples, means µ X, µ Y variances unknown 2 independent samples, Binomial(n x, p x ) and Binomial(n y, p y ) µ X µ Y = D 0 or µ X µ Y D 0 µ X µ Y = D 0 or µ X µ Y D 0 p x p y = 0 or p x p y 0 x y D 0 µ X µ Y > D 0 > z α s2 x + s2 y n x n y x y µ X µ Y > D 0 > t n,α s2 x + s2 y n x n y p x p y > 0 ˆp x ˆp y ( ) > z α ˆp nx + n y 0 (1 ˆp 0 ) n x n y 16

### 4. Continuous Random Variables, the Pareto and Normal Distributions

4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random

### Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) B Bar graph a diagram representing the frequency distribution for nominal or discrete data. It consists of a sequence

### Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.1-1.6) Objectives

### Exploratory Data Analysis

Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction

### Descriptive Statistics

Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web

### Dongfeng Li. Autumn 2010

Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis

### For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )

Probability Review 15.075 Cynthia Rudin A probability space, defined by Kolmogorov (1903-1987) consists of: A set of outcomes S, e.g., for the roll of a die, S = {1, 2, 3, 4, 5, 6}, 1 1 2 1 6 for the roll

### Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY 1. Introduction Besides arriving at an appropriate expression of an average or consensus value for observations of a population, it is important to

### Random variables P(X = 3) = P(X = 3) = 1 8, P(X = 1) = P(X = 1) = 3 8.

Random variables Remark on Notations 1. When X is a number chosen uniformly from a data set, What I call P(X = k) is called Freq[k, X] in the courseware. 2. When X is a random variable, what I call F ()

### Covariance and Correlation

Covariance and Correlation ( c Robert J. Serfling Not for reproduction or distribution) We have seen how to summarize a data-based relative frequency distribution by measures of location and spread, such

### Chapter 4. Probability and Probability Distributions

Chapter 4. robability and robability Distributions Importance of Knowing robability To know whether a sample is not identical to the population from which it was selected, it is necessary to assess the

### Mathematics. Probability and Statistics Curriculum Guide. Revised 2010

Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction

### Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Descriptive Statistics Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Statistics as a Tool for LIS Research Importance of statistics in research

### THE BINOMIAL DISTRIBUTION & PROBABILITY

REVISION SHEET STATISTICS 1 (MEI) THE BINOMIAL DISTRIBUTION & PROBABILITY The main ideas in this chapter are Probabilities based on selecting or arranging objects Probabilities based on the binomial distribution

### ST 371 (IV): Discrete Random Variables

ST 371 (IV): Discrete Random Variables 1 Random Variables A random variable (rv) is a function that is defined on the sample space of the experiment and that assigns a numerical variable to each possible

### MBA 611 STATISTICS AND QUANTITATIVE METHODS

MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain

### Chapter 4 Lecture Notes

Chapter 4 Lecture Notes Random Variables October 27, 2015 1 Section 4.1 Random Variables A random variable is typically a real-valued function defined on the sample space of some experiment. For instance,

### An Introduction to Basic Statistics and Probability

An Introduction to Basic Statistics and Probability Shenek Heyward NCSU An Introduction to Basic Statistics and Probability p. 1/4 Outline Basic probability concepts Conditional probability Discrete Random

### Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this

### Week 3&4: Z tables and the Sampling Distribution of X

Week 3&4: Z tables and the Sampling Distribution of X 2 / 36 The Standard Normal Distribution, or Z Distribution, is the distribution of a random variable, Z N(0, 1 2 ). The distribution of any other normal

### Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Types of Variables Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Quantitative (numerical)variables: take numerical values for which arithmetic operations make sense (addition/averaging)

### Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015.

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment -3, Probability and Statistics, March 05. Due:-March 5, 05.. Show that the function 0 for x < x+ F (x) = 4 for x < for x

### 5. Continuous Random Variables

5. Continuous Random Variables Continuous random variables can take any value in an interval. They are used to model physical characteristics such as time, length, position, etc. Examples (i) Let X be

### Algebra I Vocabulary Cards

Algebra I Vocabulary Cards Table of Contents Expressions and Operations Natural Numbers Whole Numbers Integers Rational Numbers Irrational Numbers Real Numbers Absolute Value Order of Operations Expression

### 5.1 Identifying the Target Parameter

University of California, Davis Department of Statistics Summer Session II Statistics 13 August 20, 2012 Date of latest update: August 20 Lecture 5: Estimation with Confidence intervals 5.1 Identifying

### AP STATISTICS REVIEW (YMS Chapters 1-8)

AP STATISTICS REVIEW (YMS Chapters 1-8) Exploring Data (Chapter 1) Categorical Data nominal scale, names e.g. male/female or eye color or breeds of dogs Quantitative Data rational scale (can +,,, with

### 3.4 Statistical inference for 2 populations based on two samples

3.4 Statistical inference for 2 populations based on two samples Tests for a difference between two population means The first sample will be denoted as X 1, X 2,..., X m. The second sample will be denoted

### MULTIVARIATE PROBABILITY DISTRIBUTIONS

MULTIVARIATE PROBABILITY DISTRIBUTIONS. PRELIMINARIES.. Example. Consider an experiment that consists of tossing a die and a coin at the same time. We can consider a number of random variables defined

### Descriptive Statistics

Descriptive Statistics Suppose following data have been collected (heights of 99 five-year-old boys) 117.9 11.2 112.9 115.9 18. 14.6 17.1 117.9 111.8 16.3 111. 1.4 112.1 19.2 11. 15.4 99.4 11.1 13.3 16.9

### 6.4 Normal Distribution

Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

### STAT 830 Convergence in Distribution

STAT 830 Convergence in Distribution Richard Lockhart Simon Fraser University STAT 830 Fall 2011 Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 1 / 31

### Exercise 1.12 (Pg. 22-23)

Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

### Chapter 3 RANDOM VARIATE GENERATION

Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

### Exploratory data analysis (Chapter 2) Fall 2011

Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,

### Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),

### Practice problems for Homework 11 - Point Estimation

Practice problems for Homework 11 - Point Estimation 1. (10 marks) Suppose we want to select a random sample of size 5 from the current CS 3341 students. Which of the following strategies is the best:

### Chapter 5. Random variables

Random variables random variable numerical variable whose value is the outcome of some probabilistic experiment; we use uppercase letters, like X, to denote such a variable and lowercase letters, like

### Normal distribution. ) 2 /2σ. 2π σ

Normal distribution The normal distribution is the most widely known and used of all distributions. Because the normal distribution approximates many natural phenomena so well, it has developed into a

### Estimation and Confidence Intervals

Estimation and Confidence Intervals Fall 2001 Professor Paul Glasserman B6014: Managerial Statistics 403 Uris Hall Properties of Point Estimates 1 We have already encountered two point estimators: th e

### E3: PROBABILITY AND STATISTICS lecture notes

E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................

### Opgaven Onderzoeksmethoden, Onderdeel Statistiek

Opgaven Onderzoeksmethoden, Onderdeel Statistiek 1. What is the measurement scale of the following variables? a Shoe size b Religion c Car brand d Score in a tennis game e Number of work hours per week

### Notes on Continuous Random Variables

Notes on Continuous Random Variables Continuous random variables are random quantities that are measured on a continuous scale. They can usually take on any value over some interval, which distinguishes

### Chapter 4 Statistical Inference in Quality Control and Improvement. Statistical Quality Control (D. C. Montgomery)

Chapter 4 Statistical Inference in Quality Control and Improvement 許 湘 伶 Statistical Quality Control (D. C. Montgomery) Sampling distribution I a random sample of size n: if it is selected so that the

### A and B This represents the probability that both events A and B occur. This can be calculated using the multiplication rules of probability.

Glossary Brase: Understandable Statistics, 10e A B This is the notation used to represent the conditional probability of A given B. A and B This represents the probability that both events A and B occur.

### Correlation in Random Variables

Correlation in Random Variables Lecture 11 Spring 2002 Correlation in Random Variables Suppose that an experiment produces two random variables, X and Y. What can we say about the relationship between

### AP Statistics Solutions to Packet 2

AP Statistics Solutions to Packet 2 The Normal Distributions Density Curves and the Normal Distribution Standard Normal Calculations HW #9 1, 2, 4, 6-8 2.1 DENSITY CURVES (a) Sketch a density curve that

### The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

Probability Theory Probability Spaces and Events Consider a random experiment with several possible outcomes. For example, we might roll a pair of dice, flip a coin three times, or choose a random real

### Simple Regression Theory II 2010 Samuel L. Baker

SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

### LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

### Lecture Notes Module 1

Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific

### This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

Algebra I Overview View unit yearlong overview here Many of the concepts presented in Algebra I are progressions of concepts that were introduced in grades 6 through 8. The content presented in this course

### Geostatistics Exploratory Analysis

Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt

### ELEMENTARY STATISTICS

ELEMENTARY STATISTICS Study Guide Dr. Shinemin Lin Table of Contents 1. Introduction to Statistics. Descriptive Statistics 3. Probabilities and Standard Normal Distribution 4. Estimates and Sample Sizes

### 1.1 Introduction, and Review of Probability Theory... 3. 1.1.1 Random Variable, Range, Types of Random Variables... 3. 1.1.2 CDF, PDF, Quantiles...

MATH4427 Notebook 1 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 1 MATH4427 Notebook 1 3 1.1 Introduction, and Review of Probability

### Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Introduction to Statistics for Psychology and Quantitative Methods for Human Sciences Jonathan Marchini Course Information There is website devoted to the course at http://www.stats.ox.ac.uk/ marchini/phs.html

### DATA INTERPRETATION AND STATISTICS

PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

### Final Exam Practice Problem Answers

Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

### Lecture 3: Continuous distributions, expected value & mean, variance, the normal distribution

Lecture 3: Continuous distributions, expected value & mean, variance, the normal distribution 8 October 2007 In this lecture we ll learn the following: 1. how continuous probability distributions differ

### CHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is.

Some Continuous Probability Distributions CHAPTER 6: Continuous Uniform Distribution: 6. Definition: The density function of the continuous random variable X on the interval [A, B] is B A A x B f(x; A,

### Chapter 7 Section 1 Homework Set A

Chapter 7 Section 1 Homework Set A 7.15 Finding the critical value t *. What critical value t * from Table D (use software, go to the web and type t distribution applet) should be used to calculate the

### CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

CA200 Quantitative Analysis for Business Decisions File name: CA200_Section_04A_StatisticsIntroduction Table of Contents 4. Introduction to Statistics... 1 4.1 Overview... 3 4.2 Discrete or continuous

### 1.5 Oneway Analysis of Variance

Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments

### The Normal Distribution. Alan T. Arnholt Department of Mathematical Sciences Appalachian State University

The Normal Distribution Alan T. Arnholt Department of Mathematical Sciences Appalachian State University arnholt@math.appstate.edu Spring 2006 R Notes 1 Copyright c 2006 Alan T. Arnholt 2 Continuous Random

### MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...

MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................

### 3: Summary Statistics

3: Summary Statistics Notation Let s start by introducing some notation. Consider the following small data set: 4 5 30 50 8 7 4 5 The symbol n represents the sample size (n = 0). The capital letter X denotes

### The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

CONDENSED LESSON 2.1 Box Plots In this lesson you will create and interpret box plots for sets of data use the interquartile range (IQR) to identify potential outliers and graph them on a modified box

### BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential

### Statistics 100A Homework 4 Solutions

Problem 1 For a discrete random variable X, Statistics 100A Homework 4 Solutions Ryan Rosario Note that all of the problems below as you to prove the statement. We are proving the properties of epectation

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### Joint Exam 1/P Sample Exam 1

Joint Exam 1/P Sample Exam 1 Take this practice exam under strict exam conditions: Set a timer for 3 hours; Do not stop the timer for restroom breaks; Do not look at your notes. If you believe a question

### Without data, all you are is just another person with an opinion.

OCR Statistics Module Revision Sheet The S exam is hour 30 minutes long. You are allowed a graphics calculator. Before you go into the exam make sureyou are fully aware of the contents of theformula booklet

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition Online Learning Centre Technology Step-by-Step - Excel Microsoft Excel is a spreadsheet software application

### Northumberland Knowledge

Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about

### Multivariate normal distribution and testing for means (see MKB Ch 3)

Multivariate normal distribution and testing for means (see MKB Ch 3) Where are we going? 2 One-sample t-test (univariate).................................................. 3 Two-sample t-test (univariate).................................................

### Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

Density Curve A density curve is the graph of a continuous probability distribution. It must satisfy the following properties: 1. The total area under the curve must equal 1. 2. Every point on the curve

### TEACHER NOTES MATH NSPIRED

Math Objectives Students will understand that normal distributions can be used to approximate binomial distributions whenever both np and n(1 p) are sufficiently large. Students will understand that when

### Variables. Exploratory Data Analysis

Exploratory Data Analysis Exploratory Data Analysis involves both graphical displays of data and numerical summaries of data. A common situation is for a data set to be represented as a matrix. There is

### Fairfield Public Schools

Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

### STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables Discrete vs. continuous random variables Examples of continuous distributions o Uniform o Exponential o Normal Recall: A random

### Random variables, probability distributions, binomial random variable

Week 4 lecture notes. WEEK 4 page 1 Random variables, probability distributions, binomial random variable Eample 1 : Consider the eperiment of flipping a fair coin three times. The number of tails that

### Lecture 6: Discrete & Continuous Probability and Random Variables

Lecture 6: Discrete & Continuous Probability and Random Variables D. Alex Hughes Math Camp September 17, 2015 D. Alex Hughes (Math Camp) Lecture 6: Discrete & Continuous Probability and Random September

### BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 seema@iasri.res.in Genomics A genome is an organism s

### HYPOTHESIS TESTING: POWER OF THE TEST

HYPOTHESIS TESTING: POWER OF THE TEST The first 6 steps of the 9-step test of hypothesis are called "the test". These steps are not dependent on the observed data values. When planning a research project,

### DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

### Definition: Suppose that two random variables, either continuous or discrete, X and Y have joint density

HW MATH 461/561 Lecture Notes 15 1 Definition: Suppose that two random variables, either continuous or discrete, X and Y have joint density and marginal densities f(x, y), (x, y) Λ X,Y f X (x), x Λ X,

### Lesson 4 Measures of Central Tendency

Outline Measures of a distribution s shape -modality and skewness -the normal distribution Measures of central tendency -mean, median, and mode Skewness and Central Tendency Lesson 4 Measures of Central

### Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

### Using R for Linear Regression

Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional

### Chapter 2: Systems of Linear Equations and Matrices:

At the end of the lesson, you should be able to: Chapter 2: Systems of Linear Equations and Matrices: 2.1: Solutions of Linear Systems by the Echelon Method Define linear systems, unique solution, inconsistent,

### Statistics 104: Section 6!

Page 1 Statistics 104: Section 6! TF: Deirdre (say: Dear-dra) Bloome Email: dbloome@fas.harvard.edu Section Times Thursday 2pm-3pm in SC 109, Thursday 5pm-6pm in SC 705 Office Hours: Thursday 6pm-7pm SC

### Lecture 8. Confidence intervals and the central limit theorem

Lecture 8. Confidence intervals and the central limit theorem Mathematical Statistics and Discrete Mathematics November 25th, 2015 1 / 15 Central limit theorem Let X 1, X 2,... X n be a random sample of

### Diagrams and Graphs of Statistical Data

Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in

### Introduction to Hypothesis Testing

I. Terms, Concepts. Introduction to Hypothesis Testing A. In general, we do not know the true value of population parameters - they must be estimated. However, we do have hypotheses about what the true

### Chapter 2. Hypothesis testing in one population

Chapter 2. Hypothesis testing in one population Contents Introduction, the null and alternative hypotheses Hypothesis testing process Type I and Type II errors, power Test statistic, level of significance

### Stat 5102 Notes: Nonparametric Tests and. confidence interval

Stat 510 Notes: Nonparametric Tests and Confidence Intervals Charles J. Geyer April 13, 003 This handout gives a brief introduction to nonparametrics, which is what you do when you don t believe the assumptions

### UNIT I: RANDOM VARIABLES PART- A -TWO MARKS

UNIT I: RANDOM VARIABLES PART- A -TWO MARKS 1. Given the probability density function of a continuous random variable X as follows f(x) = 6x (1-x) 0

### Martingale Ideas in Elementary Probability. Lecture course Higher Mathematics College, Independent University of Moscow Spring 1996

Martingale Ideas in Elementary Probability Lecture course Higher Mathematics College, Independent University of Moscow Spring 1996 William Faris University of Arizona Fulbright Lecturer, IUM, 1995 1996

### STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members