# Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume a population consists of N values. Sample: An observed subset of population values. We assume a sample consists of n values. Graphic Summaries Dotplot - A graphic to display the original data. Draw a number line, and put a dot on it for each observation. For identical or really close observations, stack the dots... :.:.: :.. :.:..:: :: :..... :. ::.: :.:.:::::::::::..::::::.... : Histogram - A graphic that displays the shape of numeric data by grouping it in intervals. 1. Choose evenly spaced categories 2. Count the number of observations in each group 3. Draw a bar graph with the height of each bar equal to the number of observations in the corresponding interval. Stem and leaf plot Similar to a dotplot. Data are grouped according to their leading digits, and the last digit is used as a plotting symbol (like a dot in the dotplot). The left digits are a cumulative count on each side of the middle. The bracketed number is how many observations are in the middle. The middle column of digits are the first digits of the number, and the bars are the last digit (18)

2 Numeric Summaries of Data Suppose that we have n observations, labeled x 1, x 2,..., x n. Then n x i means x i = x 1 + x 2 + x x n Some other relations are: f(x i ) = f(x 1 ) + f(x 2 ) + f(x 3 ) f(x n ), for any function f, cx i = c x i for any constant c, (ax i + by i ) = a x i + b y i, for any constants a and b. Measures of Location - numeric summaries of the center of a distribution. Mean (or average) µ = 1 N N x i (population) x = 1 x i n (sample) Median - the middle observation The middle observation of the sorted data if n is odd, otherwise the average of the two middle values. Measures of Dispersion - numeric summaries of the spread or variation of a distribution. Variance and Standard Deviation σ 2 = 1 N N (x i µ) 2 (population) s 2 = 1 (x i x) 2 n 1 Population and sample standard deviations (σ and s) are just the square roots of these quantities. Interpreting σ Chebyshev s rule: for any population at least 75% of the observations lie within 2σ of µ, at least 89% of the observations lie within 3σ of µ, at least 100(1 1/m 2 )% of the observations lie within m σ of the mean µ. 3

3 IQR (Interquartile Range): The distance between the (n + 1)/4th and 3 (n + 1)/4th observations in an ordered dataset. These two values are called the first and third quartiles. Measure of symmetry : Skewness skewness = (x i x) 3 /n s 3 Negative means left skewed, 0 means symmetric, positive means right skewed. Measure of heavy tails : Kurtosis kurtosis = (x i x) 4 /n s 4 3 A normal distribution has a kurtosis of 3, so if we subtract 3, interpretations are relative to that. Positive values mean a sharper peak, and negative values mean a flatter top than a normal distribution. Box-and-whisker plot: A graphic that summarizes the data using the median and quartiles, and displays outliers. Good for comparing several groups of data o first quartile median third quartile largest value below Q IQR outlier 4

4 Probability (Ch. 5) Definitions and Set Theory: Random experiment: A process leading to at least two possible outcomes with uncertainty as to which will occur. Basic outcome: A possible outcome of the random experiment. Sample space: The set of all basic outcomes. Event: A set of basic outcomes from the sample space. An event is said to occur if any one of its constituent basic outcomes occurs. Combining events: let A and B be two events. Technical Symbol Pronounced Meaning Union A B A or B A occurs or B occurs or both occur Intersection A B A and B A occurs and B occurs Complement A not A A does not occur Venn Diagrams: A B A B A A B A C B shaded = A B shaded = A B A shaded A, B mutually exclusive A,B,C collectively exhaustive Probability Postulates: 1. If A is any event in the sample space S, 0 P(A) 1 2. Let A be an event in S and let O i denote the basic outcomes. Then P(A) = A P(O i ), where the notation implies that the summation extends over all the basic outcomes in A. 3. P(S) = 1 Probability rules for combining events: P(A B) = P(A) + P(B) P(A B) P(A B) = P(A) + P(B) if A and B mutually exclusive. P(A) = 1 P(A) Conditional Probability: P(A B) P(A B) = provided P(B) > 0. P(B) P(A B) = P(A B)P(B) = P(B A)P(A) 5

5 Independence: Two events are Statistically Independent if and only if P(A B) = P(A)P(B) or equivalently P(A B) = P(A) and P(B A) = P(B). General case: events E 1, E 2,..., E k are independent if and only if Bivariate Probability P(E 1 E 2... E k ) = P(E 1 )P(E 2 )... P(E k ) Probabilities of outcomes for bivariate events: B 1 B 2 B 3 A 1 P(A 1 B 1 ) P(A 1 B 2 ) P(A 1 B 3 ) P(A 1 ) A 2 P(A 2 B 1 ) P(A 2 B 2 ) P(A 2 B 3 ) P(A 2 ) P(B 1 ) P(B 2 ) P(B 3 ) P(A 1 B 2 ) is a probability of A 1 and B 2 occurring. P(A 1 ) = P(A 1 B 1 ) + P(A 1 B 2 ) + P(A 1 B 3 ) is the marginal probability that A 1 occurs. If we think of A 1, A 2 as a group of attributes A, and B 1, B 2, B 3 as a group of attributes B, then A and B are independent only if every one of {A 1, A 2 } are independent of every one of {B 1, B 2, B 3 }. Discrete Random Variables (Ch. 6-7) Definitions Random Variable: (r.v.) A variable that takes on numerical values determined by the outcome of a random experiment. Discrete Random Variable: A r.v. that can take on no more than a countable number of values. Continuous Random Variable: A r.v. that can take any value in an interval. Notation: An upper case letter (e.g. X) will represent a r.v.; a lower case letter (e.g. x) will represent one of its possible values. 6

6 Discrete Probability Distributions The probability function, P X (x), of a discrete r.v. X gives the probability that X takes the value x: P X (x) = P(X = x) where the function is evaluated at all possible values of x. Properties: 1. P X (x) 0 for any value x 2. x P X (x) = 1 Cumulative probability function, F X (x 0 ) of a r.v. X: Properties: 1. 0 F X (x) 1 for any x 2. If a < b, then F X (a) F X (b). F X (x 0 ) = P (X x 0 ) = x x 0 P X (x). Expectation of Discrete Random Variables Expected value of a discrete r.v.: E(X) = µ X = x xp X (x). For any function g(x), Variance of a discrete r.v.: E(g(X)) = x g(x)p X (x). Var(X) = σ 2 X = E((X µ X) 2 ) = x (x µ X ) 2 P X (x) = E(X 2 ) µ 2 X. The standard deviation of X is σ X. Plug-In Rules: let X be a r.v., and a and b constants. Then E(a + bx) = a + be(x) Var(a + bx) = b 2 Var(X). This only works for linear functions. 7

7 Jointly Distributed Discrete Random Variables Joint Probability Function: Suppose X and Y are r.v. s. Their joint probability function gives the probability that simultaneously X = x and Y = y: Properties: 1. P X,Y (x, y) 0 for any pair (x, y) 2. x y P X,Y (x, y) = 1. Marginal probability function: P X,Y (x, y) = P({X = x} {Y = y}) P X (x) = y P X,Y (x, y). Conditional probability function: P Y X (y x) = P X,Y (x, y) P X (x) Independence: X and Y are independent if and only if P X,Y (x, y) = P X (x)p Y (y) for all possible (x, y) pairs Expectation: Let X and Y be r.v. s, and g(x, Y ) any function. Then E(g(X, Y )) = x g(x, y)p X,Y (x, y). y Conditional Expectation: Let X and Y be r.v. s, and suppose we know the conditional distribution of X for Y = y, labeled P X Y (x y). Then E(X Y = y) = x xp X Y (x y). Covariance: If E(X) = µ X and E(Y ) = µ Y, Cov(X, Y ) = E[(X µ X )(Y µ Y )] = x (x µ X )(y µ Y )P X,Y (x, y) y [ ] = E(XY ) µ X µ Y = xyp X,Y (x, y) µ X µ Y x y If two r.v. s are independent, their covariance is zero. The converse is not necessarily true. 8

8 Correlation: ρ XY = Cor(X, Y ) = 1 ρ XY 1 always. ρ XY = ±1 if and only if Y = a + bx Cov(X, Y ) Var(X)Var(Y ) (a,b constants). Plug-in rules: Let X and Y be r.v. s, and a, b constants. Then E(aX + by ) = ae(x) + be(y ) Var(aX + by ) = a 2 Var(X) + b 2 Var(Y ) + 2abCov(X, Y ) Binomial distribution Bernoulli Trials: A sequence of repeated experiments are Bernoulli trials if: 1. The result of each trial is either a success or failure. 2. The probability p of a success is the same for all trials. 3. The trials are independent. If X is the number of successes in n Bernoulli trials, X is a Binomial Random Variable. It has probability function: ( ) n P X (x) = p x (1 p) n x x Where ( ) n x counts the number of ways of getting x successes in n trials. The formula for ( n x) is ( n x ) = n! x!(n x)! where n! = n (n 1) (n 2) Mean and Variance: E(X) = np, Var(X) = np(1 p). Continuous Random Variables (Ch. 7) Probability Distributions Probability density function: A function f X (x) of the continuous r.v. X with the following properties: 9

9 1. f X (x) 0 for all values of x. 2. P(a X b) = the area under f X (x) between a and b, if a < b. 3. The total area under the curve is 1 4. The area under the curve to the left of any value x is F X (x), the probability that X does not exceed x. Cumulative distribution function: Same as before. P(a X b) = F X (b) F X (a) (provided a < b). Expectations, Variances, Covariances, etc. Same rules as for discrete r.v. s. The summation ( ) is replaced by the integral ( ), which is not necessary for this course. Normal Distribution Probability Density function: f X (x) = 1 2πσ 2 e (x µ)2 /2σ 2 for constants µ and σ such that < µ < and 0 < σ <. Mean and Variance: E(X) = µ Var(X) = σ 2 Notation: X N(µ, σ 2 ) means X is normal with mean µ and variance σ 2. If Z N(0, 1) we say it has a standard normal distribution. If X N(µ, σ 2 ) then Z = (X µ)/σ N(0, 1). Thus P(a < X < b) = P ( a µ σ < Z < b µ ) σ ( ) b µ = F Z σ ( ) a µ F Z σ Central Limit Theorem Let X 1, X 2,..., X n be n independent r.v. s, each with identical distributions, mean µ and variance σ 2. As n becomes large, X N(µ, σ 2 /n) X i = nx N(nµ, nσ 2 ) 10

10 Sampling & Sampling distributions Simple random sample: (or random sample) A method of randomly drawing n objects wihich are Independent and Identically Distributed (I.I.D.). Statistic: A function of the sample information. Sampling distribution of a statistic: The probability distribution of the values a statistic can take, over all possible samples of a fixed size n. Sampling distribution of the mean : Suppose X 1,... X n are a random sample from some population with mean µ X and variance σx 2. The sample mean is It has the following properties: X = X i. 1. E(X) = µ X 2. It has standard deviation σ X = σ X / n. 3. If the population distribution is normal, X N(µ X, σ 2 X ) = N(µ X, σ 2 X/n). 4. If the population distribution is not normal, but n is large, then (3) is roughly true. Sampling distribution of a proportion : Suppose the r.v. X is the number of successes in a binomial sample of n trials, whose probability of success is p. The sample proportion is ˆp = X/n It has the following properties: 1. E(ˆp) = p 2. It has standard deviation σˆp = p(1 p). n 3. If n is large (np(1 p) > 9 or roughly n 40), ˆp N(p, σ 2ˆp ) = N(p, p(1 p)/n). 11

11 Point Estimation Estimator: A random variable that depends on the sample information and whose realizations provide approximations to an unknown population parameter. Estimate: A specific realization of an estimator. Point estimator: An estimator that is a single number. Point estimate: A specific realization of a point estimator. Bias: Let ˆθ be an estimate of the parameter θ. The bias in ˆθ is Bias(ˆθ) = E(ˆθ) θ. If the bias is 0, ˆθ is an unbiased estimator. Efficiency: Let ˆθ 1 and ˆθ 2 be two estimators of θ, based on the same sample. Then θ 1 is more efficient than ˆθ 2 if Var(ˆθ 1 ) < Var(ˆθ 2 ). Interval Estimation Confidence Interval: Let θ be an unknown parameter. Suppose that from sample information, we can find random variables A and B such that P(A < θ < B) = 1 α. If the observed values are a and b, then (a, b) is a 100(1 α)% confidence interval for θ. The quantity (1 α) is called the probability content of the interval. Student s t distribution: Given a random sample of n observations with mean X and standard deviation s, from a normal population with mean µ, the random variable T = X µ s/ n follows the Student s t distribution with (n 1) degrees of freedom. For n > 30, the t distribution is quite close to a N(0, 1) distribution. 12

12 Data Parameter 100(1 α)% C.I. N(µ, σ 2 ), σ 2 known µ x ± z α/2 σ n mean µ, σ 2 unknown, n > 30 µ x ± z α/2 s n N(µ, σ 2 ), σ 2 unknown µ x ± t n 1,α/2 s n Binomial(n, p), np(1 p) > 9, or roughly n 40 p ˆp ± z α/2 ˆp(1 ˆp) n n matched pairs, difference N(µ X µ Y, σ 2 ) 2 independent samples, means µ X, µ Y variances unknown, n > 30 2 independent samples, means µ X, µ Y variances unknown 2 independent samples, Binomial(n X, p X ),Binomial(n Y, p Y ) s d µ X µ Y d ± t n 1,α/2 n µ X µ Y x y ± z s2 x α/2 + s2 y n x n y µ X µ Y x y ± t s2 n x,α/2 + s2 y n x n y p X p Y ˆp x ˆp y ± z ˆp x(1 ˆp x ) α/2 + ˆp y(1 ˆp y ) n x n y Notes for the table: 1. All quantities in the C.I. column are either known constants or observed sample quantities. 2. P(Z > z α/2 ) = α/2 for Z N(0, 1). 3. P(T > t n 1,α/2 ) = α/2 for T Student s t with (n 1) d.f. 4. s, s x, s y, s d are observed sample standard deviations corresponding to x i, x i, y i, d i = x i y i respectively. 5. ˆp, ˆp x, ˆp y are the observed sample proportions corresponding to x i, x i, y i respectively. 6. d is the sample mean corresponding to d i = x i y i. 7. n, n x, n y are the total, x and y sample sizes. 13

13 8. n = ( ) s 2 2 /[ x + s2 y (s 2 x /n x ) 2 n x n y n x 1 + (s2 y /n y) 2 ] n y 1 Also note: The text gives a different formula for comparing two means with small sample sizes. It requires that the two variances be the same, which may not be the case. Unless you re sure the variances are equal, it s safer to use the approximation given here (the formula with a n ). If you are sure that the variances are equal, using the book s formula is ok. Estimating the sample size: If you want an 100(1 α)% interval of ±L (i.e. length 2L), choose n so situation n normal, σ known n = z2 α/2 σ2 L 2 Bernoulli, worst case n = 0.25z2 α/2 L 2 Hypothesis Testing Null Hypothesis (H 0 ): The hypothesis we assume to be true unless there is sufficient evidence to the contrary. Alternative Hypothesis (H 1 ): The hypothesis we test the null against. If there is evidence that H 0 is false, we accept H 1. Type I Error: Rejecting a true H 0. Type II Error: Not rejecting a false H 0. Significance Level: P(rejectH 0 H 0 true) = P(type I error). Power: The probability of rejecting a null hypothesis that is false. Note that this depends on the true value of the parameter. P-value: The smallest significance level at which a null hypothesis can be rejected. This is a measure of how likely the data is, if H 0 is true. Notes for the following table (In addition to the comments for CI s): 1. The first three tests are examples of one (> and < alternatives) and two sided ( alternative) tests. The remaining tests all have a > alternative, but are easily adaptable to either of the other two alternatives. 14

14 2. In the last test, ˆp 0 = n x ˆp x + n y ˆp y n x + n y 3. In all the tests, we are comparing the unknown parameter (such as µ, p or µ X µ Y ) to constants (µ 0, p 0, and D 0 ). Hypothesis tests with significance level α Data H 0 H 1 reject H 0 if N(µ, σ 2 ), σ 2 known x µ 0 µ = µ 0 or µ > µ 0 σ/ n > z α µ µ 0 same x µ 0 µ = µ 0 or µ < µ 0 σ/ n < z α µ µ 0 same µ = µ 0 µ µ 0 x µ 0 σ/ n mean µ, σ 2 unknown n > 30 x µ 0 µ = µ 0 or µ > µ 0 s/ n > z α µ µ 0 not in ( z α/2, z α/2 ) N(µ, σ 2 ), σ 2 unknown n < 30 Binomial(n, p) n(1 p) > 9 or roughly n > 40 n matched pairs, difference N(µ X µ Y, σ 2 ) x µ 0 µ = µ 0 or µ > µ 0 s/ n > t n 1,α µ µ 0 ˆp p 0 p = p 0 or p > p 0 p p 0 p 0 (1 p 0 )/n > z α µ X µ Y = D 0 or µ X µ Y D 0 (Continued on next page) µ X µ Y > D 0 d D 0 s d / n > t n 1,α 15

15 Hypothesis tests with significance level α Data H 0 H 1 reject H 0 if 2 independent samples, means µ X, µ Y variances unknown, n x > 30, n y > 30 2 independent normal samples, means µ X, µ Y variances unknown 2 independent samples, Binomial(n x, p x ) and Binomial(n y, p y ) µ X µ Y = D 0 or µ X µ Y D 0 µ X µ Y = D 0 or µ X µ Y D 0 p x p y = 0 or p x p y 0 x y D 0 µ X µ Y > D 0 > z α s2 x + s2 y n x n y x y µ X µ Y > D 0 > t n,α s2 x + s2 y n x n y p x p y > 0 ˆp x ˆp y ( ) > z α ˆp nx + n y 0 (1 ˆp 0 ) n x n y 16

### The Big 50 Revision Guidelines for S1

The Big 50 Revision Guidelines for S1 If you can understand all of these you ll do very well 1. Know what is meant by a statistical model and the Modelling cycle of continuous refinement 2. Understand

### 4. Continuous Random Variables, the Pareto and Normal Distributions

4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random

### P (x) 0. Discrete random variables Expected value. The expected value, mean or average of a random variable x is: xp (x) = v i P (v i )

Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =

### Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) B Bar graph a diagram representing the frequency distribution for nominal or discrete data. It consists of a sequence

### Notes for STA 437/1005 Methods for Multivariate Data

Notes for STA 437/1005 Methods for Multivariate Data Radford M. Neal, 26 November 2010 Random Vectors Notation: Let X be a random vector with p elements, so that X = [X 1,..., X p ], where denotes transpose.

### Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.1-1.6) Objectives

### Topic 8 The Expected Value

Topic 8 The Expected Value Functions of Random Variables 1 / 12 Outline Names for Eg(X ) Variance and Standard Deviation Independence Covariance and Correlation 2 / 12 Names for Eg(X ) If g(x) = x, then

### 7 Hypothesis testing - one sample tests

7 Hypothesis testing - one sample tests 7.1 Introduction Definition 7.1 A hypothesis is a statement about a population parameter. Example A hypothesis might be that the mean age of students taking MAS113X

### Exploratory Data Analysis

Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction

### Math 141. Lecture 7: Variance, Covariance, and Sums. Albyn Jones 1. 1 Library 304. jones/courses/141

Math 141 Lecture 7: Variance, Covariance, and Sums Albyn Jones 1 1 Library 304 jones@reed.edu www.people.reed.edu/ jones/courses/141 Last Time Variance: expected squared deviation from the mean: Standard

### Sampling and Hypothesis Testing

Population and sample Sampling and Hypothesis Testing Allin Cottrell Population : an entire set of objects or units of observation of one sort or another. Sample : subset of a population. Parameter versus

### Dongfeng Li. Autumn 2010

Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis

### 4. Joint Distributions of Two Random Variables

4. Joint Distributions of Two Random Variables 4.1 Joint Distributions of Two Discrete Random Variables Suppose the discrete random variables X and Y have supports S X and S Y, respectively. The joint

### For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )

Probability Review 15.075 Cynthia Rudin A probability space, defined by Kolmogorov (1903-1987) consists of: A set of outcomes S, e.g., for the roll of a die, S = {1, 2, 3, 4, 5, 6}, 1 1 2 1 6 for the roll

### Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY 1. Introduction Besides arriving at an appropriate expression of an average or consensus value for observations of a population, it is important to

### THE BINOMIAL DISTRIBUTION & PROBABILITY

REVISION SHEET STATISTICS 1 (MEI) THE BINOMIAL DISTRIBUTION & PROBABILITY The main ideas in this chapter are Probabilities based on selecting or arranging objects Probabilities based on the binomial distribution

### Descriptive Statistics

Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web

### Random variables P(X = 3) = P(X = 3) = 1 8, P(X = 1) = P(X = 1) = 3 8.

Random variables Remark on Notations 1. When X is a number chosen uniformly from a data set, What I call P(X = k) is called Freq[k, X] in the courseware. 2. When X is a random variable, what I call F ()

### Chapters 5. Multivariate Probability Distributions

Chapters 5. Multivariate Probability Distributions Random vectors are collection of random variables defined on the same sample space. Whenever a collection of random variables are mentioned, they are

### Mathematics. Probability and Statistics Curriculum Guide. Revised 2010

Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction

### Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Descriptive Statistics Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Statistics as a Tool for LIS Research Importance of statistics in research

### Multiple random variables

Multiple random variables Multiple random variables We essentially always consider multiple random variables at once. The key concepts: Joint, conditional and marginal distributions, and independence of

### Hypothesis Testing COMP 245 STATISTICS. Dr N A Heard. 1 Hypothesis Testing 2 1.1 Introduction... 2 1.2 Error Rates and Power of a Test...

Hypothesis Testing COMP 45 STATISTICS Dr N A Heard Contents 1 Hypothesis Testing 1.1 Introduction........................................ 1. Error Rates and Power of a Test.............................

### Chapter 4 Lecture Notes

Chapter 4 Lecture Notes Random Variables October 27, 2015 1 Section 4.1 Random Variables A random variable is typically a real-valued function defined on the sample space of some experiment. For instance,

### An Introduction to Basic Statistics and Probability

An Introduction to Basic Statistics and Probability Shenek Heyward NCSU An Introduction to Basic Statistics and Probability p. 1/4 Outline Basic probability concepts Conditional probability Discrete Random

### Covariance and Correlation

Covariance and Correlation ( c Robert J. Serfling Not for reproduction or distribution) We have seen how to summarize a data-based relative frequency distribution by measures of location and spread, such

### Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this

### Solutions for the exam for Matematisk statistik och diskret matematik (MVE050/MSG810). Statistik för fysiker (MSG820). December 15, 2012.

Solutions for the exam for Matematisk statistik och diskret matematik (MVE050/MSG810). Statistik för fysiker (MSG8). December 15, 12. 1. (3p) The joint distribution of the discrete random variables X and

### MBA 611 STATISTICS AND QUANTITATIVE METHODS

MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain

### MT426 Notebook 3 Fall 2012 prepared by Professor Jenny Baglivo. 3 MT426 Notebook 3 3. 3.1 Definitions... 3. 3.2 Joint Discrete Distributions...

MT426 Notebook 3 Fall 2012 prepared by Professor Jenny Baglivo c Copyright 2004-2012 by Jenny A. Baglivo. All Rights Reserved. Contents 3 MT426 Notebook 3 3 3.1 Definitions............................................

### Chapter 4. Probability and Probability Distributions

Chapter 4. robability and robability Distributions Importance of Knowing robability To know whether a sample is not identical to the population from which it was selected, it is necessary to assess the

### 13.2 Measures of Central Tendency

13.2 Measures of Central Tendency Measures of Central Tendency For a given set of numbers, it may be desirable to have a single number to serve as a kind of representative value around which all the numbers

### Exercises with solutions (1)

Exercises with solutions (). Investigate the relationship between independence and correlation. (a) Two random variables X and Y are said to be correlated if and only if their covariance C XY is not equal

### Chapter 2: Data quantifiers: sample mean, sample variance, sample standard deviation Quartiles, percentiles, median, interquartile range Dot diagrams

Review for Final Chapter 2: Data quantifiers: sample mean, sample variance, sample standard deviation Quartiles, percentiles, median, interquartile range Dot diagrams Histogram Boxplots Chapter 3: Set

### Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015.

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment -3, Probability and Statistics, March 05. Due:-March 5, 05.. Show that the function 0 for x < x+ F (x) = 4 for x < for x

### MULTIVARIATE PROBABILITY DISTRIBUTIONS

MULTIVARIATE PROBABILITY DISTRIBUTIONS. PRELIMINARIES.. Example. Consider an experiment that consists of tossing a die and a coin at the same time. We can consider a number of random variables defined

### ST 371 (IV): Discrete Random Variables

ST 371 (IV): Discrete Random Variables 1 Random Variables A random variable (rv) is a function that is defined on the sample space of the experiment and that assigns a numerical variable to each possible

### Algebra I Vocabulary Cards

Algebra I Vocabulary Cards Table of Contents Expressions and Operations Natural Numbers Whole Numbers Integers Rational Numbers Irrational Numbers Real Numbers Absolute Value Order of Operations Expression

### Introduction to Hypothesis Testing. Point estimation and confidence intervals are useful statistical inference procedures.

Introduction to Hypothesis Testing Point estimation and confidence intervals are useful statistical inference procedures. Another type of inference is used frequently used concerns tests of hypotheses.

### Topic 4: Multivariate random variables. Multiple random variables

Topic 4: Multivariate random variables Joint, marginal, and conditional pmf Joint, marginal, and conditional pdf and cdf Independence Expectation, covariance, correlation Conditional expectation Two jointly

### Technology Step-by-Step Using StatCrunch

Technology Step-by-Step Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate

### 5. Continuous Random Variables

5. Continuous Random Variables Continuous random variables can take any value in an interval. They are used to model physical characteristics such as time, length, position, etc. Examples (i) Let X be

### Statistics 641 - EXAM II - 1999 through 2003

Statistics 641 - EXAM II - 1999 through 2003 December 1, 1999 I. (40 points ) Place the letter of the best answer in the blank to the left of each question. (1) In testing H 0 : µ 5 vs H 1 : µ > 5, the

### Numerical Summarization of Data OPRE 6301

Numerical Summarization of Data OPRE 6301 Motivation... In the previous session, we used graphical techniques to describe data. For example: While this histogram provides useful insight, other interesting

### 2.0 Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table

2.0 Lesson Plan Answer Questions 1 Summary Statistics Histograms The Normal Distribution Using the Standard Normal Table 2. Summary Statistics Given a collection of data, one needs to find representations

### Week 3&4: Z tables and the Sampling Distribution of X

Week 3&4: Z tables and the Sampling Distribution of X 2 / 36 The Standard Normal Distribution, or Z Distribution, is the distribution of a random variable, Z N(0, 1 2 ). The distribution of any other normal

### Minitab Guide. This packet contains: A Friendly Guide to Minitab. Minitab Step-By-Step

Minitab Guide This packet contains: A Friendly Guide to Minitab An introduction to Minitab; including basic Minitab functions, how to create sets of data, and how to create and edit graphs of different

### 4. Introduction to Statistics

Statistics for Engineers 4-1 4. Introduction to Statistics Descriptive Statistics Types of data A variate or random variable is a quantity or attribute whose value may vary from one unit of investigation

### Chapter 2: Exploring Data with Graphs and Numerical Summaries. Graphical Measures- Graphs are used to describe the shape of a data set.

Page 1 of 16 Chapter 2: Exploring Data with Graphs and Numerical Summaries Graphical Measures- Graphs are used to describe the shape of a data set. Section 1: Types of Variables In general, variable can

### We will use the following data sets to illustrate measures of center. DATA SET 1 The following are test scores from a class of 20 students:

MODE The mode of the sample is the value of the variable having the greatest frequency. Example: Obtain the mode for Data Set 1 77 For a grouped frequency distribution, the modal class is the class having

### Exercise 1.12 (Pg. 22-23)

Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

### Jointly Distributed Random Variables

Jointly Distributed Random Variables COMP 245 STATISTICS Dr N A Heard Contents 1 Jointly Distributed Random Variables 1 1.1 Definition......................................... 1 1.2 Joint cdfs..........................................

### Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Types of Variables Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Quantitative (numerical)variables: take numerical values for which arithmetic operations make sense (addition/averaging)

### AP STATISTICS REVIEW (YMS Chapters 1-8)

AP STATISTICS REVIEW (YMS Chapters 1-8) Exploring Data (Chapter 1) Categorical Data nominal scale, names e.g. male/female or eye color or breeds of dogs Quantitative Data rational scale (can +,,, with

### 3. The Multivariate Normal Distribution

3. The Multivariate Normal Distribution 3.1 Introduction A generalization of the familiar bell shaped normal density to several dimensions plays a fundamental role in multivariate analysis While real data

PROBLEM SET 1 For the first three answer true or false and explain your answer. A picture is often helpful. 1. Suppose the significance level of a hypothesis test is α=0.05. If the p-value of the test

### STAT 830 Convergence in Distribution

STAT 830 Convergence in Distribution Richard Lockhart Simon Fraser University STAT 830 Fall 2011 Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 1 / 31

### Unit 21 Student s t Distribution in Hypotheses Testing

Unit 21 Student s t Distribution in Hypotheses Testing Objectives: To understand the difference between the standard normal distribution and the Student's t distributions To understand the difference between

### Chapter 2, part 2. Petter Mostad

Chapter 2, part 2 Petter Mostad mostad@chalmers.se Parametrical families of probability distributions How can we solve the problem of learning about the population distribution from the sample? Usual procedure:

### Unit 29 Chi-Square Goodness-of-Fit Test

Unit 29 Chi-Square Goodness-of-Fit Test Objectives: To perform the chi-square hypothesis test concerning proportions corresponding to more than two categories of a qualitative variable To perform the Bonferroni

### Bivariate Distributions

Chapter 4 Bivariate Distributions 4.1 Distributions of Two Random Variables In many practical cases it is desirable to take more than one measurement of a random observation: (brief examples) 1. What is

### Continuous Random Variables and Probability Distributions. Stat 4570/5570 Material from Devore s book (Ed 8) Chapter 4 - and Cengage

4 Continuous Random Variables and Probability Distributions Stat 4570/5570 Material from Devore s book (Ed 8) Chapter 4 - and Cengage Continuous r.v. A random variable X is continuous if possible values

### Examination 110 Probability and Statistics Examination

Examination 0 Probability and Statistics Examination Sample Examination Questions The Probability and Statistics Examination consists of 5 multiple-choice test questions. The test is a three-hour examination

### GCSE Statistics Revision notes

GCSE Statistics Revision notes Collecting data Sample This is when data is collected from part of the population. There are different methods for sampling Random sampling, Stratified sampling, Systematic

### Expectation Discrete RV - weighted average Continuous RV - use integral to take the weighted average

PHP 2510 Expectation, variance, covariance, correlation Expectation Discrete RV - weighted average Continuous RV - use integral to take the weighted average Variance Variance is the average of (X µ) 2

### Exploratory data analysis (Chapter 2) Fall 2011

Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,

### Probability & Statistics Primer Gregory J. Hakim University of Washington 2 January 2009 v2.0

Probability & Statistics Primer Gregory J. Hakim University of Washington 2 January 2009 v2.0 This primer provides an overview of basic concepts and definitions in probability and statistics. We shall

### INTRODUCTORY STATISTICS

INTRODUCTORY STATISTICS Questions 290 Field Statistics Target Audience Science Students Outline Target Level First or Second-year Undergraduate Topics Introduction to Statistics Descriptive Statistics

### Chapter 5. Random variables

Random variables random variable numerical variable whose value is the outcome of some probabilistic experiment; we use uppercase letters, like X, to denote such a variable and lowercase letters, like

### Joint Probability Distributions and Random Samples. Week 5, 2011 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

5 Joint Probability Distributions and Random Samples Week 5, 2011 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage Two Discrete Random Variables The probability mass function (pmf) of a single

### 1.5 NUMERICAL REPRESENTATION OF DATA (Sample Statistics)

1.5 NUMERICAL REPRESENTATION OF DATA (Sample Statistics) As well as displaying data graphically we will often wish to summarise it numerically particularly if we wish to compare two or more data sets.

### Using pivots to construct confidence intervals. In Example 41 we used the fact that

Using pivots to construct confidence intervals In Example 41 we used the fact that Q( X, µ) = X µ σ/ n N(0, 1) for all µ. We then said Q( X, µ) z α/2 with probability 1 α, and converted this into a statement

### 3.4 Statistical inference for 2 populations based on two samples

3.4 Statistical inference for 2 populations based on two samples Tests for a difference between two population means The first sample will be denoted as X 1, X 2,..., X m. The second sample will be denoted

### 3 Multiple Discrete Random Variables

3 Multiple Discrete Random Variables 3.1 Joint densities Suppose we have a probability space (Ω, F,P) and now we have two discrete random variables X and Y on it. They have probability mass functions f

### Descriptive Statistics

Descriptive Statistics Suppose following data have been collected (heights of 99 five-year-old boys) 117.9 11.2 112.9 115.9 18. 14.6 17.1 117.9 111.8 16.3 111. 1.4 112.1 19.2 11. 15.4 99.4 11.1 13.3 16.9

### A and B This represents the probability that both events A and B occur. This can be calculated using the multiplication rules of probability.

Glossary Brase: Understandable Statistics, 10e A B This is the notation used to represent the conditional probability of A given B. A and B This represents the probability that both events A and B occur.

### ST 371 (VIII): Theory of Joint Distributions

ST 371 (VIII): Theory of Joint Distributions So far we have focused on probability distributions for single random variables. However, we are often interested in probability statements concerning two or

### Opgaven Onderzoeksmethoden, Onderdeel Statistiek

Opgaven Onderzoeksmethoden, Onderdeel Statistiek 1. What is the measurement scale of the following variables? a Shoe size b Religion c Car brand d Score in a tennis game e Number of work hours per week

### A frequency distribution is a table used to describe a data set. A frequency table lists intervals or ranges of data values called data classes

A frequency distribution is a table used to describe a data set. A frequency table lists intervals or ranges of data values called data classes together with the number of data values from the set that

### Estimation and Confidence Intervals

Estimation and Confidence Intervals Fall 2001 Professor Paul Glasserman B6014: Managerial Statistics 403 Uris Hall Properties of Point Estimates 1 We have already encountered two point estimators: th e

### Chapter 4 Statistical Inference in Quality Control and Improvement. Statistical Quality Control (D. C. Montgomery)

Chapter 4 Statistical Inference in Quality Control and Improvement 許 湘 伶 Statistical Quality Control (D. C. Montgomery) Sampling distribution I a random sample of size n: if it is selected so that the

### Hypothesis Testing or How to Decide to Decide Edpsy 580

Hypothesis Testing or How to Decide to Decide Edpsy 580 Carolyn J. Anderson Department of Educational Psychology University of Illinois at Urbana-Champaign Hypothesis Testing or How to Decide to Decide

### MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Open book and note Calculator OK Multiple Choice 1 point each MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Find the mean for the given sample data.

### Lean Six Sigma Training/Certification Book: Volume 1

Lean Six Sigma Training/Certification Book: Volume 1 Six Sigma Quality: Concepts & Cases Volume I (Statistical Tools in Six Sigma DMAIC process with MINITAB Applications Chapter 1 Introduction to Six Sigma,

### Practice problems for Homework 11 - Point Estimation

Practice problems for Homework 11 - Point Estimation 1. (10 marks) Suppose we want to select a random sample of size 5 from the current CS 3341 students. Which of the following strategies is the best:

### GCSE HIGHER Statistics Key Facts

GCSE HIGHER Statistics Key Facts Collecting Data When writing questions for questionnaires, always ensure that: 1. the question is worded so that it will allow the recipient to give you the information

### Continuous Random Variables

Continuous Random Variables COMP 245 STATISTICS Dr N A Heard Contents 1 Continuous Random Variables 2 11 Introduction 2 12 Probability Density Functions 3 13 Transformations 5 2 Mean, Variance and Quantiles

### ECE302 Spring 2006 HW7 Solutions March 11, 2006 1

ECE32 Spring 26 HW7 Solutions March, 26 Solutions to HW7 Note: Most of these solutions were generated by R. D. Yates and D. J. Goodman, the authors of our textbook. I have added comments in italics where

### 3. Continuous Random Variables

3. Continuous Random Variables A continuous random variable is one which can take any value in an interval (or union of intervals) The values that can be taken by such a variable cannot be listed. Such

### Montana Common Core Standard

Algebra I Grade Level: 9, 10, 11, 12 Length: 1 Year Period(s) Per Day: 1 Credit: 1 Credit Requirement Fulfilled: A must pass course Course Description This course covers the real number system, solving

### The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

Probability Theory Probability Spaces and Events Consider a random experiment with several possible outcomes. For example, we might roll a pair of dice, flip a coin three times, or choose a random real

### MIDTERM EXAMINATION Spring 2009 STA301- Statistics and Probability (Session - 2)

MIDTERM EXAMINATION Spring 2009 STA301- Statistics and Probability (Session - 2) Question No: 1 Median can be found only when: Data is Discrete Data is Attributed Data is continuous Data is continuous

### E3: PROBABILITY AND STATISTICS lecture notes

E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................

### Correlation in Random Variables

Correlation in Random Variables Lecture 11 Spring 2002 Correlation in Random Variables Suppose that an experiment produces two random variables, X and Y. What can we say about the relationship between

### e = random error, assumed to be normally distributed with mean 0 and standard deviation σ

1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.

### Notes on Continuous Random Variables

Notes on Continuous Random Variables Continuous random variables are random quantities that are measured on a continuous scale. They can usually take on any value over some interval, which distinguishes

### , for x = 0, 1, 2, 3,... (4.1) (1 + 1/n) n = 2.71828... b x /x! = e b, x=0

Chapter 4 The Poisson Distribution 4.1 The Fish Distribution? The Poisson distribution is named after Simeon-Denis Poisson (1781 1840). In addition, poisson is French for fish. In this chapter we will

### 10-3 Measures of Central Tendency and Variation

10-3 Measures of Central Tendency and Variation So far, we have discussed some graphical methods of data description. Now, we will investigate how statements of central tendency and variation can be used.