Dongfeng Li. Autumn 2010

Size: px
Start display at page:

Download "Dongfeng Li. Autumn 2010"

Transcription

1 Autumn 2010

2 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis testing, basic analysis of variance.

3 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis testing, basic analysis of variance.

4 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis testing, basic analysis of variance.

5 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis testing, basic analysis of variance.

6 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis testing, basic analysis of variance.

7 Section Contents Review statistics concepts: Distribution, discrete distribution, continuouse distribution, PDF, CDF, quantile. Normal distribution., median, variance, standard deviation, interquantile range, skewness, kurtosis. Population, parameter, sample, statistics, estimates, sampling distribution. MLE, standard error. Hypothesis tests, two types of errors, p-value.

8 Section Contents Review statistics concepts: Distribution, discrete distribution, continuouse distribution, PDF, CDF, quantile. Normal distribution., median, variance, standard deviation, interquantile range, skewness, kurtosis. Population, parameter, sample, statistics, estimates, sampling distribution. MLE, standard error. Hypothesis tests, two types of errors, p-value.

9 Section Contents Review statistics concepts: Distribution, discrete distribution, continuouse distribution, PDF, CDF, quantile. Normal distribution., median, variance, standard deviation, interquantile range, skewness, kurtosis. Population, parameter, sample, statistics, estimates, sampling distribution. MLE, standard error. Hypothesis tests, two types of errors, p-value.

10 Section Contents Review statistics concepts: Distribution, discrete distribution, continuouse distribution, PDF, CDF, quantile. Normal distribution., median, variance, standard deviation, interquantile range, skewness, kurtosis. Population, parameter, sample, statistics, estimates, sampling distribution. MLE, standard error. Hypothesis tests, two types of errors, p-value.

11 Section Contents Review statistics concepts: Distribution, discrete distribution, continuouse distribution, PDF, CDF, quantile. Normal distribution., median, variance, standard deviation, interquantile range, skewness, kurtosis. Population, parameter, sample, statistics, estimates, sampling distribution. MLE, standard error. Hypothesis tests, two types of errors, p-value.

12 Section Contents Review statistics concepts: Distribution, discrete distribution, continuouse distribution, PDF, CDF, quantile. Normal distribution., median, variance, standard deviation, interquantile range, skewness, kurtosis. Population, parameter, sample, statistics, estimates, sampling distribution. MLE, standard error. Hypothesis tests, two types of errors, p-value.

13 Distribution Random Variable: Discrete, such as sex, patient/control, age group. Continuous, such as weight, blood pressure. Distribution: used to describe the relative chance of taking some value. For discrete variable X, use P(X = xi ), where {x i } are the value set of X. Called probability mass function(pmf). For continuous variable X, use the probability density function(pdf) f (x), where P(X (x ɛ, x + ɛ) f (x)(2ɛ).

14 Distribution Random Variable: Discrete, such as sex, patient/control, age group. Continuous, such as weight, blood pressure. Distribution: used to describe the relative chance of taking some value. For discrete variable X, use P(X = xi ), where {x i } are the value set of X. Called probability mass function(pmf). For continuous variable X, use the probability density function(pdf) f (x), where P(X (x ɛ, x + ɛ) f (x)(2ɛ).

15 Distribution Random Variable: Discrete, such as sex, patient/control, age group. Continuous, such as weight, blood pressure. Distribution: used to describe the relative chance of taking some value. For discrete variable X, use P(X = xi ), where {x i } are the value set of X. Called probability mass function(pmf). For continuous variable X, use the probability density function(pdf) f (x), where P(X (x ɛ, x + ɛ) f (x)(2ɛ).

16 Distribution Random Variable: Discrete, such as sex, patient/control, age group. Continuous, such as weight, blood pressure. Distribution: used to describe the relative chance of taking some value. For discrete variable X, use P(X = xi ), where {x i } are the value set of X. Called probability mass function(pmf). For continuous variable X, use the probability density function(pdf) f (x), where P(X (x ɛ, x + ɛ) f (x)(2ɛ).

17 Distribution Random Variable: Discrete, such as sex, patient/control, age group. Continuous, such as weight, blood pressure. Distribution: used to describe the relative chance of taking some value. For discrete variable X, use P(X = xi ), where {x i } are the value set of X. Called probability mass function(pmf). For continuous variable X, use the probability density function(pdf) f (x), where P(X (x ɛ, x + ɛ) f (x)(2ɛ).

18 Distribution Random Variable: Discrete, such as sex, patient/control, age group. Continuous, such as weight, blood pressure. Distribution: used to describe the relative chance of taking some value. For discrete variable X, use P(X = xi ), where {x i } are the value set of X. Called probability mass function(pmf). For continuous variable X, use the probability density function(pdf) f (x), where P(X (x ɛ, x + ɛ) f (x)(2ɛ).

19 CDF Cumulative distribution function(cdf) F(x): F(x) = P(X x) P(x (a, b]) = F(b) F(a) For discrete distribution, F(x) = P(X = x i ) x i x For continuous distribution, F(x) = x f (t)dt

20 CDF Cumulative distribution function(cdf) F(x): F(x) = P(X x) P(x (a, b]) = F(b) F(a) For discrete distribution, F(x) = P(X = x i ) x i x For continuous distribution, F(x) = x f (t)dt

21 CDF Cumulative distribution function(cdf) F(x): F(x) = P(X x) P(x (a, b]) = F(b) F(a) For discrete distribution, F(x) = P(X = x i ) x i x For continuous distribution, F(x) = x f (t)dt

22 Quantile function Quantile function: the inverse of CDF q(p) = F 1 (p), p (0, 1) if F (x) is 1-1 mapping. Generally, q(p) = x p where P(X x p ) p, P(X x p ) 1 p (x p can be non-unique.)

23 Quantile function Quantile function: the inverse of CDF q(p) = F 1 (p), p (0, 1) if F (x) is 1-1 mapping. Generally, q(p) = x p where P(X x p ) p, P(X x p ) 1 p (x p can be non-unique.)

24 The normal distribution Standard normal distribution(n(0,1)), PDF ϕ(x) = 1 2π e x2 2 Normal distribution N(µ, σ 2 ), PDF f (x) = ϕ( x µ σ ) = 1 2πσ exp{ (x µ)2 2σ 2 }

25 The normal distribution Standard normal distribution(n(0,1)), PDF ϕ(x) = 1 2π e x2 2 Normal distribution N(µ, σ 2 ), PDF f (x) = ϕ( x µ σ ) = 1 2πσ exp{ (x µ)2 2σ 2 }

26 Numerical charasteristics PDF is a curve with infinite number of points. We can use some numbers to describe the key part of a distribution. Firstly, the location measurement. The mean EX. The meadian, x0.5 = q(0.5) where (can be non-unique). Variability measurement. P(x x 0.5 ) 0.5, P(x x 0.5 ) 0.5 The standard deviation σ X = E(X EX) 2. Interquantile range q(0.75) q(0.25).

27 Numerical charasteristics PDF is a curve with infinite number of points. We can use some numbers to describe the key part of a distribution. Firstly, the location measurement. The mean EX. The meadian, x0.5 = q(0.5) where (can be non-unique). Variability measurement. P(x x 0.5 ) 0.5, P(x x 0.5 ) 0.5 The standard deviation σ X = E(X EX) 2. Interquantile range q(0.75) q(0.25).

28 Numerical charasteristics PDF is a curve with infinite number of points. We can use some numbers to describe the key part of a distribution. Firstly, the location measurement. The mean EX. The meadian, x0.5 = q(0.5) where (can be non-unique). Variability measurement. P(x x 0.5 ) 0.5, P(x x 0.5 ) 0.5 The standard deviation σ X = E(X EX) 2. Interquantile range q(0.75) q(0.25).

29 Numerical charasteristics PDF is a curve with infinite number of points. We can use some numbers to describe the key part of a distribution. Firstly, the location measurement. The mean EX. The meadian, x0.5 = q(0.5) where (can be non-unique). Variability measurement. P(x x 0.5 ) 0.5, P(x x 0.5 ) 0.5 The standard deviation σ X = E(X EX) 2. Interquantile range q(0.75) q(0.25).

30 Numerical charasteristics PDF is a curve with infinite number of points. We can use some numbers to describe the key part of a distribution. Firstly, the location measurement. The mean EX. The meadian, x0.5 = q(0.5) where (can be non-unique). Variability measurement. P(x x 0.5 ) 0.5, P(x x 0.5 ) 0.5 The standard deviation σ X = E(X EX) 2. Interquantile range q(0.75) q(0.25).

31 Numerical charasteristics PDF is a curve with infinite number of points. We can use some numbers to describe the key part of a distribution. Firstly, the location measurement. The mean EX. The meadian, x0.5 = q(0.5) where (can be non-unique). Variability measurement. P(x x 0.5 ) 0.5, P(x x 0.5 ) 0.5 The standard deviation σ X = E(X EX) 2. Interquantile range q(0.75) q(0.25).

32 Numerical charasteristics PDF is a curve with infinite number of points. We can use some numbers to describe the key part of a distribution. Firstly, the location measurement. The mean EX. The meadian, x0.5 = q(0.5) where (can be non-unique). Variability measurement. P(x x 0.5 ) 0.5, P(x x 0.5 ) 0.5 The standard deviation σ X = E(X EX) 2. Interquantile range q(0.75) q(0.25).

33 Numerical charasteristics PDF is a curve with infinite number of points. We can use some numbers to describe the key part of a distribution. Firstly, the location measurement. The mean EX. The meadian, x0.5 = q(0.5) where (can be non-unique). Variability measurement. P(x x 0.5 ) 0.5, P(x x 0.5 ) 0.5 The standard deviation σ X = E(X EX) 2. Interquantile range q(0.75) q(0.25).

34 Shape characteristics. Skewness ( X EX E σ X ) 3 Symmetric, left-skewed, right-skewed density. Shape characteristics. Kurtosis ( ) X EX 4 E 3 σ X Heavy tail problem. Multimodel distribution. E.g., the weight of 10 year old and 20 year old mixed together.

35 Shape characteristics. Skewness ( X EX E σ X ) 3 Symmetric, left-skewed, right-skewed density. Shape characteristics. Kurtosis ( ) X EX 4 E 3 σ X Heavy tail problem. Multimodel distribution. E.g., the weight of 10 year old and 20 year old mixed together.

36 Shape characteristics. Skewness ( X EX E σ X ) 3 Symmetric, left-skewed, right-skewed density. Shape characteristics. Kurtosis ( ) X EX 4 E 3 σ X Heavy tail problem. Multimodel distribution. E.g., the weight of 10 year old and 20 year old mixed together.

37 Shape characteristics. Skewness ( X EX E σ X ) 3 Symmetric, left-skewed, right-skewed density. Shape characteristics. Kurtosis ( ) X EX 4 E 3 σ X Heavy tail problem. Multimodel distribution. E.g., the weight of 10 year old and 20 year old mixed together.

38 Shape characteristics. Skewness ( X EX E σ X ) 3 Symmetric, left-skewed, right-skewed density. Shape characteristics. Kurtosis ( ) X EX 4 E 3 σ X Heavy tail problem. Multimodel distribution. E.g., the weight of 10 year old and 20 year old mixed together.

39 Population A population is modeled by a random variable or a distribution. Population parameters: unknown numbers which could decide the distribution. E.g., for N(µ, σ 2 ) population, (µ, σ 2 ) are the two unknown population parameters.

40 Population A population is modeled by a random variable or a distribution. Population parameters: unknown numbers which could decide the distribution. E.g., for N(µ, σ 2 ) population, (µ, σ 2 ) are the two unknown population parameters.

41 Sample A sample(typically) is n observations draw independently from the population, X 1, X 2,..., X n. A sample can be regarded as n numbers, or n iid random variables. Statistics: calculate some values to estimate the distribution of the population. Can be regarded as a random variable. Such as sample mean, sample standard deviation. Can be used to estimate unknown population parameters, called estimates. Sampling distribution: The distribution of an estimator or other statistic. E.g., if X 1,..., X n is a sample from N(µ, σ 2 ), then X N(µ, σ 2 /n).

42 Sample A sample(typically) is n observations draw independently from the population, X 1, X 2,..., X n. A sample can be regarded as n numbers, or n iid random variables. Statistics: calculate some values to estimate the distribution of the population. Can be regarded as a random variable. Such as sample mean, sample standard deviation. Can be used to estimate unknown population parameters, called estimates. Sampling distribution: The distribution of an estimator or other statistic. E.g., if X 1,..., X n is a sample from N(µ, σ 2 ), then X N(µ, σ 2 /n).

43 Sample A sample(typically) is n observations draw independently from the population, X 1, X 2,..., X n. A sample can be regarded as n numbers, or n iid random variables. Statistics: calculate some values to estimate the distribution of the population. Can be regarded as a random variable. Such as sample mean, sample standard deviation. Can be used to estimate unknown population parameters, called estimates. Sampling distribution: The distribution of an estimator or other statistic. E.g., if X 1,..., X n is a sample from N(µ, σ 2 ), then X N(µ, σ 2 /n).

44 Sample A sample(typically) is n observations draw independently from the population, X 1, X 2,..., X n. A sample can be regarded as n numbers, or n iid random variables. Statistics: calculate some values to estimate the distribution of the population. Can be regarded as a random variable. Such as sample mean, sample standard deviation. Can be used to estimate unknown population parameters, called estimates. Sampling distribution: The distribution of an estimator or other statistic. E.g., if X 1,..., X n is a sample from N(µ, σ 2 ), then X N(µ, σ 2 /n).

45 MLE(Maximimum Likelihood Estimate) MLE is a commonly used parameter estimation method. For random sample X 1, X 2,..., X n from population X, if PDF of PMF of X is f (x; β), then ˆβ = arg max β L(β) = arg max β n f (X i ; β) i=1 is called the MLE of the unknow parameter β. Under some conditions, when the sample size n, ˆβ has a limiting(approximate) distribution N(β, σ 2 β ).

46 MLE(Maximimum Likelihood Estimate) MLE is a commonly used parameter estimation method. For random sample X 1, X 2,..., X n from population X, if PDF of PMF of X is f (x; β), then ˆβ = arg max β L(β) = arg max β n f (X i ; β) i=1 is called the MLE of the unknow parameter β. Under some conditions, when the sample size n, ˆβ has a limiting(approximate) distribution N(β, σ 2 β ).

47 Standard Error If ˆβ = ψ(x 1,..., X n ) is a estimate of an population parameter β, it has a sampling distribution F ˆβ(x). The standard deviation of the sampling distribution is σ ˆβ. An estimate of σ ˆβ is called the standard error(se) of ˆβ. If ˆβ has a limiting normal distribution, then ˆβ is approximately distributed N(β, SE( ˆβ)). SE can measure the precision of estimation. SE can be used to construct (approximate) confidence intervals.

48 Standard Error If ˆβ = ψ(x 1,..., X n ) is a estimate of an population parameter β, it has a sampling distribution F ˆβ(x). The standard deviation of the sampling distribution is σ ˆβ. An estimate of σ ˆβ is called the standard error(se) of ˆβ. If ˆβ has a limiting normal distribution, then ˆβ is approximately distributed N(β, SE( ˆβ)). SE can measure the precision of estimation. SE can be used to construct (approximate) confidence intervals.

49 Standard Error If ˆβ = ψ(x 1,..., X n ) is a estimate of an population parameter β, it has a sampling distribution F ˆβ(x). The standard deviation of the sampling distribution is σ ˆβ. An estimate of σ ˆβ is called the standard error(se) of ˆβ. If ˆβ has a limiting normal distribution, then ˆβ is approximately distributed N(β, SE( ˆβ)). SE can measure the precision of estimation. SE can be used to construct (approximate) confidence intervals.

50 Standard Error If ˆβ = ψ(x 1,..., X n ) is a estimate of an population parameter β, it has a sampling distribution F ˆβ(x). The standard deviation of the sampling distribution is σ ˆβ. An estimate of σ ˆβ is called the standard error(se) of ˆβ. If ˆβ has a limiting normal distribution, then ˆβ is approximately distributed N(β, SE( ˆβ)). SE can measure the precision of estimation. SE can be used to construct (approximate) confidence intervals.

51 Standard Error If ˆβ = ψ(x 1,..., X n ) is a estimate of an population parameter β, it has a sampling distribution F ˆβ(x). The standard deviation of the sampling distribution is σ ˆβ. An estimate of σ ˆβ is called the standard error(se) of ˆβ. If ˆβ has a limiting normal distribution, then ˆβ is approximately distributed N(β, SE( ˆβ)). SE can measure the precision of estimation. SE can be used to construct (approximate) confidence intervals.

52 Standard Error If ˆβ = ψ(x 1,..., X n ) is a estimate of an population parameter β, it has a sampling distribution F ˆβ(x). The standard deviation of the sampling distribution is σ ˆβ. An estimate of σ ˆβ is called the standard error(se) of ˆβ. If ˆβ has a limiting normal distribution, then ˆβ is approximately distributed N(β, SE( ˆβ)). SE can measure the precision of estimation. SE can be used to construct (approximate) confidence intervals.

53 Hypothesis Tests To test the null hypothesis H 0 agaist the alternative hypothesis H a on the population; Given sample X 1, X 2,..., X n from population F(x; θ), construct some statistic ξ, whose distribution does not depend on θ, but its value can indicate the possible choice of H 0 or H a. Traditionally, we first choose an significance level α, then we find a rejection area W, where sup H0 P(ξ W ) α. We reject H 0 when ξ W.

54 Hypothesis Tests To test the null hypothesis H 0 agaist the alternative hypothesis H a on the population; Given sample X 1, X 2,..., X n from population F(x; θ), construct some statistic ξ, whose distribution does not depend on θ, but its value can indicate the possible choice of H 0 or H a. Traditionally, we first choose an significance level α, then we find a rejection area W, where sup H0 P(ξ W ) α. We reject H 0 when ξ W.

55 Hypothesis Tests To test the null hypothesis H 0 agaist the alternative hypothesis H a on the population; Given sample X 1, X 2,..., X n from population F(x; θ), construct some statistic ξ, whose distribution does not depend on θ, but its value can indicate the possible choice of H 0 or H a. Traditionally, we first choose an significance level α, then we find a rejection area W, where sup H0 P(ξ W ) α. We reject H 0 when ξ W.

56 Errors and P-Values Two types of possible errors in a hypothesis test: Type I error, H 0 true but rejected. Error rate is at most the significance level α. Type II error, H0 false but accepted. Error rate can be as large as 1 α. To reduce type II error: Construct theoretically good tests. Don t choose α too small. Choose a big enough sample size n. p-value: the minimum α we can use if we want to reject H 0, after the test statistic value is known. The smaller the p-value is, the more confidence we have when we reject H 0. Reject H 0 if and only if the p-value is less than α.

57 Errors and P-Values Two types of possible errors in a hypothesis test: Type I error, H 0 true but rejected. Error rate is at most the significance level α. Type II error, H0 false but accepted. Error rate can be as large as 1 α. To reduce type II error: Construct theoretically good tests. Don t choose α too small. Choose a big enough sample size n. p-value: the minimum α we can use if we want to reject H 0, after the test statistic value is known. The smaller the p-value is, the more confidence we have when we reject H 0. Reject H 0 if and only if the p-value is less than α.

58 Errors and P-Values Two types of possible errors in a hypothesis test: Type I error, H 0 true but rejected. Error rate is at most the significance level α. Type II error, H0 false but accepted. Error rate can be as large as 1 α. To reduce type II error: Construct theoretically good tests. Don t choose α too small. Choose a big enough sample size n. p-value: the minimum α we can use if we want to reject H 0, after the test statistic value is known. The smaller the p-value is, the more confidence we have when we reject H 0. Reject H 0 if and only if the p-value is less than α.

59 Errors and P-Values Two types of possible errors in a hypothesis test: Type I error, H 0 true but rejected. Error rate is at most the significance level α. Type II error, H0 false but accepted. Error rate can be as large as 1 α. To reduce type II error: Construct theoretically good tests. Don t choose α too small. Choose a big enough sample size n. p-value: the minimum α we can use if we want to reject H 0, after the test statistic value is known. The smaller the p-value is, the more confidence we have when we reject H 0. Reject H 0 if and only if the p-value is less than α.

60 Errors and P-Values Two types of possible errors in a hypothesis test: Type I error, H 0 true but rejected. Error rate is at most the significance level α. Type II error, H0 false but accepted. Error rate can be as large as 1 α. To reduce type II error: Construct theoretically good tests. Don t choose α too small. Choose a big enough sample size n. p-value: the minimum α we can use if we want to reject H 0, after the test statistic value is known. The smaller the p-value is, the more confidence we have when we reject H 0. Reject H 0 if and only if the p-value is less than α.

61 Errors and P-Values Two types of possible errors in a hypothesis test: Type I error, H 0 true but rejected. Error rate is at most the significance level α. Type II error, H0 false but accepted. Error rate can be as large as 1 α. To reduce type II error: Construct theoretically good tests. Don t choose α too small. Choose a big enough sample size n. p-value: the minimum α we can use if we want to reject H 0, after the test statistic value is known. The smaller the p-value is, the more confidence we have when we reject H 0. Reject H 0 if and only if the p-value is less than α.

62 Errors and P-Values Two types of possible errors in a hypothesis test: Type I error, H 0 true but rejected. Error rate is at most the significance level α. Type II error, H0 false but accepted. Error rate can be as large as 1 α. To reduce type II error: Construct theoretically good tests. Don t choose α too small. Choose a big enough sample size n. p-value: the minimum α we can use if we want to reject H 0, after the test statistic value is known. The smaller the p-value is, the more confidence we have when we reject H 0. Reject H 0 if and only if the p-value is less than α.

63 Errors and P-Values Two types of possible errors in a hypothesis test: Type I error, H 0 true but rejected. Error rate is at most the significance level α. Type II error, H0 false but accepted. Error rate can be as large as 1 α. To reduce type II error: Construct theoretically good tests. Don t choose α too small. Choose a big enough sample size n. p-value: the minimum α we can use if we want to reject H 0, after the test statistic value is known. The smaller the p-value is, the more confidence we have when we reject H 0. Reject H 0 if and only if the p-value is less than α.

64 Errors and P-Values Two types of possible errors in a hypothesis test: Type I error, H 0 true but rejected. Error rate is at most the significance level α. Type II error, H0 false but accepted. Error rate can be as large as 1 α. To reduce type II error: Construct theoretically good tests. Don t choose α too small. Choose a big enough sample size n. p-value: the minimum α we can use if we want to reject H 0, after the test statistic value is known. The smaller the p-value is, the more confidence we have when we reject H 0. Reject H 0 if and only if the p-value is less than α.

65 Example hypothesis test X N(µ, σ 2 ). Sample is X 1,..., X n. H 0 : µ µ 0 H a : µ > µ 0. Test statistic T = X µ 0 SE( X) where SE( X) = S/ n, S is the sample standard deviation. Rejection area: W = {ξ > λ}, λ = F 1 (1 α, n 1), F 1 (p, n) is the quantile function of the t(n 1) distribution. P-value is 1 F(T ; n 1), where F(x; n) is the CDF of the t(n) distribution. The larger T is, the smaller the p-value is, the more confidence we have when we reject H 0.

66 Example hypothesis test X N(µ, σ 2 ). Sample is X 1,..., X n. H 0 : µ µ 0 H a : µ > µ 0. Test statistic T = X µ 0 SE( X) where SE( X) = S/ n, S is the sample standard deviation. Rejection area: W = {ξ > λ}, λ = F 1 (1 α, n 1), F 1 (p, n) is the quantile function of the t(n 1) distribution. P-value is 1 F(T ; n 1), where F(x; n) is the CDF of the t(n) distribution. The larger T is, the smaller the p-value is, the more confidence we have when we reject H 0.

67 Example hypothesis test X N(µ, σ 2 ). Sample is X 1,..., X n. H 0 : µ µ 0 H a : µ > µ 0. Test statistic T = X µ 0 SE( X) where SE( X) = S/ n, S is the sample standard deviation. Rejection area: W = {ξ > λ}, λ = F 1 (1 α, n 1), F 1 (p, n) is the quantile function of the t(n 1) distribution. P-value is 1 F(T ; n 1), where F(x; n) is the CDF of the t(n) distribution. The larger T is, the smaller the p-value is, the more confidence we have when we reject H 0.

68 Example hypothesis test X N(µ, σ 2 ). Sample is X 1,..., X n. H 0 : µ µ 0 H a : µ > µ 0. Test statistic T = X µ 0 SE( X) where SE( X) = S/ n, S is the sample standard deviation. Rejection area: W = {ξ > λ}, λ = F 1 (1 α, n 1), F 1 (p, n) is the quantile function of the t(n 1) distribution. P-value is 1 F(T ; n 1), where F(x; n) is the CDF of the t(n) distribution. The larger T is, the smaller the p-value is, the more confidence we have when we reject H 0.

69 Example hypothesis test X N(µ, σ 2 ). Sample is X 1,..., X n. H 0 : µ µ 0 H a : µ > µ 0. Test statistic T = X µ 0 SE( X) where SE( X) = S/ n, S is the sample standard deviation. Rejection area: W = {ξ > λ}, λ = F 1 (1 α, n 1), F 1 (p, n) is the quantile function of the t(n 1) distribution. P-value is 1 F(T ; n 1), where F(x; n) is the CDF of the t(n) distribution. The larger T is, the smaller the p-value is, the more confidence we have when we reject H 0.

70 Example hypothesis test X N(µ, σ 2 ). Sample is X 1,..., X n. H 0 : µ µ 0 H a : µ > µ 0. Test statistic T = X µ 0 SE( X) where SE( X) = S/ n, S is the sample standard deviation. Rejection area: W = {ξ > λ}, λ = F 1 (1 α, n 1), F 1 (p, n) is the quantile function of the t(n 1) distribution. P-value is 1 F(T ; n 1), where F(x; n) is the CDF of the t(n) distribution. The larger T is, the smaller the p-value is, the more confidence we have when we reject H 0.

71 Section Contents Measurment level of variables. statistics for nominal variables. statistics for interval variables. Histograms, boxplots, QQ plots, probability plots, stem-leaf plots. Using PROC FREQ, PROC MEANS, PROC UNIVARIATE. statistics Summary statistics

72 Section Contents Measurment level of variables. statistics for nominal variables. statistics for interval variables. Histograms, boxplots, QQ plots, probability plots, stem-leaf plots. Using PROC FREQ, PROC MEANS, PROC UNIVARIATE. statistics Summary statistics

73 Section Contents Measurment level of variables. statistics for nominal variables. statistics for interval variables. Histograms, boxplots, QQ plots, probability plots, stem-leaf plots. Using PROC FREQ, PROC MEANS, PROC UNIVARIATE. statistics Summary statistics

74 Section Contents Measurment level of variables. statistics for nominal variables. statistics for interval variables. Histograms, boxplots, QQ plots, probability plots, stem-leaf plots. Using PROC FREQ, PROC MEANS, PROC UNIVARIATE. statistics Summary statistics

75 Section Contents Measurment level of variables. statistics for nominal variables. statistics for interval variables. Histograms, boxplots, QQ plots, probability plots, stem-leaf plots. Using PROC FREQ, PROC MEANS, PROC UNIVARIATE. statistics Summary statistics

76 Measurement levels Nominal, such as sex, job type, race. Ordinal, such as age group(child, youth, medium, old), dose level(none, low, high). Interval, such as weight, blood pressure, heart rate. statistics Summary statistics

77 Measurement levels Nominal, such as sex, job type, race. Ordinal, such as age group(child, youth, medium, old), dose level(none, low, high). Interval, such as weight, blood pressure, heart rate. statistics Summary statistics

78 Measurement levels Nominal, such as sex, job type, race. Ordinal, such as age group(child, youth, medium, old), dose level(none, low, high). Interval, such as weight, blood pressure, heart rate. statistics Summary statistics

79 Statistics for nominal variables The value set. The count of each value(called frequency), and the percentage. Use a bar chart to show the distribution. statistics Summary statistics

80 Statistics for nominal variables The value set. The count of each value(called frequency), and the percentage. Use a bar chart to show the distribution. statistics Summary statistics

81 Statistics for nominal variables The value set. The count of each value(called frequency), and the percentage. Use a bar chart to show the distribution. statistics Summary statistics

82 Example: frequency table Use PROC FREQ to list the value set and the frequency counts, percents. proc freq data=sasuser.class; tables sex; run; Gender sex Frequency Percent Cumulative Cumulative Frequency Percent F M statistics Summary statistics

83 Example: bar chart Use PROC GCHART to make a bar chart. hbar could be replaced with vbar, hbar3d, vbar3d. proc gchart data=sasuser.class; hbar sex; run; statistics Summary statistics

84 statistics for interval variables Location statistics: sample mean, sample median. Variability statistics: sample standard deviation, sample interquantile range, range, coefficient of variation. Sample skewness Sample kurtosis Moments, quantiles. n (n 1)(n 2) n(n+1) (n 1)(n 2)(n 3) ( ) y i y 3. s (yi y) 4 3(n 1)2 s 4 (n 2)(n 3) statistics Summary statistics

85 statistics for interval variables Location statistics: sample mean, sample median. Variability statistics: sample standard deviation, sample interquantile range, range, coefficient of variation. Sample skewness Sample kurtosis Moments, quantiles. n (n 1)(n 2) n(n+1) (n 1)(n 2)(n 3) ( ) y i y 3. s (yi y) 4 3(n 1)2 s 4 (n 2)(n 3) statistics Summary statistics

86 statistics for interval variables Location statistics: sample mean, sample median. Variability statistics: sample standard deviation, sample interquantile range, range, coefficient of variation. Sample skewness Sample kurtosis Moments, quantiles. n (n 1)(n 2) n(n+1) (n 1)(n 2)(n 3) ( ) y i y 3. s (yi y) 4 3(n 1)2 s 4 (n 2)(n 3) statistics Summary statistics

87 statistics for interval variables Location statistics: sample mean, sample median. Variability statistics: sample standard deviation, sample interquantile range, range, coefficient of variation. Sample skewness Sample kurtosis Moments, quantiles. n (n 1)(n 2) n(n+1) (n 1)(n 2)(n 3) ( ) y i y 3. s (yi y) 4 3(n 1)2 s 4 (n 2)(n 3) statistics Summary statistics

88 statistics for interval variables Location statistics: sample mean, sample median. Variability statistics: sample standard deviation, sample interquantile range, range, coefficient of variation. Sample skewness Sample kurtosis Moments, quantiles. n (n 1)(n 2) n(n+1) (n 1)(n 2)(n 3) ( ) y i y 3. s (yi y) 4 3(n 1)2 s 4 (n 2)(n 3) statistics Summary statistics

89 PROC UNIVARIATE Use PROC UNIVARIATE to compute sample statistics for an interval variable. Example: proc univariate data=sasuser.class; var height; run; statistics Summary statistics

90 PROC UNIVARIATE Use PROC UNIVARIATE to compute sample statistics for an interval variable. Example: proc univariate data=sasuser.class; var height; run; statistics Summary statistics

91 Output Moments N 19 Sum Weights Sum Observations Std Deviation Skewness Kurtosis Uncorrected SS Corrected SS Coeff Variation Std Error Basic Statistical Measures Location Variability Std Deviation Median Mode Range Interquartile Range statistics Summary statistics Note The mode displayed is the smallest of 2 modes with a count of 2.

92 Tests for Location: Mu0=0 Test Statistic p Value Student s t t Pr > t <.0001 Sign M 9.5 Pr >= M <.0001 Signed Rank S 95 Pr >= S <.0001 statistics Summary statistics

93 Quantiles (Definition 5) Quantile Estimate 100% Max % % % % Q % Median % Q % % % % Min 51.3 statistics Summary statistics

94 Extreme Observations Lowest Highest Value Obs Value Obs statistics Summary statistics

95 Histograms The histogram is a nonparametric estimate of the population density function. It divide the value set into intervals, and count the percent of observations in each interval. Example: proc univariate data=sasuser.class noprint; var height; histogram; run; statistics Summary statistics

96 Histograms The histogram is a nonparametric estimate of the population density function. It divide the value set into intervals, and count the percent of observations in each interval. Example: proc univariate data=sasuser.class noprint; var height; histogram; run; statistics Summary statistics

97 Histograms The histogram is a nonparametric estimate of the population density function. It divide the value set into intervals, and count the percent of observations in each interval. Example: proc univariate data=sasuser.class noprint; var height; histogram; run; statistics Summary statistics

98 Examples of histograms Skewness shown in histograms. statistics Summary statistics

99 Box plot Quantiles could be used to describe key distribution charasteristics. The box plot shows the median, 1/4 and 3/4 quantile, and the minimum and maximum of a variable. The box holds the middle 50% range. Length is the IQR(inter quantile range). The wiskers extend to the minimum and the maximum; but the length of the wisker is not allowed to exceed 1.5 times the IQR. Points beyond wiskers are plotted as individual points. They are candidate for outliers. statistics Summary statistics

100 Box plot Quantiles could be used to describe key distribution charasteristics. The box plot shows the median, 1/4 and 3/4 quantile, and the minimum and maximum of a variable. The box holds the middle 50% range. Length is the IQR(inter quantile range). The wiskers extend to the minimum and the maximum; but the length of the wisker is not allowed to exceed 1.5 times the IQR. Points beyond wiskers are plotted as individual points. They are candidate for outliers. statistics Summary statistics

101 Box plot Quantiles could be used to describe key distribution charasteristics. The box plot shows the median, 1/4 and 3/4 quantile, and the minimum and maximum of a variable. The box holds the middle 50% range. Length is the IQR(inter quantile range). The wiskers extend to the minimum and the maximum; but the length of the wisker is not allowed to exceed 1.5 times the IQR. Points beyond wiskers are plotted as individual points. They are candidate for outliers. statistics Summary statistics

102 Box plot Quantiles could be used to describe key distribution charasteristics. The box plot shows the median, 1/4 and 3/4 quantile, and the minimum and maximum of a variable. The box holds the middle 50% range. Length is the IQR(inter quantile range). The wiskers extend to the minimum and the maximum; but the length of the wisker is not allowed to exceed 1.5 times the IQR. Points beyond wiskers are plotted as individual points. They are candidate for outliers. statistics Summary statistics

103 Box plot Quantiles could be used to describe key distribution charasteristics. The box plot shows the median, 1/4 and 3/4 quantile, and the minimum and maximum of a variable. The box holds the middle 50% range. Length is the IQR(inter quantile range). The wiskers extend to the minimum and the maximum; but the length of the wisker is not allowed to exceed 1.5 times the IQR. Points beyond wiskers are plotted as individual points. They are candidate for outliers. statistics Summary statistics

104 Example: boxplot of height statistics Summary statistics

105 proc sql; create view work._tmp_0 as select height, 1 as _dummy_ from sasuser.class; title ; axis1 major=none value=none label=none; proc boxplot data=_tmp_0; plot height*_dummy_ / boxstyle=skematic cboxfill=blue haxis=axis1; run; statistics Summary statistics

106 Example: boxplot of GPA and CO Skewness and possible outliers shown in boxplot. statistics Summary statistics

107 Grouped boxplot statistics Summary statistics

108 proc sort data=sasuser.gpa out=_tmp_1; by sex; proc boxplot data=_tmp_1; plot gpa*sex / boxstyle=skematic cboxfill=blue; run; statistics Summary statistics

109 QQ Plot The normal distribution is the most commonly used distribution. Graphs are designed to show discrepency from the normal distribution. QQ plot is one of them. Suppose Y 1, Y 2,..., Y n is from N(µ,σ 2 ) and sorted ascendingly. CDF F(x) = Φ( x µ σ ), F(Y i) i n, Φ( Y i µ σ ) i n, Y i µ + σφ 1 ( i n ). Let x i = Φ 1 ( i n ), plot (x i, Y i ),i = 1,..., n. The points should lie around the line y = µ + σx. Continuity adjustment: x i = Φ 1 ( i n+0.25 ). statistics Summary statistics

110 QQ Plot The normal distribution is the most commonly used distribution. Graphs are designed to show discrepency from the normal distribution. QQ plot is one of them. Suppose Y 1, Y 2,..., Y n is from N(µ,σ 2 ) and sorted ascendingly. CDF F(x) = Φ( x µ σ ), F(Y i) i n, Φ( Y i µ σ ) i n, Y i µ + σφ 1 ( i n ). Let x i = Φ 1 ( i n ), plot (x i, Y i ),i = 1,..., n. The points should lie around the line y = µ + σx. Continuity adjustment: x i = Φ 1 ( i n+0.25 ). statistics Summary statistics

111 QQ Plot The normal distribution is the most commonly used distribution. Graphs are designed to show discrepency from the normal distribution. QQ plot is one of them. Suppose Y 1, Y 2,..., Y n is from N(µ,σ 2 ) and sorted ascendingly. CDF F(x) = Φ( x µ σ ), F(Y i) i n, Φ( Y i µ σ ) i n, Y i µ + σφ 1 ( i n ). Let x i = Φ 1 ( i n ), plot (x i, Y i ),i = 1,..., n. The points should lie around the line y = µ + σx. Continuity adjustment: x i = Φ 1 ( i n+0.25 ). statistics Summary statistics

112 QQ Plot The normal distribution is the most commonly used distribution. Graphs are designed to show discrepency from the normal distribution. QQ plot is one of them. Suppose Y 1, Y 2,..., Y n is from N(µ,σ 2 ) and sorted ascendingly. CDF F(x) = Φ( x µ σ ), F(Y i) i n, Φ( Y i µ σ ) i n, Y i µ + σφ 1 ( i n ). Let x i = Φ 1 ( i n ), plot (x i, Y i ),i = 1,..., n. The points should lie around the line y = µ + σx. Continuity adjustment: x i = Φ 1 ( i n+0.25 ). statistics Summary statistics

113 QQ Plot The normal distribution is the most commonly used distribution. Graphs are designed to show discrepency from the normal distribution. QQ plot is one of them. Suppose Y 1, Y 2,..., Y n is from N(µ,σ 2 ) and sorted ascendingly. CDF F(x) = Φ( x µ σ ), F(Y i) i n, Φ( Y i µ σ ) i n, Y i µ + σφ 1 ( i n ). Let x i = Φ 1 ( i n ), plot (x i, Y i ),i = 1,..., n. The points should lie around the line y = µ + σx. Continuity adjustment: x i = Φ 1 ( i n+0.25 ). statistics Summary statistics

114 QQ Plot The normal distribution is the most commonly used distribution. Graphs are designed to show discrepency from the normal distribution. QQ plot is one of them. Suppose Y 1, Y 2,..., Y n is from N(µ,σ 2 ) and sorted ascendingly. CDF F(x) = Φ( x µ σ ), F(Y i) i n, Φ( Y i µ σ ) i n, Y i µ + σφ 1 ( i n ). Let x i = Φ 1 ( i n ), plot (x i, Y i ),i = 1,..., n. The points should lie around the line y = µ + σx. Continuity adjustment: x i = Φ 1 ( i n+0.25 ). statistics Summary statistics

115 Example: QQ Plot of HEIGHT proc univariate data=sasuser.class noprint; qqplot height / normal(mu=est sigma=est); run; statistics Summary statistics

116 Example: Skewness in QQ Plot statistics Summary statistics

117 Probability Plot Probability plot is the same plot as a QQ plot, exept that the label of the x axis Φ(x i ) instead of x i. Probability plots are preferable for graphical estimation of percentiles, whereas Q-Q plots are preferable for graphical estimation of distribution parameters. statistics Summary statistics

118 Probability Plot Probability plot is the same plot as a QQ plot, exept that the label of the x axis Φ(x i ) instead of x i. Probability plots are preferable for graphical estimation of percentiles, whereas Q-Q plots are preferable for graphical estimation of distribution parameters. statistics Summary statistics

119 Example: Probability Plot of HEIGHT proc univariate data=sasuser.class noprint; probplot height / normal(mu=est sigma=est); run; statistics Summary statistics

120 The Stem-leaf Plot The stem-leaf plot is a text-based plot, which display information like the histogram, but with detail on each data value. Each leaf corresponds to one data value. PROC UNIVARIATE has an option PLOT, which generates text-based stem-leaf plot, boxplot and QQ plot. statistics Summary statistics

121 The Stem-leaf Plot The stem-leaf plot is a text-based plot, which display information like the histogram, but with detail on each data value. Each leaf corresponds to one data value. PROC UNIVARIATE has an option PLOT, which generates text-based stem-leaf plot, boxplot and QQ plot. statistics Summary statistics

122 proc univariate data=sasuser.class plot; var height; run; Stem Leaf # Multiply Stem.Leaf by 10**+1 statistics Summary statistics

123 PROC MEANS and PROC SUMMARY PROC MEANS and PROC SUMMARY are used to produce summary statistics. They can display overall summary statistics and classified summary statistics. PROC MEANS displays result by default; PROC SUMMARY need the PRINT option to display result. They can output data sets with the summary statistics. PROC SUMMARY is designed to do this, although PROC MEANS can do the same. statistics Summary statistics

124 PROC MEANS and PROC SUMMARY PROC MEANS and PROC SUMMARY are used to produce summary statistics. They can display overall summary statistics and classified summary statistics. PROC MEANS displays result by default; PROC SUMMARY need the PRINT option to display result. They can output data sets with the summary statistics. PROC SUMMARY is designed to do this, although PROC MEANS can do the same. statistics Summary statistics

125 PROC MEANS and PROC SUMMARY PROC MEANS and PROC SUMMARY are used to produce summary statistics. They can display overall summary statistics and classified summary statistics. PROC MEANS displays result by default; PROC SUMMARY need the PRINT option to display result. They can output data sets with the summary statistics. PROC SUMMARY is designed to do this, although PROC MEANS can do the same. statistics Summary statistics

126 Example of overall statistics Use the VAR statement to specify which varibles to summarize. Example of overall statistics: proc means data=sasuser.class; var height weight; run; proc summary data=sasuser.class print; var height weight; run; statistics Summary statistics

127 Example of overall statistics Use the VAR statement to specify which varibles to summarize. Example of overall statistics: proc means data=sasuser.class; var height weight; run; proc summary data=sasuser.class print; var height weight; run; statistics Summary statistics

128 Classified summary Use the CLASS statement to specify one or more class variables. Example: proc means data=sasuser.class ; var height weight; class sex; run; statistics Summary statistics

129 Classified summary Use the CLASS statement to specify one or more class variables. Example: proc means data=sasuser.class ; var height weight; class sex; run; statistics Summary statistics

130 Other statistical measures PROC MEANS calculate summaries N,, Standard Deviation, Minimum, Maximum by default. Use PROC MEANS options to specify other statistical measures. Example: proc means data=sasuser.class MEAN VAR CV; var height weight; class sex; run; statistics Summary statistics

131 Other statistical measures PROC MEANS calculate summaries N,, Standard Deviation, Minimum, Maximum by default. Use PROC MEANS options to specify other statistical measures. Example: proc means data=sasuser.class MEAN VAR CV; var height weight; class sex; run; statistics Summary statistics

132 Other statistical measures PROC MEANS calculate summaries N,, Standard Deviation, Minimum, Maximum by default. Use PROC MEANS options to specify other statistical measures. Example: proc means data=sasuser.class MEAN VAR CV; var height weight; class sex; run; statistics Summary statistics

133 Producing output datasets Use the OUTPUT statement to save the summary statistics to an output dataset. Syntax: OUTPUT OUT=output-dataset keyword= keyword= / AUTONAME; Where keyword is a name of some statistical measure, like MEAN, STD, N, MIN, MAX, etc. Example: statistics Summary statistics proc means data=sasuser.class MEAN VAR CV; var height weight; class sex; output out=res N= CV= / AUTONAME; run; proc print data=res;run;

134 Producing output datasets Use the OUTPUT statement to save the summary statistics to an output dataset. Syntax: OUTPUT OUT=output-dataset keyword= keyword= / AUTONAME; Where keyword is a name of some statistical measure, like MEAN, STD, N, MIN, MAX, etc. Example: statistics Summary statistics proc means data=sasuser.class MEAN VAR CV; var height weight; class sex; output out=res N= CV= / AUTONAME; run; proc print data=res;run;

135 Producing output datasets Use the OUTPUT statement to save the summary statistics to an output dataset. Syntax: OUTPUT OUT=output-dataset keyword= keyword= / AUTONAME; Where keyword is a name of some statistical measure, like MEAN, STD, N, MIN, MAX, etc. Example: statistics Summary statistics proc means data=sasuser.class MEAN VAR CV; var height weight; class sex; output out=res N= CV= / AUTONAME; run; proc print data=res;run;

136 Producing output datasets Use the OUTPUT statement to save the summary statistics to an output dataset. Syntax: OUTPUT OUT=output-dataset keyword= keyword= / AUTONAME; Where keyword is a name of some statistical measure, like MEAN, STD, N, MIN, MAX, etc. Example: statistics Summary statistics proc means data=sasuser.class MEAN VAR CV; var height weight; class sex; output out=res N= CV= / AUTONAME; run; proc print data=res;run;

137 One sample Z tests, t tests. Two sample t tests. The Wilcoxon rank sum test. Paired t tests. Comparing proportions: one sample and two sample. One Sample Test of the Comparing Two Groups Paired Comparison Comparing Proportions

138 One sample Z tests, t tests. Two sample t tests. The Wilcoxon rank sum test. Paired t tests. Comparing proportions: one sample and two sample. One Sample Test of the Comparing Two Groups Paired Comparison Comparing Proportions

139 One sample Z tests, t tests. Two sample t tests. The Wilcoxon rank sum test. Paired t tests. Comparing proportions: one sample and two sample. One Sample Test of the Comparing Two Groups Paired Comparison Comparing Proportions

140 One sample Z tests, t tests. Two sample t tests. The Wilcoxon rank sum test. Paired t tests. Comparing proportions: one sample and two sample. One Sample Test of the Comparing Two Groups Paired Comparison Comparing Proportions

141 One sample Z tests, t tests. Two sample t tests. The Wilcoxon rank sum test. Paired t tests. Comparing proportions: one sample and two sample. One Sample Test of the Comparing Two Groups Paired Comparison Comparing Proportions

142 One Sample Z Test Two-sided X N(µ, σ 2 0 ), σ 0 known. H 0 : µ = µ 0 H a : µ µ 0 Z = X µ 0 σ 0 / H 0 n N(0, 1). W = { Z > λ}, λ = Φ 1 (1 α/2). p-value=2(1 Φ( Z ). Wald test: Based on the central limit theorem, to test H 0 : θ = θ 0 H a : θ θ 0, use W = ˆθ θ 0 as the SE(ˆθ) test statistic, if W N(0, 1). One Sample Test of the Comparing Two Groups Paired Comparison Comparing Proportions

143 One Sample Z Test Two-sided X N(µ, σ 2 0 ), σ 0 known. H 0 : µ = µ 0 H a : µ µ 0 Z = X µ 0 σ 0 / H 0 n N(0, 1). W = { Z > λ}, λ = Φ 1 (1 α/2). p-value=2(1 Φ( Z ). Wald test: Based on the central limit theorem, to test H 0 : θ = θ 0 H a : θ θ 0, use W = ˆθ θ 0 as the SE(ˆθ) test statistic, if W N(0, 1). One Sample Test of the Comparing Two Groups Paired Comparison Comparing Proportions

144 One Sample Z Test Two-sided X N(µ, σ 2 0 ), σ 0 known. H 0 : µ = µ 0 H a : µ µ 0 Z = X µ 0 σ 0 / H 0 n N(0, 1). W = { Z > λ}, λ = Φ 1 (1 α/2). p-value=2(1 Φ( Z ). Wald test: Based on the central limit theorem, to test H 0 : θ = θ 0 H a : θ θ 0, use W = ˆθ θ 0 as the SE(ˆθ) test statistic, if W N(0, 1). One Sample Test of the Comparing Two Groups Paired Comparison Comparing Proportions

145 One Sample Z Test Two-sided X N(µ, σ 2 0 ), σ 0 known. H 0 : µ = µ 0 H a : µ µ 0 Z = X µ 0 σ 0 / H 0 n N(0, 1). W = { Z > λ}, λ = Φ 1 (1 α/2). p-value=2(1 Φ( Z ). Wald test: Based on the central limit theorem, to test H 0 : θ = θ 0 H a : θ θ 0, use W = ˆθ θ 0 as the SE(ˆθ) test statistic, if W N(0, 1). One Sample Test of the Comparing Two Groups Paired Comparison Comparing Proportions

146 One Sample Z Test Two-sided X N(µ, σ 2 0 ), σ 0 known. H 0 : µ = µ 0 H a : µ µ 0 Z = X µ 0 σ 0 / H 0 n N(0, 1). W = { Z > λ}, λ = Φ 1 (1 α/2). p-value=2(1 Φ( Z ). Wald test: Based on the central limit theorem, to test H 0 : θ = θ 0 H a : θ θ 0, use W = ˆθ θ 0 as the SE(ˆθ) test statistic, if W N(0, 1). One Sample Test of the Comparing Two Groups Paired Comparison Comparing Proportions

147 One Sample Z Test Two-sided X N(µ, σ 2 0 ), σ 0 known. H 0 : µ = µ 0 H a : µ µ 0 Z = X µ 0 σ 0 / H 0 n N(0, 1). W = { Z > λ}, λ = Φ 1 (1 α/2). p-value=2(1 Φ( Z ). Wald test: Based on the central limit theorem, to test H 0 : θ = θ 0 H a : θ θ 0, use W = ˆθ θ 0 as the SE(ˆθ) test statistic, if W N(0, 1). One Sample Test of the Comparing Two Groups Paired Comparison Comparing Proportions

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

Exploratory data analysis (Chapter 2) Fall 2011

Exploratory data analysis (Chapter 2) Fall 2011 Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,

More information

Exploratory Data Analysis

Exploratory Data Analysis Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction

More information

4 Other useful features on the course web page. 5 Accessing SAS

4 Other useful features on the course web page. 5 Accessing SAS 1 Using SAS outside of ITCs Statistical Methods and Computing, 22S:30/105 Instructor: Cowles Lab 1 Jan 31, 2014 You can access SAS from off campus by using the ITC Virtual Desktop Go to https://virtualdesktopuiowaedu

More information

Descriptive Statistics

Descriptive Statistics Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web

More information

DATA INTERPRETATION AND STATISTICS

DATA INTERPRETATION AND STATISTICS PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

More information

Basic Statistical and Modeling Procedures Using SAS

Basic Statistical and Modeling Procedures Using SAS Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom

More information

Final Exam Practice Problem Answers

Final Exam Practice Problem Answers Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

More information

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013 Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.1-1.6) Objectives

More information

Exercise 1.12 (Pg. 22-23)

Exercise 1.12 (Pg. 22-23) Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

MBA 611 STATISTICS AND QUANTITATIVE METHODS

MBA 611 STATISTICS AND QUANTITATIVE METHODS MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain

More information

Principle of Data Reduction

Principle of Data Reduction Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then

More information

Geostatistics Exploratory Analysis

Geostatistics Exploratory Analysis Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Types of Variables Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Quantitative (numerical)variables: take numerical values for which arithmetic operations make sense (addition/averaging)

More information

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) B Bar graph a diagram representing the frequency distribution for nominal or discrete data. It consists of a sequence

More information

Foundation of Quantitative Data Analysis

Foundation of Quantitative Data Analysis Foundation of Quantitative Data Analysis Part 1: Data manipulation and descriptive statistics with SPSS/Excel HSRS #10 - October 17, 2013 Reference : A. Aczel, Complete Business Statistics. Chapters 1

More information

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals

More information

MEASURES OF LOCATION AND SPREAD

MEASURES OF LOCATION AND SPREAD Paper TU04 An Overview of Non-parametric Tests in SAS : When, Why, and How Paul A. Pappas and Venita DePuy Durham, North Carolina, USA ABSTRACT Most commonly used statistical procedures are based on the

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

How To Test For Significance On A Data Set

How To Test For Significance On A Data Set Non-Parametric Univariate Tests: 1 Sample Sign Test 1 1 SAMPLE SIGN TEST A non-parametric equivalent of the 1 SAMPLE T-TEST. ASSUMPTIONS: Data is non-normally distributed, even after log transforming.

More information

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables Discrete vs. continuous random variables Examples of continuous distributions o Uniform o Exponential o Normal Recall: A random

More information

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties: Density Curve A density curve is the graph of a continuous probability distribution. It must satisfy the following properties: 1. The total area under the curve must equal 1. 2. Every point on the curve

More information

Chapter 3 RANDOM VARIATE GENERATION

Chapter 3 RANDOM VARIATE GENERATION Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

More information

Lecture 2. Summarizing the Sample

Lecture 2. Summarizing the Sample Lecture 2 Summarizing the Sample WARNING: Today s lecture may bore some of you It s (sort of) not my fault I m required to teach you about what we re going to cover today. I ll try to make it as exciting

More information

Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217

Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217 Part 3 Comparing Groups Chapter 7 Comparing Paired Groups 189 Chapter 8 Comparing Two Independent Groups 217 Chapter 9 Comparing More Than Two Groups 257 188 Elementary Statistics Using SAS Chapter 7 Comparing

More information

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition Online Learning Centre Technology Step-by-Step - Excel Microsoft Excel is a spreadsheet software application

More information

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences Introduction to Statistics for Psychology and Quantitative Methods for Human Sciences Jonathan Marchini Course Information There is website devoted to the course at http://www.stats.ox.ac.uk/ marchini/phs.html

More information

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Descriptive Statistics Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Statistics as a Tool for LIS Research Importance of statistics in research

More information

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),

More information

VISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA

VISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA VISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA Csilla Csendes University of Miskolc, Hungary Department of Applied Mathematics ICAM 2010 Probability density functions A random variable X has density

More information

Statistics 104: Section 6!

Statistics 104: Section 6! Page 1 Statistics 104: Section 6! TF: Deirdre (say: Dear-dra) Bloome Email: dbloome@fas.harvard.edu Section Times Thursday 2pm-3pm in SC 109, Thursday 5pm-6pm in SC 705 Office Hours: Thursday 6pm-7pm SC

More information

Experimental Design for Influential Factors of Rates on Massive Open Online Courses

Experimental Design for Influential Factors of Rates on Massive Open Online Courses Experimental Design for Influential Factors of Rates on Massive Open Online Courses December 12, 2014 Ning Li nli7@stevens.edu Qing Wei qwei1@stevens.edu Yating Lan ylan2@stevens.edu Yilin Wei ywei12@stevens.edu

More information

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September

More information

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators... MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................

More information

UNIT I: RANDOM VARIABLES PART- A -TWO MARKS

UNIT I: RANDOM VARIABLES PART- A -TWO MARKS UNIT I: RANDOM VARIABLES PART- A -TWO MARKS 1. Given the probability density function of a continuous random variable X as follows f(x) = 6x (1-x) 0

More information

Chapter 7 Section 1 Homework Set A

Chapter 7 Section 1 Homework Set A Chapter 7 Section 1 Homework Set A 7.15 Finding the critical value t *. What critical value t * from Table D (use software, go to the web and type t distribution applet) should be used to calculate the

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for

More information

Introduction to Quantitative Methods

Introduction to Quantitative Methods Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

More information

Variables. Exploratory Data Analysis

Variables. Exploratory Data Analysis Exploratory Data Analysis Exploratory Data Analysis involves both graphical displays of data and numerical summaries of data. A common situation is for a data set to be represented as a matrix. There is

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

THE BINOMIAL DISTRIBUTION & PROBABILITY

THE BINOMIAL DISTRIBUTION & PROBABILITY REVISION SHEET STATISTICS 1 (MEI) THE BINOMIAL DISTRIBUTION & PROBABILITY The main ideas in this chapter are Probabilities based on selecting or arranging objects Probabilities based on the binomial distribution

More information

Graphs. Exploratory data analysis. Graphs. Standard forms. A graph is a suitable way of representing data if:

Graphs. Exploratory data analysis. Graphs. Standard forms. A graph is a suitable way of representing data if: Graphs Exploratory data analysis Dr. David Lucy d.lucy@lancaster.ac.uk Lancaster University A graph is a suitable way of representing data if: A line or area can represent the quantities in the data in

More information

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 seema@iasri.res.in Genomics A genome is an organism s

More information

Lecture 8. Confidence intervals and the central limit theorem

Lecture 8. Confidence intervals and the central limit theorem Lecture 8. Confidence intervals and the central limit theorem Mathematical Statistics and Discrete Mathematics November 25th, 2015 1 / 15 Central limit theorem Let X 1, X 2,... X n be a random sample of

More information

Joint Exam 1/P Sample Exam 1

Joint Exam 1/P Sample Exam 1 Joint Exam 1/P Sample Exam 1 Take this practice exam under strict exam conditions: Set a timer for 3 hours; Do not stop the timer for restroom breaks; Do not look at your notes. If you believe a question

More information

Nonparametric Two-Sample Tests. Nonparametric Tests. Sign Test

Nonparametric Two-Sample Tests. Nonparametric Tests. Sign Test Nonparametric Two-Sample Tests Sign test Mann-Whitney U-test (a.k.a. Wilcoxon two-sample test) Kolmogorov-Smirnov Test Wilcoxon Signed-Rank Test Tukey-Duckworth Test 1 Nonparametric Tests Recall, nonparametric

More information

The Normal Distribution. Alan T. Arnholt Department of Mathematical Sciences Appalachian State University

The Normal Distribution. Alan T. Arnholt Department of Mathematical Sciences Appalachian State University The Normal Distribution Alan T. Arnholt Department of Mathematical Sciences Appalachian State University arnholt@math.appstate.edu Spring 2006 R Notes 1 Copyright c 2006 Alan T. Arnholt 2 Continuous Random

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Final Exam Review MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) A researcher for an airline interviews all of the passengers on five randomly

More information

4. Continuous Random Variables, the Pareto and Normal Distributions

4. Continuous Random Variables, the Pareto and Normal Distributions 4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random

More information

Lecture 1: Review and Exploratory Data Analysis (EDA)

Lecture 1: Review and Exploratory Data Analysis (EDA) Lecture 1: Review and Exploratory Data Analysis (EDA) Sandy Eckel seckel@jhsph.edu Department of Biostatistics, The Johns Hopkins University, Baltimore USA 21 April 2008 1 / 40 Course Information I Course

More information

Normal distribution. ) 2 /2σ. 2π σ

Normal distribution. ) 2 /2σ. 2π σ Normal distribution The normal distribution is the most widely known and used of all distributions. Because the normal distribution approximates many natural phenomena so well, it has developed into a

More information

Are Histograms Giving You Fits? New SAS Software for Analyzing Distributions

Are Histograms Giving You Fits? New SAS Software for Analyzing Distributions Are Histograms Giving You Fits? New SAS Software for Analyzing Distributions Nathan A. Curtis, SAS Institute Inc., Cary, NC Abstract Exploring and modeling the distribution of a data sample is a key step

More information

5. Continuous Random Variables

5. Continuous Random Variables 5. Continuous Random Variables Continuous random variables can take any value in an interval. They are used to model physical characteristics such as time, length, position, etc. Examples (i) Let X be

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

How To Write A Data Analysis

How To Write A Data Analysis Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction

More information

An Introduction to Basic Statistics and Probability

An Introduction to Basic Statistics and Probability An Introduction to Basic Statistics and Probability Shenek Heyward NCSU An Introduction to Basic Statistics and Probability p. 1/4 Outline Basic probability concepts Conditional probability Discrete Random

More information

Northumberland Knowledge

Northumberland Knowledge Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Tutorial 5: Hypothesis Testing

Tutorial 5: Hypothesis Testing Tutorial 5: Hypothesis Testing Rob Nicholls nicholls@mrc-lmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................

More information

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test Experimental Design Power and Sample Size Determination Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison November 3 8, 2011 To this point in the semester, we have largely

More information

Gamma Distribution Fitting

Gamma Distribution Fitting Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics

More information

2. Filling Data Gaps, Data validation & Descriptive Statistics

2. Filling Data Gaps, Data validation & Descriptive Statistics 2. Filling Data Gaps, Data validation & Descriptive Statistics Dr. Prasad Modak Background Data collected from field may suffer from these problems Data may contain gaps ( = no readings during this period)

More information

Chapter 7 Section 7.1: Inference for the Mean of a Population

Chapter 7 Section 7.1: Inference for the Mean of a Population Chapter 7 Section 7.1: Inference for the Mean of a Population Now let s look at a similar situation Take an SRS of size n Normal Population : N(, ). Both and are unknown parameters. Unlike what we used

More information

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences. 1 Commands in JMP and Statcrunch Below are a set of commands in JMP and Statcrunch which facilitate a basic statistical analysis. The first part concerns commands in JMP, the second part is for analysis

More information

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction CA200 Quantitative Analysis for Business Decisions File name: CA200_Section_04A_StatisticsIntroduction Table of Contents 4. Introduction to Statistics... 1 4.1 Overview... 3 4.2 Discrete or continuous

More information

1 Prior Probability and Posterior Probability

1 Prior Probability and Posterior Probability Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which

More information

Module 4: Data Exploration

Module 4: Data Exploration Module 4: Data Exploration Now that you have your data downloaded from the Streams Project database, the detective work can begin! Before computing any advanced statistics, we will first use descriptive

More information

Exact Confidence Intervals

Exact Confidence Intervals Math 541: Statistical Theory II Instructor: Songfeng Zheng Exact Confidence Intervals Confidence intervals provide an alternative to using an estimator ˆθ when we wish to estimate an unknown parameter

More information

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Two-Sample T-Tests Assuming Equal Variance (Enter Means) Chapter 4 Two-Sample T-Tests Assuming Equal Variance (Enter Means) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when the variances of

More information

Chapter 4. Probability and Probability Distributions

Chapter 4. Probability and Probability Distributions Chapter 4. robability and robability Distributions Importance of Knowing robability To know whether a sample is not identical to the population from which it was selected, it is necessary to assess the

More information

Interpretation of Somers D under four simple models

Interpretation of Somers D under four simple models Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms

More information

Stats on the TI 83 and TI 84 Calculator

Stats on the TI 83 and TI 84 Calculator Stats on the TI 83 and TI 84 Calculator Entering the sample values STAT button Left bracket { Right bracket } Store (STO) List L1 Comma Enter Example: Sample data are {5, 10, 15, 20} 1. Press 2 ND and

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Chapter 45 Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when no assumption

More information

NONPARAMETRIC STATISTICS 1. depend on assumptions about the underlying distribution of the data (or on the Central Limit Theorem)

NONPARAMETRIC STATISTICS 1. depend on assumptions about the underlying distribution of the data (or on the Central Limit Theorem) NONPARAMETRIC STATISTICS 1 PREVIOUSLY parametric statistics in estimation and hypothesis testing... construction of confidence intervals computing of p-values classical significance testing depend on assumptions

More information

Data Analysis Tools. Tools for Summarizing Data

Data Analysis Tools. Tools for Summarizing Data Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool

More information

How To Check For Differences In The One Way Anova

How To Check For Differences In The One Way Anova MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

Some Essential Statistics The Lure of Statistics

Some Essential Statistics The Lure of Statistics Some Essential Statistics The Lure of Statistics Data Mining Techniques, by M.J.A. Berry and G.S Linoff, 2004 Statistics vs. Data Mining..lie, damn lie, and statistics mining data to support preconceived

More information

WEEK #22: PDFs and CDFs, Measures of Center and Spread

WEEK #22: PDFs and CDFs, Measures of Center and Spread WEEK #22: PDFs and CDFs, Measures of Center and Spread Goals: Explore the effect of independent events in probability calculations. Present a number of ways to represent probability distributions. Textbook

More information

Introduction to Environmental Statistics. The Big Picture. Populations and Samples. Sample Data. Examples of sample data

Introduction to Environmental Statistics. The Big Picture. Populations and Samples. Sample Data. Examples of sample data A Few Sources for Data Examples Used Introduction to Environmental Statistics Professor Jessica Utts University of California, Irvine jutts@uci.edu 1. Statistical Methods in Water Resources by D.R. Helsel

More information

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name: Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours

More information

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this

More information

Week 1. Exploratory Data Analysis

Week 1. Exploratory Data Analysis Week 1 Exploratory Data Analysis Practicalities This course ST903 has students from both the MSc in Financial Mathematics and the MSc in Statistics. Two lectures and one seminar/tutorial per week. Exam

More information

List of Examples. Examples 319

List of Examples. Examples 319 Examples 319 List of Examples DiMaggio and Mantle. 6 Weed seeds. 6, 23, 37, 38 Vole reproduction. 7, 24, 37 Wooly bear caterpillar cocoons. 7 Homophone confusion and Alzheimer s disease. 8 Gear tooth strength.

More information

3.4 Statistical inference for 2 populations based on two samples

3.4 Statistical inference for 2 populations based on two samples 3.4 Statistical inference for 2 populations based on two samples Tests for a difference between two population means The first sample will be denoted as X 1, X 2,..., X m. The second sample will be denoted

More information

Projects Involving Statistics (& SPSS)

Projects Involving Statistics (& SPSS) Projects Involving Statistics (& SPSS) Academic Skills Advice Starting a project which involves using statistics can feel confusing as there seems to be many different things you can do (charts, graphs,

More information

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY 1. Introduction Besides arriving at an appropriate expression of an average or consensus value for observations of a population, it is important to

More information

Data Cleaning 101. Ronald Cody, Ed.D., Robert Wood Johnson Medical School, Piscataway, NJ. Variable Name. Valid Values. Type

Data Cleaning 101. Ronald Cody, Ed.D., Robert Wood Johnson Medical School, Piscataway, NJ. Variable Name. Valid Values. Type Data Cleaning 101 Ronald Cody, Ed.D., Robert Wood Johnson Medical School, Piscataway, NJ INTRODUCTION One of the first and most important steps in any data processing task is to verify that your data values

More information

3: Summary Statistics

3: Summary Statistics 3: Summary Statistics Notation Let s start by introducing some notation. Consider the following small data set: 4 5 30 50 8 7 4 5 The symbol n represents the sample size (n = 0). The capital letter X denotes

More information

EXPLORATORY DATA ANALYSIS: GETTING TO KNOW YOUR DATA

EXPLORATORY DATA ANALYSIS: GETTING TO KNOW YOUR DATA EXPLORATORY DATA ANALYSIS: GETTING TO KNOW YOUR DATA Michael A. Walega Covance, Inc. INTRODUCTION In broad terms, Exploratory Data Analysis (EDA) can be defined as the numerical and graphical examination

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

More information

NCSS Statistical Software. One-Sample T-Test

NCSS Statistical Software. One-Sample T-Test Chapter 205 Introduction This procedure provides several reports for making inference about a population mean based on a single sample. These reports include confidence intervals of the mean or median,

More information

The Normal distribution

The Normal distribution The Normal distribution The normal probability distribution is the most common model for relative frequencies of a quantitative variable. Bell-shaped and described by the function f(y) = 1 2σ π e{ 1 2σ

More information

STAT 830 Convergence in Distribution

STAT 830 Convergence in Distribution STAT 830 Convergence in Distribution Richard Lockhart Simon Fraser University STAT 830 Fall 2011 Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 1 / 31

More information