# Dongfeng Li. Autumn 2010

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Autumn 2010

2 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis testing, basic analysis of variance.

3 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis testing, basic analysis of variance.

4 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis testing, basic analysis of variance.

5 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis testing, basic analysis of variance.

6 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis testing, basic analysis of variance.

7 Section Contents Review statistics concepts: Distribution, discrete distribution, continuouse distribution, PDF, CDF, quantile. Normal distribution., median, variance, standard deviation, interquantile range, skewness, kurtosis. Population, parameter, sample, statistics, estimates, sampling distribution. MLE, standard error. Hypothesis tests, two types of errors, p-value.

8 Section Contents Review statistics concepts: Distribution, discrete distribution, continuouse distribution, PDF, CDF, quantile. Normal distribution., median, variance, standard deviation, interquantile range, skewness, kurtosis. Population, parameter, sample, statistics, estimates, sampling distribution. MLE, standard error. Hypothesis tests, two types of errors, p-value.

9 Section Contents Review statistics concepts: Distribution, discrete distribution, continuouse distribution, PDF, CDF, quantile. Normal distribution., median, variance, standard deviation, interquantile range, skewness, kurtosis. Population, parameter, sample, statistics, estimates, sampling distribution. MLE, standard error. Hypothesis tests, two types of errors, p-value.

10 Section Contents Review statistics concepts: Distribution, discrete distribution, continuouse distribution, PDF, CDF, quantile. Normal distribution., median, variance, standard deviation, interquantile range, skewness, kurtosis. Population, parameter, sample, statistics, estimates, sampling distribution. MLE, standard error. Hypothesis tests, two types of errors, p-value.

11 Section Contents Review statistics concepts: Distribution, discrete distribution, continuouse distribution, PDF, CDF, quantile. Normal distribution., median, variance, standard deviation, interquantile range, skewness, kurtosis. Population, parameter, sample, statistics, estimates, sampling distribution. MLE, standard error. Hypothesis tests, two types of errors, p-value.

12 Section Contents Review statistics concepts: Distribution, discrete distribution, continuouse distribution, PDF, CDF, quantile. Normal distribution., median, variance, standard deviation, interquantile range, skewness, kurtosis. Population, parameter, sample, statistics, estimates, sampling distribution. MLE, standard error. Hypothesis tests, two types of errors, p-value.

13 Distribution Random Variable: Discrete, such as sex, patient/control, age group. Continuous, such as weight, blood pressure. Distribution: used to describe the relative chance of taking some value. For discrete variable X, use P(X = xi ), where {x i } are the value set of X. Called probability mass function(pmf). For continuous variable X, use the probability density function(pdf) f (x), where P(X (x ɛ, x + ɛ) f (x)(2ɛ).

14 Distribution Random Variable: Discrete, such as sex, patient/control, age group. Continuous, such as weight, blood pressure. Distribution: used to describe the relative chance of taking some value. For discrete variable X, use P(X = xi ), where {x i } are the value set of X. Called probability mass function(pmf). For continuous variable X, use the probability density function(pdf) f (x), where P(X (x ɛ, x + ɛ) f (x)(2ɛ).

15 Distribution Random Variable: Discrete, such as sex, patient/control, age group. Continuous, such as weight, blood pressure. Distribution: used to describe the relative chance of taking some value. For discrete variable X, use P(X = xi ), where {x i } are the value set of X. Called probability mass function(pmf). For continuous variable X, use the probability density function(pdf) f (x), where P(X (x ɛ, x + ɛ) f (x)(2ɛ).

16 Distribution Random Variable: Discrete, such as sex, patient/control, age group. Continuous, such as weight, blood pressure. Distribution: used to describe the relative chance of taking some value. For discrete variable X, use P(X = xi ), where {x i } are the value set of X. Called probability mass function(pmf). For continuous variable X, use the probability density function(pdf) f (x), where P(X (x ɛ, x + ɛ) f (x)(2ɛ).

17 Distribution Random Variable: Discrete, such as sex, patient/control, age group. Continuous, such as weight, blood pressure. Distribution: used to describe the relative chance of taking some value. For discrete variable X, use P(X = xi ), where {x i } are the value set of X. Called probability mass function(pmf). For continuous variable X, use the probability density function(pdf) f (x), where P(X (x ɛ, x + ɛ) f (x)(2ɛ).

18 Distribution Random Variable: Discrete, such as sex, patient/control, age group. Continuous, such as weight, blood pressure. Distribution: used to describe the relative chance of taking some value. For discrete variable X, use P(X = xi ), where {x i } are the value set of X. Called probability mass function(pmf). For continuous variable X, use the probability density function(pdf) f (x), where P(X (x ɛ, x + ɛ) f (x)(2ɛ).

19 CDF Cumulative distribution function(cdf) F(x): F(x) = P(X x) P(x (a, b]) = F(b) F(a) For discrete distribution, F(x) = P(X = x i ) x i x For continuous distribution, F(x) = x f (t)dt

20 CDF Cumulative distribution function(cdf) F(x): F(x) = P(X x) P(x (a, b]) = F(b) F(a) For discrete distribution, F(x) = P(X = x i ) x i x For continuous distribution, F(x) = x f (t)dt

21 CDF Cumulative distribution function(cdf) F(x): F(x) = P(X x) P(x (a, b]) = F(b) F(a) For discrete distribution, F(x) = P(X = x i ) x i x For continuous distribution, F(x) = x f (t)dt

22 Quantile function Quantile function: the inverse of CDF q(p) = F 1 (p), p (0, 1) if F (x) is 1-1 mapping. Generally, q(p) = x p where P(X x p ) p, P(X x p ) 1 p (x p can be non-unique.)

23 Quantile function Quantile function: the inverse of CDF q(p) = F 1 (p), p (0, 1) if F (x) is 1-1 mapping. Generally, q(p) = x p where P(X x p ) p, P(X x p ) 1 p (x p can be non-unique.)

24 The normal distribution Standard normal distribution(n(0,1)), PDF ϕ(x) = 1 2π e x2 2 Normal distribution N(µ, σ 2 ), PDF f (x) = ϕ( x µ σ ) = 1 2πσ exp{ (x µ)2 2σ 2 }

25 The normal distribution Standard normal distribution(n(0,1)), PDF ϕ(x) = 1 2π e x2 2 Normal distribution N(µ, σ 2 ), PDF f (x) = ϕ( x µ σ ) = 1 2πσ exp{ (x µ)2 2σ 2 }

26 Numerical charasteristics PDF is a curve with infinite number of points. We can use some numbers to describe the key part of a distribution. Firstly, the location measurement. The mean EX. The meadian, x0.5 = q(0.5) where (can be non-unique). Variability measurement. P(x x 0.5 ) 0.5, P(x x 0.5 ) 0.5 The standard deviation σ X = E(X EX) 2. Interquantile range q(0.75) q(0.25).

27 Numerical charasteristics PDF is a curve with infinite number of points. We can use some numbers to describe the key part of a distribution. Firstly, the location measurement. The mean EX. The meadian, x0.5 = q(0.5) where (can be non-unique). Variability measurement. P(x x 0.5 ) 0.5, P(x x 0.5 ) 0.5 The standard deviation σ X = E(X EX) 2. Interquantile range q(0.75) q(0.25).

28 Numerical charasteristics PDF is a curve with infinite number of points. We can use some numbers to describe the key part of a distribution. Firstly, the location measurement. The mean EX. The meadian, x0.5 = q(0.5) where (can be non-unique). Variability measurement. P(x x 0.5 ) 0.5, P(x x 0.5 ) 0.5 The standard deviation σ X = E(X EX) 2. Interquantile range q(0.75) q(0.25).

29 Numerical charasteristics PDF is a curve with infinite number of points. We can use some numbers to describe the key part of a distribution. Firstly, the location measurement. The mean EX. The meadian, x0.5 = q(0.5) where (can be non-unique). Variability measurement. P(x x 0.5 ) 0.5, P(x x 0.5 ) 0.5 The standard deviation σ X = E(X EX) 2. Interquantile range q(0.75) q(0.25).

30 Numerical charasteristics PDF is a curve with infinite number of points. We can use some numbers to describe the key part of a distribution. Firstly, the location measurement. The mean EX. The meadian, x0.5 = q(0.5) where (can be non-unique). Variability measurement. P(x x 0.5 ) 0.5, P(x x 0.5 ) 0.5 The standard deviation σ X = E(X EX) 2. Interquantile range q(0.75) q(0.25).

31 Numerical charasteristics PDF is a curve with infinite number of points. We can use some numbers to describe the key part of a distribution. Firstly, the location measurement. The mean EX. The meadian, x0.5 = q(0.5) where (can be non-unique). Variability measurement. P(x x 0.5 ) 0.5, P(x x 0.5 ) 0.5 The standard deviation σ X = E(X EX) 2. Interquantile range q(0.75) q(0.25).

32 Numerical charasteristics PDF is a curve with infinite number of points. We can use some numbers to describe the key part of a distribution. Firstly, the location measurement. The mean EX. The meadian, x0.5 = q(0.5) where (can be non-unique). Variability measurement. P(x x 0.5 ) 0.5, P(x x 0.5 ) 0.5 The standard deviation σ X = E(X EX) 2. Interquantile range q(0.75) q(0.25).

33 Numerical charasteristics PDF is a curve with infinite number of points. We can use some numbers to describe the key part of a distribution. Firstly, the location measurement. The mean EX. The meadian, x0.5 = q(0.5) where (can be non-unique). Variability measurement. P(x x 0.5 ) 0.5, P(x x 0.5 ) 0.5 The standard deviation σ X = E(X EX) 2. Interquantile range q(0.75) q(0.25).

34 Shape characteristics. Skewness ( X EX E σ X ) 3 Symmetric, left-skewed, right-skewed density. Shape characteristics. Kurtosis ( ) X EX 4 E 3 σ X Heavy tail problem. Multimodel distribution. E.g., the weight of 10 year old and 20 year old mixed together.

35 Shape characteristics. Skewness ( X EX E σ X ) 3 Symmetric, left-skewed, right-skewed density. Shape characteristics. Kurtosis ( ) X EX 4 E 3 σ X Heavy tail problem. Multimodel distribution. E.g., the weight of 10 year old and 20 year old mixed together.

36 Shape characteristics. Skewness ( X EX E σ X ) 3 Symmetric, left-skewed, right-skewed density. Shape characteristics. Kurtosis ( ) X EX 4 E 3 σ X Heavy tail problem. Multimodel distribution. E.g., the weight of 10 year old and 20 year old mixed together.

37 Shape characteristics. Skewness ( X EX E σ X ) 3 Symmetric, left-skewed, right-skewed density. Shape characteristics. Kurtosis ( ) X EX 4 E 3 σ X Heavy tail problem. Multimodel distribution. E.g., the weight of 10 year old and 20 year old mixed together.

38 Shape characteristics. Skewness ( X EX E σ X ) 3 Symmetric, left-skewed, right-skewed density. Shape characteristics. Kurtosis ( ) X EX 4 E 3 σ X Heavy tail problem. Multimodel distribution. E.g., the weight of 10 year old and 20 year old mixed together.

39 Population A population is modeled by a random variable or a distribution. Population parameters: unknown numbers which could decide the distribution. E.g., for N(µ, σ 2 ) population, (µ, σ 2 ) are the two unknown population parameters.

40 Population A population is modeled by a random variable or a distribution. Population parameters: unknown numbers which could decide the distribution. E.g., for N(µ, σ 2 ) population, (µ, σ 2 ) are the two unknown population parameters.

41 Sample A sample(typically) is n observations draw independently from the population, X 1, X 2,..., X n. A sample can be regarded as n numbers, or n iid random variables. Statistics: calculate some values to estimate the distribution of the population. Can be regarded as a random variable. Such as sample mean, sample standard deviation. Can be used to estimate unknown population parameters, called estimates. Sampling distribution: The distribution of an estimator or other statistic. E.g., if X 1,..., X n is a sample from N(µ, σ 2 ), then X N(µ, σ 2 /n).

42 Sample A sample(typically) is n observations draw independently from the population, X 1, X 2,..., X n. A sample can be regarded as n numbers, or n iid random variables. Statistics: calculate some values to estimate the distribution of the population. Can be regarded as a random variable. Such as sample mean, sample standard deviation. Can be used to estimate unknown population parameters, called estimates. Sampling distribution: The distribution of an estimator or other statistic. E.g., if X 1,..., X n is a sample from N(µ, σ 2 ), then X N(µ, σ 2 /n).

43 Sample A sample(typically) is n observations draw independently from the population, X 1, X 2,..., X n. A sample can be regarded as n numbers, or n iid random variables. Statistics: calculate some values to estimate the distribution of the population. Can be regarded as a random variable. Such as sample mean, sample standard deviation. Can be used to estimate unknown population parameters, called estimates. Sampling distribution: The distribution of an estimator or other statistic. E.g., if X 1,..., X n is a sample from N(µ, σ 2 ), then X N(µ, σ 2 /n).

44 Sample A sample(typically) is n observations draw independently from the population, X 1, X 2,..., X n. A sample can be regarded as n numbers, or n iid random variables. Statistics: calculate some values to estimate the distribution of the population. Can be regarded as a random variable. Such as sample mean, sample standard deviation. Can be used to estimate unknown population parameters, called estimates. Sampling distribution: The distribution of an estimator or other statistic. E.g., if X 1,..., X n is a sample from N(µ, σ 2 ), then X N(µ, σ 2 /n).

45 MLE(Maximimum Likelihood Estimate) MLE is a commonly used parameter estimation method. For random sample X 1, X 2,..., X n from population X, if PDF of PMF of X is f (x; β), then ˆβ = arg max β L(β) = arg max β n f (X i ; β) i=1 is called the MLE of the unknow parameter β. Under some conditions, when the sample size n, ˆβ has a limiting(approximate) distribution N(β, σ 2 β ).

46 MLE(Maximimum Likelihood Estimate) MLE is a commonly used parameter estimation method. For random sample X 1, X 2,..., X n from population X, if PDF of PMF of X is f (x; β), then ˆβ = arg max β L(β) = arg max β n f (X i ; β) i=1 is called the MLE of the unknow parameter β. Under some conditions, when the sample size n, ˆβ has a limiting(approximate) distribution N(β, σ 2 β ).

47 Standard Error If ˆβ = ψ(x 1,..., X n ) is a estimate of an population parameter β, it has a sampling distribution F ˆβ(x). The standard deviation of the sampling distribution is σ ˆβ. An estimate of σ ˆβ is called the standard error(se) of ˆβ. If ˆβ has a limiting normal distribution, then ˆβ is approximately distributed N(β, SE( ˆβ)). SE can measure the precision of estimation. SE can be used to construct (approximate) confidence intervals.

48 Standard Error If ˆβ = ψ(x 1,..., X n ) is a estimate of an population parameter β, it has a sampling distribution F ˆβ(x). The standard deviation of the sampling distribution is σ ˆβ. An estimate of σ ˆβ is called the standard error(se) of ˆβ. If ˆβ has a limiting normal distribution, then ˆβ is approximately distributed N(β, SE( ˆβ)). SE can measure the precision of estimation. SE can be used to construct (approximate) confidence intervals.

49 Standard Error If ˆβ = ψ(x 1,..., X n ) is a estimate of an population parameter β, it has a sampling distribution F ˆβ(x). The standard deviation of the sampling distribution is σ ˆβ. An estimate of σ ˆβ is called the standard error(se) of ˆβ. If ˆβ has a limiting normal distribution, then ˆβ is approximately distributed N(β, SE( ˆβ)). SE can measure the precision of estimation. SE can be used to construct (approximate) confidence intervals.

50 Standard Error If ˆβ = ψ(x 1,..., X n ) is a estimate of an population parameter β, it has a sampling distribution F ˆβ(x). The standard deviation of the sampling distribution is σ ˆβ. An estimate of σ ˆβ is called the standard error(se) of ˆβ. If ˆβ has a limiting normal distribution, then ˆβ is approximately distributed N(β, SE( ˆβ)). SE can measure the precision of estimation. SE can be used to construct (approximate) confidence intervals.

51 Standard Error If ˆβ = ψ(x 1,..., X n ) is a estimate of an population parameter β, it has a sampling distribution F ˆβ(x). The standard deviation of the sampling distribution is σ ˆβ. An estimate of σ ˆβ is called the standard error(se) of ˆβ. If ˆβ has a limiting normal distribution, then ˆβ is approximately distributed N(β, SE( ˆβ)). SE can measure the precision of estimation. SE can be used to construct (approximate) confidence intervals.

52 Standard Error If ˆβ = ψ(x 1,..., X n ) is a estimate of an population parameter β, it has a sampling distribution F ˆβ(x). The standard deviation of the sampling distribution is σ ˆβ. An estimate of σ ˆβ is called the standard error(se) of ˆβ. If ˆβ has a limiting normal distribution, then ˆβ is approximately distributed N(β, SE( ˆβ)). SE can measure the precision of estimation. SE can be used to construct (approximate) confidence intervals.

53 Hypothesis Tests To test the null hypothesis H 0 agaist the alternative hypothesis H a on the population; Given sample X 1, X 2,..., X n from population F(x; θ), construct some statistic ξ, whose distribution does not depend on θ, but its value can indicate the possible choice of H 0 or H a. Traditionally, we first choose an significance level α, then we find a rejection area W, where sup H0 P(ξ W ) α. We reject H 0 when ξ W.

54 Hypothesis Tests To test the null hypothesis H 0 agaist the alternative hypothesis H a on the population; Given sample X 1, X 2,..., X n from population F(x; θ), construct some statistic ξ, whose distribution does not depend on θ, but its value can indicate the possible choice of H 0 or H a. Traditionally, we first choose an significance level α, then we find a rejection area W, where sup H0 P(ξ W ) α. We reject H 0 when ξ W.

55 Hypothesis Tests To test the null hypothesis H 0 agaist the alternative hypothesis H a on the population; Given sample X 1, X 2,..., X n from population F(x; θ), construct some statistic ξ, whose distribution does not depend on θ, but its value can indicate the possible choice of H 0 or H a. Traditionally, we first choose an significance level α, then we find a rejection area W, where sup H0 P(ξ W ) α. We reject H 0 when ξ W.

56 Errors and P-Values Two types of possible errors in a hypothesis test: Type I error, H 0 true but rejected. Error rate is at most the significance level α. Type II error, H0 false but accepted. Error rate can be as large as 1 α. To reduce type II error: Construct theoretically good tests. Don t choose α too small. Choose a big enough sample size n. p-value: the minimum α we can use if we want to reject H 0, after the test statistic value is known. The smaller the p-value is, the more confidence we have when we reject H 0. Reject H 0 if and only if the p-value is less than α.

57 Errors and P-Values Two types of possible errors in a hypothesis test: Type I error, H 0 true but rejected. Error rate is at most the significance level α. Type II error, H0 false but accepted. Error rate can be as large as 1 α. To reduce type II error: Construct theoretically good tests. Don t choose α too small. Choose a big enough sample size n. p-value: the minimum α we can use if we want to reject H 0, after the test statistic value is known. The smaller the p-value is, the more confidence we have when we reject H 0. Reject H 0 if and only if the p-value is less than α.

58 Errors and P-Values Two types of possible errors in a hypothesis test: Type I error, H 0 true but rejected. Error rate is at most the significance level α. Type II error, H0 false but accepted. Error rate can be as large as 1 α. To reduce type II error: Construct theoretically good tests. Don t choose α too small. Choose a big enough sample size n. p-value: the minimum α we can use if we want to reject H 0, after the test statistic value is known. The smaller the p-value is, the more confidence we have when we reject H 0. Reject H 0 if and only if the p-value is less than α.

59 Errors and P-Values Two types of possible errors in a hypothesis test: Type I error, H 0 true but rejected. Error rate is at most the significance level α. Type II error, H0 false but accepted. Error rate can be as large as 1 α. To reduce type II error: Construct theoretically good tests. Don t choose α too small. Choose a big enough sample size n. p-value: the minimum α we can use if we want to reject H 0, after the test statistic value is known. The smaller the p-value is, the more confidence we have when we reject H 0. Reject H 0 if and only if the p-value is less than α.

60 Errors and P-Values Two types of possible errors in a hypothesis test: Type I error, H 0 true but rejected. Error rate is at most the significance level α. Type II error, H0 false but accepted. Error rate can be as large as 1 α. To reduce type II error: Construct theoretically good tests. Don t choose α too small. Choose a big enough sample size n. p-value: the minimum α we can use if we want to reject H 0, after the test statistic value is known. The smaller the p-value is, the more confidence we have when we reject H 0. Reject H 0 if and only if the p-value is less than α.

61 Errors and P-Values Two types of possible errors in a hypothesis test: Type I error, H 0 true but rejected. Error rate is at most the significance level α. Type II error, H0 false but accepted. Error rate can be as large as 1 α. To reduce type II error: Construct theoretically good tests. Don t choose α too small. Choose a big enough sample size n. p-value: the minimum α we can use if we want to reject H 0, after the test statistic value is known. The smaller the p-value is, the more confidence we have when we reject H 0. Reject H 0 if and only if the p-value is less than α.

62 Errors and P-Values Two types of possible errors in a hypothesis test: Type I error, H 0 true but rejected. Error rate is at most the significance level α. Type II error, H0 false but accepted. Error rate can be as large as 1 α. To reduce type II error: Construct theoretically good tests. Don t choose α too small. Choose a big enough sample size n. p-value: the minimum α we can use if we want to reject H 0, after the test statistic value is known. The smaller the p-value is, the more confidence we have when we reject H 0. Reject H 0 if and only if the p-value is less than α.

63 Errors and P-Values Two types of possible errors in a hypothesis test: Type I error, H 0 true but rejected. Error rate is at most the significance level α. Type II error, H0 false but accepted. Error rate can be as large as 1 α. To reduce type II error: Construct theoretically good tests. Don t choose α too small. Choose a big enough sample size n. p-value: the minimum α we can use if we want to reject H 0, after the test statistic value is known. The smaller the p-value is, the more confidence we have when we reject H 0. Reject H 0 if and only if the p-value is less than α.

64 Errors and P-Values Two types of possible errors in a hypothesis test: Type I error, H 0 true but rejected. Error rate is at most the significance level α. Type II error, H0 false but accepted. Error rate can be as large as 1 α. To reduce type II error: Construct theoretically good tests. Don t choose α too small. Choose a big enough sample size n. p-value: the minimum α we can use if we want to reject H 0, after the test statistic value is known. The smaller the p-value is, the more confidence we have when we reject H 0. Reject H 0 if and only if the p-value is less than α.

65 Example hypothesis test X N(µ, σ 2 ). Sample is X 1,..., X n. H 0 : µ µ 0 H a : µ > µ 0. Test statistic T = X µ 0 SE( X) where SE( X) = S/ n, S is the sample standard deviation. Rejection area: W = {ξ > λ}, λ = F 1 (1 α, n 1), F 1 (p, n) is the quantile function of the t(n 1) distribution. P-value is 1 F(T ; n 1), where F(x; n) is the CDF of the t(n) distribution. The larger T is, the smaller the p-value is, the more confidence we have when we reject H 0.

66 Example hypothesis test X N(µ, σ 2 ). Sample is X 1,..., X n. H 0 : µ µ 0 H a : µ > µ 0. Test statistic T = X µ 0 SE( X) where SE( X) = S/ n, S is the sample standard deviation. Rejection area: W = {ξ > λ}, λ = F 1 (1 α, n 1), F 1 (p, n) is the quantile function of the t(n 1) distribution. P-value is 1 F(T ; n 1), where F(x; n) is the CDF of the t(n) distribution. The larger T is, the smaller the p-value is, the more confidence we have when we reject H 0.

67 Example hypothesis test X N(µ, σ 2 ). Sample is X 1,..., X n. H 0 : µ µ 0 H a : µ > µ 0. Test statistic T = X µ 0 SE( X) where SE( X) = S/ n, S is the sample standard deviation. Rejection area: W = {ξ > λ}, λ = F 1 (1 α, n 1), F 1 (p, n) is the quantile function of the t(n 1) distribution. P-value is 1 F(T ; n 1), where F(x; n) is the CDF of the t(n) distribution. The larger T is, the smaller the p-value is, the more confidence we have when we reject H 0.

68 Example hypothesis test X N(µ, σ 2 ). Sample is X 1,..., X n. H 0 : µ µ 0 H a : µ > µ 0. Test statistic T = X µ 0 SE( X) where SE( X) = S/ n, S is the sample standard deviation. Rejection area: W = {ξ > λ}, λ = F 1 (1 α, n 1), F 1 (p, n) is the quantile function of the t(n 1) distribution. P-value is 1 F(T ; n 1), where F(x; n) is the CDF of the t(n) distribution. The larger T is, the smaller the p-value is, the more confidence we have when we reject H 0.

69 Example hypothesis test X N(µ, σ 2 ). Sample is X 1,..., X n. H 0 : µ µ 0 H a : µ > µ 0. Test statistic T = X µ 0 SE( X) where SE( X) = S/ n, S is the sample standard deviation. Rejection area: W = {ξ > λ}, λ = F 1 (1 α, n 1), F 1 (p, n) is the quantile function of the t(n 1) distribution. P-value is 1 F(T ; n 1), where F(x; n) is the CDF of the t(n) distribution. The larger T is, the smaller the p-value is, the more confidence we have when we reject H 0.

70 Example hypothesis test X N(µ, σ 2 ). Sample is X 1,..., X n. H 0 : µ µ 0 H a : µ > µ 0. Test statistic T = X µ 0 SE( X) where SE( X) = S/ n, S is the sample standard deviation. Rejection area: W = {ξ > λ}, λ = F 1 (1 α, n 1), F 1 (p, n) is the quantile function of the t(n 1) distribution. P-value is 1 F(T ; n 1), where F(x; n) is the CDF of the t(n) distribution. The larger T is, the smaller the p-value is, the more confidence we have when we reject H 0.

71 Section Contents Measurment level of variables. statistics for nominal variables. statistics for interval variables. Histograms, boxplots, QQ plots, probability plots, stem-leaf plots. Using PROC FREQ, PROC MEANS, PROC UNIVARIATE. statistics Summary statistics

72 Section Contents Measurment level of variables. statistics for nominal variables. statistics for interval variables. Histograms, boxplots, QQ plots, probability plots, stem-leaf plots. Using PROC FREQ, PROC MEANS, PROC UNIVARIATE. statistics Summary statistics

73 Section Contents Measurment level of variables. statistics for nominal variables. statistics for interval variables. Histograms, boxplots, QQ plots, probability plots, stem-leaf plots. Using PROC FREQ, PROC MEANS, PROC UNIVARIATE. statistics Summary statistics

74 Section Contents Measurment level of variables. statistics for nominal variables. statistics for interval variables. Histograms, boxplots, QQ plots, probability plots, stem-leaf plots. Using PROC FREQ, PROC MEANS, PROC UNIVARIATE. statistics Summary statistics

75 Section Contents Measurment level of variables. statistics for nominal variables. statistics for interval variables. Histograms, boxplots, QQ plots, probability plots, stem-leaf plots. Using PROC FREQ, PROC MEANS, PROC UNIVARIATE. statistics Summary statistics

76 Measurement levels Nominal, such as sex, job type, race. Ordinal, such as age group(child, youth, medium, old), dose level(none, low, high). Interval, such as weight, blood pressure, heart rate. statistics Summary statistics

77 Measurement levels Nominal, such as sex, job type, race. Ordinal, such as age group(child, youth, medium, old), dose level(none, low, high). Interval, such as weight, blood pressure, heart rate. statistics Summary statistics

78 Measurement levels Nominal, such as sex, job type, race. Ordinal, such as age group(child, youth, medium, old), dose level(none, low, high). Interval, such as weight, blood pressure, heart rate. statistics Summary statistics

79 Statistics for nominal variables The value set. The count of each value(called frequency), and the percentage. Use a bar chart to show the distribution. statistics Summary statistics

80 Statistics for nominal variables The value set. The count of each value(called frequency), and the percentage. Use a bar chart to show the distribution. statistics Summary statistics

81 Statistics for nominal variables The value set. The count of each value(called frequency), and the percentage. Use a bar chart to show the distribution. statistics Summary statistics

82 Example: frequency table Use PROC FREQ to list the value set and the frequency counts, percents. proc freq data=sasuser.class; tables sex; run; Gender sex Frequency Percent Cumulative Cumulative Frequency Percent F M statistics Summary statistics

83 Example: bar chart Use PROC GCHART to make a bar chart. hbar could be replaced with vbar, hbar3d, vbar3d. proc gchart data=sasuser.class; hbar sex; run; statistics Summary statistics

84 statistics for interval variables Location statistics: sample mean, sample median. Variability statistics: sample standard deviation, sample interquantile range, range, coefficient of variation. Sample skewness Sample kurtosis Moments, quantiles. n (n 1)(n 2) n(n+1) (n 1)(n 2)(n 3) ( ) y i y 3. s (yi y) 4 3(n 1)2 s 4 (n 2)(n 3) statistics Summary statistics

85 statistics for interval variables Location statistics: sample mean, sample median. Variability statistics: sample standard deviation, sample interquantile range, range, coefficient of variation. Sample skewness Sample kurtosis Moments, quantiles. n (n 1)(n 2) n(n+1) (n 1)(n 2)(n 3) ( ) y i y 3. s (yi y) 4 3(n 1)2 s 4 (n 2)(n 3) statistics Summary statistics

86 statistics for interval variables Location statistics: sample mean, sample median. Variability statistics: sample standard deviation, sample interquantile range, range, coefficient of variation. Sample skewness Sample kurtosis Moments, quantiles. n (n 1)(n 2) n(n+1) (n 1)(n 2)(n 3) ( ) y i y 3. s (yi y) 4 3(n 1)2 s 4 (n 2)(n 3) statistics Summary statistics

87 statistics for interval variables Location statistics: sample mean, sample median. Variability statistics: sample standard deviation, sample interquantile range, range, coefficient of variation. Sample skewness Sample kurtosis Moments, quantiles. n (n 1)(n 2) n(n+1) (n 1)(n 2)(n 3) ( ) y i y 3. s (yi y) 4 3(n 1)2 s 4 (n 2)(n 3) statistics Summary statistics

88 statistics for interval variables Location statistics: sample mean, sample median. Variability statistics: sample standard deviation, sample interquantile range, range, coefficient of variation. Sample skewness Sample kurtosis Moments, quantiles. n (n 1)(n 2) n(n+1) (n 1)(n 2)(n 3) ( ) y i y 3. s (yi y) 4 3(n 1)2 s 4 (n 2)(n 3) statistics Summary statistics

89 PROC UNIVARIATE Use PROC UNIVARIATE to compute sample statistics for an interval variable. Example: proc univariate data=sasuser.class; var height; run; statistics Summary statistics

90 PROC UNIVARIATE Use PROC UNIVARIATE to compute sample statistics for an interval variable. Example: proc univariate data=sasuser.class; var height; run; statistics Summary statistics

91 Output Moments N 19 Sum Weights Sum Observations Std Deviation Skewness Kurtosis Uncorrected SS Corrected SS Coeff Variation Std Error Basic Statistical Measures Location Variability Std Deviation Median Mode Range Interquartile Range statistics Summary statistics Note The mode displayed is the smallest of 2 modes with a count of 2.

92 Tests for Location: Mu0=0 Test Statistic p Value Student s t t Pr > t <.0001 Sign M 9.5 Pr >= M <.0001 Signed Rank S 95 Pr >= S <.0001 statistics Summary statistics

93 Quantiles (Definition 5) Quantile Estimate 100% Max % % % % Q % Median % Q % % % % Min 51.3 statistics Summary statistics

94 Extreme Observations Lowest Highest Value Obs Value Obs statistics Summary statistics

95 Histograms The histogram is a nonparametric estimate of the population density function. It divide the value set into intervals, and count the percent of observations in each interval. Example: proc univariate data=sasuser.class noprint; var height; histogram; run; statistics Summary statistics

96 Histograms The histogram is a nonparametric estimate of the population density function. It divide the value set into intervals, and count the percent of observations in each interval. Example: proc univariate data=sasuser.class noprint; var height; histogram; run; statistics Summary statistics

97 Histograms The histogram is a nonparametric estimate of the population density function. It divide the value set into intervals, and count the percent of observations in each interval. Example: proc univariate data=sasuser.class noprint; var height; histogram; run; statistics Summary statistics

98 Examples of histograms Skewness shown in histograms. statistics Summary statistics

99 Box plot Quantiles could be used to describe key distribution charasteristics. The box plot shows the median, 1/4 and 3/4 quantile, and the minimum and maximum of a variable. The box holds the middle 50% range. Length is the IQR(inter quantile range). The wiskers extend to the minimum and the maximum; but the length of the wisker is not allowed to exceed 1.5 times the IQR. Points beyond wiskers are plotted as individual points. They are candidate for outliers. statistics Summary statistics

100 Box plot Quantiles could be used to describe key distribution charasteristics. The box plot shows the median, 1/4 and 3/4 quantile, and the minimum and maximum of a variable. The box holds the middle 50% range. Length is the IQR(inter quantile range). The wiskers extend to the minimum and the maximum; but the length of the wisker is not allowed to exceed 1.5 times the IQR. Points beyond wiskers are plotted as individual points. They are candidate for outliers. statistics Summary statistics

101 Box plot Quantiles could be used to describe key distribution charasteristics. The box plot shows the median, 1/4 and 3/4 quantile, and the minimum and maximum of a variable. The box holds the middle 50% range. Length is the IQR(inter quantile range). The wiskers extend to the minimum and the maximum; but the length of the wisker is not allowed to exceed 1.5 times the IQR. Points beyond wiskers are plotted as individual points. They are candidate for outliers. statistics Summary statistics

102 Box plot Quantiles could be used to describe key distribution charasteristics. The box plot shows the median, 1/4 and 3/4 quantile, and the minimum and maximum of a variable. The box holds the middle 50% range. Length is the IQR(inter quantile range). The wiskers extend to the minimum and the maximum; but the length of the wisker is not allowed to exceed 1.5 times the IQR. Points beyond wiskers are plotted as individual points. They are candidate for outliers. statistics Summary statistics

103 Box plot Quantiles could be used to describe key distribution charasteristics. The box plot shows the median, 1/4 and 3/4 quantile, and the minimum and maximum of a variable. The box holds the middle 50% range. Length is the IQR(inter quantile range). The wiskers extend to the minimum and the maximum; but the length of the wisker is not allowed to exceed 1.5 times the IQR. Points beyond wiskers are plotted as individual points. They are candidate for outliers. statistics Summary statistics

104 Example: boxplot of height statistics Summary statistics

105 proc sql; create view work._tmp_0 as select height, 1 as _dummy_ from sasuser.class; title ; axis1 major=none value=none label=none; proc boxplot data=_tmp_0; plot height*_dummy_ / boxstyle=skematic cboxfill=blue haxis=axis1; run; statistics Summary statistics

106 Example: boxplot of GPA and CO Skewness and possible outliers shown in boxplot. statistics Summary statistics

107 Grouped boxplot statistics Summary statistics

108 proc sort data=sasuser.gpa out=_tmp_1; by sex; proc boxplot data=_tmp_1; plot gpa*sex / boxstyle=skematic cboxfill=blue; run; statistics Summary statistics

109 QQ Plot The normal distribution is the most commonly used distribution. Graphs are designed to show discrepency from the normal distribution. QQ plot is one of them. Suppose Y 1, Y 2,..., Y n is from N(µ,σ 2 ) and sorted ascendingly. CDF F(x) = Φ( x µ σ ), F(Y i) i n, Φ( Y i µ σ ) i n, Y i µ + σφ 1 ( i n ). Let x i = Φ 1 ( i n ), plot (x i, Y i ),i = 1,..., n. The points should lie around the line y = µ + σx. Continuity adjustment: x i = Φ 1 ( i n+0.25 ). statistics Summary statistics

110 QQ Plot The normal distribution is the most commonly used distribution. Graphs are designed to show discrepency from the normal distribution. QQ plot is one of them. Suppose Y 1, Y 2,..., Y n is from N(µ,σ 2 ) and sorted ascendingly. CDF F(x) = Φ( x µ σ ), F(Y i) i n, Φ( Y i µ σ ) i n, Y i µ + σφ 1 ( i n ). Let x i = Φ 1 ( i n ), plot (x i, Y i ),i = 1,..., n. The points should lie around the line y = µ + σx. Continuity adjustment: x i = Φ 1 ( i n+0.25 ). statistics Summary statistics

111 QQ Plot The normal distribution is the most commonly used distribution. Graphs are designed to show discrepency from the normal distribution. QQ plot is one of them. Suppose Y 1, Y 2,..., Y n is from N(µ,σ 2 ) and sorted ascendingly. CDF F(x) = Φ( x µ σ ), F(Y i) i n, Φ( Y i µ σ ) i n, Y i µ + σφ 1 ( i n ). Let x i = Φ 1 ( i n ), plot (x i, Y i ),i = 1,..., n. The points should lie around the line y = µ + σx. Continuity adjustment: x i = Φ 1 ( i n+0.25 ). statistics Summary statistics

112 QQ Plot The normal distribution is the most commonly used distribution. Graphs are designed to show discrepency from the normal distribution. QQ plot is one of them. Suppose Y 1, Y 2,..., Y n is from N(µ,σ 2 ) and sorted ascendingly. CDF F(x) = Φ( x µ σ ), F(Y i) i n, Φ( Y i µ σ ) i n, Y i µ + σφ 1 ( i n ). Let x i = Φ 1 ( i n ), plot (x i, Y i ),i = 1,..., n. The points should lie around the line y = µ + σx. Continuity adjustment: x i = Φ 1 ( i n+0.25 ). statistics Summary statistics

113 QQ Plot The normal distribution is the most commonly used distribution. Graphs are designed to show discrepency from the normal distribution. QQ plot is one of them. Suppose Y 1, Y 2,..., Y n is from N(µ,σ 2 ) and sorted ascendingly. CDF F(x) = Φ( x µ σ ), F(Y i) i n, Φ( Y i µ σ ) i n, Y i µ + σφ 1 ( i n ). Let x i = Φ 1 ( i n ), plot (x i, Y i ),i = 1,..., n. The points should lie around the line y = µ + σx. Continuity adjustment: x i = Φ 1 ( i n+0.25 ). statistics Summary statistics

114 QQ Plot The normal distribution is the most commonly used distribution. Graphs are designed to show discrepency from the normal distribution. QQ plot is one of them. Suppose Y 1, Y 2,..., Y n is from N(µ,σ 2 ) and sorted ascendingly. CDF F(x) = Φ( x µ σ ), F(Y i) i n, Φ( Y i µ σ ) i n, Y i µ + σφ 1 ( i n ). Let x i = Φ 1 ( i n ), plot (x i, Y i ),i = 1,..., n. The points should lie around the line y = µ + σx. Continuity adjustment: x i = Φ 1 ( i n+0.25 ). statistics Summary statistics

115 Example: QQ Plot of HEIGHT proc univariate data=sasuser.class noprint; qqplot height / normal(mu=est sigma=est); run; statistics Summary statistics

116 Example: Skewness in QQ Plot statistics Summary statistics

117 Probability Plot Probability plot is the same plot as a QQ plot, exept that the label of the x axis Φ(x i ) instead of x i. Probability plots are preferable for graphical estimation of percentiles, whereas Q-Q plots are preferable for graphical estimation of distribution parameters. statistics Summary statistics

118 Probability Plot Probability plot is the same plot as a QQ plot, exept that the label of the x axis Φ(x i ) instead of x i. Probability plots are preferable for graphical estimation of percentiles, whereas Q-Q plots are preferable for graphical estimation of distribution parameters. statistics Summary statistics

119 Example: Probability Plot of HEIGHT proc univariate data=sasuser.class noprint; probplot height / normal(mu=est sigma=est); run; statistics Summary statistics

120 The Stem-leaf Plot The stem-leaf plot is a text-based plot, which display information like the histogram, but with detail on each data value. Each leaf corresponds to one data value. PROC UNIVARIATE has an option PLOT, which generates text-based stem-leaf plot, boxplot and QQ plot. statistics Summary statistics

121 The Stem-leaf Plot The stem-leaf plot is a text-based plot, which display information like the histogram, but with detail on each data value. Each leaf corresponds to one data value. PROC UNIVARIATE has an option PLOT, which generates text-based stem-leaf plot, boxplot and QQ plot. statistics Summary statistics

122 proc univariate data=sasuser.class plot; var height; run; Stem Leaf # Multiply Stem.Leaf by 10**+1 statistics Summary statistics

123 PROC MEANS and PROC SUMMARY PROC MEANS and PROC SUMMARY are used to produce summary statistics. They can display overall summary statistics and classified summary statistics. PROC MEANS displays result by default; PROC SUMMARY need the PRINT option to display result. They can output data sets with the summary statistics. PROC SUMMARY is designed to do this, although PROC MEANS can do the same. statistics Summary statistics

124 PROC MEANS and PROC SUMMARY PROC MEANS and PROC SUMMARY are used to produce summary statistics. They can display overall summary statistics and classified summary statistics. PROC MEANS displays result by default; PROC SUMMARY need the PRINT option to display result. They can output data sets with the summary statistics. PROC SUMMARY is designed to do this, although PROC MEANS can do the same. statistics Summary statistics

125 PROC MEANS and PROC SUMMARY PROC MEANS and PROC SUMMARY are used to produce summary statistics. They can display overall summary statistics and classified summary statistics. PROC MEANS displays result by default; PROC SUMMARY need the PRINT option to display result. They can output data sets with the summary statistics. PROC SUMMARY is designed to do this, although PROC MEANS can do the same. statistics Summary statistics

126 Example of overall statistics Use the VAR statement to specify which varibles to summarize. Example of overall statistics: proc means data=sasuser.class; var height weight; run; proc summary data=sasuser.class print; var height weight; run; statistics Summary statistics

127 Example of overall statistics Use the VAR statement to specify which varibles to summarize. Example of overall statistics: proc means data=sasuser.class; var height weight; run; proc summary data=sasuser.class print; var height weight; run; statistics Summary statistics

128 Classified summary Use the CLASS statement to specify one or more class variables. Example: proc means data=sasuser.class ; var height weight; class sex; run; statistics Summary statistics

129 Classified summary Use the CLASS statement to specify one or more class variables. Example: proc means data=sasuser.class ; var height weight; class sex; run; statistics Summary statistics

130 Other statistical measures PROC MEANS calculate summaries N,, Standard Deviation, Minimum, Maximum by default. Use PROC MEANS options to specify other statistical measures. Example: proc means data=sasuser.class MEAN VAR CV; var height weight; class sex; run; statistics Summary statistics

131 Other statistical measures PROC MEANS calculate summaries N,, Standard Deviation, Minimum, Maximum by default. Use PROC MEANS options to specify other statistical measures. Example: proc means data=sasuser.class MEAN VAR CV; var height weight; class sex; run; statistics Summary statistics

132 Other statistical measures PROC MEANS calculate summaries N,, Standard Deviation, Minimum, Maximum by default. Use PROC MEANS options to specify other statistical measures. Example: proc means data=sasuser.class MEAN VAR CV; var height weight; class sex; run; statistics Summary statistics

133 Producing output datasets Use the OUTPUT statement to save the summary statistics to an output dataset. Syntax: OUTPUT OUT=output-dataset keyword= keyword= / AUTONAME; Where keyword is a name of some statistical measure, like MEAN, STD, N, MIN, MAX, etc. Example: statistics Summary statistics proc means data=sasuser.class MEAN VAR CV; var height weight; class sex; output out=res N= CV= / AUTONAME; run; proc print data=res;run;

134 Producing output datasets Use the OUTPUT statement to save the summary statistics to an output dataset. Syntax: OUTPUT OUT=output-dataset keyword= keyword= / AUTONAME; Where keyword is a name of some statistical measure, like MEAN, STD, N, MIN, MAX, etc. Example: statistics Summary statistics proc means data=sasuser.class MEAN VAR CV; var height weight; class sex; output out=res N= CV= / AUTONAME; run; proc print data=res;run;

135 Producing output datasets Use the OUTPUT statement to save the summary statistics to an output dataset. Syntax: OUTPUT OUT=output-dataset keyword= keyword= / AUTONAME; Where keyword is a name of some statistical measure, like MEAN, STD, N, MIN, MAX, etc. Example: statistics Summary statistics proc means data=sasuser.class MEAN VAR CV; var height weight; class sex; output out=res N= CV= / AUTONAME; run; proc print data=res;run;

136 Producing output datasets Use the OUTPUT statement to save the summary statistics to an output dataset. Syntax: OUTPUT OUT=output-dataset keyword= keyword= / AUTONAME; Where keyword is a name of some statistical measure, like MEAN, STD, N, MIN, MAX, etc. Example: statistics Summary statistics proc means data=sasuser.class MEAN VAR CV; var height weight; class sex; output out=res N= CV= / AUTONAME; run; proc print data=res;run;

137 One sample Z tests, t tests. Two sample t tests. The Wilcoxon rank sum test. Paired t tests. Comparing proportions: one sample and two sample. One Sample Test of the Comparing Two Groups Paired Comparison Comparing Proportions

138 One sample Z tests, t tests. Two sample t tests. The Wilcoxon rank sum test. Paired t tests. Comparing proportions: one sample and two sample. One Sample Test of the Comparing Two Groups Paired Comparison Comparing Proportions

139 One sample Z tests, t tests. Two sample t tests. The Wilcoxon rank sum test. Paired t tests. Comparing proportions: one sample and two sample. One Sample Test of the Comparing Two Groups Paired Comparison Comparing Proportions

140 One sample Z tests, t tests. Two sample t tests. The Wilcoxon rank sum test. Paired t tests. Comparing proportions: one sample and two sample. One Sample Test of the Comparing Two Groups Paired Comparison Comparing Proportions

141 One sample Z tests, t tests. Two sample t tests. The Wilcoxon rank sum test. Paired t tests. Comparing proportions: one sample and two sample. One Sample Test of the Comparing Two Groups Paired Comparison Comparing Proportions

142 One Sample Z Test Two-sided X N(µ, σ 2 0 ), σ 0 known. H 0 : µ = µ 0 H a : µ µ 0 Z = X µ 0 σ 0 / H 0 n N(0, 1). W = { Z > λ}, λ = Φ 1 (1 α/2). p-value=2(1 Φ( Z ). Wald test: Based on the central limit theorem, to test H 0 : θ = θ 0 H a : θ θ 0, use W = ˆθ θ 0 as the SE(ˆθ) test statistic, if W N(0, 1). One Sample Test of the Comparing Two Groups Paired Comparison Comparing Proportions

143 One Sample Z Test Two-sided X N(µ, σ 2 0 ), σ 0 known. H 0 : µ = µ 0 H a : µ µ 0 Z = X µ 0 σ 0 / H 0 n N(0, 1). W = { Z > λ}, λ = Φ 1 (1 α/2). p-value=2(1 Φ( Z ). Wald test: Based on the central limit theorem, to test H 0 : θ = θ 0 H a : θ θ 0, use W = ˆθ θ 0 as the SE(ˆθ) test statistic, if W N(0, 1). One Sample Test of the Comparing Two Groups Paired Comparison Comparing Proportions

144 One Sample Z Test Two-sided X N(µ, σ 2 0 ), σ 0 known. H 0 : µ = µ 0 H a : µ µ 0 Z = X µ 0 σ 0 / H 0 n N(0, 1). W = { Z > λ}, λ = Φ 1 (1 α/2). p-value=2(1 Φ( Z ). Wald test: Based on the central limit theorem, to test H 0 : θ = θ 0 H a : θ θ 0, use W = ˆθ θ 0 as the SE(ˆθ) test statistic, if W N(0, 1). One Sample Test of the Comparing Two Groups Paired Comparison Comparing Proportions

145 One Sample Z Test Two-sided X N(µ, σ 2 0 ), σ 0 known. H 0 : µ = µ 0 H a : µ µ 0 Z = X µ 0 σ 0 / H 0 n N(0, 1). W = { Z > λ}, λ = Φ 1 (1 α/2). p-value=2(1 Φ( Z ). Wald test: Based on the central limit theorem, to test H 0 : θ = θ 0 H a : θ θ 0, use W = ˆθ θ 0 as the SE(ˆθ) test statistic, if W N(0, 1). One Sample Test of the Comparing Two Groups Paired Comparison Comparing Proportions

146 One Sample Z Test Two-sided X N(µ, σ 2 0 ), σ 0 known. H 0 : µ = µ 0 H a : µ µ 0 Z = X µ 0 σ 0 / H 0 n N(0, 1). W = { Z > λ}, λ = Φ 1 (1 α/2). p-value=2(1 Φ( Z ). Wald test: Based on the central limit theorem, to test H 0 : θ = θ 0 H a : θ θ 0, use W = ˆθ θ 0 as the SE(ˆθ) test statistic, if W N(0, 1). One Sample Test of the Comparing Two Groups Paired Comparison Comparing Proportions

147 One Sample Z Test Two-sided X N(µ, σ 2 0 ), σ 0 known. H 0 : µ = µ 0 H a : µ µ 0 Z = X µ 0 σ 0 / H 0 n N(0, 1). W = { Z > λ}, λ = Φ 1 (1 α/2). p-value=2(1 Φ( Z ). Wald test: Based on the central limit theorem, to test H 0 : θ = θ 0 H a : θ θ 0, use W = ˆθ θ 0 as the SE(ˆθ) test statistic, if W N(0, 1). One Sample Test of the Comparing Two Groups Paired Comparison Comparing Proportions

### Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

### STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members

### Exploratory data analysis (Chapter 2) Fall 2011

Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,

### Exploratory Data Analysis

Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction

### Seminar paper Statistics

Seminar paper Statistics The seminar paper must contain: - the title page - the characterization of the data (origin, reason why you have chosen this analysis,...) - the list of the data (in the table)

### 4 Other useful features on the course web page. 5 Accessing SAS

1 Using SAS outside of ITCs Statistical Methods and Computing, 22S:30/105 Instructor: Cowles Lab 1 Jan 31, 2014 You can access SAS from off campus by using the ITC Virtual Desktop Go to https://virtualdesktopuiowaedu

### Statistical Analysis The First Steps Jennifer L. Waller Medical College of Georgia, Augusta, Georgia

Statistical Analysis The First Steps Jennifer L. Waller Medical College of Georgia, Augusta, Georgia ABSTRACT For both statisticians and non-statisticians, knowing what data look like before more rigorous

### Using pivots to construct confidence intervals. In Example 41 we used the fact that

Using pivots to construct confidence intervals In Example 41 we used the fact that Q( X, µ) = X µ σ/ n N(0, 1) for all µ. We then said Q( X, µ) z α/2 with probability 1 α, and converted this into a statement

### Hypothesis Testing COMP 245 STATISTICS. Dr N A Heard. 1 Hypothesis Testing 2 1.1 Introduction... 2 1.2 Error Rates and Power of a Test...

Hypothesis Testing COMP 45 STATISTICS Dr N A Heard Contents 1 Hypothesis Testing 1.1 Introduction........................................ 1. Error Rates and Power of a Test.............................

### How to Perform and Interpret Chi-Square and T-Tests Jennifer L. Waller, Georgia Health Sciences University, Augusta, Georgia

Paper HW-05 How to Perform and Interpret Chi-Square and T-Tests Jennifer L. Waller, Georgia Health Sciences University, Augusta, Georgia ABSTRACT For both statisticians and non-statisticians, knowing what

### Descriptive Statistics

Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web

### Statistics 641 - EXAM II - 1999 through 2003

Statistics 641 - EXAM II - 1999 through 2003 December 1, 1999 I. (40 points ) Place the letter of the best answer in the blank to the left of each question. (1) In testing H 0 : µ 5 vs H 1 : µ > 5, the

### DATA INTERPRETATION AND STATISTICS

PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

### Data Mining Part 2. Data Understanding and Preparation 2.1 Data Understanding Spring 2010

Data Mining Part 2. and Preparation 2.1 Spring 2010 Instructor: Dr. Masoud Yaghini Introduction Outline Introduction Measuring the Central Tendency Measuring the Dispersion of Data Graphic Displays References

### A frequency distribution is a table used to describe a data set. A frequency table lists intervals or ranges of data values called data classes

A frequency distribution is a table used to describe a data set. A frequency table lists intervals or ranges of data values called data classes together with the number of data values from the set that

### Exercise 1.12 (Pg. 22-23)

Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

### Basic Statistical and Modeling Procedures Using SAS

Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom

### Final Exam Practice Problem Answers

Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

### Descriptive Statistics. Understanding Data: Categorical Variables. Descriptive Statistics. Dataset: Shellfish Contamination

Descriptive Statistics Understanding Data: Dataset: Shellfish Contamination Location Year Species Species2 Method Metals Cadmium (mg kg - ) Chromium (mg kg - ) Copper (mg kg - ) Lead (mg kg - ) Mercury

### F. Farrokhyar, MPhil, PhD, PDoc

Learning objectives Descriptive Statistics F. Farrokhyar, MPhil, PhD, PDoc To recognize different types of variables To learn how to appropriately explore your data How to display data using graphs How

### Univariate Descriptive Statistics

Univariate Descriptive Statistics Displays: pie charts, bar graphs, box plots, histograms, density estimates, dot plots, stemleaf plots, tables, lists. Example: sea urchin sizes Boxplot Histogram Urchin

### 2.0 Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table

2.0 Lesson Plan Answer Questions 1 Summary Statistics Histograms The Normal Distribution Using the Standard Normal Table 2. Summary Statistics Given a collection of data, one needs to find representations

### Chapter 2: Exploring Data with Graphs and Numerical Summaries. Graphical Measures- Graphs are used to describe the shape of a data set.

Page 1 of 16 Chapter 2: Exploring Data with Graphs and Numerical Summaries Graphical Measures- Graphs are used to describe the shape of a data set. Section 1: Types of Variables In general, variable can

### 9-3.4 Likelihood ratio test. Neyman-Pearson lemma

9-3.4 Likelihood ratio test Neyman-Pearson lemma 9-1 Hypothesis Testing 9-1.1 Statistical Hypotheses Statistical hypothesis testing and confidence interval estimation of parameters are the fundamental

### Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.1-1.6) Objectives

### Content DESCRIPTIVE STATISTICS. Data & Statistic. Statistics. Example: DATA VS. STATISTIC VS. STATISTICS

Content DESCRIPTIVE STATISTICS Dr Najib Majdi bin Yaacob MD, MPH, DrPH (Epidemiology) USM Unit of Biostatistics & Research Methodology School of Medical Sciences Universiti Sains Malaysia. Introduction

### Death on the Titanic

Death on the Titanic Introduction On its maiden voyage, the cruise ship Titanic collided with an iceberg and sank. There was much loss of life. It is of interest to test how well sample proportions from

### Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) B Bar graph a diagram representing the frequency distribution for nominal or discrete data. It consists of a sequence

### Principle of Data Reduction

Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then

### Technology Step-by-Step Using StatCrunch

Technology Step-by-Step Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate

### NCSS Statistical Software

Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

### Introduction to Hypothesis Testing. Point estimation and confidence intervals are useful statistical inference procedures.

Introduction to Hypothesis Testing Point estimation and confidence intervals are useful statistical inference procedures. Another type of inference is used frequently used concerns tests of hypotheses.

### Statistics Chapter 3 Averages and Variations

Statistics Chapter 3 Averages and Variations Measures of Central Tendency Average a measure of the center value or central tendency of a distribution of values. Three types of average: Mode Median Mean

### Simple Linear Regression Inference

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

### II. DISTRIBUTIONS distribution normal distribution. standard scores

Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

### 1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

### Chapter 2. Objectives. Tabulate Qualitative Data. Frequency Table. Descriptive Statistics: Organizing, Displaying and Summarizing Data.

Objectives Chapter Descriptive Statistics: Organizing, Displaying and Summarizing Data Student should be able to Organize data Tabulate data into frequency/relative frequency tables Display data graphically

### MBA 611 STATISTICS AND QUANTITATIVE METHODS

MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain

### BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential

### Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Types of Variables Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Quantitative (numerical)variables: take numerical values for which arithmetic operations make sense (addition/averaging)

### The Chi-Square Distributions

MATH 183 The Chi-Square Distributions Dr. Neal, WKU The chi-square distributions can be used in statistics to analyze the standard deviation " of a normally distributed measurement and to test the goodness

### MEASURES OF LOCATION AND SPREAD

Paper TU04 An Overview of Non-parametric Tests in SAS : When, Why, and How Paul A. Pappas and Venita DePuy Durham, North Carolina, USA ABSTRACT Most commonly used statistical procedures are based on the

### 6. Distribution and Quantile Functions

Virtual Laboratories > 2. Distributions > 1 2 3 4 5 6 7 8 6. Distribution and Quantile Functions As usual, our starting point is a random experiment with probability measure P on an underlying sample spac

### Statistics and research

Statistics and research Usaneya Perngparn Chitlada Areesantichai Drug Dependence Research Center (WHOCC for Research and Training in Drug Dependence) College of Public Health Sciences Chulolongkorn University,

### Geostatistics Exploratory Analysis

Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt

### Foundation of Quantitative Data Analysis

Foundation of Quantitative Data Analysis Part 1: Data manipulation and descriptive statistics with SPSS/Excel HSRS #10 - October 17, 2013 Reference : A. Aczel, Complete Business Statistics. Chapters 1

### Statistics 104: Section 6!

Page 1 Statistics 104: Section 6! TF: Deirdre (say: Dear-dra) Bloome Email: dbloome@fas.harvard.edu Section Times Thursday 2pm-3pm in SC 109, Thursday 5pm-6pm in SC 705 Office Hours: Thursday 6pm-7pm SC

### Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals

### Variables and Data A variable contains data about anything we measure. For example; age or gender of the participants or their score on a test.

The Analysis of Research Data The design of any project will determine what sort of statistical tests you should perform on your data and how successful the data analysis will be. For example if you decide

### CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

CA200 Quantitative Analysis for Business Decisions File name: CA200_Section_04A_StatisticsIntroduction Table of Contents 4. Introduction to Statistics... 1 4.1 Overview... 3 4.2 Discrete or continuous

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition Online Learning Centre Technology Step-by-Step - Excel Microsoft Excel is a spreadsheet software application

### Joint Exam 1/P Sample Exam 1

Joint Exam 1/P Sample Exam 1 Take this practice exam under strict exam conditions: Set a timer for 3 hours; Do not stop the timer for restroom breaks; Do not look at your notes. If you believe a question

### Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217

Part 3 Comparing Groups Chapter 7 Comparing Paired Groups 189 Chapter 8 Comparing Two Independent Groups 217 Chapter 9 Comparing More Than Two Groups 257 188 Elementary Statistics Using SAS Chapter 7 Comparing

### Null Hypothesis H 0. The null hypothesis (denoted by H 0

Hypothesis test In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test (or test of significance) is a standard procedure for testing a claim about a property

### Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Descriptive Statistics Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Statistics as a Tool for LIS Research Importance of statistics in research

### Guido s Guide to PROC MEANS A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY

Guido s Guide to PROC MEANS A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY ABSTRACT PROC MEANS is a basic procedure within BASE SAS

### Maximum Likelihood Estimation

Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for

### 7 Hypothesis testing - one sample tests

7 Hypothesis testing - one sample tests 7.1 Introduction Definition 7.1 A hypothesis is a statement about a population parameter. Example A hypothesis might be that the mean age of students taking MAS113X

### MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...

MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................

### Chapter 3 RANDOM VARIATE GENERATION

Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

### VISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA

VISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA Csilla Csendes University of Miskolc, Hungary Department of Applied Mathematics ICAM 2010 Probability density functions A random variable X has density

### Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

Density Curve A density curve is the graph of a continuous probability distribution. It must satisfy the following properties: 1. The total area under the curve must equal 1. 2. Every point on the curve

### Numerical Summarization of Data OPRE 6301

Numerical Summarization of Data OPRE 6301 Motivation... In the previous session, we used graphical techniques to describe data. For example: While this histogram provides useful insight, other interesting

### Experimental Design for Influential Factors of Rates on Massive Open Online Courses

Experimental Design for Influential Factors of Rates on Massive Open Online Courses December 12, 2014 Ning Li nli7@stevens.edu Qing Wei qwei1@stevens.edu Yating Lan ylan2@stevens.edu Yilin Wei ywei12@stevens.edu

### 1 SAMPLE SIGN TEST. Non-Parametric Univariate Tests: 1 Sample Sign Test 1. A non-parametric equivalent of the 1 SAMPLE T-TEST.

Non-Parametric Univariate Tests: 1 Sample Sign Test 1 1 SAMPLE SIGN TEST A non-parametric equivalent of the 1 SAMPLE T-TEST. ASSUMPTIONS: Data is non-normally distributed, even after log transforming.

### Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.

1 Commands in JMP and Statcrunch Below are a set of commands in JMP and Statcrunch which facilitate a basic statistical analysis. The first part concerns commands in JMP, the second part is for analysis

### Lecture 2. Summarizing the Sample

Lecture 2 Summarizing the Sample WARNING: Today s lecture may bore some of you It s (sort of) not my fault I m required to teach you about what we re going to cover today. I ll try to make it as exciting

### Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Introduction to Statistics for Psychology and Quantitative Methods for Human Sciences Jonathan Marchini Course Information There is website devoted to the course at http://www.stats.ox.ac.uk/ marchini/phs.html

### SYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation

SYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu September 19, 2015 Outline

### Null Hypothesis Significance Testing Signifcance Level, Power, t-tests. 18.05 Spring 2014 Jeremy Orloff and Jonathan Bloom

Null Hypothesis Significance Testing Signifcance Level, Power, t-tests 18.05 Spring 2014 Jeremy Orloff and Jonathan Bloom Simple and composite hypotheses Simple hypothesis: the sampling distribution is

### 2. Describing Data. We consider 1. Graphical methods 2. Numerical methods 1 / 56

2. Describing Data We consider 1. Graphical methods 2. Numerical methods 1 / 56 General Use of Graphical and Numerical Methods Graphical methods can be used to visually and qualitatively present data and

### Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),

### The Normal Distribution. Alan T. Arnholt Department of Mathematical Sciences Appalachian State University

The Normal Distribution Alan T. Arnholt Department of Mathematical Sciences Appalachian State University arnholt@math.appstate.edu Spring 2006 R Notes 1 Copyright c 2006 Alan T. Arnholt 2 Continuous Random

### Frequency Distributions

Displaying Data Frequency Distributions After collecting data, the first task for a researcher is to organize and summarize the data to get a general overview of the results. Remember, this is the goal

### BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 seema@iasri.res.in Genomics A genome is an organism s

### Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

Experimental Design Power and Sample Size Determination Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison November 3 8, 2011 To this point in the semester, we have largely

### Are Histograms Giving You Fits? New SAS Software for Analyzing Distributions

Are Histograms Giving You Fits? New SAS Software for Analyzing Distributions Nathan A. Curtis, SAS Institute Inc., Cary, NC Abstract Exploring and modeling the distribution of a data sample is a key step

### 10-3 Measures of Central Tendency and Variation

10-3 Measures of Central Tendency and Variation So far, we have discussed some graphical methods of data description. Now, we will investigate how statements of central tendency and variation can be used.

### Chapter 7 Section 1 Homework Set A

Chapter 7 Section 1 Homework Set A 7.15 Finding the critical value t *. What critical value t * from Table D (use software, go to the web and type t distribution applet) should be used to calculate the

### A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September

### 13.2 Measures of Central Tendency

13.2 Measures of Central Tendency Measures of Central Tendency For a given set of numbers, it may be desirable to have a single number to serve as a kind of representative value around which all the numbers

### STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables Discrete vs. continuous random variables Examples of continuous distributions o Uniform o Exponential o Normal Recall: A random

### 5. Continuous Random Variables

5. Continuous Random Variables Continuous random variables can take any value in an interval. They are used to model physical characteristics such as time, length, position, etc. Examples (i) Let X be

### Interpretation of Somers D under four simple models

Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms

### Desciptive Statistics Qualitative data Quantitative data Graphical methods Numerical methods

Desciptive Statistics Qualitative data Quantitative data Graphical methods Numerical methods Qualitative data Data are classified in categories Non numerical (although may be numerically codified) Elements

### 4. Continuous Random Variables, the Pareto and Normal Distributions

4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random

### Statistical Analysis I

CTSI BERD Research Methods Seminar Series Statistical Analysis I Lan Kong, PhD Associate Professor Department of Public Health Sciences December 22, 2014 Biostatistics, Epidemiology, Research Design(BERD)

### Introduction to Quantitative Methods

Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

### Nonparametric Two-Sample Tests. Nonparametric Tests. Sign Test

Nonparametric Two-Sample Tests Sign test Mann-Whitney U-test (a.k.a. Wilcoxon two-sample test) Kolmogorov-Smirnov Test Wilcoxon Signed-Rank Test Tukey-Duckworth Test 1 Nonparametric Tests Recall, nonparametric

### Normal distribution. ) 2 /2σ. 2π σ

Normal distribution The normal distribution is the most widely known and used of all distributions. Because the normal distribution approximates many natural phenomena so well, it has developed into a

### Chapter 9: Hypothesis Testing Sections

Chapter 9: Hypothesis Testing Sections - we are still here Skip: 9.2 Testing Simple Hypotheses Skip: 9.3 Uniformly Most Powerful Tests Skip: 9.4 Two-Sided Alternatives 9.5 The t Test 9.6 Comparing the

### Chapter 8 Introduction to Hypothesis Testing

Chapter 8 Student Lecture Notes 8-1 Chapter 8 Introduction to Hypothesis Testing Fall 26 Fundamentals of Business Statistics 1 Chapter Goals After completing this chapter, you should be able to: Formulate

### Biostatistics Lab Notes

Biostatistics Lab Notes Page 1 Lab 1: Measurement and Sampling Biostatistics Lab Notes Because we used a chance mechanism to select our sample, each sample will differ. My data set (GerstmanB.sav), looks

### MINITAB ASSISTANT WHITE PAPER

MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

### Variables. Exploratory Data Analysis

Exploratory Data Analysis Exploratory Data Analysis involves both graphical displays of data and numerical summaries of data. A common situation is for a data set to be represented as a matrix. There is

### One-sample normal hypothesis Testing, paired t-test, two-sample normal inference, normal probability plots

1 / 27 One-sample normal hypothesis Testing, paired t-test, two-sample normal inference, normal probability plots Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis

### Measuring the Power of a Test

Textbook Reference: Chapter 9.5 Measuring the Power of a Test An economic problem motivates the statement of a null and alternative hypothesis. For a numeric data set, a decision rule can lead to the rejection

### DISTRIBUTION FITTING 1

1 DISTRIBUTION FITTING What Is Distribution Fitting? Distribution fitting is the procedure of selecting a statistical distribution that best fits to a data set generated by some random process. In other