Dongfeng Li. Autumn 2010

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Dongfeng Li. Autumn 2010"

Transcription

1 Autumn 2010

2 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis testing, basic analysis of variance.

3 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis testing, basic analysis of variance.

4 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis testing, basic analysis of variance.

5 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis testing, basic analysis of variance.

6 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis testing, basic analysis of variance.

7 Section Contents Review statistics concepts: Distribution, discrete distribution, continuouse distribution, PDF, CDF, quantile. Normal distribution., median, variance, standard deviation, interquantile range, skewness, kurtosis. Population, parameter, sample, statistics, estimates, sampling distribution. MLE, standard error. Hypothesis tests, two types of errors, p-value.

8 Section Contents Review statistics concepts: Distribution, discrete distribution, continuouse distribution, PDF, CDF, quantile. Normal distribution., median, variance, standard deviation, interquantile range, skewness, kurtosis. Population, parameter, sample, statistics, estimates, sampling distribution. MLE, standard error. Hypothesis tests, two types of errors, p-value.

9 Section Contents Review statistics concepts: Distribution, discrete distribution, continuouse distribution, PDF, CDF, quantile. Normal distribution., median, variance, standard deviation, interquantile range, skewness, kurtosis. Population, parameter, sample, statistics, estimates, sampling distribution. MLE, standard error. Hypothesis tests, two types of errors, p-value.

10 Section Contents Review statistics concepts: Distribution, discrete distribution, continuouse distribution, PDF, CDF, quantile. Normal distribution., median, variance, standard deviation, interquantile range, skewness, kurtosis. Population, parameter, sample, statistics, estimates, sampling distribution. MLE, standard error. Hypothesis tests, two types of errors, p-value.

11 Section Contents Review statistics concepts: Distribution, discrete distribution, continuouse distribution, PDF, CDF, quantile. Normal distribution., median, variance, standard deviation, interquantile range, skewness, kurtosis. Population, parameter, sample, statistics, estimates, sampling distribution. MLE, standard error. Hypothesis tests, two types of errors, p-value.

12 Section Contents Review statistics concepts: Distribution, discrete distribution, continuouse distribution, PDF, CDF, quantile. Normal distribution., median, variance, standard deviation, interquantile range, skewness, kurtosis. Population, parameter, sample, statistics, estimates, sampling distribution. MLE, standard error. Hypothesis tests, two types of errors, p-value.

13 Distribution Random Variable: Discrete, such as sex, patient/control, age group. Continuous, such as weight, blood pressure. Distribution: used to describe the relative chance of taking some value. For discrete variable X, use P(X = xi ), where {x i } are the value set of X. Called probability mass function(pmf). For continuous variable X, use the probability density function(pdf) f (x), where P(X (x ɛ, x + ɛ) f (x)(2ɛ).

14 Distribution Random Variable: Discrete, such as sex, patient/control, age group. Continuous, such as weight, blood pressure. Distribution: used to describe the relative chance of taking some value. For discrete variable X, use P(X = xi ), where {x i } are the value set of X. Called probability mass function(pmf). For continuous variable X, use the probability density function(pdf) f (x), where P(X (x ɛ, x + ɛ) f (x)(2ɛ).

15 Distribution Random Variable: Discrete, such as sex, patient/control, age group. Continuous, such as weight, blood pressure. Distribution: used to describe the relative chance of taking some value. For discrete variable X, use P(X = xi ), where {x i } are the value set of X. Called probability mass function(pmf). For continuous variable X, use the probability density function(pdf) f (x), where P(X (x ɛ, x + ɛ) f (x)(2ɛ).

16 Distribution Random Variable: Discrete, such as sex, patient/control, age group. Continuous, such as weight, blood pressure. Distribution: used to describe the relative chance of taking some value. For discrete variable X, use P(X = xi ), where {x i } are the value set of X. Called probability mass function(pmf). For continuous variable X, use the probability density function(pdf) f (x), where P(X (x ɛ, x + ɛ) f (x)(2ɛ).

17 Distribution Random Variable: Discrete, such as sex, patient/control, age group. Continuous, such as weight, blood pressure. Distribution: used to describe the relative chance of taking some value. For discrete variable X, use P(X = xi ), where {x i } are the value set of X. Called probability mass function(pmf). For continuous variable X, use the probability density function(pdf) f (x), where P(X (x ɛ, x + ɛ) f (x)(2ɛ).

18 Distribution Random Variable: Discrete, such as sex, patient/control, age group. Continuous, such as weight, blood pressure. Distribution: used to describe the relative chance of taking some value. For discrete variable X, use P(X = xi ), where {x i } are the value set of X. Called probability mass function(pmf). For continuous variable X, use the probability density function(pdf) f (x), where P(X (x ɛ, x + ɛ) f (x)(2ɛ).

19 CDF Cumulative distribution function(cdf) F(x): F(x) = P(X x) P(x (a, b]) = F(b) F(a) For discrete distribution, F(x) = P(X = x i ) x i x For continuous distribution, F(x) = x f (t)dt

20 CDF Cumulative distribution function(cdf) F(x): F(x) = P(X x) P(x (a, b]) = F(b) F(a) For discrete distribution, F(x) = P(X = x i ) x i x For continuous distribution, F(x) = x f (t)dt

21 CDF Cumulative distribution function(cdf) F(x): F(x) = P(X x) P(x (a, b]) = F(b) F(a) For discrete distribution, F(x) = P(X = x i ) x i x For continuous distribution, F(x) = x f (t)dt

22 Quantile function Quantile function: the inverse of CDF q(p) = F 1 (p), p (0, 1) if F (x) is 1-1 mapping. Generally, q(p) = x p where P(X x p ) p, P(X x p ) 1 p (x p can be non-unique.)

23 Quantile function Quantile function: the inverse of CDF q(p) = F 1 (p), p (0, 1) if F (x) is 1-1 mapping. Generally, q(p) = x p where P(X x p ) p, P(X x p ) 1 p (x p can be non-unique.)

24 The normal distribution Standard normal distribution(n(0,1)), PDF ϕ(x) = 1 2π e x2 2 Normal distribution N(µ, σ 2 ), PDF f (x) = ϕ( x µ σ ) = 1 2πσ exp{ (x µ)2 2σ 2 }

25 The normal distribution Standard normal distribution(n(0,1)), PDF ϕ(x) = 1 2π e x2 2 Normal distribution N(µ, σ 2 ), PDF f (x) = ϕ( x µ σ ) = 1 2πσ exp{ (x µ)2 2σ 2 }

26 Numerical charasteristics PDF is a curve with infinite number of points. We can use some numbers to describe the key part of a distribution. Firstly, the location measurement. The mean EX. The meadian, x0.5 = q(0.5) where (can be non-unique). Variability measurement. P(x x 0.5 ) 0.5, P(x x 0.5 ) 0.5 The standard deviation σ X = E(X EX) 2. Interquantile range q(0.75) q(0.25).

27 Numerical charasteristics PDF is a curve with infinite number of points. We can use some numbers to describe the key part of a distribution. Firstly, the location measurement. The mean EX. The meadian, x0.5 = q(0.5) where (can be non-unique). Variability measurement. P(x x 0.5 ) 0.5, P(x x 0.5 ) 0.5 The standard deviation σ X = E(X EX) 2. Interquantile range q(0.75) q(0.25).

28 Numerical charasteristics PDF is a curve with infinite number of points. We can use some numbers to describe the key part of a distribution. Firstly, the location measurement. The mean EX. The meadian, x0.5 = q(0.5) where (can be non-unique). Variability measurement. P(x x 0.5 ) 0.5, P(x x 0.5 ) 0.5 The standard deviation σ X = E(X EX) 2. Interquantile range q(0.75) q(0.25).

29 Numerical charasteristics PDF is a curve with infinite number of points. We can use some numbers to describe the key part of a distribution. Firstly, the location measurement. The mean EX. The meadian, x0.5 = q(0.5) where (can be non-unique). Variability measurement. P(x x 0.5 ) 0.5, P(x x 0.5 ) 0.5 The standard deviation σ X = E(X EX) 2. Interquantile range q(0.75) q(0.25).

30 Numerical charasteristics PDF is a curve with infinite number of points. We can use some numbers to describe the key part of a distribution. Firstly, the location measurement. The mean EX. The meadian, x0.5 = q(0.5) where (can be non-unique). Variability measurement. P(x x 0.5 ) 0.5, P(x x 0.5 ) 0.5 The standard deviation σ X = E(X EX) 2. Interquantile range q(0.75) q(0.25).

31 Numerical charasteristics PDF is a curve with infinite number of points. We can use some numbers to describe the key part of a distribution. Firstly, the location measurement. The mean EX. The meadian, x0.5 = q(0.5) where (can be non-unique). Variability measurement. P(x x 0.5 ) 0.5, P(x x 0.5 ) 0.5 The standard deviation σ X = E(X EX) 2. Interquantile range q(0.75) q(0.25).

32 Numerical charasteristics PDF is a curve with infinite number of points. We can use some numbers to describe the key part of a distribution. Firstly, the location measurement. The mean EX. The meadian, x0.5 = q(0.5) where (can be non-unique). Variability measurement. P(x x 0.5 ) 0.5, P(x x 0.5 ) 0.5 The standard deviation σ X = E(X EX) 2. Interquantile range q(0.75) q(0.25).

33 Numerical charasteristics PDF is a curve with infinite number of points. We can use some numbers to describe the key part of a distribution. Firstly, the location measurement. The mean EX. The meadian, x0.5 = q(0.5) where (can be non-unique). Variability measurement. P(x x 0.5 ) 0.5, P(x x 0.5 ) 0.5 The standard deviation σ X = E(X EX) 2. Interquantile range q(0.75) q(0.25).

34 Shape characteristics. Skewness ( X EX E σ X ) 3 Symmetric, left-skewed, right-skewed density. Shape characteristics. Kurtosis ( ) X EX 4 E 3 σ X Heavy tail problem. Multimodel distribution. E.g., the weight of 10 year old and 20 year old mixed together.

35 Shape characteristics. Skewness ( X EX E σ X ) 3 Symmetric, left-skewed, right-skewed density. Shape characteristics. Kurtosis ( ) X EX 4 E 3 σ X Heavy tail problem. Multimodel distribution. E.g., the weight of 10 year old and 20 year old mixed together.

36 Shape characteristics. Skewness ( X EX E σ X ) 3 Symmetric, left-skewed, right-skewed density. Shape characteristics. Kurtosis ( ) X EX 4 E 3 σ X Heavy tail problem. Multimodel distribution. E.g., the weight of 10 year old and 20 year old mixed together.

37 Shape characteristics. Skewness ( X EX E σ X ) 3 Symmetric, left-skewed, right-skewed density. Shape characteristics. Kurtosis ( ) X EX 4 E 3 σ X Heavy tail problem. Multimodel distribution. E.g., the weight of 10 year old and 20 year old mixed together.

38 Shape characteristics. Skewness ( X EX E σ X ) 3 Symmetric, left-skewed, right-skewed density. Shape characteristics. Kurtosis ( ) X EX 4 E 3 σ X Heavy tail problem. Multimodel distribution. E.g., the weight of 10 year old and 20 year old mixed together.

39 Population A population is modeled by a random variable or a distribution. Population parameters: unknown numbers which could decide the distribution. E.g., for N(µ, σ 2 ) population, (µ, σ 2 ) are the two unknown population parameters.

40 Population A population is modeled by a random variable or a distribution. Population parameters: unknown numbers which could decide the distribution. E.g., for N(µ, σ 2 ) population, (µ, σ 2 ) are the two unknown population parameters.

41 Sample A sample(typically) is n observations draw independently from the population, X 1, X 2,..., X n. A sample can be regarded as n numbers, or n iid random variables. Statistics: calculate some values to estimate the distribution of the population. Can be regarded as a random variable. Such as sample mean, sample standard deviation. Can be used to estimate unknown population parameters, called estimates. Sampling distribution: The distribution of an estimator or other statistic. E.g., if X 1,..., X n is a sample from N(µ, σ 2 ), then X N(µ, σ 2 /n).

42 Sample A sample(typically) is n observations draw independently from the population, X 1, X 2,..., X n. A sample can be regarded as n numbers, or n iid random variables. Statistics: calculate some values to estimate the distribution of the population. Can be regarded as a random variable. Such as sample mean, sample standard deviation. Can be used to estimate unknown population parameters, called estimates. Sampling distribution: The distribution of an estimator or other statistic. E.g., if X 1,..., X n is a sample from N(µ, σ 2 ), then X N(µ, σ 2 /n).

43 Sample A sample(typically) is n observations draw independently from the population, X 1, X 2,..., X n. A sample can be regarded as n numbers, or n iid random variables. Statistics: calculate some values to estimate the distribution of the population. Can be regarded as a random variable. Such as sample mean, sample standard deviation. Can be used to estimate unknown population parameters, called estimates. Sampling distribution: The distribution of an estimator or other statistic. E.g., if X 1,..., X n is a sample from N(µ, σ 2 ), then X N(µ, σ 2 /n).

44 Sample A sample(typically) is n observations draw independently from the population, X 1, X 2,..., X n. A sample can be regarded as n numbers, or n iid random variables. Statistics: calculate some values to estimate the distribution of the population. Can be regarded as a random variable. Such as sample mean, sample standard deviation. Can be used to estimate unknown population parameters, called estimates. Sampling distribution: The distribution of an estimator or other statistic. E.g., if X 1,..., X n is a sample from N(µ, σ 2 ), then X N(µ, σ 2 /n).

45 MLE(Maximimum Likelihood Estimate) MLE is a commonly used parameter estimation method. For random sample X 1, X 2,..., X n from population X, if PDF of PMF of X is f (x; β), then ˆβ = arg max β L(β) = arg max β n f (X i ; β) i=1 is called the MLE of the unknow parameter β. Under some conditions, when the sample size n, ˆβ has a limiting(approximate) distribution N(β, σ 2 β ).

46 MLE(Maximimum Likelihood Estimate) MLE is a commonly used parameter estimation method. For random sample X 1, X 2,..., X n from population X, if PDF of PMF of X is f (x; β), then ˆβ = arg max β L(β) = arg max β n f (X i ; β) i=1 is called the MLE of the unknow parameter β. Under some conditions, when the sample size n, ˆβ has a limiting(approximate) distribution N(β, σ 2 β ).

47 Standard Error If ˆβ = ψ(x 1,..., X n ) is a estimate of an population parameter β, it has a sampling distribution F ˆβ(x). The standard deviation of the sampling distribution is σ ˆβ. An estimate of σ ˆβ is called the standard error(se) of ˆβ. If ˆβ has a limiting normal distribution, then ˆβ is approximately distributed N(β, SE( ˆβ)). SE can measure the precision of estimation. SE can be used to construct (approximate) confidence intervals.

48 Standard Error If ˆβ = ψ(x 1,..., X n ) is a estimate of an population parameter β, it has a sampling distribution F ˆβ(x). The standard deviation of the sampling distribution is σ ˆβ. An estimate of σ ˆβ is called the standard error(se) of ˆβ. If ˆβ has a limiting normal distribution, then ˆβ is approximately distributed N(β, SE( ˆβ)). SE can measure the precision of estimation. SE can be used to construct (approximate) confidence intervals.

49 Standard Error If ˆβ = ψ(x 1,..., X n ) is a estimate of an population parameter β, it has a sampling distribution F ˆβ(x). The standard deviation of the sampling distribution is σ ˆβ. An estimate of σ ˆβ is called the standard error(se) of ˆβ. If ˆβ has a limiting normal distribution, then ˆβ is approximately distributed N(β, SE( ˆβ)). SE can measure the precision of estimation. SE can be used to construct (approximate) confidence intervals.

50 Standard Error If ˆβ = ψ(x 1,..., X n ) is a estimate of an population parameter β, it has a sampling distribution F ˆβ(x). The standard deviation of the sampling distribution is σ ˆβ. An estimate of σ ˆβ is called the standard error(se) of ˆβ. If ˆβ has a limiting normal distribution, then ˆβ is approximately distributed N(β, SE( ˆβ)). SE can measure the precision of estimation. SE can be used to construct (approximate) confidence intervals.

51 Standard Error If ˆβ = ψ(x 1,..., X n ) is a estimate of an population parameter β, it has a sampling distribution F ˆβ(x). The standard deviation of the sampling distribution is σ ˆβ. An estimate of σ ˆβ is called the standard error(se) of ˆβ. If ˆβ has a limiting normal distribution, then ˆβ is approximately distributed N(β, SE( ˆβ)). SE can measure the precision of estimation. SE can be used to construct (approximate) confidence intervals.

52 Standard Error If ˆβ = ψ(x 1,..., X n ) is a estimate of an population parameter β, it has a sampling distribution F ˆβ(x). The standard deviation of the sampling distribution is σ ˆβ. An estimate of σ ˆβ is called the standard error(se) of ˆβ. If ˆβ has a limiting normal distribution, then ˆβ is approximately distributed N(β, SE( ˆβ)). SE can measure the precision of estimation. SE can be used to construct (approximate) confidence intervals.

53 Hypothesis Tests To test the null hypothesis H 0 agaist the alternative hypothesis H a on the population; Given sample X 1, X 2,..., X n from population F(x; θ), construct some statistic ξ, whose distribution does not depend on θ, but its value can indicate the possible choice of H 0 or H a. Traditionally, we first choose an significance level α, then we find a rejection area W, where sup H0 P(ξ W ) α. We reject H 0 when ξ W.

54 Hypothesis Tests To test the null hypothesis H 0 agaist the alternative hypothesis H a on the population; Given sample X 1, X 2,..., X n from population F(x; θ), construct some statistic ξ, whose distribution does not depend on θ, but its value can indicate the possible choice of H 0 or H a. Traditionally, we first choose an significance level α, then we find a rejection area W, where sup H0 P(ξ W ) α. We reject H 0 when ξ W.

55 Hypothesis Tests To test the null hypothesis H 0 agaist the alternative hypothesis H a on the population; Given sample X 1, X 2,..., X n from population F(x; θ), construct some statistic ξ, whose distribution does not depend on θ, but its value can indicate the possible choice of H 0 or H a. Traditionally, we first choose an significance level α, then we find a rejection area W, where sup H0 P(ξ W ) α. We reject H 0 when ξ W.

56 Errors and P-Values Two types of possible errors in a hypothesis test: Type I error, H 0 true but rejected. Error rate is at most the significance level α. Type II error, H0 false but accepted. Error rate can be as large as 1 α. To reduce type II error: Construct theoretically good tests. Don t choose α too small. Choose a big enough sample size n. p-value: the minimum α we can use if we want to reject H 0, after the test statistic value is known. The smaller the p-value is, the more confidence we have when we reject H 0. Reject H 0 if and only if the p-value is less than α.

57 Errors and P-Values Two types of possible errors in a hypothesis test: Type I error, H 0 true but rejected. Error rate is at most the significance level α. Type II error, H0 false but accepted. Error rate can be as large as 1 α. To reduce type II error: Construct theoretically good tests. Don t choose α too small. Choose a big enough sample size n. p-value: the minimum α we can use if we want to reject H 0, after the test statistic value is known. The smaller the p-value is, the more confidence we have when we reject H 0. Reject H 0 if and only if the p-value is less than α.

58 Errors and P-Values Two types of possible errors in a hypothesis test: Type I error, H 0 true but rejected. Error rate is at most the significance level α. Type II error, H0 false but accepted. Error rate can be as large as 1 α. To reduce type II error: Construct theoretically good tests. Don t choose α too small. Choose a big enough sample size n. p-value: the minimum α we can use if we want to reject H 0, after the test statistic value is known. The smaller the p-value is, the more confidence we have when we reject H 0. Reject H 0 if and only if the p-value is less than α.

59 Errors and P-Values Two types of possible errors in a hypothesis test: Type I error, H 0 true but rejected. Error rate is at most the significance level α. Type II error, H0 false but accepted. Error rate can be as large as 1 α. To reduce type II error: Construct theoretically good tests. Don t choose α too small. Choose a big enough sample size n. p-value: the minimum α we can use if we want to reject H 0, after the test statistic value is known. The smaller the p-value is, the more confidence we have when we reject H 0. Reject H 0 if and only if the p-value is less than α.

60 Errors and P-Values Two types of possible errors in a hypothesis test: Type I error, H 0 true but rejected. Error rate is at most the significance level α. Type II error, H0 false but accepted. Error rate can be as large as 1 α. To reduce type II error: Construct theoretically good tests. Don t choose α too small. Choose a big enough sample size n. p-value: the minimum α we can use if we want to reject H 0, after the test statistic value is known. The smaller the p-value is, the more confidence we have when we reject H 0. Reject H 0 if and only if the p-value is less than α.

61 Errors and P-Values Two types of possible errors in a hypothesis test: Type I error, H 0 true but rejected. Error rate is at most the significance level α. Type II error, H0 false but accepted. Error rate can be as large as 1 α. To reduce type II error: Construct theoretically good tests. Don t choose α too small. Choose a big enough sample size n. p-value: the minimum α we can use if we want to reject H 0, after the test statistic value is known. The smaller the p-value is, the more confidence we have when we reject H 0. Reject H 0 if and only if the p-value is less than α.

62 Errors and P-Values Two types of possible errors in a hypothesis test: Type I error, H 0 true but rejected. Error rate is at most the significance level α. Type II error, H0 false but accepted. Error rate can be as large as 1 α. To reduce type II error: Construct theoretically good tests. Don t choose α too small. Choose a big enough sample size n. p-value: the minimum α we can use if we want to reject H 0, after the test statistic value is known. The smaller the p-value is, the more confidence we have when we reject H 0. Reject H 0 if and only if the p-value is less than α.

63 Errors and P-Values Two types of possible errors in a hypothesis test: Type I error, H 0 true but rejected. Error rate is at most the significance level α. Type II error, H0 false but accepted. Error rate can be as large as 1 α. To reduce type II error: Construct theoretically good tests. Don t choose α too small. Choose a big enough sample size n. p-value: the minimum α we can use if we want to reject H 0, after the test statistic value is known. The smaller the p-value is, the more confidence we have when we reject H 0. Reject H 0 if and only if the p-value is less than α.

64 Errors and P-Values Two types of possible errors in a hypothesis test: Type I error, H 0 true but rejected. Error rate is at most the significance level α. Type II error, H0 false but accepted. Error rate can be as large as 1 α. To reduce type II error: Construct theoretically good tests. Don t choose α too small. Choose a big enough sample size n. p-value: the minimum α we can use if we want to reject H 0, after the test statistic value is known. The smaller the p-value is, the more confidence we have when we reject H 0. Reject H 0 if and only if the p-value is less than α.

65 Example hypothesis test X N(µ, σ 2 ). Sample is X 1,..., X n. H 0 : µ µ 0 H a : µ > µ 0. Test statistic T = X µ 0 SE( X) where SE( X) = S/ n, S is the sample standard deviation. Rejection area: W = {ξ > λ}, λ = F 1 (1 α, n 1), F 1 (p, n) is the quantile function of the t(n 1) distribution. P-value is 1 F(T ; n 1), where F(x; n) is the CDF of the t(n) distribution. The larger T is, the smaller the p-value is, the more confidence we have when we reject H 0.

66 Example hypothesis test X N(µ, σ 2 ). Sample is X 1,..., X n. H 0 : µ µ 0 H a : µ > µ 0. Test statistic T = X µ 0 SE( X) where SE( X) = S/ n, S is the sample standard deviation. Rejection area: W = {ξ > λ}, λ = F 1 (1 α, n 1), F 1 (p, n) is the quantile function of the t(n 1) distribution. P-value is 1 F(T ; n 1), where F(x; n) is the CDF of the t(n) distribution. The larger T is, the smaller the p-value is, the more confidence we have when we reject H 0.

67 Example hypothesis test X N(µ, σ 2 ). Sample is X 1,..., X n. H 0 : µ µ 0 H a : µ > µ 0. Test statistic T = X µ 0 SE( X) where SE( X) = S/ n, S is the sample standard deviation. Rejection area: W = {ξ > λ}, λ = F 1 (1 α, n 1), F 1 (p, n) is the quantile function of the t(n 1) distribution. P-value is 1 F(T ; n 1), where F(x; n) is the CDF of the t(n) distribution. The larger T is, the smaller the p-value is, the more confidence we have when we reject H 0.

68 Example hypothesis test X N(µ, σ 2 ). Sample is X 1,..., X n. H 0 : µ µ 0 H a : µ > µ 0. Test statistic T = X µ 0 SE( X) where SE( X) = S/ n, S is the sample standard deviation. Rejection area: W = {ξ > λ}, λ = F 1 (1 α, n 1), F 1 (p, n) is the quantile function of the t(n 1) distribution. P-value is 1 F(T ; n 1), where F(x; n) is the CDF of the t(n) distribution. The larger T is, the smaller the p-value is, the more confidence we have when we reject H 0.

69 Example hypothesis test X N(µ, σ 2 ). Sample is X 1,..., X n. H 0 : µ µ 0 H a : µ > µ 0. Test statistic T = X µ 0 SE( X) where SE( X) = S/ n, S is the sample standard deviation. Rejection area: W = {ξ > λ}, λ = F 1 (1 α, n 1), F 1 (p, n) is the quantile function of the t(n 1) distribution. P-value is 1 F(T ; n 1), where F(x; n) is the CDF of the t(n) distribution. The larger T is, the smaller the p-value is, the more confidence we have when we reject H 0.

70 Example hypothesis test X N(µ, σ 2 ). Sample is X 1,..., X n. H 0 : µ µ 0 H a : µ > µ 0. Test statistic T = X µ 0 SE( X) where SE( X) = S/ n, S is the sample standard deviation. Rejection area: W = {ξ > λ}, λ = F 1 (1 α, n 1), F 1 (p, n) is the quantile function of the t(n 1) distribution. P-value is 1 F(T ; n 1), where F(x; n) is the CDF of the t(n) distribution. The larger T is, the smaller the p-value is, the more confidence we have when we reject H 0.

71 Section Contents Measurment level of variables. statistics for nominal variables. statistics for interval variables. Histograms, boxplots, QQ plots, probability plots, stem-leaf plots. Using PROC FREQ, PROC MEANS, PROC UNIVARIATE. statistics Summary statistics

72 Section Contents Measurment level of variables. statistics for nominal variables. statistics for interval variables. Histograms, boxplots, QQ plots, probability plots, stem-leaf plots. Using PROC FREQ, PROC MEANS, PROC UNIVARIATE. statistics Summary statistics

73 Section Contents Measurment level of variables. statistics for nominal variables. statistics for interval variables. Histograms, boxplots, QQ plots, probability plots, stem-leaf plots. Using PROC FREQ, PROC MEANS, PROC UNIVARIATE. statistics Summary statistics

74 Section Contents Measurment level of variables. statistics for nominal variables. statistics for interval variables. Histograms, boxplots, QQ plots, probability plots, stem-leaf plots. Using PROC FREQ, PROC MEANS, PROC UNIVARIATE. statistics Summary statistics

75 Section Contents Measurment level of variables. statistics for nominal variables. statistics for interval variables. Histograms, boxplots, QQ plots, probability plots, stem-leaf plots. Using PROC FREQ, PROC MEANS, PROC UNIVARIATE. statistics Summary statistics

76 Measurement levels Nominal, such as sex, job type, race. Ordinal, such as age group(child, youth, medium, old), dose level(none, low, high). Interval, such as weight, blood pressure, heart rate. statistics Summary statistics

77 Measurement levels Nominal, such as sex, job type, race. Ordinal, such as age group(child, youth, medium, old), dose level(none, low, high). Interval, such as weight, blood pressure, heart rate. statistics Summary statistics

78 Measurement levels Nominal, such as sex, job type, race. Ordinal, such as age group(child, youth, medium, old), dose level(none, low, high). Interval, such as weight, blood pressure, heart rate. statistics Summary statistics

79 Statistics for nominal variables The value set. The count of each value(called frequency), and the percentage. Use a bar chart to show the distribution. statistics Summary statistics

80 Statistics for nominal variables The value set. The count of each value(called frequency), and the percentage. Use a bar chart to show the distribution. statistics Summary statistics

81 Statistics for nominal variables The value set. The count of each value(called frequency), and the percentage. Use a bar chart to show the distribution. statistics Summary statistics

82 Example: frequency table Use PROC FREQ to list the value set and the frequency counts, percents. proc freq data=sasuser.class; tables sex; run; Gender sex Frequency Percent Cumulative Cumulative Frequency Percent F M statistics Summary statistics

83 Example: bar chart Use PROC GCHART to make a bar chart. hbar could be replaced with vbar, hbar3d, vbar3d. proc gchart data=sasuser.class; hbar sex; run; statistics Summary statistics

84 statistics for interval variables Location statistics: sample mean, sample median. Variability statistics: sample standard deviation, sample interquantile range, range, coefficient of variation. Sample skewness Sample kurtosis Moments, quantiles. n (n 1)(n 2) n(n+1) (n 1)(n 2)(n 3) ( ) y i y 3. s (yi y) 4 3(n 1)2 s 4 (n 2)(n 3) statistics Summary statistics

85 statistics for interval variables Location statistics: sample mean, sample median. Variability statistics: sample standard deviation, sample interquantile range, range, coefficient of variation. Sample skewness Sample kurtosis Moments, quantiles. n (n 1)(n 2) n(n+1) (n 1)(n 2)(n 3) ( ) y i y 3. s (yi y) 4 3(n 1)2 s 4 (n 2)(n 3) statistics Summary statistics

86 statistics for interval variables Location statistics: sample mean, sample median. Variability statistics: sample standard deviation, sample interquantile range, range, coefficient of variation. Sample skewness Sample kurtosis Moments, quantiles. n (n 1)(n 2) n(n+1) (n 1)(n 2)(n 3) ( ) y i y 3. s (yi y) 4 3(n 1)2 s 4 (n 2)(n 3) statistics Summary statistics

87 statistics for interval variables Location statistics: sample mean, sample median. Variability statistics: sample standard deviation, sample interquantile range, range, coefficient of variation. Sample skewness Sample kurtosis Moments, quantiles. n (n 1)(n 2) n(n+1) (n 1)(n 2)(n 3) ( ) y i y 3. s (yi y) 4 3(n 1)2 s 4 (n 2)(n 3) statistics Summary statistics

88 statistics for interval variables Location statistics: sample mean, sample median. Variability statistics: sample standard deviation, sample interquantile range, range, coefficient of variation. Sample skewness Sample kurtosis Moments, quantiles. n (n 1)(n 2) n(n+1) (n 1)(n 2)(n 3) ( ) y i y 3. s (yi y) 4 3(n 1)2 s 4 (n 2)(n 3) statistics Summary statistics

89 PROC UNIVARIATE Use PROC UNIVARIATE to compute sample statistics for an interval variable. Example: proc univariate data=sasuser.class; var height; run; statistics Summary statistics

90 PROC UNIVARIATE Use PROC UNIVARIATE to compute sample statistics for an interval variable. Example: proc univariate data=sasuser.class; var height; run; statistics Summary statistics

91 Output Moments N 19 Sum Weights Sum Observations Std Deviation Skewness Kurtosis Uncorrected SS Corrected SS Coeff Variation Std Error Basic Statistical Measures Location Variability Std Deviation Median Mode Range Interquartile Range statistics Summary statistics Note The mode displayed is the smallest of 2 modes with a count of 2.

92 Tests for Location: Mu0=0 Test Statistic p Value Student s t t Pr > t <.0001 Sign M 9.5 Pr >= M <.0001 Signed Rank S 95 Pr >= S <.0001 statistics Summary statistics

93 Quantiles (Definition 5) Quantile Estimate 100% Max % % % % Q % Median % Q % % % % Min 51.3 statistics Summary statistics

94 Extreme Observations Lowest Highest Value Obs Value Obs statistics Summary statistics

95 Histograms The histogram is a nonparametric estimate of the population density function. It divide the value set into intervals, and count the percent of observations in each interval. Example: proc univariate data=sasuser.class noprint; var height; histogram; run; statistics Summary statistics

96 Histograms The histogram is a nonparametric estimate of the population density function. It divide the value set into intervals, and count the percent of observations in each interval. Example: proc univariate data=sasuser.class noprint; var height; histogram; run; statistics Summary statistics

97 Histograms The histogram is a nonparametric estimate of the population density function. It divide the value set into intervals, and count the percent of observations in each interval. Example: proc univariate data=sasuser.class noprint; var height; histogram; run; statistics Summary statistics

98 Examples of histograms Skewness shown in histograms. statistics Summary statistics

99 Box plot Quantiles could be used to describe key distribution charasteristics. The box plot shows the median, 1/4 and 3/4 quantile, and the minimum and maximum of a variable. The box holds the middle 50% range. Length is the IQR(inter quantile range). The wiskers extend to the minimum and the maximum; but the length of the wisker is not allowed to exceed 1.5 times the IQR. Points beyond wiskers are plotted as individual points. They are candidate for outliers. statistics Summary statistics

100 Box plot Quantiles could be used to describe key distribution charasteristics. The box plot shows the median, 1/4 and 3/4 quantile, and the minimum and maximum of a variable. The box holds the middle 50% range. Length is the IQR(inter quantile range). The wiskers extend to the minimum and the maximum; but the length of the wisker is not allowed to exceed 1.5 times the IQR. Points beyond wiskers are plotted as individual points. They are candidate for outliers. statistics Summary statistics

101 Box plot Quantiles could be used to describe key distribution charasteristics. The box plot shows the median, 1/4 and 3/4 quantile, and the minimum and maximum of a variable. The box holds the middle 50% range. Length is the IQR(inter quantile range). The wiskers extend to the minimum and the maximum; but the length of the wisker is not allowed to exceed 1.5 times the IQR. Points beyond wiskers are plotted as individual points. They are candidate for outliers. statistics Summary statistics

102 Box plot Quantiles could be used to describe key distribution charasteristics. The box plot shows the median, 1/4 and 3/4 quantile, and the minimum and maximum of a variable. The box holds the middle 50% range. Length is the IQR(inter quantile range). The wiskers extend to the minimum and the maximum; but the length of the wisker is not allowed to exceed 1.5 times the IQR. Points beyond wiskers are plotted as individual points. They are candidate for outliers. statistics Summary statistics

103 Box plot Quantiles could be used to describe key distribution charasteristics. The box plot shows the median, 1/4 and 3/4 quantile, and the minimum and maximum of a variable. The box holds the middle 50% range. Length is the IQR(inter quantile range). The wiskers extend to the minimum and the maximum; but the length of the wisker is not allowed to exceed 1.5 times the IQR. Points beyond wiskers are plotted as individual points. They are candidate for outliers. statistics Summary statistics

104 Example: boxplot of height statistics Summary statistics

105 proc sql; create view work._tmp_0 as select height, 1 as _dummy_ from sasuser.class; title ; axis1 major=none value=none label=none; proc boxplot data=_tmp_0; plot height*_dummy_ / boxstyle=skematic cboxfill=blue haxis=axis1; run; statistics Summary statistics

106 Example: boxplot of GPA and CO Skewness and possible outliers shown in boxplot. statistics Summary statistics

107 Grouped boxplot statistics Summary statistics

108 proc sort data=sasuser.gpa out=_tmp_1; by sex; proc boxplot data=_tmp_1; plot gpa*sex / boxstyle=skematic cboxfill=blue; run; statistics Summary statistics

109 QQ Plot The normal distribution is the most commonly used distribution. Graphs are designed to show discrepency from the normal distribution. QQ plot is one of them. Suppose Y 1, Y 2,..., Y n is from N(µ,σ 2 ) and sorted ascendingly. CDF F(x) = Φ( x µ σ ), F(Y i) i n, Φ( Y i µ σ ) i n, Y i µ + σφ 1 ( i n ). Let x i = Φ 1 ( i n ), plot (x i, Y i ),i = 1,..., n. The points should lie around the line y = µ + σx. Continuity adjustment: x i = Φ 1 ( i n+0.25 ). statistics Summary statistics

110 QQ Plot The normal distribution is the most commonly used distribution. Graphs are designed to show discrepency from the normal distribution. QQ plot is one of them. Suppose Y 1, Y 2,..., Y n is from N(µ,σ 2 ) and sorted ascendingly. CDF F(x) = Φ( x µ σ ), F(Y i) i n, Φ( Y i µ σ ) i n, Y i µ + σφ 1 ( i n ). Let x i = Φ 1 ( i n ), plot (x i, Y i ),i = 1,..., n. The points should lie around the line y = µ + σx. Continuity adjustment: x i = Φ 1 ( i n+0.25 ). statistics Summary statistics

111 QQ Plot The normal distribution is the most commonly used distribution. Graphs are designed to show discrepency from the normal distribution. QQ plot is one of them. Suppose Y 1, Y 2,..., Y n is from N(µ,σ 2 ) and sorted ascendingly. CDF F(x) = Φ( x µ σ ), F(Y i) i n, Φ( Y i µ σ ) i n, Y i µ + σφ 1 ( i n ). Let x i = Φ 1 ( i n ), plot (x i, Y i ),i = 1,..., n. The points should lie around the line y = µ + σx. Continuity adjustment: x i = Φ 1 ( i n+0.25 ). statistics Summary statistics

112 QQ Plot The normal distribution is the most commonly used distribution. Graphs are designed to show discrepency from the normal distribution. QQ plot is one of them. Suppose Y 1, Y 2,..., Y n is from N(µ,σ 2 ) and sorted ascendingly. CDF F(x) = Φ( x µ σ ), F(Y i) i n, Φ( Y i µ σ ) i n, Y i µ + σφ 1 ( i n ). Let x i = Φ 1 ( i n ), plot (x i, Y i ),i = 1,..., n. The points should lie around the line y = µ + σx. Continuity adjustment: x i = Φ 1 ( i n+0.25 ). statistics Summary statistics

113 QQ Plot The normal distribution is the most commonly used distribution. Graphs are designed to show discrepency from the normal distribution. QQ plot is one of them. Suppose Y 1, Y 2,..., Y n is from N(µ,σ 2 ) and sorted ascendingly. CDF F(x) = Φ( x µ σ ), F(Y i) i n, Φ( Y i µ σ ) i n, Y i µ + σφ 1 ( i n ). Let x i = Φ 1 ( i n ), plot (x i, Y i ),i = 1,..., n. The points should lie around the line y = µ + σx. Continuity adjustment: x i = Φ 1 ( i n+0.25 ). statistics Summary statistics

114 QQ Plot The normal distribution is the most commonly used distribution. Graphs are designed to show discrepency from the normal distribution. QQ plot is one of them. Suppose Y 1, Y 2,..., Y n is from N(µ,σ 2 ) and sorted ascendingly. CDF F(x) = Φ( x µ σ ), F(Y i) i n, Φ( Y i µ σ ) i n, Y i µ + σφ 1 ( i n ). Let x i = Φ 1 ( i n ), plot (x i, Y i ),i = 1,..., n. The points should lie around the line y = µ + σx. Continuity adjustment: x i = Φ 1 ( i n+0.25 ). statistics Summary statistics

115 Example: QQ Plot of HEIGHT proc univariate data=sasuser.class noprint; qqplot height / normal(mu=est sigma=est); run; statistics Summary statistics

116 Example: Skewness in QQ Plot statistics Summary statistics

117 Probability Plot Probability plot is the same plot as a QQ plot, exept that the label of the x axis Φ(x i ) instead of x i. Probability plots are preferable for graphical estimation of percentiles, whereas Q-Q plots are preferable for graphical estimation of distribution parameters. statistics Summary statistics

118 Probability Plot Probability plot is the same plot as a QQ plot, exept that the label of the x axis Φ(x i ) instead of x i. Probability plots are preferable for graphical estimation of percentiles, whereas Q-Q plots are preferable for graphical estimation of distribution parameters. statistics Summary statistics

119 Example: Probability Plot of HEIGHT proc univariate data=sasuser.class noprint; probplot height / normal(mu=est sigma=est); run; statistics Summary statistics

120 The Stem-leaf Plot The stem-leaf plot is a text-based plot, which display information like the histogram, but with detail on each data value. Each leaf corresponds to one data value. PROC UNIVARIATE has an option PLOT, which generates text-based stem-leaf plot, boxplot and QQ plot. statistics Summary statistics

121 The Stem-leaf Plot The stem-leaf plot is a text-based plot, which display information like the histogram, but with detail on each data value. Each leaf corresponds to one data value. PROC UNIVARIATE has an option PLOT, which generates text-based stem-leaf plot, boxplot and QQ plot. statistics Summary statistics

122 proc univariate data=sasuser.class plot; var height; run; Stem Leaf # Multiply Stem.Leaf by 10**+1 statistics Summary statistics

123 PROC MEANS and PROC SUMMARY PROC MEANS and PROC SUMMARY are used to produce summary statistics. They can display overall summary statistics and classified summary statistics. PROC MEANS displays result by default; PROC SUMMARY need the PRINT option to display result. They can output data sets with the summary statistics. PROC SUMMARY is designed to do this, although PROC MEANS can do the same. statistics Summary statistics

124 PROC MEANS and PROC SUMMARY PROC MEANS and PROC SUMMARY are used to produce summary statistics. They can display overall summary statistics and classified summary statistics. PROC MEANS displays result by default; PROC SUMMARY need the PRINT option to display result. They can output data sets with the summary statistics. PROC SUMMARY is designed to do this, although PROC MEANS can do the same. statistics Summary statistics

125 PROC MEANS and PROC SUMMARY PROC MEANS and PROC SUMMARY are used to produce summary statistics. They can display overall summary statistics and classified summary statistics. PROC MEANS displays result by default; PROC SUMMARY need the PRINT option to display result. They can output data sets with the summary statistics. PROC SUMMARY is designed to do this, although PROC MEANS can do the same. statistics Summary statistics

126 Example of overall statistics Use the VAR statement to specify which varibles to summarize. Example of overall statistics: proc means data=sasuser.class; var height weight; run; proc summary data=sasuser.class print; var height weight; run; statistics Summary statistics

127 Example of overall statistics Use the VAR statement to specify which varibles to summarize. Example of overall statistics: proc means data=sasuser.class; var height weight; run; proc summary data=sasuser.class print; var height weight; run; statistics Summary statistics

128 Classified summary Use the CLASS statement to specify one or more class variables. Example: proc means data=sasuser.class ; var height weight; class sex; run; statistics Summary statistics

129 Classified summary Use the CLASS statement to specify one or more class variables. Example: proc means data=sasuser.class ; var height weight; class sex; run; statistics Summary statistics

130 Other statistical measures PROC MEANS calculate summaries N,, Standard Deviation, Minimum, Maximum by default. Use PROC MEANS options to specify other statistical measures. Example: proc means data=sasuser.class MEAN VAR CV; var height weight; class sex; run; statistics Summary statistics

131 Other statistical measures PROC MEANS calculate summaries N,, Standard Deviation, Minimum, Maximum by default. Use PROC MEANS options to specify other statistical measures. Example: proc means data=sasuser.class MEAN VAR CV; var height weight; class sex; run; statistics Summary statistics

132 Other statistical measures PROC MEANS calculate summaries N,, Standard Deviation, Minimum, Maximum by default. Use PROC MEANS options to specify other statistical measures. Example: proc means data=sasuser.class MEAN VAR CV; var height weight; class sex; run; statistics Summary statistics

133 Producing output datasets Use the OUTPUT statement to save the summary statistics to an output dataset. Syntax: OUTPUT OUT=output-dataset keyword= keyword= / AUTONAME; Where keyword is a name of some statistical measure, like MEAN, STD, N, MIN, MAX, etc. Example: statistics Summary statistics proc means data=sasuser.class MEAN VAR CV; var height weight; class sex; output out=res N= CV= / AUTONAME; run; proc print data=res;run;

134 Producing output datasets Use the OUTPUT statement to save the summary statistics to an output dataset. Syntax: OUTPUT OUT=output-dataset keyword= keyword= / AUTONAME; Where keyword is a name of some statistical measure, like MEAN, STD, N, MIN, MAX, etc. Example: statistics Summary statistics proc means data=sasuser.class MEAN VAR CV; var height weight; class sex; output out=res N= CV= / AUTONAME; run; proc print data=res;run;

135 Producing output datasets Use the OUTPUT statement to save the summary statistics to an output dataset. Syntax: OUTPUT OUT=output-dataset keyword= keyword= / AUTONAME; Where keyword is a name of some statistical measure, like MEAN, STD, N, MIN, MAX, etc. Example: statistics Summary statistics proc means data=sasuser.class MEAN VAR CV; var height weight; class sex; output out=res N= CV= / AUTONAME; run; proc print data=res;run;

136 Producing output datasets Use the OUTPUT statement to save the summary statistics to an output dataset. Syntax: OUTPUT OUT=output-dataset keyword= keyword= / AUTONAME; Where keyword is a name of some statistical measure, like MEAN, STD, N, MIN, MAX, etc. Example: statistics Summary statistics proc means data=sasuser.class MEAN VAR CV; var height weight; class sex; output out=res N= CV= / AUTONAME; run; proc print data=res;run;

137 One sample Z tests, t tests. Two sample t tests. The Wilcoxon rank sum test. Paired t tests. Comparing proportions: one sample and two sample. One Sample Test of the Comparing Two Groups Paired Comparison Comparing Proportions

138 One sample Z tests, t tests. Two sample t tests. The Wilcoxon rank sum test. Paired t tests. Comparing proportions: one sample and two sample. One Sample Test of the Comparing Two Groups Paired Comparison Comparing Proportions

139 One sample Z tests, t tests. Two sample t tests. The Wilcoxon rank sum test. Paired t tests. Comparing proportions: one sample and two sample. One Sample Test of the Comparing Two Groups Paired Comparison Comparing Proportions

140 One sample Z tests, t tests. Two sample t tests. The Wilcoxon rank sum test. Paired t tests. Comparing proportions: one sample and two sample. One Sample Test of the Comparing Two Groups Paired Comparison Comparing Proportions

141 One sample Z tests, t tests. Two sample t tests. The Wilcoxon rank sum test. Paired t tests. Comparing proportions: one sample and two sample. One Sample Test of the Comparing Two Groups Paired Comparison Comparing Proportions

142 One Sample Z Test Two-sided X N(µ, σ 2 0 ), σ 0 known. H 0 : µ = µ 0 H a : µ µ 0 Z = X µ 0 σ 0 / H 0 n N(0, 1). W = { Z > λ}, λ = Φ 1 (1 α/2). p-value=2(1 Φ( Z ). Wald test: Based on the central limit theorem, to test H 0 : θ = θ 0 H a : θ θ 0, use W = ˆθ θ 0 as the SE(ˆθ) test statistic, if W N(0, 1). One Sample Test of the Comparing Two Groups Paired Comparison Comparing Proportions

143 One Sample Z Test Two-sided X N(µ, σ 2 0 ), σ 0 known. H 0 : µ = µ 0 H a : µ µ 0 Z = X µ 0 σ 0 / H 0 n N(0, 1). W = { Z > λ}, λ = Φ 1 (1 α/2). p-value=2(1 Φ( Z ). Wald test: Based on the central limit theorem, to test H 0 : θ = θ 0 H a : θ θ 0, use W = ˆθ θ 0 as the SE(ˆθ) test statistic, if W N(0, 1). One Sample Test of the Comparing Two Groups Paired Comparison Comparing Proportions

144 One Sample Z Test Two-sided X N(µ, σ 2 0 ), σ 0 known. H 0 : µ = µ 0 H a : µ µ 0 Z = X µ 0 σ 0 / H 0 n N(0, 1). W = { Z > λ}, λ = Φ 1 (1 α/2). p-value=2(1 Φ( Z ). Wald test: Based on the central limit theorem, to test H 0 : θ = θ 0 H a : θ θ 0, use W = ˆθ θ 0 as the SE(ˆθ) test statistic, if W N(0, 1). One Sample Test of the Comparing Two Groups Paired Comparison Comparing Proportions

145 One Sample Z Test Two-sided X N(µ, σ 2 0 ), σ 0 known. H 0 : µ = µ 0 H a : µ µ 0 Z = X µ 0 σ 0 / H 0 n N(0, 1). W = { Z > λ}, λ = Φ 1 (1 α/2). p-value=2(1 Φ( Z ). Wald test: Based on the central limit theorem, to test H 0 : θ = θ 0 H a : θ θ 0, use W = ˆθ θ 0 as the SE(ˆθ) test statistic, if W N(0, 1). One Sample Test of the Comparing Two Groups Paired Comparison Comparing Proportions

146 One Sample Z Test Two-sided X N(µ, σ 2 0 ), σ 0 known. H 0 : µ = µ 0 H a : µ µ 0 Z = X µ 0 σ 0 / H 0 n N(0, 1). W = { Z > λ}, λ = Φ 1 (1 α/2). p-value=2(1 Φ( Z ). Wald test: Based on the central limit theorem, to test H 0 : θ = θ 0 H a : θ θ 0, use W = ˆθ θ 0 as the SE(ˆθ) test statistic, if W N(0, 1). One Sample Test of the Comparing Two Groups Paired Comparison Comparing Proportions

147 One Sample Z Test Two-sided X N(µ, σ 2 0 ), σ 0 known. H 0 : µ = µ 0 H a : µ µ 0 Z = X µ 0 σ 0 / H 0 n N(0, 1). W = { Z > λ}, λ = Φ 1 (1 α/2). p-value=2(1 Φ( Z ). Wald test: Based on the central limit theorem, to test H 0 : θ = θ 0 H a : θ θ 0, use W = ˆθ θ 0 as the SE(ˆθ) test statistic, if W N(0, 1). One Sample Test of the Comparing Two Groups Paired Comparison Comparing Proportions

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members

More information

Exploratory data analysis (Chapter 2) Fall 2011

Exploratory data analysis (Chapter 2) Fall 2011 Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,

More information

Exploratory Data Analysis

Exploratory Data Analysis Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction

More information

Seminar paper Statistics

Seminar paper Statistics Seminar paper Statistics The seminar paper must contain: - the title page - the characterization of the data (origin, reason why you have chosen this analysis,...) - the list of the data (in the table)

More information

4 Other useful features on the course web page. 5 Accessing SAS

4 Other useful features on the course web page. 5 Accessing SAS 1 Using SAS outside of ITCs Statistical Methods and Computing, 22S:30/105 Instructor: Cowles Lab 1 Jan 31, 2014 You can access SAS from off campus by using the ITC Virtual Desktop Go to https://virtualdesktopuiowaedu

More information

Statistical Analysis The First Steps Jennifer L. Waller Medical College of Georgia, Augusta, Georgia

Statistical Analysis The First Steps Jennifer L. Waller Medical College of Georgia, Augusta, Georgia Statistical Analysis The First Steps Jennifer L. Waller Medical College of Georgia, Augusta, Georgia ABSTRACT For both statisticians and non-statisticians, knowing what data look like before more rigorous

More information

Using pivots to construct confidence intervals. In Example 41 we used the fact that

Using pivots to construct confidence intervals. In Example 41 we used the fact that Using pivots to construct confidence intervals In Example 41 we used the fact that Q( X, µ) = X µ σ/ n N(0, 1) for all µ. We then said Q( X, µ) z α/2 with probability 1 α, and converted this into a statement

More information

Hypothesis Testing COMP 245 STATISTICS. Dr N A Heard. 1 Hypothesis Testing 2 1.1 Introduction... 2 1.2 Error Rates and Power of a Test...

Hypothesis Testing COMP 245 STATISTICS. Dr N A Heard. 1 Hypothesis Testing 2 1.1 Introduction... 2 1.2 Error Rates and Power of a Test... Hypothesis Testing COMP 45 STATISTICS Dr N A Heard Contents 1 Hypothesis Testing 1.1 Introduction........................................ 1. Error Rates and Power of a Test.............................

More information

How to Perform and Interpret Chi-Square and T-Tests Jennifer L. Waller, Georgia Health Sciences University, Augusta, Georgia

How to Perform and Interpret Chi-Square and T-Tests Jennifer L. Waller, Georgia Health Sciences University, Augusta, Georgia Paper HW-05 How to Perform and Interpret Chi-Square and T-Tests Jennifer L. Waller, Georgia Health Sciences University, Augusta, Georgia ABSTRACT For both statisticians and non-statisticians, knowing what

More information

Descriptive Statistics

Descriptive Statistics Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web

More information

Statistics 641 - EXAM II - 1999 through 2003

Statistics 641 - EXAM II - 1999 through 2003 Statistics 641 - EXAM II - 1999 through 2003 December 1, 1999 I. (40 points ) Place the letter of the best answer in the blank to the left of each question. (1) In testing H 0 : µ 5 vs H 1 : µ > 5, the

More information

DATA INTERPRETATION AND STATISTICS

DATA INTERPRETATION AND STATISTICS PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

More information

Data Mining Part 2. Data Understanding and Preparation 2.1 Data Understanding Spring 2010

Data Mining Part 2. Data Understanding and Preparation 2.1 Data Understanding Spring 2010 Data Mining Part 2. and Preparation 2.1 Spring 2010 Instructor: Dr. Masoud Yaghini Introduction Outline Introduction Measuring the Central Tendency Measuring the Dispersion of Data Graphic Displays References

More information

A frequency distribution is a table used to describe a data set. A frequency table lists intervals or ranges of data values called data classes

A frequency distribution is a table used to describe a data set. A frequency table lists intervals or ranges of data values called data classes A frequency distribution is a table used to describe a data set. A frequency table lists intervals or ranges of data values called data classes together with the number of data values from the set that

More information

Exercise 1.12 (Pg. 22-23)

Exercise 1.12 (Pg. 22-23) Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

Basic Statistical and Modeling Procedures Using SAS

Basic Statistical and Modeling Procedures Using SAS Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom

More information

Final Exam Practice Problem Answers

Final Exam Practice Problem Answers Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

More information

Descriptive Statistics. Understanding Data: Categorical Variables. Descriptive Statistics. Dataset: Shellfish Contamination

Descriptive Statistics. Understanding Data: Categorical Variables. Descriptive Statistics. Dataset: Shellfish Contamination Descriptive Statistics Understanding Data: Dataset: Shellfish Contamination Location Year Species Species2 Method Metals Cadmium (mg kg - ) Chromium (mg kg - ) Copper (mg kg - ) Lead (mg kg - ) Mercury

More information

F. Farrokhyar, MPhil, PhD, PDoc

F. Farrokhyar, MPhil, PhD, PDoc Learning objectives Descriptive Statistics F. Farrokhyar, MPhil, PhD, PDoc To recognize different types of variables To learn how to appropriately explore your data How to display data using graphs How

More information

Univariate Descriptive Statistics

Univariate Descriptive Statistics Univariate Descriptive Statistics Displays: pie charts, bar graphs, box plots, histograms, density estimates, dot plots, stemleaf plots, tables, lists. Example: sea urchin sizes Boxplot Histogram Urchin

More information

2.0 Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table

2.0 Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table 2.0 Lesson Plan Answer Questions 1 Summary Statistics Histograms The Normal Distribution Using the Standard Normal Table 2. Summary Statistics Given a collection of data, one needs to find representations

More information

Chapter 2: Exploring Data with Graphs and Numerical Summaries. Graphical Measures- Graphs are used to describe the shape of a data set.

Chapter 2: Exploring Data with Graphs and Numerical Summaries. Graphical Measures- Graphs are used to describe the shape of a data set. Page 1 of 16 Chapter 2: Exploring Data with Graphs and Numerical Summaries Graphical Measures- Graphs are used to describe the shape of a data set. Section 1: Types of Variables In general, variable can

More information

9-3.4 Likelihood ratio test. Neyman-Pearson lemma

9-3.4 Likelihood ratio test. Neyman-Pearson lemma 9-3.4 Likelihood ratio test Neyman-Pearson lemma 9-1 Hypothesis Testing 9-1.1 Statistical Hypotheses Statistical hypothesis testing and confidence interval estimation of parameters are the fundamental

More information

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013 Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.1-1.6) Objectives

More information

Content DESCRIPTIVE STATISTICS. Data & Statistic. Statistics. Example: DATA VS. STATISTIC VS. STATISTICS

Content DESCRIPTIVE STATISTICS. Data & Statistic. Statistics. Example: DATA VS. STATISTIC VS. STATISTICS Content DESCRIPTIVE STATISTICS Dr Najib Majdi bin Yaacob MD, MPH, DrPH (Epidemiology) USM Unit of Biostatistics & Research Methodology School of Medical Sciences Universiti Sains Malaysia. Introduction

More information

Death on the Titanic

Death on the Titanic Death on the Titanic Introduction On its maiden voyage, the cruise ship Titanic collided with an iceberg and sank. There was much loss of life. It is of interest to test how well sample proportions from

More information

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) B Bar graph a diagram representing the frequency distribution for nominal or discrete data. It consists of a sequence

More information

Principle of Data Reduction

Principle of Data Reduction Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then

More information

Technology Step-by-Step Using StatCrunch

Technology Step-by-Step Using StatCrunch Technology Step-by-Step Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

More information

Introduction to Hypothesis Testing. Point estimation and confidence intervals are useful statistical inference procedures.

Introduction to Hypothesis Testing. Point estimation and confidence intervals are useful statistical inference procedures. Introduction to Hypothesis Testing Point estimation and confidence intervals are useful statistical inference procedures. Another type of inference is used frequently used concerns tests of hypotheses.

More information

Statistics Chapter 3 Averages and Variations

Statistics Chapter 3 Averages and Variations Statistics Chapter 3 Averages and Variations Measures of Central Tendency Average a measure of the center value or central tendency of a distribution of values. Three types of average: Mode Median Mean

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

Chapter 2. Objectives. Tabulate Qualitative Data. Frequency Table. Descriptive Statistics: Organizing, Displaying and Summarizing Data.

Chapter 2. Objectives. Tabulate Qualitative Data. Frequency Table. Descriptive Statistics: Organizing, Displaying and Summarizing Data. Objectives Chapter Descriptive Statistics: Organizing, Displaying and Summarizing Data Student should be able to Organize data Tabulate data into frequency/relative frequency tables Display data graphically

More information

MBA 611 STATISTICS AND QUANTITATIVE METHODS

MBA 611 STATISTICS AND QUANTITATIVE METHODS MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain

More information

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential

More information

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Types of Variables Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Quantitative (numerical)variables: take numerical values for which arithmetic operations make sense (addition/averaging)

More information

The Chi-Square Distributions

The Chi-Square Distributions MATH 183 The Chi-Square Distributions Dr. Neal, WKU The chi-square distributions can be used in statistics to analyze the standard deviation " of a normally distributed measurement and to test the goodness

More information

MEASURES OF LOCATION AND SPREAD

MEASURES OF LOCATION AND SPREAD Paper TU04 An Overview of Non-parametric Tests in SAS : When, Why, and How Paul A. Pappas and Venita DePuy Durham, North Carolina, USA ABSTRACT Most commonly used statistical procedures are based on the

More information

6. Distribution and Quantile Functions

6. Distribution and Quantile Functions Virtual Laboratories > 2. Distributions > 1 2 3 4 5 6 7 8 6. Distribution and Quantile Functions As usual, our starting point is a random experiment with probability measure P on an underlying sample spac

More information

Statistics and research

Statistics and research Statistics and research Usaneya Perngparn Chitlada Areesantichai Drug Dependence Research Center (WHOCC for Research and Training in Drug Dependence) College of Public Health Sciences Chulolongkorn University,

More information

Geostatistics Exploratory Analysis

Geostatistics Exploratory Analysis Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt

More information

Foundation of Quantitative Data Analysis

Foundation of Quantitative Data Analysis Foundation of Quantitative Data Analysis Part 1: Data manipulation and descriptive statistics with SPSS/Excel HSRS #10 - October 17, 2013 Reference : A. Aczel, Complete Business Statistics. Chapters 1

More information

Statistics 104: Section 6!

Statistics 104: Section 6! Page 1 Statistics 104: Section 6! TF: Deirdre (say: Dear-dra) Bloome Email: dbloome@fas.harvard.edu Section Times Thursday 2pm-3pm in SC 109, Thursday 5pm-6pm in SC 705 Office Hours: Thursday 6pm-7pm SC

More information

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals

More information

Variables and Data A variable contains data about anything we measure. For example; age or gender of the participants or their score on a test.

Variables and Data A variable contains data about anything we measure. For example; age or gender of the participants or their score on a test. The Analysis of Research Data The design of any project will determine what sort of statistical tests you should perform on your data and how successful the data analysis will be. For example if you decide

More information

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction CA200 Quantitative Analysis for Business Decisions File name: CA200_Section_04A_StatisticsIntroduction Table of Contents 4. Introduction to Statistics... 1 4.1 Overview... 3 4.2 Discrete or continuous

More information

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition Online Learning Centre Technology Step-by-Step - Excel Microsoft Excel is a spreadsheet software application

More information

Joint Exam 1/P Sample Exam 1

Joint Exam 1/P Sample Exam 1 Joint Exam 1/P Sample Exam 1 Take this practice exam under strict exam conditions: Set a timer for 3 hours; Do not stop the timer for restroom breaks; Do not look at your notes. If you believe a question

More information

Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217

Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217 Part 3 Comparing Groups Chapter 7 Comparing Paired Groups 189 Chapter 8 Comparing Two Independent Groups 217 Chapter 9 Comparing More Than Two Groups 257 188 Elementary Statistics Using SAS Chapter 7 Comparing

More information

Null Hypothesis H 0. The null hypothesis (denoted by H 0

Null Hypothesis H 0. The null hypothesis (denoted by H 0 Hypothesis test In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test (or test of significance) is a standard procedure for testing a claim about a property

More information

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Descriptive Statistics Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Statistics as a Tool for LIS Research Importance of statistics in research

More information

Guido s Guide to PROC MEANS A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY

Guido s Guide to PROC MEANS A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY Guido s Guide to PROC MEANS A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY ABSTRACT PROC MEANS is a basic procedure within BASE SAS

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for

More information

7 Hypothesis testing - one sample tests

7 Hypothesis testing - one sample tests 7 Hypothesis testing - one sample tests 7.1 Introduction Definition 7.1 A hypothesis is a statement about a population parameter. Example A hypothesis might be that the mean age of students taking MAS113X

More information

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators... MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................

More information

Chapter 3 RANDOM VARIATE GENERATION

Chapter 3 RANDOM VARIATE GENERATION Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

More information

VISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA

VISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA VISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA Csilla Csendes University of Miskolc, Hungary Department of Applied Mathematics ICAM 2010 Probability density functions A random variable X has density

More information

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties: Density Curve A density curve is the graph of a continuous probability distribution. It must satisfy the following properties: 1. The total area under the curve must equal 1. 2. Every point on the curve

More information

Numerical Summarization of Data OPRE 6301

Numerical Summarization of Data OPRE 6301 Numerical Summarization of Data OPRE 6301 Motivation... In the previous session, we used graphical techniques to describe data. For example: While this histogram provides useful insight, other interesting

More information

Experimental Design for Influential Factors of Rates on Massive Open Online Courses

Experimental Design for Influential Factors of Rates on Massive Open Online Courses Experimental Design for Influential Factors of Rates on Massive Open Online Courses December 12, 2014 Ning Li nli7@stevens.edu Qing Wei qwei1@stevens.edu Yating Lan ylan2@stevens.edu Yilin Wei ywei12@stevens.edu

More information

1 SAMPLE SIGN TEST. Non-Parametric Univariate Tests: 1 Sample Sign Test 1. A non-parametric equivalent of the 1 SAMPLE T-TEST.

1 SAMPLE SIGN TEST. Non-Parametric Univariate Tests: 1 Sample Sign Test 1. A non-parametric equivalent of the 1 SAMPLE T-TEST. Non-Parametric Univariate Tests: 1 Sample Sign Test 1 1 SAMPLE SIGN TEST A non-parametric equivalent of the 1 SAMPLE T-TEST. ASSUMPTIONS: Data is non-normally distributed, even after log transforming.

More information

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences. 1 Commands in JMP and Statcrunch Below are a set of commands in JMP and Statcrunch which facilitate a basic statistical analysis. The first part concerns commands in JMP, the second part is for analysis

More information

Lecture 2. Summarizing the Sample

Lecture 2. Summarizing the Sample Lecture 2 Summarizing the Sample WARNING: Today s lecture may bore some of you It s (sort of) not my fault I m required to teach you about what we re going to cover today. I ll try to make it as exciting

More information

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences Introduction to Statistics for Psychology and Quantitative Methods for Human Sciences Jonathan Marchini Course Information There is website devoted to the course at http://www.stats.ox.ac.uk/ marchini/phs.html

More information

SYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation

SYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation SYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu September 19, 2015 Outline

More information

Null Hypothesis Significance Testing Signifcance Level, Power, t-tests. 18.05 Spring 2014 Jeremy Orloff and Jonathan Bloom

Null Hypothesis Significance Testing Signifcance Level, Power, t-tests. 18.05 Spring 2014 Jeremy Orloff and Jonathan Bloom Null Hypothesis Significance Testing Signifcance Level, Power, t-tests 18.05 Spring 2014 Jeremy Orloff and Jonathan Bloom Simple and composite hypotheses Simple hypothesis: the sampling distribution is

More information

2. Describing Data. We consider 1. Graphical methods 2. Numerical methods 1 / 56

2. Describing Data. We consider 1. Graphical methods 2. Numerical methods 1 / 56 2. Describing Data We consider 1. Graphical methods 2. Numerical methods 1 / 56 General Use of Graphical and Numerical Methods Graphical methods can be used to visually and qualitatively present data and

More information

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),

More information

The Normal Distribution. Alan T. Arnholt Department of Mathematical Sciences Appalachian State University

The Normal Distribution. Alan T. Arnholt Department of Mathematical Sciences Appalachian State University The Normal Distribution Alan T. Arnholt Department of Mathematical Sciences Appalachian State University arnholt@math.appstate.edu Spring 2006 R Notes 1 Copyright c 2006 Alan T. Arnholt 2 Continuous Random

More information

Frequency Distributions

Frequency Distributions Displaying Data Frequency Distributions After collecting data, the first task for a researcher is to organize and summarize the data to get a general overview of the results. Remember, this is the goal

More information

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 seema@iasri.res.in Genomics A genome is an organism s

More information

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test Experimental Design Power and Sample Size Determination Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison November 3 8, 2011 To this point in the semester, we have largely

More information

Are Histograms Giving You Fits? New SAS Software for Analyzing Distributions

Are Histograms Giving You Fits? New SAS Software for Analyzing Distributions Are Histograms Giving You Fits? New SAS Software for Analyzing Distributions Nathan A. Curtis, SAS Institute Inc., Cary, NC Abstract Exploring and modeling the distribution of a data sample is a key step

More information

10-3 Measures of Central Tendency and Variation

10-3 Measures of Central Tendency and Variation 10-3 Measures of Central Tendency and Variation So far, we have discussed some graphical methods of data description. Now, we will investigate how statements of central tendency and variation can be used.

More information

Chapter 7 Section 1 Homework Set A

Chapter 7 Section 1 Homework Set A Chapter 7 Section 1 Homework Set A 7.15 Finding the critical value t *. What critical value t * from Table D (use software, go to the web and type t distribution applet) should be used to calculate the

More information

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September

More information

13.2 Measures of Central Tendency

13.2 Measures of Central Tendency 13.2 Measures of Central Tendency Measures of Central Tendency For a given set of numbers, it may be desirable to have a single number to serve as a kind of representative value around which all the numbers

More information

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables Discrete vs. continuous random variables Examples of continuous distributions o Uniform o Exponential o Normal Recall: A random

More information

5. Continuous Random Variables

5. Continuous Random Variables 5. Continuous Random Variables Continuous random variables can take any value in an interval. They are used to model physical characteristics such as time, length, position, etc. Examples (i) Let X be

More information

Interpretation of Somers D under four simple models

Interpretation of Somers D under four simple models Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms

More information

Desciptive Statistics Qualitative data Quantitative data Graphical methods Numerical methods

Desciptive Statistics Qualitative data Quantitative data Graphical methods Numerical methods Desciptive Statistics Qualitative data Quantitative data Graphical methods Numerical methods Qualitative data Data are classified in categories Non numerical (although may be numerically codified) Elements

More information

4. Continuous Random Variables, the Pareto and Normal Distributions

4. Continuous Random Variables, the Pareto and Normal Distributions 4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random

More information

Statistical Analysis I

Statistical Analysis I CTSI BERD Research Methods Seminar Series Statistical Analysis I Lan Kong, PhD Associate Professor Department of Public Health Sciences December 22, 2014 Biostatistics, Epidemiology, Research Design(BERD)

More information

Introduction to Quantitative Methods

Introduction to Quantitative Methods Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

More information

Nonparametric Two-Sample Tests. Nonparametric Tests. Sign Test

Nonparametric Two-Sample Tests. Nonparametric Tests. Sign Test Nonparametric Two-Sample Tests Sign test Mann-Whitney U-test (a.k.a. Wilcoxon two-sample test) Kolmogorov-Smirnov Test Wilcoxon Signed-Rank Test Tukey-Duckworth Test 1 Nonparametric Tests Recall, nonparametric

More information

Normal distribution. ) 2 /2σ. 2π σ

Normal distribution. ) 2 /2σ. 2π σ Normal distribution The normal distribution is the most widely known and used of all distributions. Because the normal distribution approximates many natural phenomena so well, it has developed into a

More information

Chapter 9: Hypothesis Testing Sections

Chapter 9: Hypothesis Testing Sections Chapter 9: Hypothesis Testing Sections - we are still here Skip: 9.2 Testing Simple Hypotheses Skip: 9.3 Uniformly Most Powerful Tests Skip: 9.4 Two-Sided Alternatives 9.5 The t Test 9.6 Comparing the

More information

Chapter 8 Introduction to Hypothesis Testing

Chapter 8 Introduction to Hypothesis Testing Chapter 8 Student Lecture Notes 8-1 Chapter 8 Introduction to Hypothesis Testing Fall 26 Fundamentals of Business Statistics 1 Chapter Goals After completing this chapter, you should be able to: Formulate

More information

Biostatistics Lab Notes

Biostatistics Lab Notes Biostatistics Lab Notes Page 1 Lab 1: Measurement and Sampling Biostatistics Lab Notes Because we used a chance mechanism to select our sample, each sample will differ. My data set (GerstmanB.sav), looks

More information

MINITAB ASSISTANT WHITE PAPER

MINITAB ASSISTANT WHITE PAPER MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

Variables. Exploratory Data Analysis

Variables. Exploratory Data Analysis Exploratory Data Analysis Exploratory Data Analysis involves both graphical displays of data and numerical summaries of data. A common situation is for a data set to be represented as a matrix. There is

More information

One-sample normal hypothesis Testing, paired t-test, two-sample normal inference, normal probability plots

One-sample normal hypothesis Testing, paired t-test, two-sample normal inference, normal probability plots 1 / 27 One-sample normal hypothesis Testing, paired t-test, two-sample normal inference, normal probability plots Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis

More information

Measuring the Power of a Test

Measuring the Power of a Test Textbook Reference: Chapter 9.5 Measuring the Power of a Test An economic problem motivates the statement of a null and alternative hypothesis. For a numeric data set, a decision rule can lead to the rejection

More information

DISTRIBUTION FITTING 1

DISTRIBUTION FITTING 1 1 DISTRIBUTION FITTING What Is Distribution Fitting? Distribution fitting is the procedure of selecting a statistical distribution that best fits to a data set generated by some random process. In other

More information

THE BINOMIAL DISTRIBUTION & PROBABILITY

THE BINOMIAL DISTRIBUTION & PROBABILITY REVISION SHEET STATISTICS 1 (MEI) THE BINOMIAL DISTRIBUTION & PROBABILITY The main ideas in this chapter are Probabilities based on selecting or arranging objects Probabilities based on the binomial distribution

More information

Stats on the TI 83 and TI 84 Calculator

Stats on the TI 83 and TI 84 Calculator Stats on the TI 83 and TI 84 Calculator Entering the sample values STAT button Left bracket { Right bracket } Store (STO) List L1 Comma Enter Example: Sample data are {5, 10, 15, 20} 1. Press 2 ND and

More information