Dongfeng Li. Autumn 2010


 Garey Flynn
 1 years ago
 Views:
Transcription
1 Autumn 2010
2 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis testing, basic analysis of variance.
3 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis testing, basic analysis of variance.
4 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis testing, basic analysis of variance.
5 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis testing, basic analysis of variance.
6 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis testing, basic analysis of variance.
7 Section Contents Review statistics concepts: Distribution, discrete distribution, continuouse distribution, PDF, CDF, quantile. Normal distribution., median, variance, standard deviation, interquantile range, skewness, kurtosis. Population, parameter, sample, statistics, estimates, sampling distribution. MLE, standard error. Hypothesis tests, two types of errors, pvalue.
8 Section Contents Review statistics concepts: Distribution, discrete distribution, continuouse distribution, PDF, CDF, quantile. Normal distribution., median, variance, standard deviation, interquantile range, skewness, kurtosis. Population, parameter, sample, statistics, estimates, sampling distribution. MLE, standard error. Hypothesis tests, two types of errors, pvalue.
9 Section Contents Review statistics concepts: Distribution, discrete distribution, continuouse distribution, PDF, CDF, quantile. Normal distribution., median, variance, standard deviation, interquantile range, skewness, kurtosis. Population, parameter, sample, statistics, estimates, sampling distribution. MLE, standard error. Hypothesis tests, two types of errors, pvalue.
10 Section Contents Review statistics concepts: Distribution, discrete distribution, continuouse distribution, PDF, CDF, quantile. Normal distribution., median, variance, standard deviation, interquantile range, skewness, kurtosis. Population, parameter, sample, statistics, estimates, sampling distribution. MLE, standard error. Hypothesis tests, two types of errors, pvalue.
11 Section Contents Review statistics concepts: Distribution, discrete distribution, continuouse distribution, PDF, CDF, quantile. Normal distribution., median, variance, standard deviation, interquantile range, skewness, kurtosis. Population, parameter, sample, statistics, estimates, sampling distribution. MLE, standard error. Hypothesis tests, two types of errors, pvalue.
12 Section Contents Review statistics concepts: Distribution, discrete distribution, continuouse distribution, PDF, CDF, quantile. Normal distribution., median, variance, standard deviation, interquantile range, skewness, kurtosis. Population, parameter, sample, statistics, estimates, sampling distribution. MLE, standard error. Hypothesis tests, two types of errors, pvalue.
13 Distribution Random Variable: Discrete, such as sex, patient/control, age group. Continuous, such as weight, blood pressure. Distribution: used to describe the relative chance of taking some value. For discrete variable X, use P(X = xi ), where {x i } are the value set of X. Called probability mass function(pmf). For continuous variable X, use the probability density function(pdf) f (x), where P(X (x ɛ, x + ɛ) f (x)(2ɛ).
14 Distribution Random Variable: Discrete, such as sex, patient/control, age group. Continuous, such as weight, blood pressure. Distribution: used to describe the relative chance of taking some value. For discrete variable X, use P(X = xi ), where {x i } are the value set of X. Called probability mass function(pmf). For continuous variable X, use the probability density function(pdf) f (x), where P(X (x ɛ, x + ɛ) f (x)(2ɛ).
15 Distribution Random Variable: Discrete, such as sex, patient/control, age group. Continuous, such as weight, blood pressure. Distribution: used to describe the relative chance of taking some value. For discrete variable X, use P(X = xi ), where {x i } are the value set of X. Called probability mass function(pmf). For continuous variable X, use the probability density function(pdf) f (x), where P(X (x ɛ, x + ɛ) f (x)(2ɛ).
16 Distribution Random Variable: Discrete, such as sex, patient/control, age group. Continuous, such as weight, blood pressure. Distribution: used to describe the relative chance of taking some value. For discrete variable X, use P(X = xi ), where {x i } are the value set of X. Called probability mass function(pmf). For continuous variable X, use the probability density function(pdf) f (x), where P(X (x ɛ, x + ɛ) f (x)(2ɛ).
17 Distribution Random Variable: Discrete, such as sex, patient/control, age group. Continuous, such as weight, blood pressure. Distribution: used to describe the relative chance of taking some value. For discrete variable X, use P(X = xi ), where {x i } are the value set of X. Called probability mass function(pmf). For continuous variable X, use the probability density function(pdf) f (x), where P(X (x ɛ, x + ɛ) f (x)(2ɛ).
18 Distribution Random Variable: Discrete, such as sex, patient/control, age group. Continuous, such as weight, blood pressure. Distribution: used to describe the relative chance of taking some value. For discrete variable X, use P(X = xi ), where {x i } are the value set of X. Called probability mass function(pmf). For continuous variable X, use the probability density function(pdf) f (x), where P(X (x ɛ, x + ɛ) f (x)(2ɛ).
19 CDF Cumulative distribution function(cdf) F(x): F(x) = P(X x) P(x (a, b]) = F(b) F(a) For discrete distribution, F(x) = P(X = x i ) x i x For continuous distribution, F(x) = x f (t)dt
20 CDF Cumulative distribution function(cdf) F(x): F(x) = P(X x) P(x (a, b]) = F(b) F(a) For discrete distribution, F(x) = P(X = x i ) x i x For continuous distribution, F(x) = x f (t)dt
21 CDF Cumulative distribution function(cdf) F(x): F(x) = P(X x) P(x (a, b]) = F(b) F(a) For discrete distribution, F(x) = P(X = x i ) x i x For continuous distribution, F(x) = x f (t)dt
22 Quantile function Quantile function: the inverse of CDF q(p) = F 1 (p), p (0, 1) if F (x) is 11 mapping. Generally, q(p) = x p where P(X x p ) p, P(X x p ) 1 p (x p can be nonunique.)
23 Quantile function Quantile function: the inverse of CDF q(p) = F 1 (p), p (0, 1) if F (x) is 11 mapping. Generally, q(p) = x p where P(X x p ) p, P(X x p ) 1 p (x p can be nonunique.)
24 The normal distribution Standard normal distribution(n(0,1)), PDF ϕ(x) = 1 2π e x2 2 Normal distribution N(µ, σ 2 ), PDF f (x) = ϕ( x µ σ ) = 1 2πσ exp{ (x µ)2 2σ 2 }
25 The normal distribution Standard normal distribution(n(0,1)), PDF ϕ(x) = 1 2π e x2 2 Normal distribution N(µ, σ 2 ), PDF f (x) = ϕ( x µ σ ) = 1 2πσ exp{ (x µ)2 2σ 2 }
26 Numerical charasteristics PDF is a curve with infinite number of points. We can use some numbers to describe the key part of a distribution. Firstly, the location measurement. The mean EX. The meadian, x0.5 = q(0.5) where (can be nonunique). Variability measurement. P(x x 0.5 ) 0.5, P(x x 0.5 ) 0.5 The standard deviation σ X = E(X EX) 2. Interquantile range q(0.75) q(0.25).
27 Numerical charasteristics PDF is a curve with infinite number of points. We can use some numbers to describe the key part of a distribution. Firstly, the location measurement. The mean EX. The meadian, x0.5 = q(0.5) where (can be nonunique). Variability measurement. P(x x 0.5 ) 0.5, P(x x 0.5 ) 0.5 The standard deviation σ X = E(X EX) 2. Interquantile range q(0.75) q(0.25).
28 Numerical charasteristics PDF is a curve with infinite number of points. We can use some numbers to describe the key part of a distribution. Firstly, the location measurement. The mean EX. The meadian, x0.5 = q(0.5) where (can be nonunique). Variability measurement. P(x x 0.5 ) 0.5, P(x x 0.5 ) 0.5 The standard deviation σ X = E(X EX) 2. Interquantile range q(0.75) q(0.25).
29 Numerical charasteristics PDF is a curve with infinite number of points. We can use some numbers to describe the key part of a distribution. Firstly, the location measurement. The mean EX. The meadian, x0.5 = q(0.5) where (can be nonunique). Variability measurement. P(x x 0.5 ) 0.5, P(x x 0.5 ) 0.5 The standard deviation σ X = E(X EX) 2. Interquantile range q(0.75) q(0.25).
30 Numerical charasteristics PDF is a curve with infinite number of points. We can use some numbers to describe the key part of a distribution. Firstly, the location measurement. The mean EX. The meadian, x0.5 = q(0.5) where (can be nonunique). Variability measurement. P(x x 0.5 ) 0.5, P(x x 0.5 ) 0.5 The standard deviation σ X = E(X EX) 2. Interquantile range q(0.75) q(0.25).
31 Numerical charasteristics PDF is a curve with infinite number of points. We can use some numbers to describe the key part of a distribution. Firstly, the location measurement. The mean EX. The meadian, x0.5 = q(0.5) where (can be nonunique). Variability measurement. P(x x 0.5 ) 0.5, P(x x 0.5 ) 0.5 The standard deviation σ X = E(X EX) 2. Interquantile range q(0.75) q(0.25).
32 Numerical charasteristics PDF is a curve with infinite number of points. We can use some numbers to describe the key part of a distribution. Firstly, the location measurement. The mean EX. The meadian, x0.5 = q(0.5) where (can be nonunique). Variability measurement. P(x x 0.5 ) 0.5, P(x x 0.5 ) 0.5 The standard deviation σ X = E(X EX) 2. Interquantile range q(0.75) q(0.25).
33 Numerical charasteristics PDF is a curve with infinite number of points. We can use some numbers to describe the key part of a distribution. Firstly, the location measurement. The mean EX. The meadian, x0.5 = q(0.5) where (can be nonunique). Variability measurement. P(x x 0.5 ) 0.5, P(x x 0.5 ) 0.5 The standard deviation σ X = E(X EX) 2. Interquantile range q(0.75) q(0.25).
34 Shape characteristics. Skewness ( X EX E σ X ) 3 Symmetric, leftskewed, rightskewed density. Shape characteristics. Kurtosis ( ) X EX 4 E 3 σ X Heavy tail problem. Multimodel distribution. E.g., the weight of 10 year old and 20 year old mixed together.
35 Shape characteristics. Skewness ( X EX E σ X ) 3 Symmetric, leftskewed, rightskewed density. Shape characteristics. Kurtosis ( ) X EX 4 E 3 σ X Heavy tail problem. Multimodel distribution. E.g., the weight of 10 year old and 20 year old mixed together.
36 Shape characteristics. Skewness ( X EX E σ X ) 3 Symmetric, leftskewed, rightskewed density. Shape characteristics. Kurtosis ( ) X EX 4 E 3 σ X Heavy tail problem. Multimodel distribution. E.g., the weight of 10 year old and 20 year old mixed together.
37 Shape characteristics. Skewness ( X EX E σ X ) 3 Symmetric, leftskewed, rightskewed density. Shape characteristics. Kurtosis ( ) X EX 4 E 3 σ X Heavy tail problem. Multimodel distribution. E.g., the weight of 10 year old and 20 year old mixed together.
38 Shape characteristics. Skewness ( X EX E σ X ) 3 Symmetric, leftskewed, rightskewed density. Shape characteristics. Kurtosis ( ) X EX 4 E 3 σ X Heavy tail problem. Multimodel distribution. E.g., the weight of 10 year old and 20 year old mixed together.
39 Population A population is modeled by a random variable or a distribution. Population parameters: unknown numbers which could decide the distribution. E.g., for N(µ, σ 2 ) population, (µ, σ 2 ) are the two unknown population parameters.
40 Population A population is modeled by a random variable or a distribution. Population parameters: unknown numbers which could decide the distribution. E.g., for N(µ, σ 2 ) population, (µ, σ 2 ) are the two unknown population parameters.
41 Sample A sample(typically) is n observations draw independently from the population, X 1, X 2,..., X n. A sample can be regarded as n numbers, or n iid random variables. Statistics: calculate some values to estimate the distribution of the population. Can be regarded as a random variable. Such as sample mean, sample standard deviation. Can be used to estimate unknown population parameters, called estimates. Sampling distribution: The distribution of an estimator or other statistic. E.g., if X 1,..., X n is a sample from N(µ, σ 2 ), then X N(µ, σ 2 /n).
42 Sample A sample(typically) is n observations draw independently from the population, X 1, X 2,..., X n. A sample can be regarded as n numbers, or n iid random variables. Statistics: calculate some values to estimate the distribution of the population. Can be regarded as a random variable. Such as sample mean, sample standard deviation. Can be used to estimate unknown population parameters, called estimates. Sampling distribution: The distribution of an estimator or other statistic. E.g., if X 1,..., X n is a sample from N(µ, σ 2 ), then X N(µ, σ 2 /n).
43 Sample A sample(typically) is n observations draw independently from the population, X 1, X 2,..., X n. A sample can be regarded as n numbers, or n iid random variables. Statistics: calculate some values to estimate the distribution of the population. Can be regarded as a random variable. Such as sample mean, sample standard deviation. Can be used to estimate unknown population parameters, called estimates. Sampling distribution: The distribution of an estimator or other statistic. E.g., if X 1,..., X n is a sample from N(µ, σ 2 ), then X N(µ, σ 2 /n).
44 Sample A sample(typically) is n observations draw independently from the population, X 1, X 2,..., X n. A sample can be regarded as n numbers, or n iid random variables. Statistics: calculate some values to estimate the distribution of the population. Can be regarded as a random variable. Such as sample mean, sample standard deviation. Can be used to estimate unknown population parameters, called estimates. Sampling distribution: The distribution of an estimator or other statistic. E.g., if X 1,..., X n is a sample from N(µ, σ 2 ), then X N(µ, σ 2 /n).
45 MLE(Maximimum Likelihood Estimate) MLE is a commonly used parameter estimation method. For random sample X 1, X 2,..., X n from population X, if PDF of PMF of X is f (x; β), then ˆβ = arg max β L(β) = arg max β n f (X i ; β) i=1 is called the MLE of the unknow parameter β. Under some conditions, when the sample size n, ˆβ has a limiting(approximate) distribution N(β, σ 2 β ).
46 MLE(Maximimum Likelihood Estimate) MLE is a commonly used parameter estimation method. For random sample X 1, X 2,..., X n from population X, if PDF of PMF of X is f (x; β), then ˆβ = arg max β L(β) = arg max β n f (X i ; β) i=1 is called the MLE of the unknow parameter β. Under some conditions, when the sample size n, ˆβ has a limiting(approximate) distribution N(β, σ 2 β ).
47 Standard Error If ˆβ = ψ(x 1,..., X n ) is a estimate of an population parameter β, it has a sampling distribution F ˆβ(x). The standard deviation of the sampling distribution is σ ˆβ. An estimate of σ ˆβ is called the standard error(se) of ˆβ. If ˆβ has a limiting normal distribution, then ˆβ is approximately distributed N(β, SE( ˆβ)). SE can measure the precision of estimation. SE can be used to construct (approximate) confidence intervals.
48 Standard Error If ˆβ = ψ(x 1,..., X n ) is a estimate of an population parameter β, it has a sampling distribution F ˆβ(x). The standard deviation of the sampling distribution is σ ˆβ. An estimate of σ ˆβ is called the standard error(se) of ˆβ. If ˆβ has a limiting normal distribution, then ˆβ is approximately distributed N(β, SE( ˆβ)). SE can measure the precision of estimation. SE can be used to construct (approximate) confidence intervals.
49 Standard Error If ˆβ = ψ(x 1,..., X n ) is a estimate of an population parameter β, it has a sampling distribution F ˆβ(x). The standard deviation of the sampling distribution is σ ˆβ. An estimate of σ ˆβ is called the standard error(se) of ˆβ. If ˆβ has a limiting normal distribution, then ˆβ is approximately distributed N(β, SE( ˆβ)). SE can measure the precision of estimation. SE can be used to construct (approximate) confidence intervals.
50 Standard Error If ˆβ = ψ(x 1,..., X n ) is a estimate of an population parameter β, it has a sampling distribution F ˆβ(x). The standard deviation of the sampling distribution is σ ˆβ. An estimate of σ ˆβ is called the standard error(se) of ˆβ. If ˆβ has a limiting normal distribution, then ˆβ is approximately distributed N(β, SE( ˆβ)). SE can measure the precision of estimation. SE can be used to construct (approximate) confidence intervals.
51 Standard Error If ˆβ = ψ(x 1,..., X n ) is a estimate of an population parameter β, it has a sampling distribution F ˆβ(x). The standard deviation of the sampling distribution is σ ˆβ. An estimate of σ ˆβ is called the standard error(se) of ˆβ. If ˆβ has a limiting normal distribution, then ˆβ is approximately distributed N(β, SE( ˆβ)). SE can measure the precision of estimation. SE can be used to construct (approximate) confidence intervals.
52 Standard Error If ˆβ = ψ(x 1,..., X n ) is a estimate of an population parameter β, it has a sampling distribution F ˆβ(x). The standard deviation of the sampling distribution is σ ˆβ. An estimate of σ ˆβ is called the standard error(se) of ˆβ. If ˆβ has a limiting normal distribution, then ˆβ is approximately distributed N(β, SE( ˆβ)). SE can measure the precision of estimation. SE can be used to construct (approximate) confidence intervals.
53 Hypothesis Tests To test the null hypothesis H 0 agaist the alternative hypothesis H a on the population; Given sample X 1, X 2,..., X n from population F(x; θ), construct some statistic ξ, whose distribution does not depend on θ, but its value can indicate the possible choice of H 0 or H a. Traditionally, we first choose an significance level α, then we find a rejection area W, where sup H0 P(ξ W ) α. We reject H 0 when ξ W.
54 Hypothesis Tests To test the null hypothesis H 0 agaist the alternative hypothesis H a on the population; Given sample X 1, X 2,..., X n from population F(x; θ), construct some statistic ξ, whose distribution does not depend on θ, but its value can indicate the possible choice of H 0 or H a. Traditionally, we first choose an significance level α, then we find a rejection area W, where sup H0 P(ξ W ) α. We reject H 0 when ξ W.
55 Hypothesis Tests To test the null hypothesis H 0 agaist the alternative hypothesis H a on the population; Given sample X 1, X 2,..., X n from population F(x; θ), construct some statistic ξ, whose distribution does not depend on θ, but its value can indicate the possible choice of H 0 or H a. Traditionally, we first choose an significance level α, then we find a rejection area W, where sup H0 P(ξ W ) α. We reject H 0 when ξ W.
56 Errors and PValues Two types of possible errors in a hypothesis test: Type I error, H 0 true but rejected. Error rate is at most the significance level α. Type II error, H0 false but accepted. Error rate can be as large as 1 α. To reduce type II error: Construct theoretically good tests. Don t choose α too small. Choose a big enough sample size n. pvalue: the minimum α we can use if we want to reject H 0, after the test statistic value is known. The smaller the pvalue is, the more confidence we have when we reject H 0. Reject H 0 if and only if the pvalue is less than α.
57 Errors and PValues Two types of possible errors in a hypothesis test: Type I error, H 0 true but rejected. Error rate is at most the significance level α. Type II error, H0 false but accepted. Error rate can be as large as 1 α. To reduce type II error: Construct theoretically good tests. Don t choose α too small. Choose a big enough sample size n. pvalue: the minimum α we can use if we want to reject H 0, after the test statistic value is known. The smaller the pvalue is, the more confidence we have when we reject H 0. Reject H 0 if and only if the pvalue is less than α.
58 Errors and PValues Two types of possible errors in a hypothesis test: Type I error, H 0 true but rejected. Error rate is at most the significance level α. Type II error, H0 false but accepted. Error rate can be as large as 1 α. To reduce type II error: Construct theoretically good tests. Don t choose α too small. Choose a big enough sample size n. pvalue: the minimum α we can use if we want to reject H 0, after the test statistic value is known. The smaller the pvalue is, the more confidence we have when we reject H 0. Reject H 0 if and only if the pvalue is less than α.
59 Errors and PValues Two types of possible errors in a hypothesis test: Type I error, H 0 true but rejected. Error rate is at most the significance level α. Type II error, H0 false but accepted. Error rate can be as large as 1 α. To reduce type II error: Construct theoretically good tests. Don t choose α too small. Choose a big enough sample size n. pvalue: the minimum α we can use if we want to reject H 0, after the test statistic value is known. The smaller the pvalue is, the more confidence we have when we reject H 0. Reject H 0 if and only if the pvalue is less than α.
60 Errors and PValues Two types of possible errors in a hypothesis test: Type I error, H 0 true but rejected. Error rate is at most the significance level α. Type II error, H0 false but accepted. Error rate can be as large as 1 α. To reduce type II error: Construct theoretically good tests. Don t choose α too small. Choose a big enough sample size n. pvalue: the minimum α we can use if we want to reject H 0, after the test statistic value is known. The smaller the pvalue is, the more confidence we have when we reject H 0. Reject H 0 if and only if the pvalue is less than α.
61 Errors and PValues Two types of possible errors in a hypothesis test: Type I error, H 0 true but rejected. Error rate is at most the significance level α. Type II error, H0 false but accepted. Error rate can be as large as 1 α. To reduce type II error: Construct theoretically good tests. Don t choose α too small. Choose a big enough sample size n. pvalue: the minimum α we can use if we want to reject H 0, after the test statistic value is known. The smaller the pvalue is, the more confidence we have when we reject H 0. Reject H 0 if and only if the pvalue is less than α.
62 Errors and PValues Two types of possible errors in a hypothesis test: Type I error, H 0 true but rejected. Error rate is at most the significance level α. Type II error, H0 false but accepted. Error rate can be as large as 1 α. To reduce type II error: Construct theoretically good tests. Don t choose α too small. Choose a big enough sample size n. pvalue: the minimum α we can use if we want to reject H 0, after the test statistic value is known. The smaller the pvalue is, the more confidence we have when we reject H 0. Reject H 0 if and only if the pvalue is less than α.
63 Errors and PValues Two types of possible errors in a hypothesis test: Type I error, H 0 true but rejected. Error rate is at most the significance level α. Type II error, H0 false but accepted. Error rate can be as large as 1 α. To reduce type II error: Construct theoretically good tests. Don t choose α too small. Choose a big enough sample size n. pvalue: the minimum α we can use if we want to reject H 0, after the test statistic value is known. The smaller the pvalue is, the more confidence we have when we reject H 0. Reject H 0 if and only if the pvalue is less than α.
64 Errors and PValues Two types of possible errors in a hypothesis test: Type I error, H 0 true but rejected. Error rate is at most the significance level α. Type II error, H0 false but accepted. Error rate can be as large as 1 α. To reduce type II error: Construct theoretically good tests. Don t choose α too small. Choose a big enough sample size n. pvalue: the minimum α we can use if we want to reject H 0, after the test statistic value is known. The smaller the pvalue is, the more confidence we have when we reject H 0. Reject H 0 if and only if the pvalue is less than α.
65 Example hypothesis test X N(µ, σ 2 ). Sample is X 1,..., X n. H 0 : µ µ 0 H a : µ > µ 0. Test statistic T = X µ 0 SE( X) where SE( X) = S/ n, S is the sample standard deviation. Rejection area: W = {ξ > λ}, λ = F 1 (1 α, n 1), F 1 (p, n) is the quantile function of the t(n 1) distribution. Pvalue is 1 F(T ; n 1), where F(x; n) is the CDF of the t(n) distribution. The larger T is, the smaller the pvalue is, the more confidence we have when we reject H 0.
66 Example hypothesis test X N(µ, σ 2 ). Sample is X 1,..., X n. H 0 : µ µ 0 H a : µ > µ 0. Test statistic T = X µ 0 SE( X) where SE( X) = S/ n, S is the sample standard deviation. Rejection area: W = {ξ > λ}, λ = F 1 (1 α, n 1), F 1 (p, n) is the quantile function of the t(n 1) distribution. Pvalue is 1 F(T ; n 1), where F(x; n) is the CDF of the t(n) distribution. The larger T is, the smaller the pvalue is, the more confidence we have when we reject H 0.
67 Example hypothesis test X N(µ, σ 2 ). Sample is X 1,..., X n. H 0 : µ µ 0 H a : µ > µ 0. Test statistic T = X µ 0 SE( X) where SE( X) = S/ n, S is the sample standard deviation. Rejection area: W = {ξ > λ}, λ = F 1 (1 α, n 1), F 1 (p, n) is the quantile function of the t(n 1) distribution. Pvalue is 1 F(T ; n 1), where F(x; n) is the CDF of the t(n) distribution. The larger T is, the smaller the pvalue is, the more confidence we have when we reject H 0.
68 Example hypothesis test X N(µ, σ 2 ). Sample is X 1,..., X n. H 0 : µ µ 0 H a : µ > µ 0. Test statistic T = X µ 0 SE( X) where SE( X) = S/ n, S is the sample standard deviation. Rejection area: W = {ξ > λ}, λ = F 1 (1 α, n 1), F 1 (p, n) is the quantile function of the t(n 1) distribution. Pvalue is 1 F(T ; n 1), where F(x; n) is the CDF of the t(n) distribution. The larger T is, the smaller the pvalue is, the more confidence we have when we reject H 0.
69 Example hypothesis test X N(µ, σ 2 ). Sample is X 1,..., X n. H 0 : µ µ 0 H a : µ > µ 0. Test statistic T = X µ 0 SE( X) where SE( X) = S/ n, S is the sample standard deviation. Rejection area: W = {ξ > λ}, λ = F 1 (1 α, n 1), F 1 (p, n) is the quantile function of the t(n 1) distribution. Pvalue is 1 F(T ; n 1), where F(x; n) is the CDF of the t(n) distribution. The larger T is, the smaller the pvalue is, the more confidence we have when we reject H 0.
70 Example hypothesis test X N(µ, σ 2 ). Sample is X 1,..., X n. H 0 : µ µ 0 H a : µ > µ 0. Test statistic T = X µ 0 SE( X) where SE( X) = S/ n, S is the sample standard deviation. Rejection area: W = {ξ > λ}, λ = F 1 (1 α, n 1), F 1 (p, n) is the quantile function of the t(n 1) distribution. Pvalue is 1 F(T ; n 1), where F(x; n) is the CDF of the t(n) distribution. The larger T is, the smaller the pvalue is, the more confidence we have when we reject H 0.
71 Section Contents Measurment level of variables. statistics for nominal variables. statistics for interval variables. Histograms, boxplots, QQ plots, probability plots, stemleaf plots. Using PROC FREQ, PROC MEANS, PROC UNIVARIATE. statistics Summary statistics
72 Section Contents Measurment level of variables. statistics for nominal variables. statistics for interval variables. Histograms, boxplots, QQ plots, probability plots, stemleaf plots. Using PROC FREQ, PROC MEANS, PROC UNIVARIATE. statistics Summary statistics
73 Section Contents Measurment level of variables. statistics for nominal variables. statistics for interval variables. Histograms, boxplots, QQ plots, probability plots, stemleaf plots. Using PROC FREQ, PROC MEANS, PROC UNIVARIATE. statistics Summary statistics
74 Section Contents Measurment level of variables. statistics for nominal variables. statistics for interval variables. Histograms, boxplots, QQ plots, probability plots, stemleaf plots. Using PROC FREQ, PROC MEANS, PROC UNIVARIATE. statistics Summary statistics
75 Section Contents Measurment level of variables. statistics for nominal variables. statistics for interval variables. Histograms, boxplots, QQ plots, probability plots, stemleaf plots. Using PROC FREQ, PROC MEANS, PROC UNIVARIATE. statistics Summary statistics
76 Measurement levels Nominal, such as sex, job type, race. Ordinal, such as age group(child, youth, medium, old), dose level(none, low, high). Interval, such as weight, blood pressure, heart rate. statistics Summary statistics
77 Measurement levels Nominal, such as sex, job type, race. Ordinal, such as age group(child, youth, medium, old), dose level(none, low, high). Interval, such as weight, blood pressure, heart rate. statistics Summary statistics
78 Measurement levels Nominal, such as sex, job type, race. Ordinal, such as age group(child, youth, medium, old), dose level(none, low, high). Interval, such as weight, blood pressure, heart rate. statistics Summary statistics
79 Statistics for nominal variables The value set. The count of each value(called frequency), and the percentage. Use a bar chart to show the distribution. statistics Summary statistics
80 Statistics for nominal variables The value set. The count of each value(called frequency), and the percentage. Use a bar chart to show the distribution. statistics Summary statistics
81 Statistics for nominal variables The value set. The count of each value(called frequency), and the percentage. Use a bar chart to show the distribution. statistics Summary statistics
82 Example: frequency table Use PROC FREQ to list the value set and the frequency counts, percents. proc freq data=sasuser.class; tables sex; run; Gender sex Frequency Percent Cumulative Cumulative Frequency Percent F M statistics Summary statistics
83 Example: bar chart Use PROC GCHART to make a bar chart. hbar could be replaced with vbar, hbar3d, vbar3d. proc gchart data=sasuser.class; hbar sex; run; statistics Summary statistics
84 statistics for interval variables Location statistics: sample mean, sample median. Variability statistics: sample standard deviation, sample interquantile range, range, coefficient of variation. Sample skewness Sample kurtosis Moments, quantiles. n (n 1)(n 2) n(n+1) (n 1)(n 2)(n 3) ( ) y i y 3. s (yi y) 4 3(n 1)2 s 4 (n 2)(n 3) statistics Summary statistics
85 statistics for interval variables Location statistics: sample mean, sample median. Variability statistics: sample standard deviation, sample interquantile range, range, coefficient of variation. Sample skewness Sample kurtosis Moments, quantiles. n (n 1)(n 2) n(n+1) (n 1)(n 2)(n 3) ( ) y i y 3. s (yi y) 4 3(n 1)2 s 4 (n 2)(n 3) statistics Summary statistics
86 statistics for interval variables Location statistics: sample mean, sample median. Variability statistics: sample standard deviation, sample interquantile range, range, coefficient of variation. Sample skewness Sample kurtosis Moments, quantiles. n (n 1)(n 2) n(n+1) (n 1)(n 2)(n 3) ( ) y i y 3. s (yi y) 4 3(n 1)2 s 4 (n 2)(n 3) statistics Summary statistics
87 statistics for interval variables Location statistics: sample mean, sample median. Variability statistics: sample standard deviation, sample interquantile range, range, coefficient of variation. Sample skewness Sample kurtosis Moments, quantiles. n (n 1)(n 2) n(n+1) (n 1)(n 2)(n 3) ( ) y i y 3. s (yi y) 4 3(n 1)2 s 4 (n 2)(n 3) statistics Summary statistics
88 statistics for interval variables Location statistics: sample mean, sample median. Variability statistics: sample standard deviation, sample interquantile range, range, coefficient of variation. Sample skewness Sample kurtosis Moments, quantiles. n (n 1)(n 2) n(n+1) (n 1)(n 2)(n 3) ( ) y i y 3. s (yi y) 4 3(n 1)2 s 4 (n 2)(n 3) statistics Summary statistics
89 PROC UNIVARIATE Use PROC UNIVARIATE to compute sample statistics for an interval variable. Example: proc univariate data=sasuser.class; var height; run; statistics Summary statistics
90 PROC UNIVARIATE Use PROC UNIVARIATE to compute sample statistics for an interval variable. Example: proc univariate data=sasuser.class; var height; run; statistics Summary statistics
91 Output Moments N 19 Sum Weights Sum Observations Std Deviation Skewness Kurtosis Uncorrected SS Corrected SS Coeff Variation Std Error Basic Statistical Measures Location Variability Std Deviation Median Mode Range Interquartile Range statistics Summary statistics Note The mode displayed is the smallest of 2 modes with a count of 2.
92 Tests for Location: Mu0=0 Test Statistic p Value Student s t t Pr > t <.0001 Sign M 9.5 Pr >= M <.0001 Signed Rank S 95 Pr >= S <.0001 statistics Summary statistics
93 Quantiles (Definition 5) Quantile Estimate 100% Max % % % % Q % Median % Q % % % % Min 51.3 statistics Summary statistics
94 Extreme Observations Lowest Highest Value Obs Value Obs statistics Summary statistics
95 Histograms The histogram is a nonparametric estimate of the population density function. It divide the value set into intervals, and count the percent of observations in each interval. Example: proc univariate data=sasuser.class noprint; var height; histogram; run; statistics Summary statistics
96 Histograms The histogram is a nonparametric estimate of the population density function. It divide the value set into intervals, and count the percent of observations in each interval. Example: proc univariate data=sasuser.class noprint; var height; histogram; run; statistics Summary statistics
97 Histograms The histogram is a nonparametric estimate of the population density function. It divide the value set into intervals, and count the percent of observations in each interval. Example: proc univariate data=sasuser.class noprint; var height; histogram; run; statistics Summary statistics
98 Examples of histograms Skewness shown in histograms. statistics Summary statistics
99 Box plot Quantiles could be used to describe key distribution charasteristics. The box plot shows the median, 1/4 and 3/4 quantile, and the minimum and maximum of a variable. The box holds the middle 50% range. Length is the IQR(inter quantile range). The wiskers extend to the minimum and the maximum; but the length of the wisker is not allowed to exceed 1.5 times the IQR. Points beyond wiskers are plotted as individual points. They are candidate for outliers. statistics Summary statistics
100 Box plot Quantiles could be used to describe key distribution charasteristics. The box plot shows the median, 1/4 and 3/4 quantile, and the minimum and maximum of a variable. The box holds the middle 50% range. Length is the IQR(inter quantile range). The wiskers extend to the minimum and the maximum; but the length of the wisker is not allowed to exceed 1.5 times the IQR. Points beyond wiskers are plotted as individual points. They are candidate for outliers. statistics Summary statistics
101 Box plot Quantiles could be used to describe key distribution charasteristics. The box plot shows the median, 1/4 and 3/4 quantile, and the minimum and maximum of a variable. The box holds the middle 50% range. Length is the IQR(inter quantile range). The wiskers extend to the minimum and the maximum; but the length of the wisker is not allowed to exceed 1.5 times the IQR. Points beyond wiskers are plotted as individual points. They are candidate for outliers. statistics Summary statistics
102 Box plot Quantiles could be used to describe key distribution charasteristics. The box plot shows the median, 1/4 and 3/4 quantile, and the minimum and maximum of a variable. The box holds the middle 50% range. Length is the IQR(inter quantile range). The wiskers extend to the minimum and the maximum; but the length of the wisker is not allowed to exceed 1.5 times the IQR. Points beyond wiskers are plotted as individual points. They are candidate for outliers. statistics Summary statistics
103 Box plot Quantiles could be used to describe key distribution charasteristics. The box plot shows the median, 1/4 and 3/4 quantile, and the minimum and maximum of a variable. The box holds the middle 50% range. Length is the IQR(inter quantile range). The wiskers extend to the minimum and the maximum; but the length of the wisker is not allowed to exceed 1.5 times the IQR. Points beyond wiskers are plotted as individual points. They are candidate for outliers. statistics Summary statistics
104 Example: boxplot of height statistics Summary statistics
105 proc sql; create view work._tmp_0 as select height, 1 as _dummy_ from sasuser.class; title ; axis1 major=none value=none label=none; proc boxplot data=_tmp_0; plot height*_dummy_ / boxstyle=skematic cboxfill=blue haxis=axis1; run; statistics Summary statistics
106 Example: boxplot of GPA and CO Skewness and possible outliers shown in boxplot. statistics Summary statistics
107 Grouped boxplot statistics Summary statistics
108 proc sort data=sasuser.gpa out=_tmp_1; by sex; proc boxplot data=_tmp_1; plot gpa*sex / boxstyle=skematic cboxfill=blue; run; statistics Summary statistics
109 QQ Plot The normal distribution is the most commonly used distribution. Graphs are designed to show discrepency from the normal distribution. QQ plot is one of them. Suppose Y 1, Y 2,..., Y n is from N(µ,σ 2 ) and sorted ascendingly. CDF F(x) = Φ( x µ σ ), F(Y i) i n, Φ( Y i µ σ ) i n, Y i µ + σφ 1 ( i n ). Let x i = Φ 1 ( i n ), plot (x i, Y i ),i = 1,..., n. The points should lie around the line y = µ + σx. Continuity adjustment: x i = Φ 1 ( i n+0.25 ). statistics Summary statistics
110 QQ Plot The normal distribution is the most commonly used distribution. Graphs are designed to show discrepency from the normal distribution. QQ plot is one of them. Suppose Y 1, Y 2,..., Y n is from N(µ,σ 2 ) and sorted ascendingly. CDF F(x) = Φ( x µ σ ), F(Y i) i n, Φ( Y i µ σ ) i n, Y i µ + σφ 1 ( i n ). Let x i = Φ 1 ( i n ), plot (x i, Y i ),i = 1,..., n. The points should lie around the line y = µ + σx. Continuity adjustment: x i = Φ 1 ( i n+0.25 ). statistics Summary statistics
111 QQ Plot The normal distribution is the most commonly used distribution. Graphs are designed to show discrepency from the normal distribution. QQ plot is one of them. Suppose Y 1, Y 2,..., Y n is from N(µ,σ 2 ) and sorted ascendingly. CDF F(x) = Φ( x µ σ ), F(Y i) i n, Φ( Y i µ σ ) i n, Y i µ + σφ 1 ( i n ). Let x i = Φ 1 ( i n ), plot (x i, Y i ),i = 1,..., n. The points should lie around the line y = µ + σx. Continuity adjustment: x i = Φ 1 ( i n+0.25 ). statistics Summary statistics
112 QQ Plot The normal distribution is the most commonly used distribution. Graphs are designed to show discrepency from the normal distribution. QQ plot is one of them. Suppose Y 1, Y 2,..., Y n is from N(µ,σ 2 ) and sorted ascendingly. CDF F(x) = Φ( x µ σ ), F(Y i) i n, Φ( Y i µ σ ) i n, Y i µ + σφ 1 ( i n ). Let x i = Φ 1 ( i n ), plot (x i, Y i ),i = 1,..., n. The points should lie around the line y = µ + σx. Continuity adjustment: x i = Φ 1 ( i n+0.25 ). statistics Summary statistics
113 QQ Plot The normal distribution is the most commonly used distribution. Graphs are designed to show discrepency from the normal distribution. QQ plot is one of them. Suppose Y 1, Y 2,..., Y n is from N(µ,σ 2 ) and sorted ascendingly. CDF F(x) = Φ( x µ σ ), F(Y i) i n, Φ( Y i µ σ ) i n, Y i µ + σφ 1 ( i n ). Let x i = Φ 1 ( i n ), plot (x i, Y i ),i = 1,..., n. The points should lie around the line y = µ + σx. Continuity adjustment: x i = Φ 1 ( i n+0.25 ). statistics Summary statistics
114 QQ Plot The normal distribution is the most commonly used distribution. Graphs are designed to show discrepency from the normal distribution. QQ plot is one of them. Suppose Y 1, Y 2,..., Y n is from N(µ,σ 2 ) and sorted ascendingly. CDF F(x) = Φ( x µ σ ), F(Y i) i n, Φ( Y i µ σ ) i n, Y i µ + σφ 1 ( i n ). Let x i = Φ 1 ( i n ), plot (x i, Y i ),i = 1,..., n. The points should lie around the line y = µ + σx. Continuity adjustment: x i = Φ 1 ( i n+0.25 ). statistics Summary statistics
115 Example: QQ Plot of HEIGHT proc univariate data=sasuser.class noprint; qqplot height / normal(mu=est sigma=est); run; statistics Summary statistics
116 Example: Skewness in QQ Plot statistics Summary statistics
117 Probability Plot Probability plot is the same plot as a QQ plot, exept that the label of the x axis Φ(x i ) instead of x i. Probability plots are preferable for graphical estimation of percentiles, whereas QQ plots are preferable for graphical estimation of distribution parameters. statistics Summary statistics
118 Probability Plot Probability plot is the same plot as a QQ plot, exept that the label of the x axis Φ(x i ) instead of x i. Probability plots are preferable for graphical estimation of percentiles, whereas QQ plots are preferable for graphical estimation of distribution parameters. statistics Summary statistics
119 Example: Probability Plot of HEIGHT proc univariate data=sasuser.class noprint; probplot height / normal(mu=est sigma=est); run; statistics Summary statistics
120 The Stemleaf Plot The stemleaf plot is a textbased plot, which display information like the histogram, but with detail on each data value. Each leaf corresponds to one data value. PROC UNIVARIATE has an option PLOT, which generates textbased stemleaf plot, boxplot and QQ plot. statistics Summary statistics
121 The Stemleaf Plot The stemleaf plot is a textbased plot, which display information like the histogram, but with detail on each data value. Each leaf corresponds to one data value. PROC UNIVARIATE has an option PLOT, which generates textbased stemleaf plot, boxplot and QQ plot. statistics Summary statistics
122 proc univariate data=sasuser.class plot; var height; run; Stem Leaf # Multiply Stem.Leaf by 10**+1 statistics Summary statistics
123 PROC MEANS and PROC SUMMARY PROC MEANS and PROC SUMMARY are used to produce summary statistics. They can display overall summary statistics and classified summary statistics. PROC MEANS displays result by default; PROC SUMMARY need the PRINT option to display result. They can output data sets with the summary statistics. PROC SUMMARY is designed to do this, although PROC MEANS can do the same. statistics Summary statistics
124 PROC MEANS and PROC SUMMARY PROC MEANS and PROC SUMMARY are used to produce summary statistics. They can display overall summary statistics and classified summary statistics. PROC MEANS displays result by default; PROC SUMMARY need the PRINT option to display result. They can output data sets with the summary statistics. PROC SUMMARY is designed to do this, although PROC MEANS can do the same. statistics Summary statistics
125 PROC MEANS and PROC SUMMARY PROC MEANS and PROC SUMMARY are used to produce summary statistics. They can display overall summary statistics and classified summary statistics. PROC MEANS displays result by default; PROC SUMMARY need the PRINT option to display result. They can output data sets with the summary statistics. PROC SUMMARY is designed to do this, although PROC MEANS can do the same. statistics Summary statistics
126 Example of overall statistics Use the VAR statement to specify which varibles to summarize. Example of overall statistics: proc means data=sasuser.class; var height weight; run; proc summary data=sasuser.class print; var height weight; run; statistics Summary statistics
127 Example of overall statistics Use the VAR statement to specify which varibles to summarize. Example of overall statistics: proc means data=sasuser.class; var height weight; run; proc summary data=sasuser.class print; var height weight; run; statistics Summary statistics
128 Classified summary Use the CLASS statement to specify one or more class variables. Example: proc means data=sasuser.class ; var height weight; class sex; run; statistics Summary statistics
129 Classified summary Use the CLASS statement to specify one or more class variables. Example: proc means data=sasuser.class ; var height weight; class sex; run; statistics Summary statistics
130 Other statistical measures PROC MEANS calculate summaries N,, Standard Deviation, Minimum, Maximum by default. Use PROC MEANS options to specify other statistical measures. Example: proc means data=sasuser.class MEAN VAR CV; var height weight; class sex; run; statistics Summary statistics
131 Other statistical measures PROC MEANS calculate summaries N,, Standard Deviation, Minimum, Maximum by default. Use PROC MEANS options to specify other statistical measures. Example: proc means data=sasuser.class MEAN VAR CV; var height weight; class sex; run; statistics Summary statistics
132 Other statistical measures PROC MEANS calculate summaries N,, Standard Deviation, Minimum, Maximum by default. Use PROC MEANS options to specify other statistical measures. Example: proc means data=sasuser.class MEAN VAR CV; var height weight; class sex; run; statistics Summary statistics
133 Producing output datasets Use the OUTPUT statement to save the summary statistics to an output dataset. Syntax: OUTPUT OUT=outputdataset keyword= keyword= / AUTONAME; Where keyword is a name of some statistical measure, like MEAN, STD, N, MIN, MAX, etc. Example: statistics Summary statistics proc means data=sasuser.class MEAN VAR CV; var height weight; class sex; output out=res N= CV= / AUTONAME; run; proc print data=res;run;
134 Producing output datasets Use the OUTPUT statement to save the summary statistics to an output dataset. Syntax: OUTPUT OUT=outputdataset keyword= keyword= / AUTONAME; Where keyword is a name of some statistical measure, like MEAN, STD, N, MIN, MAX, etc. Example: statistics Summary statistics proc means data=sasuser.class MEAN VAR CV; var height weight; class sex; output out=res N= CV= / AUTONAME; run; proc print data=res;run;
135 Producing output datasets Use the OUTPUT statement to save the summary statistics to an output dataset. Syntax: OUTPUT OUT=outputdataset keyword= keyword= / AUTONAME; Where keyword is a name of some statistical measure, like MEAN, STD, N, MIN, MAX, etc. Example: statistics Summary statistics proc means data=sasuser.class MEAN VAR CV; var height weight; class sex; output out=res N= CV= / AUTONAME; run; proc print data=res;run;
136 Producing output datasets Use the OUTPUT statement to save the summary statistics to an output dataset. Syntax: OUTPUT OUT=outputdataset keyword= keyword= / AUTONAME; Where keyword is a name of some statistical measure, like MEAN, STD, N, MIN, MAX, etc. Example: statistics Summary statistics proc means data=sasuser.class MEAN VAR CV; var height weight; class sex; output out=res N= CV= / AUTONAME; run; proc print data=res;run;
137 One sample Z tests, t tests. Two sample t tests. The Wilcoxon rank sum test. Paired t tests. Comparing proportions: one sample and two sample. One Sample Test of the Comparing Two Groups Paired Comparison Comparing Proportions
138 One sample Z tests, t tests. Two sample t tests. The Wilcoxon rank sum test. Paired t tests. Comparing proportions: one sample and two sample. One Sample Test of the Comparing Two Groups Paired Comparison Comparing Proportions
139 One sample Z tests, t tests. Two sample t tests. The Wilcoxon rank sum test. Paired t tests. Comparing proportions: one sample and two sample. One Sample Test of the Comparing Two Groups Paired Comparison Comparing Proportions
140 One sample Z tests, t tests. Two sample t tests. The Wilcoxon rank sum test. Paired t tests. Comparing proportions: one sample and two sample. One Sample Test of the Comparing Two Groups Paired Comparison Comparing Proportions
141 One sample Z tests, t tests. Two sample t tests. The Wilcoxon rank sum test. Paired t tests. Comparing proportions: one sample and two sample. One Sample Test of the Comparing Two Groups Paired Comparison Comparing Proportions
142 One Sample Z Test Twosided X N(µ, σ 2 0 ), σ 0 known. H 0 : µ = µ 0 H a : µ µ 0 Z = X µ 0 σ 0 / H 0 n N(0, 1). W = { Z > λ}, λ = Φ 1 (1 α/2). pvalue=2(1 Φ( Z ). Wald test: Based on the central limit theorem, to test H 0 : θ = θ 0 H a : θ θ 0, use W = ˆθ θ 0 as the SE(ˆθ) test statistic, if W N(0, 1). One Sample Test of the Comparing Two Groups Paired Comparison Comparing Proportions
143 One Sample Z Test Twosided X N(µ, σ 2 0 ), σ 0 known. H 0 : µ = µ 0 H a : µ µ 0 Z = X µ 0 σ 0 / H 0 n N(0, 1). W = { Z > λ}, λ = Φ 1 (1 α/2). pvalue=2(1 Φ( Z ). Wald test: Based on the central limit theorem, to test H 0 : θ = θ 0 H a : θ θ 0, use W = ˆθ θ 0 as the SE(ˆθ) test statistic, if W N(0, 1). One Sample Test of the Comparing Two Groups Paired Comparison Comparing Proportions
144 One Sample Z Test Twosided X N(µ, σ 2 0 ), σ 0 known. H 0 : µ = µ 0 H a : µ µ 0 Z = X µ 0 σ 0 / H 0 n N(0, 1). W = { Z > λ}, λ = Φ 1 (1 α/2). pvalue=2(1 Φ( Z ). Wald test: Based on the central limit theorem, to test H 0 : θ = θ 0 H a : θ θ 0, use W = ˆθ θ 0 as the SE(ˆθ) test statistic, if W N(0, 1). One Sample Test of the Comparing Two Groups Paired Comparison Comparing Proportions
145 One Sample Z Test Twosided X N(µ, σ 2 0 ), σ 0 known. H 0 : µ = µ 0 H a : µ µ 0 Z = X µ 0 σ 0 / H 0 n N(0, 1). W = { Z > λ}, λ = Φ 1 (1 α/2). pvalue=2(1 Φ( Z ). Wald test: Based on the central limit theorem, to test H 0 : θ = θ 0 H a : θ θ 0, use W = ˆθ θ 0 as the SE(ˆθ) test statistic, if W N(0, 1). One Sample Test of the Comparing Two Groups Paired Comparison Comparing Proportions
146 One Sample Z Test Twosided X N(µ, σ 2 0 ), σ 0 known. H 0 : µ = µ 0 H a : µ µ 0 Z = X µ 0 σ 0 / H 0 n N(0, 1). W = { Z > λ}, λ = Φ 1 (1 α/2). pvalue=2(1 Φ( Z ). Wald test: Based on the central limit theorem, to test H 0 : θ = θ 0 H a : θ θ 0, use W = ˆθ θ 0 as the SE(ˆθ) test statistic, if W N(0, 1). One Sample Test of the Comparing Two Groups Paired Comparison Comparing Proportions
147 One Sample Z Test Twosided X N(µ, σ 2 0 ), σ 0 known. H 0 : µ = µ 0 H a : µ µ 0 Z = X µ 0 σ 0 / H 0 n N(0, 1). W = { Z > λ}, λ = Φ 1 (1 α/2). pvalue=2(1 Φ( Z ). Wald test: Based on the central limit theorem, to test H 0 : θ = θ 0 H a : θ θ 0, use W = ˆθ θ 0 as the SE(ˆθ) test statistic, if W N(0, 1). One Sample Test of the Comparing Two Groups Paired Comparison Comparing Proportions
Summary of Formulas and Concepts. Descriptive Statistics (Ch. 14)
Summary of Formulas and Concepts Descriptive Statistics (Ch. 14) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume
More informationSTATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI
STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members
More informationExploratory data analysis (Chapter 2) Fall 2011
Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,
More informationExploratory Data Analysis
Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction
More informationSeminar paper Statistics
Seminar paper Statistics The seminar paper must contain:  the title page  the characterization of the data (origin, reason why you have chosen this analysis,...)  the list of the data (in the table)
More information4 Other useful features on the course web page. 5 Accessing SAS
1 Using SAS outside of ITCs Statistical Methods and Computing, 22S:30/105 Instructor: Cowles Lab 1 Jan 31, 2014 You can access SAS from off campus by using the ITC Virtual Desktop Go to https://virtualdesktopuiowaedu
More informationStatistical Analysis The First Steps Jennifer L. Waller Medical College of Georgia, Augusta, Georgia
Statistical Analysis The First Steps Jennifer L. Waller Medical College of Georgia, Augusta, Georgia ABSTRACT For both statisticians and nonstatisticians, knowing what data look like before more rigorous
More informationUsing pivots to construct confidence intervals. In Example 41 we used the fact that
Using pivots to construct confidence intervals In Example 41 we used the fact that Q( X, µ) = X µ σ/ n N(0, 1) for all µ. We then said Q( X, µ) z α/2 with probability 1 α, and converted this into a statement
More informationHypothesis Testing COMP 245 STATISTICS. Dr N A Heard. 1 Hypothesis Testing 2 1.1 Introduction... 2 1.2 Error Rates and Power of a Test...
Hypothesis Testing COMP 45 STATISTICS Dr N A Heard Contents 1 Hypothesis Testing 1.1 Introduction........................................ 1. Error Rates and Power of a Test.............................
More informationHow to Perform and Interpret ChiSquare and TTests Jennifer L. Waller, Georgia Health Sciences University, Augusta, Georgia
Paper HW05 How to Perform and Interpret ChiSquare and TTests Jennifer L. Waller, Georgia Health Sciences University, Augusta, Georgia ABSTRACT For both statisticians and nonstatisticians, knowing what
More informationDescriptive Statistics
Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web
More informationStatistics 641  EXAM II  1999 through 2003
Statistics 641  EXAM II  1999 through 2003 December 1, 1999 I. (40 points ) Place the letter of the best answer in the blank to the left of each question. (1) In testing H 0 : µ 5 vs H 1 : µ > 5, the
More informationDATA INTERPRETATION AND STATISTICS
PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE
More informationData Mining Part 2. Data Understanding and Preparation 2.1 Data Understanding Spring 2010
Data Mining Part 2. and Preparation 2.1 Spring 2010 Instructor: Dr. Masoud Yaghini Introduction Outline Introduction Measuring the Central Tendency Measuring the Dispersion of Data Graphic Displays References
More informationA frequency distribution is a table used to describe a data set. A frequency table lists intervals or ranges of data values called data classes
A frequency distribution is a table used to describe a data set. A frequency table lists intervals or ranges of data values called data classes together with the number of data values from the set that
More informationExercise 1.12 (Pg. 2223)
Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.
More informationBasic Statistical and Modeling Procedures Using SAS
Basic Statistical and Modeling Procedures Using SAS OneSample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom
More informationFinal Exam Practice Problem Answers
Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal
More informationDescriptive Statistics. Understanding Data: Categorical Variables. Descriptive Statistics. Dataset: Shellfish Contamination
Descriptive Statistics Understanding Data: Dataset: Shellfish Contamination Location Year Species Species2 Method Metals Cadmium (mg kg  ) Chromium (mg kg  ) Copper (mg kg  ) Lead (mg kg  ) Mercury
More informationF. Farrokhyar, MPhil, PhD, PDoc
Learning objectives Descriptive Statistics F. Farrokhyar, MPhil, PhD, PDoc To recognize different types of variables To learn how to appropriately explore your data How to display data using graphs How
More informationUnivariate Descriptive Statistics
Univariate Descriptive Statistics Displays: pie charts, bar graphs, box plots, histograms, density estimates, dot plots, stemleaf plots, tables, lists. Example: sea urchin sizes Boxplot Histogram Urchin
More information2.0 Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table
2.0 Lesson Plan Answer Questions 1 Summary Statistics Histograms The Normal Distribution Using the Standard Normal Table 2. Summary Statistics Given a collection of data, one needs to find representations
More informationChapter 2: Exploring Data with Graphs and Numerical Summaries. Graphical Measures Graphs are used to describe the shape of a data set.
Page 1 of 16 Chapter 2: Exploring Data with Graphs and Numerical Summaries Graphical Measures Graphs are used to describe the shape of a data set. Section 1: Types of Variables In general, variable can
More information93.4 Likelihood ratio test. NeymanPearson lemma
93.4 Likelihood ratio test NeymanPearson lemma 91 Hypothesis Testing 91.1 Statistical Hypotheses Statistical hypothesis testing and confidence interval estimation of parameters are the fundamental
More informationStatistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013
Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.11.6) Objectives
More informationContent DESCRIPTIVE STATISTICS. Data & Statistic. Statistics. Example: DATA VS. STATISTIC VS. STATISTICS
Content DESCRIPTIVE STATISTICS Dr Najib Majdi bin Yaacob MD, MPH, DrPH (Epidemiology) USM Unit of Biostatistics & Research Methodology School of Medical Sciences Universiti Sains Malaysia. Introduction
More informationDeath on the Titanic
Death on the Titanic Introduction On its maiden voyage, the cruise ship Titanic collided with an iceberg and sank. There was much loss of life. It is of interest to test how well sample proportions from
More informationProbability and Statistics Vocabulary List (Definitions for Middle School Teachers)
Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) B Bar graph a diagram representing the frequency distribution for nominal or discrete data. It consists of a sequence
More informationPrinciple of Data Reduction
Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then
More informationTechnology StepbyStep Using StatCrunch
Technology StepbyStep Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate
More informationNCSS Statistical Software
Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, twosample ttests, the ztest, the
More informationIntroduction to Hypothesis Testing. Point estimation and confidence intervals are useful statistical inference procedures.
Introduction to Hypothesis Testing Point estimation and confidence intervals are useful statistical inference procedures. Another type of inference is used frequently used concerns tests of hypotheses.
More informationStatistics Chapter 3 Averages and Variations
Statistics Chapter 3 Averages and Variations Measures of Central Tendency Average a measure of the center value or central tendency of a distribution of values. Three types of average: Mode Median Mean
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationII. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationChapter 2. Objectives. Tabulate Qualitative Data. Frequency Table. Descriptive Statistics: Organizing, Displaying and Summarizing Data.
Objectives Chapter Descriptive Statistics: Organizing, Displaying and Summarizing Data Student should be able to Organize data Tabulate data into frequency/relative frequency tables Display data graphically
More informationMBA 611 STATISTICS AND QUANTITATIVE METHODS
MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 111) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain
More informationBNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I
BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential
More informationChapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs
Types of Variables Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Quantitative (numerical)variables: take numerical values for which arithmetic operations make sense (addition/averaging)
More informationThe ChiSquare Distributions
MATH 183 The ChiSquare Distributions Dr. Neal, WKU The chisquare distributions can be used in statistics to analyze the standard deviation " of a normally distributed measurement and to test the goodness
More informationMEASURES OF LOCATION AND SPREAD
Paper TU04 An Overview of Nonparametric Tests in SAS : When, Why, and How Paul A. Pappas and Venita DePuy Durham, North Carolina, USA ABSTRACT Most commonly used statistical procedures are based on the
More information6. Distribution and Quantile Functions
Virtual Laboratories > 2. Distributions > 1 2 3 4 5 6 7 8 6. Distribution and Quantile Functions As usual, our starting point is a random experiment with probability measure P on an underlying sample spac
More informationStatistics and research
Statistics and research Usaneya Perngparn Chitlada Areesantichai Drug Dependence Research Center (WHOCC for Research and Training in Drug Dependence) College of Public Health Sciences Chulolongkorn University,
More informationGeostatistics Exploratory Analysis
Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt
More informationFoundation of Quantitative Data Analysis
Foundation of Quantitative Data Analysis Part 1: Data manipulation and descriptive statistics with SPSS/Excel HSRS #10  October 17, 2013 Reference : A. Aczel, Complete Business Statistics. Chapters 1
More informationStatistics 104: Section 6!
Page 1 Statistics 104: Section 6! TF: Deirdre (say: Deardra) Bloome Email: dbloome@fas.harvard.edu Section Times Thursday 2pm3pm in SC 109, Thursday 5pm6pm in SC 705 Office Hours: Thursday 6pm7pm SC
More informationLecture 2: Descriptive Statistics and Exploratory Data Analysis
Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals
More informationVariables and Data A variable contains data about anything we measure. For example; age or gender of the participants or their score on a test.
The Analysis of Research Data The design of any project will determine what sort of statistical tests you should perform on your data and how successful the data analysis will be. For example if you decide
More informationCA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction
CA200 Quantitative Analysis for Business Decisions File name: CA200_Section_04A_StatisticsIntroduction Table of Contents 4. Introduction to Statistics... 1 4.1 Overview... 3 4.2 Discrete or continuous
More informationBowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition
Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition Online Learning Centre Technology StepbyStep  Excel Microsoft Excel is a spreadsheet software application
More informationJoint Exam 1/P Sample Exam 1
Joint Exam 1/P Sample Exam 1 Take this practice exam under strict exam conditions: Set a timer for 3 hours; Do not stop the timer for restroom breaks; Do not look at your notes. If you believe a question
More informationPart 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217
Part 3 Comparing Groups Chapter 7 Comparing Paired Groups 189 Chapter 8 Comparing Two Independent Groups 217 Chapter 9 Comparing More Than Two Groups 257 188 Elementary Statistics Using SAS Chapter 7 Comparing
More informationNull Hypothesis H 0. The null hypothesis (denoted by H 0
Hypothesis test In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test (or test of significance) is a standard procedure for testing a claim about a property
More informationDescriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion
Descriptive Statistics Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Statistics as a Tool for LIS Research Importance of statistics in research
More informationGuido s Guide to PROC MEANS A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY
Guido s Guide to PROC MEANS A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY ABSTRACT PROC MEANS is a basic procedure within BASE SAS
More informationMaximum Likelihood Estimation
Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for
More information7 Hypothesis testing  one sample tests
7 Hypothesis testing  one sample tests 7.1 Introduction Definition 7.1 A hypothesis is a statement about a population parameter. Example A hypothesis might be that the mean age of students taking MAS113X
More informationMATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...
MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 20092016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................
More informationChapter 3 RANDOM VARIATE GENERATION
Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.
More informationVISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA
VISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA Csilla Csendes University of Miskolc, Hungary Department of Applied Mathematics ICAM 2010 Probability density functions A random variable X has density
More informationDensity Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:
Density Curve A density curve is the graph of a continuous probability distribution. It must satisfy the following properties: 1. The total area under the curve must equal 1. 2. Every point on the curve
More informationNumerical Summarization of Data OPRE 6301
Numerical Summarization of Data OPRE 6301 Motivation... In the previous session, we used graphical techniques to describe data. For example: While this histogram provides useful insight, other interesting
More informationExperimental Design for Influential Factors of Rates on Massive Open Online Courses
Experimental Design for Influential Factors of Rates on Massive Open Online Courses December 12, 2014 Ning Li nli7@stevens.edu Qing Wei qwei1@stevens.edu Yating Lan ylan2@stevens.edu Yilin Wei ywei12@stevens.edu
More information1 SAMPLE SIGN TEST. NonParametric Univariate Tests: 1 Sample Sign Test 1. A nonparametric equivalent of the 1 SAMPLE TTEST.
NonParametric Univariate Tests: 1 Sample Sign Test 1 1 SAMPLE SIGN TEST A nonparametric equivalent of the 1 SAMPLE TTEST. ASSUMPTIONS: Data is nonnormally distributed, even after log transforming.
More informationOnce saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.
1 Commands in JMP and Statcrunch Below are a set of commands in JMP and Statcrunch which facilitate a basic statistical analysis. The first part concerns commands in JMP, the second part is for analysis
More informationLecture 2. Summarizing the Sample
Lecture 2 Summarizing the Sample WARNING: Today s lecture may bore some of you It s (sort of) not my fault I m required to teach you about what we re going to cover today. I ll try to make it as exciting
More informationIntroduction to Statistics for Psychology. Quantitative Methods for Human Sciences
Introduction to Statistics for Psychology and Quantitative Methods for Human Sciences Jonathan Marchini Course Information There is website devoted to the course at http://www.stats.ox.ac.uk/ marchini/phs.html
More informationSYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation
SYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu September 19, 2015 Outline
More informationNull Hypothesis Significance Testing Signifcance Level, Power, ttests. 18.05 Spring 2014 Jeremy Orloff and Jonathan Bloom
Null Hypothesis Significance Testing Signifcance Level, Power, ttests 18.05 Spring 2014 Jeremy Orloff and Jonathan Bloom Simple and composite hypotheses Simple hypothesis: the sampling distribution is
More information2. Describing Data. We consider 1. Graphical methods 2. Numerical methods 1 / 56
2. Describing Data We consider 1. Graphical methods 2. Numerical methods 1 / 56 General Use of Graphical and Numerical Methods Graphical methods can be used to visually and qualitatively present data and
More informationDescriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics
Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),
More informationThe Normal Distribution. Alan T. Arnholt Department of Mathematical Sciences Appalachian State University
The Normal Distribution Alan T. Arnholt Department of Mathematical Sciences Appalachian State University arnholt@math.appstate.edu Spring 2006 R Notes 1 Copyright c 2006 Alan T. Arnholt 2 Continuous Random
More informationFrequency Distributions
Displaying Data Frequency Distributions After collecting data, the first task for a researcher is to organize and summarize the data to get a general overview of the results. Remember, this is the goal
More informationBASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS
BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi110 012 seema@iasri.res.in Genomics A genome is an organism s
More informationExperimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test
Experimental Design Power and Sample Size Determination Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison November 3 8, 2011 To this point in the semester, we have largely
More informationAre Histograms Giving You Fits? New SAS Software for Analyzing Distributions
Are Histograms Giving You Fits? New SAS Software for Analyzing Distributions Nathan A. Curtis, SAS Institute Inc., Cary, NC Abstract Exploring and modeling the distribution of a data sample is a key step
More information103 Measures of Central Tendency and Variation
103 Measures of Central Tendency and Variation So far, we have discussed some graphical methods of data description. Now, we will investigate how statements of central tendency and variation can be used.
More informationChapter 7 Section 1 Homework Set A
Chapter 7 Section 1 Homework Set A 7.15 Finding the critical value t *. What critical value t * from Table D (use software, go to the web and type t distribution applet) should be used to calculate the
More informationA Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution
A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September
More information13.2 Measures of Central Tendency
13.2 Measures of Central Tendency Measures of Central Tendency For a given set of numbers, it may be desirable to have a single number to serve as a kind of representative value around which all the numbers
More informationSTT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables
Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables Discrete vs. continuous random variables Examples of continuous distributions o Uniform o Exponential o Normal Recall: A random
More information5. Continuous Random Variables
5. Continuous Random Variables Continuous random variables can take any value in an interval. They are used to model physical characteristics such as time, length, position, etc. Examples (i) Let X be
More informationInterpretation of Somers D under four simple models
Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms
More informationDesciptive Statistics Qualitative data Quantitative data Graphical methods Numerical methods
Desciptive Statistics Qualitative data Quantitative data Graphical methods Numerical methods Qualitative data Data are classified in categories Non numerical (although may be numerically codified) Elements
More information4. Continuous Random Variables, the Pareto and Normal Distributions
4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random
More informationStatistical Analysis I
CTSI BERD Research Methods Seminar Series Statistical Analysis I Lan Kong, PhD Associate Professor Department of Public Health Sciences December 22, 2014 Biostatistics, Epidemiology, Research Design(BERD)
More informationIntroduction to Quantitative Methods
Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................
More informationNonparametric TwoSample Tests. Nonparametric Tests. Sign Test
Nonparametric TwoSample Tests Sign test MannWhitney Utest (a.k.a. Wilcoxon twosample test) KolmogorovSmirnov Test Wilcoxon SignedRank Test TukeyDuckworth Test 1 Nonparametric Tests Recall, nonparametric
More informationNormal distribution. ) 2 /2σ. 2π σ
Normal distribution The normal distribution is the most widely known and used of all distributions. Because the normal distribution approximates many natural phenomena so well, it has developed into a
More informationChapter 9: Hypothesis Testing Sections
Chapter 9: Hypothesis Testing Sections  we are still here Skip: 9.2 Testing Simple Hypotheses Skip: 9.3 Uniformly Most Powerful Tests Skip: 9.4 TwoSided Alternatives 9.5 The t Test 9.6 Comparing the
More informationChapter 8 Introduction to Hypothesis Testing
Chapter 8 Student Lecture Notes 81 Chapter 8 Introduction to Hypothesis Testing Fall 26 Fundamentals of Business Statistics 1 Chapter Goals After completing this chapter, you should be able to: Formulate
More informationBiostatistics Lab Notes
Biostatistics Lab Notes Page 1 Lab 1: Measurement and Sampling Biostatistics Lab Notes Because we used a chance mechanism to select our sample, each sample will differ. My data set (GerstmanB.sav), looks
More informationMINITAB ASSISTANT WHITE PAPER
MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. OneWay
More informationVariables. Exploratory Data Analysis
Exploratory Data Analysis Exploratory Data Analysis involves both graphical displays of data and numerical summaries of data. A common situation is for a data set to be represented as a matrix. There is
More informationOnesample normal hypothesis Testing, paired ttest, twosample normal inference, normal probability plots
1 / 27 Onesample normal hypothesis Testing, paired ttest, twosample normal inference, normal probability plots Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis
More informationMeasuring the Power of a Test
Textbook Reference: Chapter 9.5 Measuring the Power of a Test An economic problem motivates the statement of a null and alternative hypothesis. For a numeric data set, a decision rule can lead to the rejection
More informationDISTRIBUTION FITTING 1
1 DISTRIBUTION FITTING What Is Distribution Fitting? Distribution fitting is the procedure of selecting a statistical distribution that best fits to a data set generated by some random process. In other
More informationTHE BINOMIAL DISTRIBUTION & PROBABILITY
REVISION SHEET STATISTICS 1 (MEI) THE BINOMIAL DISTRIBUTION & PROBABILITY The main ideas in this chapter are Probabilities based on selecting or arranging objects Probabilities based on the binomial distribution
More informationStats on the TI 83 and TI 84 Calculator
Stats on the TI 83 and TI 84 Calculator Entering the sample values STAT button Left bracket { Right bracket } Store (STO) List L1 Comma Enter Example: Sample data are {5, 10, 15, 20} 1. Press 2 ND and
More information