Module 4 Confidence Intervals

Size: px
Start display at page:

Download "Module 4 Confidence Intervals"

Transcription

1 Module 4 Confidence Intervals Objective: At the completion of this module you will learn how to take into account the sampling variability or uncertainty in the sample estimate of parameters in the population, e.g., population mean and proportion and also you will know how to compare two groups of patients. 4.1 Introduction In Section 3.3 of Module 3 we discussed that the mean, proportion, relative risk and odds ratio are unknown for a population. These unknown quantities in the population are known as the parameters and they are estimated from the sample data.two methods those are commonly used for estimating parameters are: (a) Point Estimation and (b) Interval Estimation. Point estimation involves calculation of a single number as an estimate for the parameter of interest. For example, let us assume that we are interested in the average/mean body mass index (BMI) of cardiac surgery patients in Victoria. Calculation of the true average BMI is indeed difficult but it can be estimated from the sample data. Consider a random sample of 30 patients from the cardiac surgery population and calculate the sample mean BMI (6.86 kg/m, see Table 4.1). This sample mean is a point estimate of the true mean BMI of the cardiac surgery patients in Victoria. A point estimate, however, does not provide any information about the inherent variability of the estimator; we do not know how close the sample estimate is to the true parameter. Table 4.1: Body mass index (kg/m ) in a sample of 30 patients A sample mean is rarely the same as the true mean. A difference between the sample mean and the true mean may occur purely by chance or sampling variation. So it is sensible to estimate the true mean by an interval centred on the sample mean called a 55

2 Confidence Interval (see the following figure). Confidence intervals take into account the sampling variability or uncertainty in the sample estimates by incorporating standard error in its calculation procedures. Consider the following graph let us assume that the oval on the horizontal line is the sample mean BMI or the estimated true mean BMI and the vertical lines on the edges of the horizontal line are the two limits known as confidence limits or confidence intervals for the true mean BMI. It is expected that the true mean BMI will be within these limits. The confidence interval has an associated percentage, for example 95%, to show how confident we are that this interval contains the true mean. Since we put some confidence in this interval procedure, the interval is called a Confidence Interval. Sample Mean BMI (6.86 kg/m ) Confidence Interval Confidence intervals can be calculated for various parameters of interest such as mean, proportion, relative risk, odds ratio, etc. However, in this module we will discuss confidence intervals for the true mean and true proportion confidence interval for relative risk and odds ratio will be discussed in Module 7. Note that the confidence interval for the true mean (or simply mean) is appropriate for continuous data and the confidence interval for the true proportion (or simply proportion) is appropriate for categorical data. Calculations of confidence intervals for true mean always assume that the sampled population is normally distributed. In this Module, we will discuss on following topics: Confidence intervals for a single true mean Comparing two groups: confidence intervals for the difference between mean of two independent populations Confidence intervals for a single true proportion Comparing two groups: confidence intervals for the difference between true proportions of two independent populations Comparing two groups: confidence intervals for the difference between two true means, paired data (paired samples) Notations in this module are: Sample size: n True/population mean: μ Sample mean: x True/population standard deviation: σ Sample standard deviation: s Standard error: SE True/population proportion: π 56

3 Number of patients in a sample of n patients with a special characteristic: r Sample proportion: p = r / n The normal distribution multiplier: Ζ The t -distribution multiplier: Τ 4. Confidence Intervals for a Single True Mean In medical research sometimes we are interested in calculating confidence intervals for a single true mean. For example, assume that we are interested in the confidence interval for the true mean BMI of cardiac surgery patients in Victoria. As discussed in Module 3, for a large sample, the sampling distribution of the sample mean (the distribution of mean in repeated samples) follows the normal distribution regardless of the shape of the sampled population (see Figure 3.5, Module 3). Thus, according to the normal distribution probability law as discussed in Module 3: (a) 68% of the sample means will be within one SE of the true mean, (b) 95% of the sample means will be within 1.96 (or approximately ) SE of the true mean and (c) 99% of the sample means lie within.58 ( or approximately 3) SE of the true mean. If a sample mean is within SE of the true mean, the true mean is also within SE of the sample mean. This means that if we draw many samples and calculate the interval Sample Mean ±. 0 SE of Mean for each sample, 95% of these intervals will include the true mean and 5% will not include it. Equivalently, in notation this interval can be written as: x ±. 0 SE, where x is the sample mean and SE is the standard error for the mean. This interval is called the 95% confidence interval for the true mean (or simply for the mean) because in 95% of the samples the true mean will be covered by this random interval. More specifically, if we take, for example, 100 random samples of the same size, each sample may yield a different 95% confidence interval. Among these % confidence intervals, we expect 95 to cover the true mean, 5 not to cover it. In other words, if we were to select 100 random samples of the same size from the population and to calculate 100 different confidence intervals for the true mean, approximately 95 of the intervals would cover the true mean and 5 would not cover it. We never know if the confidence interval calculated from a sample is a good one (actually contains the true mean) or a bad one (does not contain the true mean). All we know is that in the long run, 95% of the confidence intervals calculated are good because they include the true mean. Let us draw 0 samples each of size 30 from the cardiac surgery population and for each sample calculate the sample mean BMI and SE for the mean. Then we calculate x ±. 0 SE, the 95% confidence interval, for each sample; these intervals are presented in Figure

4 Figure 4.1: Confidence intervals for μ (true mean) from repeated samples 34 Sample mean and 95% CI for 0 samples each of size 30 3 Sample BMI mean Random sample number Sample mean Mean coverage A description of the Figure 4.1 is as follows: The horizontal line is the unknown true mean BMI for cardiac surgery patients in Victoria. The thick dot in the middle of each vertical line show the sample mean for each sample. Edges of each small vertical line are lower and upper intervals calculated using the formula x ±. 0 SE. All the intervals except the first interval from the left include the true mean. A 95% confidence interval is common in health science research. However, one may be interested in calculating confidence intervals for other confidence levels e.g., 90%, 93%, 98% etc. Further, if the sample size is not sufficiently large the reference ranges ±1 SE, ± SE and ±3 SE do not hold (How large is a large? There is no specific answer for this question however some text books consider a sample of size grater than 30 as large). This means, for small samples the interval x ±. 0 SE may not covers the true mean in 95% of the samples. Similarly, the interval x ± 3 SE may fail to include the true mean in 99% of the samples. Hence, we require a general formula for constructing confidence intervals for a single true mean which is as follows. Sample Mean ± Multiplier SE of Mean Two multipliers, namely the normal distribution multiplier denoted by Z and the t - distribution multiplier denoted by T are widely used in the construction of confidence intervals. The value of a multiplier relates to the amount of confidence (e.g., 95%) used for obtaining a confidence interval. 58

5 How do we choose between the Z and T multipliers? Usually if the sample size is small we use the t -distribution multiplier and use the normal distribution multiplier when the sample size is large. However, the t - distribution and normal distribution multipliers are the same for large samples, therefore, the t -distribution multiplier can be used for both small and large samples. So throughout this module we use the t-distribution multiplier T for constructing confidence intervals for the population mean. Thus, the formula for the confidence interval for the true mean is given by Sample Mean ± T SE of Mean s Here, SE = ; s is the sample standard deviation. Thus, the final formula for n confidence interval for a single true mean is as follows: s Sample Mean ± T n Confidence intervals for a true mean can be calculated under either of the following two different assumptions: (a) the true standard deviation is known and (b) the true standard deviation is unknown. In practice, it is unlikely that the true mean is unknown while the true standard deviation is known. Therefore in this module we will discuss confidence intervals under the assumption that the true standard deviation is unknown. How to calculate multipliers? The T multiplier value is obtained from the t-distribution table, Table 4.7 presented at the end of this module. For a small sample size, the T value in Table 4.7 depends on: The degrees of freedom (d.f.) of the t-distribution; the d.f. of the t-distribution for a single sample data can be obtained by subtracting the value of one from the total number of observations and The confidence level (e.g., 95%). The first column in Table 4.7 shows the d.f. and the first row presents the confidence level. For example, let us assume that we want to calculate a 95% confidence interval for the true mean BMI; draw a random sample of 30 patients from the population. Then the t-distribution multiplier value T can be obtained as stated in the following steps. Step 1: Open the t- distribution table, Table 4.7 at the end of this module. Step : Go to the row with d.f. of 9 (d.f. = sample size minus one) in the first column. Step 3: Then go along to the column for the 95% confidence level. Step 4: The value in the intersection of d.f. of 9 and confidence level of 95% is the required T-value here the T-value is.045. Note: If the d.f. does not exactly match with any of those presented in Table 4.7, then round the d.f. value to the nearest value in the table. For example, consider a d.f. of 37 which lies between d.f. of 30 and 40 we round this d.f. value to 40 (the nearest value) and hence T =.01. If the d.f. is larger than 10, we consider it as infinite (, see the last row of Table 4.7). 59

6 Assumptions for Confidence Intervals: We make the following assumptions when constructing confidence intervals for the true mean: o The Sample is drawn randomly. o Observations within a sample are independent. o The Sampled population is normally distributed. Steps for Construction of Confidence Intervals: We can calculate confidence intervals following the steps stated below: Draw a random sample and then calculate the sample mean and the sample standard deviation (use Excel). Compute the SE of the mean. Calculate the d.f. and then find the T-value from the t-distribution table (Table 4.7 in the appendix) Put the sample mean, the multiplier Τ and the SE together in the following formula to give the interval of plausible values for the true mean. s Sample Mean ± T n Consider the variable body mass index (BMI) for the cardiac surgery patients in Victoria as discussed earlier consider the BMI data in Table 4.1 and calculate the sample mean and SE of the mean. We are interested in the 95% confidence interval for the true mean BMI or simply the mean BMI in the population. Calculation of 95% confidence interval (you can use Microsoft Excel for most of the calculations shown below): Sample size: n = 30 Degrees of freedom: d.f. = n 1 = 9 Sample mean: x = kg/m Sample standard deviation: s =. 995 kg/m s.995 Standard error for mean: SE = = = kg/m n 30 Multiplier: T =.045 (see Table 4.7) Lower limit = Sample Mean T SE = = Upper limit = Sample Mean + T SE = = Confidence interval is (5.74, 7.98 kg/m ) We are 95% confident that the mean BMI of cardiac surgery patients in Victoria is in the interval 5.74 to 7.98 kg/m this indicates that in general the patients who had cardiac surgery are overweight (Healthy: 18.5 BMI 5; overweight: 5<BMI 5; obese: BMI>30 kg/m. 60

7 Width of a Confidence Interval: The width of a confidence interval is the difference between the upper limit and lower limit of the interval (in the above example the width is: =. 4 kg/m ). A smaller width shows a better confidence interval and vice versa. In practice, we prefer higher confidence level and narrower width of intervals a narrower width indicates smaller sampling variability or uncertainty in the sample estimate of the population/true mean. If the confidence level increases, the width increases and vice versa. However, the width of a confidence interval can be reduced without compromising the higher confidence level by increasing the sample size. If the sample size increases, the standard error decreases (uncertainty decreases); this results in a narrower confidence interval. 4.3 Comparing Two Groups Comparison between two groups of patients is very common in medical research as well as other scientific research. For example, a clinician may be interested in comparing a new drug A with an old drug B, a baby food producer would like to compare whether his product elevates baby weights faster than a product produced by another company, a research nurse may wish to compare the serum iron level for two groups of children, a cardiologist may be interested in comparing the preoperative creatinine level of patients by their diabetic status, a researcher may compare the BMI between male and female cardiac surgery patients, etc. Two groups of patients can be compared by comparing specific parameters of interest from each group (e.g., mean, proportion). The two methods that are commonly used for comparing parameters of two groups are confidence intervals and hypothesis testing. In this module, we discuss the method of confidence interval. The hypothesis testing technique will be discussed in Module 5. In this section, we compare two groups by constructing a confidence interval for the difference in two true means. The construction of a confidence interval for the difference between two true means also requires the knowledge of sampling distribution for the difference in two sample means. As discussed in Module 3, for large sample, the sampling distribution for the difference in two sample means follows normal distribution CI for the difference in two true means Since the sampling distribution for the difference between two sample means follows a normal distribution, 95% of the differences in sample means fall within SE of the population mean difference. Then 95% of the time the difference between population means will also be within SE of the difference in sample means. Similarly, in 99% of the cases the true mean difference will be within 3 SE of the difference in sample means. Thus the general formula for calculating confidence intervals for the difference between two true means is given by: 61

8 ( Difference intwo Sample Means ± Multiplier SE for the Difference BetweenTwo Sample Means) The formula for SE will be discussed later; the manual calculation of the SE is bit complex but Microsoft Excel can help us to calculate it. As was discussed in Section 4., we use the t-distribution multiplier T for large as well as small samples. Thus the final formula for confidence intervals for the difference in two true means is given by: Diff ± T SE We encounter mainly two different possibilities for calculating confidence intervals for the difference between two true means, they are: (a) Case 1: True standard deviations are unknown, but assumed equal and (b) Case : True standard deviations are unknown, but assumed unequal. Many text books consider the case where the population standard deviations are known, however in medical research we avoid making assumptions that we are unlikely to meet in real life. In fact, it is impractical that the true means are unknown while the true standard deviations are known. How do we choose between Case 1 and Case in real life? We calculate the standard deviation for each sample data - if the standard deviations are close to each other we use Case 1, otherwise use Case. There is no rule and thumb on how close is a close, just use your judgement. Alternatively, statistical theory can be used to assess the significance of the difference between two standard deviations however this topic is beyond the scope of this subject. Assumptions: o Sampled populations are independent and normally distributed. o Samples are drawn randomly. o Observations within a sample are independent. Steps for the calculation of confidence intervals: Calculate the sample means and standard deviations (use Excel) Calculate the difference in sample means Calculate the SE for the difference in two sample means (formula depends on the choice between Case 1 and Case ) Calculate the d.f. (formula depends on the assumptions of the equality of standard deviations) Obtain the t-distribution multiplier, T (see Table 4.7) Finally, put the value of the sample mean difference, SE and T into the formula for confidence intervals and do some calculations this will give the confidence interval for the difference in two true means. 6

9 Case 1: True SDs are Unknown but Assumed Equal Consider a study that was conducted to investigate the risk factors for heart disease among male and female patients in Victoria. One of the characteristics examined was body mass index (BMI), a measure of the extent to which an individual is overweight. We wish to determine whether the mean BMI of the male patients is equal to that of the female patients in the population. Let us assume that we have a random sample of size 5 from the male cardiac surgery patiens and another random sample of size 19 from the female patients. The BMI of male and female patients in the sample are presented in Table 4.. Table 4.: Body mass indices (kg/m ) in samples of 5 male and 19 female patients BMI of Male Patients BMI of Female Patients Using Microsoft Excel, the sample mean and standard deviation for BMI of male patients are respectively 9.1 kg/m and 4.97 kg/m. Similarly, for female patients the sample mean and standard deviation are respectively 7.04 kg/m and 4.75 kg/m. Clearly, the sample standard deviations are close enough to assume that the population standard deviations are equal. The formula for the SE is as follows: s s SE p p ( n1 1) s1 + ( n 1) s = +, where s p = n1 n n1 + n The d.f. is calculated using the formula: d.f. = n 1) + ( n 1) = n + n. ( 1 1 Here n1 and n are sample sizes and s1 and s are standard deviations for male and female patients respectively. The pooled variance s p can be calculated using Microsoft Excel (see Excel Help posted on MUSO WebCT). Calculations: Sample mean difference: (Male Female) = =.17 kg/m Degrees of freedom: d.f. = = 44 = 4 Multiplier: Τ =. 01 Pooled variance: = s p 63

10 s p s p Standard error: SE = + = + = n n Lower limit = Diff T SE = = Upper limit = Diff + T SE = = Hence, the 95% confidence interval is ( kg/m, kg/m ) On the basis of the sample data, we are 95% confident that the difference between the true means of BMI for male and female patients lies between kg/m and kg/m. Since this interval includes zero we conclude that the difference between means BMI for male and female patients in the cardiac surgery population in Victoria is not statistically significant. This means the sample data does not support a difference. Note that if the confidence interval for the difference in means of two populations include zero, the difference is not statistically significant, otherwise the difference is significant. A significant difference means that we have enough evidence from the sample data to say that there exists a difference between two population means otherwise we do not have enough evidence to conclude a difference. Case : True SDs are Unknown but Assumed Unequal We now turn to the situation where the true standard deviations are unknown but assumed unequal. This case is likely to be encountered in real life medical research as well as other data. Let us consider a study where the investigator is interested in comparing the preoperative creatinine level for cardiac surgery patients with and without diabetes. A random sample of 50 patients was selected from each group (population). The data is shown in Table 4.3. Using Microsoft Excel, the sample mean and standard deviation for preoperative creatinine levels for diabetics and non-diabetics in Table 3.4 are respectively 0.16 and mmol/l and and mmol/l. The standard error is calculated using the following formula: s1 s SE = + n n For this case, the formula for d.f. is complex, so use Microsoft Excel to calculate it. The formula is as follows: d.f. 1 1 s1 s + n1 n = s 1 s n1 n + n n

11 Table 4.3: Preoperative creatinine level (mmol/l) for 50 diabetic and 50 nondiabetics cardiac surgery patients Diabetes Non-diabetes Calculations: Difference in means = (Diabetes Non-diabetes) = = Degrees of freedom: d.f. = 59 (using Excel) s1 s (0.1095) (0.0346) SE = + = + = n n o s = and s = mmol/l Multiplier T =.0 Lower limit = Diff T * SE = * = Upper limit = Diff + T * SE = * = % confidence interval is ( , mmol/l) Thus, we are 95% confident that the difference in preoperative mean creatinine level for diabetic and non-diabetic cardiac surgery patients in Victoria lies between mmol/l and mmol/l. The difference in population means of creatinine level for diabetes and non-diabetes is not statistically significant because the interval does not exclude the value of zero. Thus we do not have sufficient evidence from the data to conclude a difference between the means of the preoperative creatinine levels for these two groups of patients in the population. 65

12 4.4 Confidence Intervals for the Population Proportion According to the central limit theorem for large samples, 95% of the sample proportions in repeated sampling fall within 1.96 SE of the population proportion. Therefore, 95% of the times the population proportion will be within 1.96 SE of the sample proportion. Similarly, in 99% of the samples the population proportion will be within.57 SE of the sample proportion. Thus the formula for confidence intervals for the population proportion is given by: Sample proportion ± Ζ SE of sample proportion Here we always use Z-multiplier because the sampling distribution for the sample proportion is always normal provided the sample size is large. The value of Z-multiplier for confidence levels of 90%, 95% and 99% are respectively 1.65, 1.96 and.57 (see Table 4.8 at the end of this module). The SE for the sample proportion is given by p ( 1 p) / n, here n is the sample size and ϑ is the estimated population proportion or simply the sample proportion. Let us assume that we want to calculate the confidence interval for the proportion of cardiac surgery patiens who are diabetic and have preoperative creatinine level greater than 0.133mmol/L in the population. Consider the data in Table 4.4. The calculation of the 95% confidence interval is as follows: Sample size: 50 No. of patients in the sample with creatinine level > mmol/l: 8 Sample proportion: 8/50 = (1 0.16) SE for proportion: p ( 1 p) / n = = Multiplier: Z = 1.96 (see Table 4.8 at the end of this module) Lower limit: p Ζ SE = = Upper limit: p Ζ SE = = 0. 6 Confidence interval is: (0.058, 0.6) We are 95% confident that the proportion of diabetic cardiac surgery patients in Victoria with preoperative creatinine level grater than 0.133mmol/L is between and Comparing Two Groups: CI for the difference in two population proportions Comparison between two groups of patients is also very common when the information from each patient was collected or recoded in a categorical scale. This type of comparison arises in both observational and experimental studies. For 66

13 example, a researcher may be interested in comparing the mortality of cardiac surgery patients between two hospitals; a clinician may compare the two groups of cancer patients reported on pain relieve where one group taking treatment and another group is on placebo; a cardiac surgeon may compare the proportions of obesity between male and female cardiac surgery patients in the populations, etc. The above comparisons can be done by constructing a confidence interval for the difference between two population proportions. Construction of this confidence interval requires the knowledge of sampling distribution for the difference between sample proportions calculated from two groups of patients. It has been discussed in Module 3 that for large samples the sampling distribution for the difference between two sample proportions approaches the normal distribution and hence in 95% of the cases the difference in two population proportions falls within 1.96 standard error of the difference between sample proportions. Similarly 99% of the times the population proportion difference falls within.57 standard error of the sample proportion difference. Thus the general formula for the confidence interval for the difference between two population proportions is given by: ( Difference in Sample proportions ± Ζ SE for the Difference Between Sample proportions) The standard error for the difference in sample proportions is given by: p1 (1 p1) p (1 p ) SE = + n1 n Here n1 and n and p1 and p are respectively sample sizes and sample proportions for the first group (e.g., treatment) and second group (e.g., placebo) of patients. Consider a study where we want to compare the risk of overweight between male and female patients. We have random sample of 40 male and 50 female patients from the population, the data is shown in Table 4.5. If the patients BMI is greater than 5 kg/m we record it as 1, otherwise it was recorded as 0. The data in Table 4.5 shows that 9 out of 40 male patients and 3 out of 50 female patients have BMI above 5 kg/m. Table 4.5: BMI for 40 male and 50 female patients (BMI>5, Yes = 1 and No = 0) Male Female

14 In the above study we are interested in comparing the proportion of male and female patients with BMI greater than 5 kg/m. Calculation of 95% confidence interval is as follows: Sample sizes: male (M): n = 1 40 and female (F): n = 50. Sample proportions: M: p 1 = 31/ 40 = 77.5% and F: p = 3 / 50 = 64% Multiplier: Ζ = 1.96 (see Table 4.8) ( ) 0.64 (1 0.64) Standard error: SE = + = Difference between sample proportions (M-F): Lower limit = Diff Z SE = = Upper limit = Diff + Z SE = = % Confidence Interval is (-0.051, 0.31) We are 95% confident that the difference in proportion of male and female patients who have BMI above 5 kg/m in the population lies between and Since the interval includes the value of zero, the difference may not be statistically significant. This means the data does not suggest a difference. 4.5 Comparing Two Groups: Paired Data In Section 4.3 we discussed the difference between two population means assuming that the samples were independent. However, there are studies where the data consists of pairs of measurements. These pairs may be two outcomes measured on the same subjects/patients under two different treatments. The same subjects may be measured before and after receiving some treatments. Also the pairs may be two individuals matched during sample selection to share some key characteristics such as age and sex. Pairs of twins or siblings may be assigned randomly to two treatments in such a way that members of a single pair receive different treatments. It sometimes happens that true differences do not exist between the two populations with respect to the variable of interest, but the presence of extraneous sources of variation may cause rejection of hypothesis of no difference. On the other hand, the differences may be masked by the presence of extraneous factors. The objective in paired comparisons is to eliminate a maximum number of sources of extraneous variation by making the pairs similar with respect to as many variables as possible. This is done by considering the differences between each pair of observations. Thus we convert our data of pairs of values into a single sample of differences. Instead of performing the analysis with individual observations, we use the difference between pairs of observations, as the variable of interest. We denote these differences by d and the standard deviation of differences by. Here we assume that the differences between pairs of observations are random and follow the normal distribution. It can be shown that the sampling distribution for the mean of differences follows the normal distribution. Thus, 95% of the means of differences will be within SE of the difference in true means, i.e., in repeated 68 s d

15 sampling, 95% of the times the difference in true means will be within two SE of the mean of differences. Hence, a 95% confidence interval for the difference in two true means for paired data can be calculated using the following formula. Mean of Differences ± T * SE of Differences Or, equivalently in notation: d ± Τ SE = d ± Τ S d n Here Τ is the t distribution Multiplier value with ( n 1) degrees of freedom, n is the number of pairs (see Table 4.7 for T value). Please note that the number of observations in each sample must be the same. Consider a study that was conducted to determine weight loss in obese women before and after 1 weeks of treatment with a very-low-calorie diet (VLCD). The 9 women participating in the study were from an outpatient, hospital based treatment program for obesity. The women s weights before and after the 1-week VLCD treatment are shown in the following table (Columns 1 & ) and the difference in weight (after before) is in column 3. Weight Before VLCD Table 4.6: Weight (kg) loss of 9 women Weight After VLCD d = After Before

16 Steps for calculation of a 95% confidence interval are: Number of pairs: n = 9 Sample mean of observed differences: d =. 59 kg Standard deviation of differences: s = 5. 3 kg sd 5.3 Standard Error: SE = = = kg n 9 Multiplier: Τ =. 306 (see Table 4.7) Lower limit = = kg Upper limit = = kg 95% confidence interval is ( 6.68, kg) d We are 95% confident that the true mean of differences of weights before and 1 weeks after VLCD lies between kg and kg. Since the interval does not include zero, the difference is statistically significant. This means, the sample data support that the weights before and 1 weeks after VLCD may be different. The 95% confidence interval resulted in negative limits which indicate that VLCD reduces the weight significantly (since we calculated the confidence interval for the difference of after weights minus the before weights, that is, confidence interval for the difference between After Before in the population). 70

17 Summary of This Module: Terms and notations: Sample mean True/ population mean Sample standard deviation True standard deviation Difference between two sample means Difference between two true means Sample proportion True proportion Difference in two true proportions T-multiplier Z-multiplier Standard error (SE) Confidence intervals (CI) o Paired samples o Independent samples 95% confidence interval continuous data: Confidence intervals for a single true/population mean Confidence intervals for the difference in two true means assuming equal standard deviations Confidence intervals for the difference in two true means assuming unequal standard deviations Confidence intervals for the difference in two true means paired data 95% confidence interval categorical data: Confidence intervals for a single true proportion Confidence intervals for the difference in two true proportions 71

18 Table 4.7: The t-distribution table for -tailed p-values (and confidence level) p-value (confidence level) d.f. 0.0 (80%) 0.10 (90%) 0.05 (95%) 0.0 (98%) 0.01 (99%) α

19 Table 4.8: Normal distribution multiplier ( Ζ ). Confidence Level Ζ -Value 90% % %.57 73

Week 4: Standard Error and Confidence Intervals

Week 4: Standard Error and Confidence Intervals Health Sciences M.Sc. Programme Applied Biostatistics Week 4: Standard Error and Confidence Intervals Sampling Most research data come from subjects we think of as samples drawn from a larger population.

More information

6.4 Normal Distribution

6.4 Normal Distribution Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

More information

4. Continuous Random Variables, the Pareto and Normal Distributions

4. Continuous Random Variables, the Pareto and Normal Distributions 4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random

More information

Statistical estimation using confidence intervals

Statistical estimation using confidence intervals 0894PP_ch06 15/3/02 11:02 am Page 135 6 Statistical estimation using confidence intervals In Chapter 2, the concept of the central nature and variability of data and the methods by which these two phenomena

More information

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES SCHOOL OF HEALTH AND HUMAN SCIENCES Using SPSS Topics addressed today: 1. Differences between groups 2. Graphing Use the s4data.sav file for the first part of this session. DON T FORGET TO RECODE YOUR

More information

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis

More information

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as... HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1 PREVIOUSLY used confidence intervals to answer questions such as... You know that 0.25% of women have red/green color blindness. You conduct a study of men

More information

3.4 Statistical inference for 2 populations based on two samples

3.4 Statistical inference for 2 populations based on two samples 3.4 Statistical inference for 2 populations based on two samples Tests for a difference between two population means The first sample will be denoted as X 1, X 2,..., X m. The second sample will be denoted

More information

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as... HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1 PREVIOUSLY used confidence intervals to answer questions such as... You know that 0.25% of women have red/green color blindness. You conduct a study of men

More information

Unit 26 Estimation with Confidence Intervals

Unit 26 Estimation with Confidence Intervals Unit 26 Estimation with Confidence Intervals Objectives: To see how confidence intervals are used to estimate a population proportion, a population mean, a difference in population proportions, or a difference

More information

How To Check For Differences In The One Way Anova

How To Check For Differences In The One Way Anova MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

UNDERSTANDING THE TWO-WAY ANOVA

UNDERSTANDING THE TWO-WAY ANOVA UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

More information

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY 1. Introduction Besides arriving at an appropriate expression of an average or consensus value for observations of a population, it is important to

More information

Chapter 7 Section 7.1: Inference for the Mean of a Population

Chapter 7 Section 7.1: Inference for the Mean of a Population Chapter 7 Section 7.1: Inference for the Mean of a Population Now let s look at a similar situation Take an SRS of size n Normal Population : N(, ). Both and are unknown parameters. Unlike what we used

More information

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1) Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

5.1 Identifying the Target Parameter

5.1 Identifying the Target Parameter University of California, Davis Department of Statistics Summer Session II Statistics 13 August 20, 2012 Date of latest update: August 20 Lecture 5: Estimation with Confidence intervals 5.1 Identifying

More information

Having a coin come up heads or tails is a variable on a nominal scale. Heads is a different category from tails.

Having a coin come up heads or tails is a variable on a nominal scale. Heads is a different category from tails. Chi-square Goodness of Fit Test The chi-square test is designed to test differences whether one frequency is different from another frequency. The chi-square test is designed for use with data on a nominal

More information

The correlation coefficient

The correlation coefficient The correlation coefficient Clinical Biostatistics The correlation coefficient Martin Bland Correlation coefficients are used to measure the of the relationship or association between two quantitative

More information

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing Introduction Hypothesis Testing Mark Lunt Arthritis Research UK Centre for Ecellence in Epidemiology University of Manchester 13/10/2015 We saw last week that we can never know the population parameters

More information

Means, standard deviations and. and standard errors

Means, standard deviations and. and standard errors CHAPTER 4 Means, standard deviations and standard errors 4.1 Introduction Change of units 4.2 Mean, median and mode Coefficient of variation 4.3 Measures of variation 4.4 Calculating the mean and standard

More information

Hypothesis Testing: Two Means, Paired Data, Two Proportions

Hypothesis Testing: Two Means, Paired Data, Two Proportions Chapter 10 Hypothesis Testing: Two Means, Paired Data, Two Proportions 10.1 Hypothesis Testing: Two Population Means and Two Population Proportions 1 10.1.1 Student Learning Objectives By the end of this

More information

Lecture Notes Module 1

Lecture Notes Module 1 Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific

More information

A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING CHAPTER 5. A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING 5.1 Concepts When a number of animals or plots are exposed to a certain treatment, we usually estimate the effect of the treatment

More information

Odds ratio, Odds ratio test for independence, chi-squared statistic.

Odds ratio, Odds ratio test for independence, chi-squared statistic. Odds ratio, Odds ratio test for independence, chi-squared statistic. Announcements: Assignment 5 is live on webpage. Due Wed Aug 1 at 4:30pm. (9 days, 1 hour, 58.5 minutes ) Final exam is Aug 9. Review

More information

2 Sample t-test (unequal sample sizes and unequal variances)

2 Sample t-test (unequal sample sizes and unequal variances) Variations of the t-test: Sample tail Sample t-test (unequal sample sizes and unequal variances) Like the last example, below we have ceramic sherd thickness measurements (in cm) of two samples representing

More information

Describing and presenting data

Describing and presenting data Describing and presenting data All epidemiological studies involve the collection of data on the exposures and outcomes of interest. In a well planned study, the raw observations that constitute the data

More information

C. The null hypothesis is not rejected when the alternative hypothesis is true. A. population parameters.

C. The null hypothesis is not rejected when the alternative hypothesis is true. A. population parameters. Sample Multiple Choice Questions for the material since Midterm 2. Sample questions from Midterms and 2 are also representative of questions that may appear on the final exam.. A randomly selected sample

More information

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

More information

Permutation Tests for Comparing Two Populations

Permutation Tests for Comparing Two Populations Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. Jae-Wan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of

More information

UNDERSTANDING THE DEPENDENT-SAMPLES t TEST

UNDERSTANDING THE DEPENDENT-SAMPLES t TEST UNDERSTANDING THE DEPENDENT-SAMPLES t TEST A dependent-samples t test (a.k.a. matched or paired-samples, matched-pairs, samples, or subjects, simple repeated-measures or within-groups, or correlated groups)

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

Study Guide for the Final Exam

Study Guide for the Final Exam Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

More information

Chapter 3 RANDOM VARIATE GENERATION

Chapter 3 RANDOM VARIATE GENERATION Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

More information

1.5 Oneway Analysis of Variance

1.5 Oneway Analysis of Variance Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments

More information

Association Between Variables

Association Between Variables Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

More information

DATA INTERPRETATION AND STATISTICS

DATA INTERPRETATION AND STATISTICS PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

More information

MBA 611 STATISTICS AND QUANTITATIVE METHODS

MBA 611 STATISTICS AND QUANTITATIVE METHODS MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain

More information

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1.

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1. Lecture 6: Chapter 6: Normal Probability Distributions A normal distribution is a continuous probability distribution for a random variable x. The graph of a normal distribution is called the normal curve.

More information

Chi-square test Fisher s Exact test

Chi-square test Fisher s Exact test Lesson 1 Chi-square test Fisher s Exact test McNemar s Test Lesson 1 Overview Lesson 11 covered two inference methods for categorical data from groups Confidence Intervals for the difference of two proportions

More information

Using Excel for inferential statistics

Using Excel for inferential statistics FACT SHEET Using Excel for inferential statistics Introduction When you collect data, you expect a certain amount of variation, just caused by chance. A wide variety of statistical tests can be applied

More information

Crosstabulation & Chi Square

Crosstabulation & Chi Square Crosstabulation & Chi Square Robert S Michael Chi-square as an Index of Association After examining the distribution of each of the variables, the researcher s next task is to look for relationships among

More information

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples Comparing Two Groups Chapter 7 describes two ways to compare two populations on the basis of independent samples: a confidence interval for the difference in population means and a hypothesis test. The

More information

Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 5 Analysis of variance SPSS Analysis of variance Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,

More information

Confidence Intervals for One Standard Deviation Using Standard Deviation

Confidence Intervals for One Standard Deviation Using Standard Deviation Chapter 640 Confidence Intervals for One Standard Deviation Using Standard Deviation Introduction This routine calculates the sample size necessary to achieve a specified interval width or distance from

More information

5/31/2013. 6.1 Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

5/31/2013. 6.1 Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives. The Normal Distribution C H 6A P T E R The Normal Distribution Outline 6 1 6 2 Applications of the Normal Distribution 6 3 The Central Limit Theorem 6 4 The Normal Approximation to the Binomial Distribution

More information

Standard Deviation Estimator

Standard Deviation Estimator CSS.com Chapter 905 Standard Deviation Estimator Introduction Even though it is not of primary interest, an estimate of the standard deviation (SD) is needed when calculating the power or sample size of

More information

Comparing Means in Two Populations

Comparing Means in Two Populations Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we

More information

Tests for Two Proportions

Tests for Two Proportions Chapter 200 Tests for Two Proportions Introduction This module computes power and sample size for hypothesis tests of the difference, ratio, or odds ratio of two independent proportions. The test statistics

More information

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA Limitations of the t-test Although the t-test is commonly used, it has limitations Can only

More information

Confidence Intervals for the Difference Between Two Means

Confidence Intervals for the Difference Between Two Means Chapter 47 Confidence Intervals for the Difference Between Two Means Introduction This procedure calculates the sample size necessary to achieve a specified distance from the difference in sample means

More information

Testing Research and Statistical Hypotheses

Testing Research and Statistical Hypotheses Testing Research and Statistical Hypotheses Introduction In the last lab we analyzed metric artifact attributes such as thickness or width/thickness ratio. Those were continuous variables, which as you

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

Sample Size and Power in Clinical Trials

Sample Size and Power in Clinical Trials Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance

More information

Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

More information

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance 2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample

More information

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple. Graphical Representations of Data, Mean, Median and Standard Deviation In this class we will consider graphical representations of the distribution of a set of data. The goal is to identify the range of

More information

Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217

Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217 Part 3 Comparing Groups Chapter 7 Comparing Paired Groups 189 Chapter 8 Comparing Two Independent Groups 217 Chapter 9 Comparing More Than Two Groups 257 188 Elementary Statistics Using SAS Chapter 7 Comparing

More information

Normal distribution. ) 2 /2σ. 2π σ

Normal distribution. ) 2 /2σ. 2π σ Normal distribution The normal distribution is the most widely known and used of all distributions. Because the normal distribution approximates many natural phenomena so well, it has developed into a

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

More information

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences Introduction to Statistics for Psychology and Quantitative Methods for Human Sciences Jonathan Marchini Course Information There is website devoted to the course at http://www.stats.ox.ac.uk/ marchini/phs.html

More information

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other 1 Hypothesis Testing Richard S. Balkin, Ph.D., LPC-S, NCC 2 Overview When we have questions about the effect of a treatment or intervention or wish to compare groups, we use hypothesis testing Parametric

More information

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7. THERE ARE TWO WAYS TO DO HYPOTHESIS TESTING WITH STATCRUNCH: WITH SUMMARY DATA (AS IN EXAMPLE 7.17, PAGE 236, IN ROSNER); WITH THE ORIGINAL DATA (AS IN EXAMPLE 8.5, PAGE 301 IN ROSNER THAT USES DATA FROM

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

Tests for One Proportion

Tests for One Proportion Chapter 100 Tests for One Proportion Introduction The One-Sample Proportion Test is used to assess whether a population proportion (P1) is significantly different from a hypothesized value (P0). This is

More information

Examining Differences (Comparing Groups) using SPSS Inferential statistics (Part I) Dwayne Devonish

Examining Differences (Comparing Groups) using SPSS Inferential statistics (Part I) Dwayne Devonish Examining Differences (Comparing Groups) using SPSS Inferential statistics (Part I) Dwayne Devonish Statistics Statistics are quantitative methods of describing, analysing, and drawing inferences (conclusions)

More information

Introduction to Hypothesis Testing OPRE 6301

Introduction to Hypothesis Testing OPRE 6301 Introduction to Hypothesis Testing OPRE 6301 Motivation... The purpose of hypothesis testing is to determine whether there is enough statistical evidence in favor of a certain belief, or hypothesis, about

More information

Content Sheet 7-1: Overview of Quality Control for Quantitative Tests

Content Sheet 7-1: Overview of Quality Control for Quantitative Tests Content Sheet 7-1: Overview of Quality Control for Quantitative Tests Role in quality management system Quality Control (QC) is a component of process control, and is a major element of the quality management

More information

CHI-SQUARE: TESTING FOR GOODNESS OF FIT

CHI-SQUARE: TESTING FOR GOODNESS OF FIT CHI-SQUARE: TESTING FOR GOODNESS OF FIT In the previous chapter we discussed procedures for fitting a hypothesized function to a set of experimental data points. Such procedures involve minimizing a quantity

More information

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name: Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

Non-Inferiority Tests for Two Means using Differences

Non-Inferiority Tests for Two Means using Differences Chapter 450 on-inferiority Tests for Two Means using Differences Introduction This procedure computes power and sample size for non-inferiority tests in two-sample designs in which the outcome is a continuous

More information

Analysis of Variance ANOVA

Analysis of Variance ANOVA Analysis of Variance ANOVA Overview We ve used the t -test to compare the means from two independent groups. Now we ve come to the final topic of the course: how to compare means from more than two populations.

More information

Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure?

Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure? Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure? Harvey Motulsky hmotulsky@graphpad.com This is the first case in what I expect will be a series of case studies. While I mention

More information

3 Some Integer Functions

3 Some Integer Functions 3 Some Integer Functions A Pair of Fundamental Integer Functions The integer function that is the heart of this section is the modulo function. However, before getting to it, let us look at some very simple

More information

Is it statistically significant? The chi-square test

Is it statistically significant? The chi-square test UAS Conference Series 2013/14 Is it statistically significant? The chi-square test Dr Gosia Turner Student Data Management and Analysis 14 September 2010 Page 1 Why chi-square? Tests whether two categorical

More information

Chapter 4. Probability and Probability Distributions

Chapter 4. Probability and Probability Distributions Chapter 4. robability and robability Distributions Importance of Knowing robability To know whether a sample is not identical to the population from which it was selected, it is necessary to assess the

More information

Variables Control Charts

Variables Control Charts MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. Variables

More information

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing. Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative

More information

2x + y = 3. Since the second equation is precisely the same as the first equation, it is enough to find x and y satisfying the system

2x + y = 3. Since the second equation is precisely the same as the first equation, it is enough to find x and y satisfying the system 1. Systems of linear equations We are interested in the solutions to systems of linear equations. A linear equation is of the form 3x 5y + 2z + w = 3. The key thing is that we don t multiply the variables

More information

Linear Models in STATA and ANOVA

Linear Models in STATA and ANOVA Session 4 Linear Models in STATA and ANOVA Page Strengths of Linear Relationships 4-2 A Note on Non-Linear Relationships 4-4 Multiple Linear Regression 4-5 Removal of Variables 4-8 Independent Samples

More information

Fixed-Effect Versus Random-Effects Models

Fixed-Effect Versus Random-Effects Models CHAPTER 13 Fixed-Effect Versus Random-Effects Models Introduction Definition of a summary effect Estimating the summary effect Extreme effect size in a large study or a small study Confidence interval

More information

Estimation and Confidence Intervals

Estimation and Confidence Intervals Estimation and Confidence Intervals Fall 2001 Professor Paul Glasserman B6014: Managerial Statistics 403 Uris Hall Properties of Point Estimates 1 We have already encountered two point estimators: th e

More information

Need for Sampling. Very large populations Destructive testing Continuous production process

Need for Sampling. Very large populations Destructive testing Continuous production process Chapter 4 Sampling and Estimation Need for Sampling Very large populations Destructive testing Continuous production process The objective of sampling is to draw a valid inference about a population. 4-

More information

Chapter 9. Two-Sample Tests. Effect Sizes and Power Paired t Test Calculation

Chapter 9. Two-Sample Tests. Effect Sizes and Power Paired t Test Calculation Chapter 9 Two-Sample Tests Paired t Test (Correlated Groups t Test) Effect Sizes and Power Paired t Test Calculation Summary Independent t Test Chapter 9 Homework Power and Two-Sample Tests: Paired Versus

More information

SIMULATION STUDIES IN STATISTICS WHAT IS A SIMULATION STUDY, AND WHY DO ONE? What is a (Monte Carlo) simulation study, and why do one?

SIMULATION STUDIES IN STATISTICS WHAT IS A SIMULATION STUDY, AND WHY DO ONE? What is a (Monte Carlo) simulation study, and why do one? SIMULATION STUDIES IN STATISTICS WHAT IS A SIMULATION STUDY, AND WHY DO ONE? What is a (Monte Carlo) simulation study, and why do one? Simulations for properties of estimators Simulations for properties

More information

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1. MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column

More information

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE 1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,

More information

General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.

General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1. General Method: Difference of Means 1. Calculate x 1, x 2, SE 1, SE 2. 2. Combined SE = SE1 2 + SE2 2. ASSUMES INDEPENDENT SAMPLES. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n

More information

CS 147: Computer Systems Performance Analysis

CS 147: Computer Systems Performance Analysis CS 147: Computer Systems Performance Analysis One-Factor Experiments CS 147: Computer Systems Performance Analysis One-Factor Experiments 1 / 42 Overview Introduction Overview Overview Introduction Finding

More information

Lesson 17: Margin of Error When Estimating a Population Proportion

Lesson 17: Margin of Error When Estimating a Population Proportion Margin of Error When Estimating a Population Proportion Classwork In this lesson, you will find and interpret the standard deviation of a simulated distribution for a sample proportion and use this information

More information

Chapter 7. One-way ANOVA

Chapter 7. One-way ANOVA Chapter 7 One-way ANOVA One-way ANOVA examines equality of population means for a quantitative outcome and a single categorical explanatory variable with any number of levels. The t-test of Chapter 6 looks

More information

One-Way Analysis of Variance

One-Way Analysis of Variance One-Way Analysis of Variance Note: Much of the math here is tedious but straightforward. We ll skim over it in class but you should be sure to ask questions if you don t understand it. I. Overview A. We

More information

Topic 8. Chi Square Tests

Topic 8. Chi Square Tests BE540W Chi Square Tests Page 1 of 5 Topic 8 Chi Square Tests Topics 1. Introduction to Contingency Tables. Introduction to the Contingency Table Hypothesis Test of No Association.. 3. The Chi Square Test

More information

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing Chapter 8 Hypothesis Testing 1 Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing 8-3 Testing a Claim About a Proportion 8-5 Testing a Claim About a Mean: s Not Known 8-6 Testing

More information

Test Positive True Positive False Positive. Test Negative False Negative True Negative. Figure 5-1: 2 x 2 Contingency Table

Test Positive True Positive False Positive. Test Negative False Negative True Negative. Figure 5-1: 2 x 2 Contingency Table ANALYSIS OF DISCRT VARIABLS / 5 CHAPTR FIV ANALYSIS OF DISCRT VARIABLS Discrete variables are those which can only assume certain fixed values. xamples include outcome variables with results such as live

More information

Analyzing Data with GraphPad Prism

Analyzing Data with GraphPad Prism 1999 GraphPad Software, Inc. All rights reserved. All Rights Reserved. GraphPad Prism, Prism and InStat are registered trademarks of GraphPad Software, Inc. GraphPad is a trademark of GraphPad Software,

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information