Biostatistics Quantitative Data

Size: px
Start display at page:

Download "Biostatistics Quantitative Data"

Transcription

1 Histogram of conc conc Histogram of conc conc Biostatistics Quantitative Data Descriptive Statistics Statistical Models One-sample and Two-Sample Tests Introduction to SAS-ANALYST T- and Rank-Tests using ANALYST Thomas Scheike Quantitative Data This course will focus on the analysis of quantitative data which is encountered in many areas of eperimental research. Data may roughly be grouped into 3 groups : Quantitative data : sperm concentration (mill/ml), height in cm, level of hormones (measured on a continuous scale). Qualitative data : se, race, work, groupings of quantitative data (high/medium/low). Survival data : length of waiting time for some event. For some individuals, however, the event is never recorded. These individuals are censored and this makes some particular methods necessary. We will concentrate on quantitative data and describe : Descriptive techniques. (Histograms, scatter-plots, means, standard deviation, quantiles, percentiles,...) Non-parametric methods. These are based on ranks of data, and may be used for one-sample tests, two-sample tests (paired and un-paired), one-sided analysis of variance and computation of measures of association (Spearman correlation). Regression analysis techniques for normally distributed residuals. These techniques include : t-test (paired and un-paired such), analysis of variance (one- and two-sided), regression analysis, multiple regression analysis, analysis of covariance) We do, however, not discuss how to deal with repeated measures where subjects are followed and measured repeatedly. When repeated measures are encountered they may often be reduced to just one summary number for each subject and thereby analysed by techniques dealt with in this course. 1 2 Descriptive Statistics We consider data on sperm concentration (mill/ml) on two groups of people in a study. One group are members of an association that promotes the development of organic agriculture (n=55), and another group of workers are from a major Scandinavian airline carrier (n=141). How these data were collected is very important if we want to conclude more generally from the data. The data for both groups must be representative for the members of organic agriculture associations and airline workers. This must be very carefully validated, but for now we believe that this is the case. Drawing the data is the most important part of the statistical analysis : The Histogram The histogram is a different and better summary, it describes the distribution of the sperm concentrations for the two groups : Organic farmers Airline eco sas sperm concentration A histogram shows how the data is distributed, i.e., we can find out how many men that have a sperm count lower that 100 mill/ml, say. For the Airline people this is 110 (141) men and the organic farmers have 35 (55) under 100 mill/ml. It is made by grouping of the sperm concentrations and then deciding the height of each bar such that: height width = in group if bars all have the same width this is not important. A difficulty is to decide the width of the bars. Here are two different histograms: Group Density Density

2 Histogram of conc conc Histogram of conc^ conc^0.3 The Histogram The histogram describes the variability of the data. And we can approimate the chance that a data-point is below some limit, above some limit or between two limits by calculating the area of the histogram in the appropriate area : Density Area is = chance ( / number ) Histogram of conc conc What is the probability of seeing a sperm concentration less than 40, say, from a randomly chosen man among our men in the study Percentiles Histogram of conc conc To describe the histogram we may find the data value for which 50 % of the data is above or equal to and 50 % is below or equal to, this is the median. After ordering the data in size the median is the value in the middle of the data, for an even number of data points the median is the average of the two middle values : median = median = (6 + 7)/2 = 6.5 Similarly the 25% percentile (quantile) is the data point for which at least 25% of the data points have a lower or equal value and at least 75 % have a higher or equal value : %percentile = %percentile = 4 Find an approimate median in the histogram? 5 6 Simple Summary Statistics We can calculate the mean (average) and standard-deviation for the two groups : and = 1 i, n Variance = 1 ( i ) 2, n 1 SD = 1 ( i ) 2 n 1 The mean describes the midpoint of the data, and the standard deviation the spread of the data. These number may always be calculated. Symmetric distributions are well characterized by these numbers, whereas a skewed distribution will not be well described. normal density Normal Distribution If a distribution does not appear symmetric one should instead compute median and various percentiles (25 % and 75 %, say) or give the range of the data (largest and smallest value). For the Sperm data the spermconcentration was 77 (77) (mean (SD)), the median and range was 56 and [0,402], respectively. What numbers are best suited to describe how the sperm concentration varies?? The Histogram The histogram based on the data is an approimation of the population the data is a representative sample from. A particularly nice histogram curve is the normal distribution : normal density Normal Distribution which is a good approimation to many symmetric histograms. Some properties of the normal curve is : The normal curve is symmetric around its mean. It is completely described by its mean and SD. By saying that data is normally distributed we mean that the histogram of the data is close (well approimated) to the normal curve. Sometimes a transformation of the data is necessary to make this true Density 7 8

3 Log Normal Distribution meanlog=3,sdlog=1 Histogram for sample of The Normal Distribution Similarly, to how we use the histogram, based on the normal curve we can work out how the data is distributed. The normal curve satisfies that : 50 % of the area is under the mean. 95 % of the area is between [mean SD, mean SD]. 68 % of the area is between [mean - 1 SD, mean + 1 SD]. 2.5 % of the area is between [, mean 1.96SD]. There are tables of the standard normal distribution which has mean=0 and SD=1, and the area between two values for any other normal curve can be found using this table by converting values to standard scores. pnorm() Eample : The height of Danish women are approimately normal with mean 165cm and standard deviation 30cm. If a woman is chosen at random what is the chance that she is lower than 180 cm. Standard score = ( )/30 = 0.5, i.e., 180 is 0.5 standard deviations above the mean. The chance of being less than 0.5 in a standard normal is 0.65 Is it a reasonable statistical model?? What is the chance of a randomly chosen woman is between 190 and 175? Convert to standard scores = 0.83, 0.33 Density Histogram of height height The figure gives the cumulative distribution, i.e., what percent of the distribution is below a given value. The statement may formally be written as : P(X < 0.83) = 0.80; P(X < 0.33) = 0.63and P(0.33 < X < 0.83) = P(X < 0.83) P(X < 0.33) = This is based on the following precise statement about standard scores. With Z normal with mean µ and variance σ 2 it follows that (Z µ)/σ is standard normal Distributions We often draw histogram curves to show how the data is distributed (is varying). How does these two histograms differ from the normal curve Standard Log Normal Distribution : Eample: Suppose that the sperm-concentration in the Danish population is right skewed : If we draw 50 men at random from this distribution we get the following numbers : The first distribution is right skewed. i.e. data from this distribution contains some very high values. Multi Modal Histogram of Distribution c(1, 2) c(1, 2) This other curve have several modes (multi-modal). calculations give mean=27, SD=29, median=17, range=2,250 Now, drawing again gives that : calculations give mean=34, SD=27, median=21, range=4,153 and again : calculations give mean=53, SD=115, median= 16, range=2,287 and again : calculations give mean=26, SD=31, median=20, range=2,

4 Normal Distribution mean=3,sd= Histogram for log of sample of Normal densities Eample cont d : Looking at concentrations on log-scale the population is distributed as follows : Descriptive Statistics : Summary The histogram shows how the data is distributed, i.e., how it is varying. dnorm(3 +, 3, 1) The area of the histogram represents frequency. The normal distribution is a histogram curve that is a good approimation to many histograms. Drawing 50 men randomly from the population gives the following histogram : calculations give mean=2.9, SD=0.99, median=2.8, The mean and standard deviation are useful summaries of how data are distributed. They should be calculated only when the data are approimately normally distributed The median and range are useful summaries of how data are distributed. They should be calculated when the data are not (approimately) normally distributed. range=[0.8,5.5] Now, drawing another random sample of 50 gives : mean=3.0, SD=0.85, median=3.0, range=[1.3,5.0] and again : mean=2.9, SD=1.00, median=2.9, range=[0.4,5.6] and again : mean=3.1, SD=1.07, median=3.1, range=[0.7,4.9] We conclude that for the right skewed data the mean and SD are highly variable, for the normal data the mean and SD, however, provides a very effective summary. The median stays constant for both distributions Statistical Models When a physical quantity is measured several times we will get different results due to measurement error and biological variation. For eample, measuring the height of a subject may yield the following histogram : What we see is variation around the average height. The variation is due to both measurement error and biological variation. Based on the above histogram it appears reasonable to claim that the variation may be described by a normal distribution. We may phrase this as a statistical model : Individual measurement = overall mean + noise If we let the individual measurements be called Y i (the observed data) the overall mean µ (unknown), and the noise ǫ, we have that Y i = µ + ǫ i This is a statistical model that describes how the observed measurements arises. The model claims that the individual observations varies around a fied value (µ), and that the variation is ǫ. A model contains two parts: a systematic part which is of scientific interest and a random variation part which is due to biological and measurement error variation. To complete the specification of the model we also specify how the random variation ǫ i varies. We do this by specifying its distribution. It is assumed that ǫ i N(0, σ 2 ), i.e., it is normal with mean 0 and variance σ Estimation in Statistical Models In a statistical model one wishes to learn primarily about the parameters of the model. However, to understand what can be learned about these one must also study the variability present. In the statistical model Y i = µ + ǫ i i = 1,..., 200 where ǫ i N(0, σ 2 ) are independent noise terms. We want to know µ and σ. We may estimate these quantities by the sample average and standard deviation. ȳ = ˆµ = 1 Y i n and 1 SD = ˆσ = (Y i ȳ) n 1 2 Looking at ȳ and using the statistical model we get that ȳ = ˆµ = µ + 1 n ǫ i The last term is an average of independent noise terms N(0, σ 2 ) and mathematical arguments give that it is distributed as N(0, σ2 ). So we n have described eactly what is known about µ in ˆµ through finding its distribution (N(µ, σ 2 /n)). One way to think about this is that we have a description of how the sample average is varying if we repeat the sampling. The variance of the average is n times smaller than the variance of the individual noise terms. normal density

5 Histogram for log of sample of Histogram and Normal Approimation t dist f=199 and Normal Distribution under Null and Observed t dist f=19 and Normal t dist f=9 and Normal Sperm analysis Scientific interest in level of sperm concentration in Danish population. We have representative sample from population. We wish to see if the level in Denmark is equal to what WHO considers the minimum level (20 mill/ml). A sample of 200 Danish men look like this : The log-transformed data appears to be distributed as a normal distribution. A statistical model is now proposed to describe how the population is varying, containing a systematic part (µ) which is the average log(sperm concentration) in the population and a random variation part ǫ i, which is independent normal random variation N(0, σ 2 ) : Y i = µ + ǫ i i = 1,..., 200 We do not know µ and σ. We may estimate these quantities by the sample average and standard deviation. and SD = ˆσ = ȳ = ˆµ = 1 n 1 Y i = 3.9 (Y i ȳ) 2 = 0.95 n 1 This means that our best guess is that the population has mean 3.9 and the level of random variation is described by a normal distribution with standard deviation equal to 0.95 Sperm analysis, cont d Drawing the best guess at how the population is distributed against the histogram : We see that the histogram and the normal curve approimate each other well. So the statistical model is validated. Which means that we have a reasonable description of the level of random variation, and a reasonable description of the systematic variation. We wish to investigate if the data is consistent with the null-hypothesis H 0 : µ = log(20), if this is not so, we are left with the alternative H A : µ log(20). The meaning of consistent with the null-hypothesis is in statistical terms equivalent to checking if the data could arise when the null-distribution is true. The null-hypothesis claims that the data is distributed around log(20), and if we use the description of the variation found above, the data should arise as a random sample from the left hand curve : 0.5 * 200 * dnorm( + 4, log(20), sl) The right hand curve is the normal approimation to the data. Formally we write Y i = log(20) + ǫ i i = 1,..., 200 ǫ N(0, ) Sperm analysis, cont d The question now is : how well does this fit with the average we found in our data at 3.9? The sample average is distributed as N(µ, σ 2 /n), so if H 0 is true, the sample average is varying around log(20) with a standard deviation at σ/ n (which we estimate as σ/ n = 0.95/14 = 0.05). Thus our guess at how the average is varying under the null is N(log(20),(0.05) 2 ). dnorm( + 4, log(20), sl/200^0.5) Distribution of Mean under Null (log(20) How well does this fit with the data?? Sperm analysis, The t-test To further summarize how the observed sample average compares to the null-hypothesis we can calculate how many standard deviations it is different from the null-hypothesis : T = ȳ log(20) SD/ n = 18 which is t-distributed with n 1 = degrees of freedom (p < ). We define SEM = SD/ n, the standard error of the mean. A t-distribution is varying slightly more than a normal : dnorm(, 0, 1) dnorm(, 0, 1) dnorm(, 0, 1) because we had only a variable guess on the SD of the population. Note that the t-test is on the form observed epected T = standard errror of observed We now calculate the chance of getting a test-statistic as etreme as or more etreme than the observed one. The chance is computed under the null H 0 (the p-value). The smaller this chance is the more evidence against the null. If the p-value is less than 5% we reject the null (at a 5 % level)

6 Histogram log ECO log(eco[eco > 0]) Histogram log SAS log(sas[sas > 0]) Statistical Models The random variation in a statistical model is described by a distribution. Often a normal distribution. The random variation may consist of several components depending on the contet. Different sources may be : Measurement error. Inter-individual variation. Intra-individual variation. Variation over time. Statistical Models, Summary The recipe when doing statistical analysis : Scientific hypothesis is formulated. Make graphs of data, to get a feel for the data, and the variability. Statistical model is proposed and validated. Systematic variation, contains parameters about which the scientific hypotheses is formulated. Random variation described as normal N(0, σ 2 ). Inference about parameters may be drawn in statistical model. The random variation is not the object of interest but we must anyway specify a model for it that appears reasonable to correctly understand how much that can be learned about the systematic part of the variation One-sample Comparison s, the t-test Consider the 55 ecological farmers and the 141 airline workers : Organic farmers eco Airline We now wish to investigate if the sperm-level is equal to the level 40 mill/ml (found in the literature) for the group of ecological farmers. A statistical model is Y i = µ + ǫ i i = 1,..., 55 where ǫ N(0, σ 2 ) are independent noise terms. We know that the data is approimately normal when considered on a log-scale : sas The t-test The one-sample t-test for the hypothesis H 0 : µ = log(40) versus H A : µ log(40). The null claims that we see is a sample from a population that varies symmetrically around log(40). T-test for H 0 is T = ȳ log(40) = 0.51/0.14 = 3.6 SEM which should be looked up in t-distribution with 54 = 55 1 degrees of freedom, where SEM=SD/ n. We get a p-value at Thus, if the null was true and we drew 55 men from the population we would get an average as different or more than the observed average with a chance at We conclude that the sperm-level is significantly higher than 40 mill/ml in population of ecological farmers. A 95 % confidence interval for mean-values we can not reject by a 5 % test are : (ˆµ 1.96 SD/ n, ˆµ SD/ n) ( , ) = (3.9, 4.4) This is the range of values for the mean of the sperm-concentration we believe in. and and therefore investigate the scientific hypothesis on this scale. Estimate µ and σ by sample average and sample standard deviation SD = ˆσ = ȳ = ˆµ = 1 n 1 n 1 Y i = 4.2 (Y i ȳ) 2 =

7 A Non-parametric One-sample Test, The signed-rank test Non-parametric techniques avoids the assumption of normally distributed residuals, and instead ask questions about the median for the population. Still looking at the ecological farmers. We now take a subset of 10 men: and wish to test if they vary symmetrically around 40 mill/ml. We do not specify a detailed statistical model but want to test if H 0 : Distribution symmetric around 0 versus H A : Distribution not symmetric. (skewed for eample) We make a Wilcoon one-sample test a signed rank test. Subtracting 40 from each of the sperm levels we get Ordering these after absolute size and assigning them ranks. We check if the sum of the rank s of the negative values are as big as the ranks of the positive values, as it should be under symmetry. The ranks of the negative numbers are 4.5. We look it up in statistical table. The p-value is p > 0.01 and p < Doing the test on all the data gives a p-value at One may use a normal approimation to compute the p-value, i.e., compute µ = n(n + 1)/4 and σ = n(n + 1)(2n + 1)/24, and Z = T µ σ for n > 20. For smaller values of n use a table. 25 Two-sample Comparison s, the t-test Consider the 55 ecological farmers and the 141 airline workers on a log-scale : Histogram log ECO log(eco[eco > 0]) Histogram log SAS log(sas[sas > 0]) One may want to know if these two groups really could be varying around the same level, and that the differences we see is due to random variation. We start by proposing a statistical model in which we can answer the question: Y i,j = µ i + ǫ i,j i = 1, 2, j = 1,...n i where ǫ i,j N(0, σ 2 i ) are independent noise terms. The histograms of the data shows that the model is a good description of the data on log-scale. Estimating the mean and variability in the two populations underlying the samples give that µ 1 = 3.9 σ1 2 = 1.08 µ 2 = 4.2 σ2 2 = Two-sample Comparison s, the t-test To carry out a two-sample t-test we first need to check if the variability is the same in the two groups. We test if H 0 : σ 1 = σ 2 versus H A : σ 1 σ 2. And use the following test-statistic : F = ma(σ2 1, σ 2 2) min(σ 2 1, σ 2 2) = = 1.27 which we should look up in F distribution with (140, 54) degrees of freedom (p=0.32). So we accept hypothesis. Now we can calculate a combined estimate of the variability SD 2 = (n 1 1)σ1 2 + (n 2 1)σ2 2 (n 1 1) + (n 2 1) = = With the combined variability estimate SD we can proceed to the twosample T-test for H 0 : µ 1 = µ 2 versus H A : µ 1 µ 2 T = ȳ 1 ȳ 2 SD (1/n 1 ) + (1/n 2 ) = 2.82 which we look up in t-distribution with n 1 + n 2 2 = f 1 + f 2 degrees of freedom. (p=0.006). We conclude that the ecological farmers have a significantly higher sperm-level than the airline workers. A 95 % confidence interval for the difference in means of the two groups are given by : (ȳ 1 ȳ SED, ȳ 1 ȳ SED) = ( , ) where SED = SD ( (1/n 1 ) + (1/n 2 )). Non-parametric Two-sample Comparison s, The rank test The non-parametric rank test is also called the Wilcoon-Mann-Whitney test. Consider two groups of data as before. We now wish to test if the distribution of the two population could be equal, or if this must be rejected by a test. The statistical model : : Y i,j arbitrary distribution F i ( ). : All data points are independent. In this non-parametric model we wish to test if : H 0 : Distributions are the same versus H A : Distributions are not the same. We calculate a test-statistic as follows: Pool all data and assign ranks. Sum ranks of smallest group. Look sum of ranks up in statistical table to get p-value. Sum of ranks, T, for ecological farmers is 6342 (total sum of ranks is 19306, and * (55/196) = 5405) which result in p-value at (computer program). One may use a normal approimation to compute the p-value, i.e., compute µ = n 1 (n 1 + n 2 + 1)/2 (5390) and σ = n 2 µ/6 (356), and Z = T µ σ for n 1, n 2 > 10. For smaller values, use a table

8 Paired Comparison s When data is paired the two measurements often are not independent: Make graphs of data. Summary Measuring right- and left bicep. Growth before and after treatment. Height of men of women when sampled as couples. With only two correlated measurements, the data may anyway be analysed by simple techniques. A correct analysis is obtained by making one-sample analysis on the differences. The differences between the before and after measurements are namely independent among subjects. Therefore one should simply test if the differences are varying around 0, by either a t-test or a signedrank-test. When investigating the effect of some drug that prevents sun-burn, say, we could apply the sun-lotion to one arm and placebo to the other. The difference between the arms may be ascribed to the lotion. The difference is a measure that is corrected for inter-individual variation, which may be large. One-sample test: When the variation is approimately normal the t-test may be used to test a hypothesis about the mean of the underlying population. The p-value provided is only valid if the variation is approimately normal. A nice summary of data is provided by the confidence interval of the mean. When data is not normally distributed and interest is concentrated on inference rather than estimates the signed-rank-test may be used. This test is always valid. No confidence intervals may be given. Right skewed data may be transformed to approimate normality by transformations like, 1/3, log. Two-sample test: Two groups of data may be compared by the t-test when the variation is approimately normal and the variance of the residual variation is equal in the two groups. A nice summary of difference between the groups are given by the confidence interval for the difference between the means. When data is not normally distributed and interest is concentrated on inference rather than estimates the rank-test may be used. This test is always valid. No confidence intervals may be given. Paired data is handled by sample techniques on the differences between the pairs Statistical Analysis using Analyst (SAS) Analyst is a windows based application in the SAS statistical software. SAS is activated by clicking : start statistik SAS in the lower lefthand corner. Analyst is activated after solutions analysis Analyst Commands will be presented as we need them for the various analyses, and remember that the focus is on the statistical analyses rather than how one do this and that. We consider data on sperm concentration (mill/ml) on two groups of people in a study. One group are members of an association that promotes the development of organic agriculture (n=55), and another group of workers from a major Scandinavian airline carrier (n=141). now type a new name e.g. oeko12 if you are in from of machine 12. The data-set contains the following variables : obs observation number. abstime length of abstinence in days. age age of subject. s1e2 group indicator. conc sperm concentration (mill/ml). volume volume of sperm sample (ml). The data is loaded file open... from n:\human\oeko that is a SAS data-set. Doing this the data will appear in the data table. It consists of a record for each subject with the variables described above. To make your own new variables when you work with the data you must create your own version of the data. You do this by saving your own version of the data under a new name : File Save

9 Data Manipulations A little bit of data manipulation is needed. New transformed variables are constructed by setting the data frame in edit mode edit mode edit and then data transform compute... now type new variable name (e.g. conc3) and an epression that defines the new variable in the bo below the equality-sign (e.g. conc**.3333). Now, a new variable called conc3 that is equal to concentrations on cube-root scale is defined and appears in the data table. Data Manipulations To group a continuous variable according to its value and to define a classification variable based on it : data transform recode ranges... in the recode dialog give column name (volume) and name of the new grouped version (gvol) and click ok. Now in the net window give the bounds 0,3; 3,4, and 4,15 for the first three groups and name them (1,2,3) in the rightmost column, click ok. To delete variable highlite the column in the data-table : edit delete Alternatively, one may take on of the standard transformations like conc after highlighting the column one wishes to transform by data transform To make a variable that can be used for the one-sample test (e.g. ld40=lconc-log(40)) data transform compute... now type new variable name (ld40) and the epression that defines the new variable in the bo below the equality sign log(conc)-log(40). To construct a subset of the data, e.g., the subset of ecological farmers for an specific analysis for this group : data filter subset data... in the subset dialog you can apply a Where clause to the data (click s1e2 and eq and constant value followed by 1 to select s1e2=1 the Airline workers) Histograms To make a histogram of concentration ( conc ) graphs histogram... select conc as the analysis variable and s1e2 as the class variable (the classification variable). If the class variable is omitted no-classification variable will used. Now, clicking ok does the job. Simple descriptive Statistics To compute mean, standard deviations, variances, medians and percentiles as well as the range statistics descriptive distributions... select conc as the analysis variable and s1e2 as the class variable (the classification variable). Now, clicking ok does the job Organic farmers Airline Output S1E2= Univariate Procedure Variable=CONC Moments N 141 Sum Wgts 141 Mean Sum Std Dev Variance Skewness Kurtosis USS CSS CV Std Mean T:Mean= Pr> T Num ^= Num > M(Sign) 69.5 Pr>= M Sgn Rank 4865 Pr>= S eco sas Quantiles(Def=5) To eamine the normality of a variable one may draw the histogram for a normal distribution on the same plot. To do this click fit in the distribution-dialog and and select normal and ok in the fit-dialog before clicking ok on the distribution-dialog. 100% Ma % % Q % % Med 48 90% % Q % 12 0% Min 0 5% 3.3 1% 0 Range 402 Q3-Q1 68 Mode 12 Etremes Lowest Obs Highest Obs 0( 40) 233( 92) 0( 1) 284( 102) 0.75( 67) 308( 32) 1.88( 60) 358( 104) 2.3( 132) 402( 69) S1E2= Univariate Procedure 35 36

10 Variable=CONC Moments N 55 Sum Wgts 55 Mean Sum Std Dev Variance Skewness Kurtosis USS CSS CV Std Mean T:Mean= Pr> T Num ^= 0 54 Num > 0 54 M(Sign) 27 Pr>= M Sgn Rank Pr>= S Quantiles(Def=5) 100% Ma % % Q % % Med 69 90% % Q % 15 0% Min 0 5% 9.1 1% 0 Range 354 Q3-Q1 105 Mode 69 Etremes Lowest Obs Highest Obs 0( 40) 264( 32) 5.5( 15) 264( 33) 9.1( 35) 297( 47) 11( 42) 322( 14) 14( 10) 354( 51) One-sample T-test and Signed Rank Test We wish to eamine if the hypothesis that the sperm level varies around 40 mill/ml can be statistically rejected or validated. To make a one-sample t-test first transform to log-scale to obtain approimate normality and then compute a new variable dl40=lconc-log(40) (see above). Now, statistics descriptive distributions... selecting the variable dl40 and with class equal to s1e2 does the job. Output: S1E2= Univariate Procedure Variable=DL40 Moments N 139 Sum Wgts 139 Mean Sum Std Dev Variance Skewness Kurtosis USS CSS CV Std Mean T:Mean= Pr> T Num ^= Num > 0 79 M(Sign) 9.5 Pr>= M Sgn Rank 816 Pr>= S Quantiles(Def=5) 100% Ma % % Q % % Med % % Q % % Min % % Range Q3-Q Mode Etremes Lowest Obs Highest Obs ( 67) ( 92) ( 60) ( 102) ( 132) ( 32) ( 111) ( 104) ( 49) ( 69) Missing Value. Count 2 % Count/Nobs S1E2= Univariate Procedure Variable=DL40 Moments N 54 Sum Wgts 54 Mean Sum Std Dev Variance Skewness Kurtosis USS CSS CV Std Mean T:Mean= Pr> T Num ^= 0 54 Num > 0 41 M(Sign) 14 Pr>= M Sgn Rank Pr>= S Quantiles(Def=5) 100% Ma % % Q % % Med % % Q % % Min % % Range Q3-Q Mode Etremes Lowest Obs Highest Obs ( 15) ( 32) ( 35) ( 33) ( 42) ( 47) ( 10) ( 14) ( 29) ( 51) One-sample T-test Alternatively one may use a special menu that has been designed especially for the one-sample t-test statistics hypothesis tests One-sample t-test... selecting the variable lconc and entering the mean we wish to test as 4. Note that the t-test should be carried out only the group of ecological farmers, say, and that the active data-set therefore should be only this group. To make the test it is necessary to construct a new data set that consists of the group of interest as done in the data manipulation section above. Output: One Sample T Test for a Mean Sample Statistics for LCONC N Mean Std. Dev. Std. Error Hypothesis Test Null hypothesis: Mean of LCONC = 4 Alternative: Mean of LCONC ^= 4 t Statistic Df Prob > t To make the t-test of the two groups, you can specify that you want it done for the two groups under the variables button, by given s1e2 as the by variable. Missing Value. Count 1 % Count/Nobs

11 Two-sample T-test for Means (un-paired data) To compare the concentrations for the two groups statistics hypothesis tests Two-sample t-test for means... selecting the variable lconc and the group variable s1e2. Output: Two Sample T Test for the Means of LCONC within S1E2 Sample Statistics Group N Mean Std. Dev. Std. Error Hypothesis Test Null hypothesis: Mean 1 - Mean 2 = 0 Alternative: Mean 1 - Mean 2 ^= 0 If Variances Are t statistic Df Pr > t Equal Not Equal It is useful to supplement the analysis with some plots. Try for eample the plots button, and select one of the plots. The conclusions are based on an assumption of equal variances, and this should be validated. The output may indicate that this is the case, but if in doubt one can carry out a test that shows have serious the deviation from equal variances are. Two-sample T-test for Variances (un-paired data) To compare the concentrations for the two groups statistics hypothesis tests Two-sample t-test for variances... selecting the variable lconc and the group variable s1e2. Output: Two Sample Test for Variances of LCONC within S1E2 Sample Statistics S1E2 Group N Mean Std. Dev. Variance Hypothesis Test Null hypothesis: Variance 1 / Variance 2 = 1 Alternative: Variance 1 / Variance 2 ^= 1 - Degrees of Freedom - F Numer. Denom. Pr > F Two-Sample Signed Rank Test The two-sample signed rank test can more generally by considered as a special case of the Kruskal-Wallis test that test if k groups have the same distribution. To carry out the two-sample signed rank test : statistics ANOVA non-parametric one-way ANOVA... selecting the variable conc and the group variable s1e2. Output: Wilcoon Scores (Rank Sums) for Variable CONC Classified by Variable S1E2 Sum of Epected Std Dev Mean S1E2 N Scores Under H0 Under H0 Score Average Scores Were Used for Ties Eercise-I Rather than considering the concentration we shall now consider the volume of each sperm sample as the parameter of interest. We wish to compare the ecological farmers and the airline workers. A volume of 3 ml is considered normal. Investigate further if the two groups are normal in this respect. 3) Without doing any computer work make a strategy for how such an analyses can and should be carried out. What descriptive plots and statistics are needed? What hypothesis are formulated and tested? How will you validate the necessary assumptions for the suggested analysis? 4) Do the analyses, make the plots and so on. Remember to interpret the results according to the subject matter. Wilcoon 2-Sample Test (Normal Approimation) (with Continuity Correction of.5) S = Z = Prob > Z = T-Test Appro. Significance = Kruskal-Wallis Test (Chi-Square Approimation) CHISQ = DF = 1 Prob > CHISQ =

Data Analysis Tools. Tools for Summarizing Data

Data Analysis Tools. Tools for Summarizing Data Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool

More information

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES SCHOOL OF HEALTH AND HUMAN SCIENCES Using SPSS Topics addressed today: 1. Differences between groups 2. Graphing Use the s4data.sav file for the first part of this session. DON T FORGET TO RECODE YOUR

More information

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis

More information

CHAPTER 14 NONPARAMETRIC TESTS

CHAPTER 14 NONPARAMETRIC TESTS CHAPTER 14 NONPARAMETRIC TESTS Everything that we have done up until now in statistics has relied heavily on one major fact: that our data is normally distributed. We have been able to make inferences

More information

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing Introduction Hypothesis Testing Mark Lunt Arthritis Research UK Centre for Ecellence in Epidemiology University of Manchester 13/10/2015 We saw last week that we can never know the population parameters

More information

Nonparametric Two-Sample Tests. Nonparametric Tests. Sign Test

Nonparametric Two-Sample Tests. Nonparametric Tests. Sign Test Nonparametric Two-Sample Tests Sign test Mann-Whitney U-test (a.k.a. Wilcoxon two-sample test) Kolmogorov-Smirnov Test Wilcoxon Signed-Rank Test Tukey-Duckworth Test 1 Nonparametric Tests Recall, nonparametric

More information

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

More information

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS This booklet contains lecture notes for the nonparametric work in the QM course. This booklet may be online at http://users.ox.ac.uk/~grafen/qmnotes/index.html.

More information

Projects Involving Statistics (& SPSS)

Projects Involving Statistics (& SPSS) Projects Involving Statistics (& SPSS) Academic Skills Advice Starting a project which involves using statistics can feel confusing as there seems to be many different things you can do (charts, graphs,

More information

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1) Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

More information

SPSS Tests for Versions 9 to 13

SPSS Tests for Versions 9 to 13 SPSS Tests for Versions 9 to 13 Chapter 2 Descriptive Statistic (including median) Choose Analyze Descriptive statistics Frequencies... Click on variable(s) then press to move to into Variable(s): list

More information

Chapter 7 Section 7.1: Inference for the Mean of a Population

Chapter 7 Section 7.1: Inference for the Mean of a Population Chapter 7 Section 7.1: Inference for the Mean of a Population Now let s look at a similar situation Take an SRS of size n Normal Population : N(, ). Both and are unknown parameters. Unlike what we used

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

Comparing Means in Two Populations

Comparing Means in Two Populations Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we

More information

1.5 Oneway Analysis of Variance

1.5 Oneway Analysis of Variance Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments

More information

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples Comparing Two Groups Chapter 7 describes two ways to compare two populations on the basis of independent samples: a confidence interval for the difference in population means and a hypothesis test. The

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

MBA 611 STATISTICS AND QUANTITATIVE METHODS

MBA 611 STATISTICS AND QUANTITATIVE METHODS MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain

More information

SPSS Explore procedure

SPSS Explore procedure SPSS Explore procedure One useful function in SPSS is the Explore procedure, which will produce histograms, boxplots, stem-and-leaf plots and extensive descriptive statistics. To run the Explore procedure,

More information

Dongfeng Li. Autumn 2010

Dongfeng Li. Autumn 2010 Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis

More information

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name: Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours

More information

Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 5 Analysis of variance SPSS Analysis of variance Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences. 1 Commands in JMP and Statcrunch Below are a set of commands in JMP and Statcrunch which facilitate a basic statistical analysis. The first part concerns commands in JMP, the second part is for analysis

More information

Study Guide for the Final Exam

Study Guide for the Final Exam Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

More information

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance 2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

The Wilcoxon Rank-Sum Test

The Wilcoxon Rank-Sum Test 1 The Wilcoxon Rank-Sum Test The Wilcoxon rank-sum test is a nonparametric alternative to the twosample t-test which is based solely on the order in which the observations from the two samples fall. We

More information

Introduction to Quantitative Methods

Introduction to Quantitative Methods Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

More information

Statistics Review PSY379

Statistics Review PSY379 Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses

More information

Recall this chart that showed how most of our course would be organized:

Recall this chart that showed how most of our course would be organized: Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

More information

Biostatistics: Types of Data Analysis

Biostatistics: Types of Data Analysis Biostatistics: Types of Data Analysis Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa A Scott, MS

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

NCSS Statistical Software. One-Sample T-Test

NCSS Statistical Software. One-Sample T-Test Chapter 205 Introduction This procedure provides several reports for making inference about a population mean based on a single sample. These reports include confidence intervals of the mean or median,

More information

Analysis of Variance ANOVA

Analysis of Variance ANOVA Analysis of Variance ANOVA Overview We ve used the t -test to compare the means from two independent groups. Now we ve come to the final topic of the course: how to compare means from more than two populations.

More information

Come scegliere un test statistico

Come scegliere un test statistico Come scegliere un test statistico Estratto dal Capitolo 37 of Intuitive Biostatistics (ISBN 0-19-508607-4) by Harvey Motulsky. Copyright 1995 by Oxfd University Press Inc. (disponibile in Iinternet) Table

More information

How To Compare Birds To Other Birds

How To Compare Birds To Other Birds STT 430/630/ES 760 Lecture Notes: Chapter 7: Two-Sample Inference 1 February 27, 2009 Chapter 7: Two Sample Inference Chapter 6 introduced hypothesis testing in the one-sample setting: one sample is obtained

More information

Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217

Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217 Part 3 Comparing Groups Chapter 7 Comparing Paired Groups 189 Chapter 8 Comparing Two Independent Groups 217 Chapter 9 Comparing More Than Two Groups 257 188 Elementary Statistics Using SAS Chapter 7 Comparing

More information

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi - 110 012 seema@iasri.res.in 1. Descriptive Statistics Statistics

More information

SAS Analyst for Windows Tutorial

SAS Analyst for Windows Tutorial Updated: August 2012 Table of Contents Section 1: Introduction... 3 1.1 About this Document... 3 1.2 Introduction to Version 8 of SAS... 3 Section 2: An Overview of SAS V.8 for Windows... 3 2.1 Navigating

More information

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7. THERE ARE TWO WAYS TO DO HYPOTHESIS TESTING WITH STATCRUNCH: WITH SUMMARY DATA (AS IN EXAMPLE 7.17, PAGE 236, IN ROSNER); WITH THE ORIGINAL DATA (AS IN EXAMPLE 8.5, PAGE 301 IN ROSNER THAT USES DATA FROM

More information

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Chapter 45 Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when no assumption

More information

13: Additional ANOVA Topics. Post hoc Comparisons

13: Additional ANOVA Topics. Post hoc Comparisons 13: Additional ANOVA Topics Post hoc Comparisons ANOVA Assumptions Assessing Group Variances When Distributional Assumptions are Severely Violated Kruskal-Wallis Test Post hoc Comparisons In the prior

More information

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management KSTAT MINI-MANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To

More information

Stat 411/511 THE RANDOMIZATION TEST. Charlotte Wickham. stat511.cwick.co.nz. Oct 16 2015

Stat 411/511 THE RANDOMIZATION TEST. Charlotte Wickham. stat511.cwick.co.nz. Oct 16 2015 Stat 411/511 THE RANDOMIZATION TEST Oct 16 2015 Charlotte Wickham stat511.cwick.co.nz Today Review randomization model Conduct randomization test What about CIs? Using a t-distribution as an approximation

More information

Chapter 7. One-way ANOVA

Chapter 7. One-way ANOVA Chapter 7 One-way ANOVA One-way ANOVA examines equality of population means for a quantitative outcome and a single categorical explanatory variable with any number of levels. The t-test of Chapter 6 looks

More information

Standard Deviation Estimator

Standard Deviation Estimator CSS.com Chapter 905 Standard Deviation Estimator Introduction Even though it is not of primary interest, an estimate of the standard deviation (SD) is needed when calculating the power or sample size of

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

More information

Rank-Based Non-Parametric Tests

Rank-Based Non-Parametric Tests Rank-Based Non-Parametric Tests Reminder: Student Instructional Rating Surveys You have until May 8 th to fill out the student instructional rating surveys at https://sakai.rutgers.edu/portal/site/sirs

More information

STAT 350 Practice Final Exam Solution (Spring 2015)

STAT 350 Practice Final Exam Solution (Spring 2015) PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects

More information

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Two-Sample T-Tests Assuming Equal Variance (Enter Means) Chapter 4 Two-Sample T-Tests Assuming Equal Variance (Enter Means) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when the variances of

More information

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone:

More information

DATA INTERPRETATION AND STATISTICS

DATA INTERPRETATION AND STATISTICS PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

More information

Using Excel for inferential statistics

Using Excel for inferential statistics FACT SHEET Using Excel for inferential statistics Introduction When you collect data, you expect a certain amount of variation, just caused by chance. A wide variety of statistical tests can be applied

More information

Erik Parner 14 September 2016. Basic Biostatistics - Day 2-21 September, 2016 1

Erik Parner 14 September 2016. Basic Biostatistics - Day 2-21 September, 2016 1 PhD course in Basic Biostatistics Day Erik Parner, Department of Biostatistics, Aarhus University Log-transformation of continuous data Exercise.+.4+Standard- (Triglyceride) Logarithms and exponentials

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

1 Nonparametric Statistics

1 Nonparametric Statistics 1 Nonparametric Statistics When finding confidence intervals or conducting tests so far, we always described the population with a model, which includes a set of parameters. Then we could make decisions

More information

Tutorial 5: Hypothesis Testing

Tutorial 5: Hypothesis Testing Tutorial 5: Hypothesis Testing Rob Nicholls nicholls@mrc-lmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................

More information

Introduction; Descriptive & Univariate Statistics

Introduction; Descriptive & Univariate Statistics Introduction; Descriptive & Univariate Statistics I. KEY COCEPTS A. Population. Definitions:. The entire set of members in a group. EXAMPLES: All U.S. citizens; all otre Dame Students. 2. All values of

More information

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements

More information

Two-sample hypothesis testing, II 9.07 3/16/2004

Two-sample hypothesis testing, II 9.07 3/16/2004 Two-sample hypothesis testing, II 9.07 3/16/004 Small sample tests for the difference between two independent means For two-sample tests of the difference in mean, things get a little confusing, here,

More information

An introduction to IBM SPSS Statistics

An introduction to IBM SPSS Statistics An introduction to IBM SPSS Statistics Contents 1 Introduction... 1 2 Entering your data... 2 3 Preparing your data for analysis... 10 4 Exploring your data: univariate analysis... 14 5 Generating descriptive

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon t-tests in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com www.excelmasterseries.com

More information

Inference for two Population Means

Inference for two Population Means Inference for two Population Means Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison October 27 November 1, 2011 Two Population Means 1 / 65 Case Study Case Study Example

More information

StatCrunch and Nonparametric Statistics

StatCrunch and Nonparametric Statistics StatCrunch and Nonparametric Statistics You can use StatCrunch to calculate the values of nonparametric statistics. It may not be obvious how to enter the data in StatCrunch for various data sets that

More information

Lecture Notes Module 1

Lecture Notes Module 1 Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific

More information

3.4 Statistical inference for 2 populations based on two samples

3.4 Statistical inference for 2 populations based on two samples 3.4 Statistical inference for 2 populations based on two samples Tests for a difference between two population means The first sample will be denoted as X 1, X 2,..., X m. The second sample will be denoted

More information

How To Test For Significance On A Data Set

How To Test For Significance On A Data Set Non-Parametric Univariate Tests: 1 Sample Sign Test 1 1 SAMPLE SIGN TEST A non-parametric equivalent of the 1 SAMPLE T-TEST. ASSUMPTIONS: Data is non-normally distributed, even after log transforming.

More information

Paired T-Test. Chapter 208. Introduction. Technical Details. Research Questions

Paired T-Test. Chapter 208. Introduction. Technical Details. Research Questions Chapter 208 Introduction This procedure provides several reports for making inference about the difference between two population means based on a paired sample. These reports include confidence intervals

More information

One-Way ANOVA using SPSS 11.0. SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate

One-Way ANOVA using SPSS 11.0. SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate 1 One-Way ANOVA using SPSS 11.0 This section covers steps for testing the difference between three or more group means using the SPSS ANOVA procedures found in the Compare Means analyses. Specifically,

More information

Examining Differences (Comparing Groups) using SPSS Inferential statistics (Part I) Dwayne Devonish

Examining Differences (Comparing Groups) using SPSS Inferential statistics (Part I) Dwayne Devonish Examining Differences (Comparing Groups) using SPSS Inferential statistics (Part I) Dwayne Devonish Statistics Statistics are quantitative methods of describing, analysing, and drawing inferences (conclusions)

More information

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties: Density Curve A density curve is the graph of a continuous probability distribution. It must satisfy the following properties: 1. The total area under the curve must equal 1. 2. Every point on the curve

More information

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of

More information

Confidence Intervals for One Standard Deviation Using Standard Deviation

Confidence Intervals for One Standard Deviation Using Standard Deviation Chapter 640 Confidence Intervals for One Standard Deviation Using Standard Deviation Introduction This routine calculates the sample size necessary to achieve a specified interval width or distance from

More information

Sample Size and Power in Clinical Trials

Sample Size and Power in Clinical Trials Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance

More information

Scatter Plots with Error Bars

Scatter Plots with Error Bars Chapter 165 Scatter Plots with Error Bars Introduction The procedure extends the capability of the basic scatter plot by allowing you to plot the variability in Y and X corresponding to each point. Each

More information

Confidence Intervals for the Difference Between Two Means

Confidence Intervals for the Difference Between Two Means Chapter 47 Confidence Intervals for the Difference Between Two Means Introduction This procedure calculates the sample size necessary to achieve a specified distance from the difference in sample means

More information

General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.

General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1. General Method: Difference of Means 1. Calculate x 1, x 2, SE 1, SE 2. 2. Combined SE = SE1 2 + SE2 2. ASSUMES INDEPENDENT SAMPLES. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n

More information

STATISTICAL ANALYSIS WITH EXCEL COURSE OUTLINE

STATISTICAL ANALYSIS WITH EXCEL COURSE OUTLINE STATISTICAL ANALYSIS WITH EXCEL COURSE OUTLINE Perhaps Microsoft has taken pains to hide some of the most powerful tools in Excel. These add-ins tools work on top of Excel, extending its power and abilities

More information

Guide to Microsoft Excel for calculations, statistics, and plotting data

Guide to Microsoft Excel for calculations, statistics, and plotting data Page 1/47 Guide to Microsoft Excel for calculations, statistics, and plotting data Topic Page A. Writing equations and text 2 1. Writing equations with mathematical operations 2 2. Writing equations with

More information

Hypothesis testing. c 2014, Jeffrey S. Simonoff 1

Hypothesis testing. c 2014, Jeffrey S. Simonoff 1 Hypothesis testing So far, we ve talked about inference from the point of estimation. We ve tried to answer questions like What is a good estimate for a typical value? or How much variability is there

More information

Nonparametric statistics and model selection

Nonparametric statistics and model selection Chapter 5 Nonparametric statistics and model selection In Chapter, we learned about the t-test and its variations. These were designed to compare sample means, and relied heavily on assumptions of normality.

More information

Two-sample inference: Continuous data

Two-sample inference: Continuous data Two-sample inference: Continuous data Patrick Breheny April 5 Patrick Breheny STA 580: Biostatistics I 1/32 Introduction Our next two lectures will deal with two-sample inference for continuous data As

More information

t-test Statistics Overview of Statistical Tests Assumptions

t-test Statistics Overview of Statistical Tests Assumptions t-test Statistics Overview of Statistical Tests Assumption: Testing for Normality The Student s t-distribution Inference about one mean (one sample t-test) Inference about two means (two sample t-test)

More information

Confidence Intervals for Cp

Confidence Intervals for Cp Chapter 296 Confidence Intervals for Cp Introduction This routine calculates the sample size needed to obtain a specified width of a Cp confidence interval at a stated confidence level. Cp is a process

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

IBM SPSS Statistics for Beginners for Windows

IBM SPSS Statistics for Beginners for Windows ISS, NEWCASTLE UNIVERSITY IBM SPSS Statistics for Beginners for Windows A Training Manual for Beginners Dr. S. T. Kometa A Training Manual for Beginners Contents 1 Aims and Objectives... 3 1.1 Learning

More information

SOLUTIONS TO BIOSTATISTICS PRACTICE PROBLEMS

SOLUTIONS TO BIOSTATISTICS PRACTICE PROBLEMS SOLUTIONS TO BIOSTATISTICS PRACTICE PROBLEMS BIOSTATISTICS DESCRIBING DATA, THE NORMAL DISTRIBUTION SOLUTIONS 1. a. To calculate the mean, we just add up all 7 values, and divide by 7. In Xi i= 1 fancy

More information

Skewed Data and Non-parametric Methods

Skewed Data and Non-parametric Methods 0 2 4 6 8 10 12 14 Skewed Data and Non-parametric Methods Comparing two groups: t-test assumes data are: 1. Normally distributed, and 2. both samples have the same SD (i.e. one sample is simply shifted

More information

Independent samples t-test. Dr. Tom Pierce Radford University

Independent samples t-test. Dr. Tom Pierce Radford University Independent samples t-test Dr. Tom Pierce Radford University The logic behind drawing causal conclusions from experiments The sampling distribution of the difference between means The standard error of

More information

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 13 Introduction to Linear Regression and Correlation Analysis Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

More information

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information