Descriptive and Inferential Statistics

Save this PDF as:

Size: px
Start display at page:

Download "Descriptive and Inferential Statistics"

Transcription

1 General Sir John Kotelawala Defence University Workshop on Descriptive and Inferential Statistics Faculty of Research and Development 14 th May 2013

2 1. Introduction to Statistics 1.1 What is Statistics? In the common usage, `statistics' refers to numerical information. (Here, `Statistics' is the plural of `Statistic', which means one piece of numerical information). For example, Percentage of male nurses in Sri Lanka is 5% Birth rate: births/1,000 population Death rate: 5.92 deaths/1,000 population Infant mortality rate: 9.7 deaths/1,000 live births Life expectancy at birth: male: years female: years GDP (value of all final goods and services produced in a year): \$106.5 billion Unemployment rate (the percent of the labor force that is without jobs) : 5.8% Inflation rate (the annual percent change in consumer prices compared with the previous year's consumer prices): 5.9% (2010 est.) In the more specific sense, `statistics' refers to a field of Study. It has been defined in several ways. For example, Statistics is the study of the collection, organization, analysis, and interpretation of data - Statistics is the mathematical science involved in the application of quantitative principles to the collection, analysis, and presentation of numerical data. Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting numerical data to assist in making more effective decisions Data and Information These words are often used interchangeably. However, there are some differences. Data are the numbers, characters, symbols, images etc., collected in the raw form for analysis whereas information is processed data. Data is unprocessed facts and figures without any added interpretation or analysis. 3

3 Information is data that has been interpreted so that it has meaning for the user. Knowledge is a combination of information, experience and insight that may benefit the individual or the organization. 1.3 Distinguishing between Variables and Data A variable is some characteristic which has different `values' or categories for different units (items/subjects/individuals) Examples of variables on which data are collected at a prenatal clinic. Gender, Ethnicity, Age, Body temperature, Pulse rate, Blood pressure, Fasting blood sugar level, Urine ph value, Income group, Number of children. We collect data on variables. Data are raw numbers or facts that must be processed (analyzed) to get useful information. We get information by processing data. Variable: Age (in years) of patients Data: 31, 42, 34, 33, 41, 45, 35, 39, 28, 41 Information: the mean age is 36.9 years. the percentage of patients above 40 years of age: 40% 1.4 Population and sample Statistics is used for making conclusions regarding a group of units (individuals/items/subjects). Such a group of interest is called a population. In research, the `population' represents a group of units that one wishes to generalize the conclusions to. The populations of interest are usually large. Even though the decisions have to be made pertaining to the population of interest, often it is impossible or very difficult to collect data from the whole population, due to practical constraints on the available money, time and labour etc., or due to the nature of the population. Therefore, often data are collected from only a subset of the population. Such a subset is called a sample. 4

4 1.5 Descriptive Statistics and Inferential Statistics Descriptive Statistics is the branch of Statistics that includes methods of organizing, summarizing and presenting data in an informative way. Commonly used methods are: frequency tables, graphs, and summary measures. Inferential Statistics is the branch of Statistics that includes methods used to make decisions, estimates, predictions, or generalizations about a population, based on a sample. This includes point estimation, interval estimation, tests of hypotheses, regression analysis, time series analysis, multivariate analysis, etc 1.6 Classification of Variables 5

5 Why do we need to know about types of variables? You need to know, in order to evaluate the appropriateness of the statistical techniques used, and consequently whether the conclusions derived from them are valid. In other words, you can't tell whether the results in a particular medical research study are credible unless you know what types of variables or measures have been used in obtaining the data Qualitative Variables The characteristic is a quality. The data are categories. They cannot be given numerical values. However, they may be given numerical labels. Examples: Gender of patient, Ethnicity, income group Quantitative Variables The characteristic is a quantity. The data are numbers. They are obtained by counting or measuring with some scale. Examples: Age, Body temperature, Pulse rate, Blood pressure, Fasting blood sugar level, Urine ph value, Number of children Discrete Variables Quantitative. Usually, the data are counts. There are impossible values between any two possible values. Examples: Pulse rate, Number of children Continuous Variables Quantitative. Usually, the data are obtained by measuring with a scale. There are no impossible values between any two possible values. Any value between any two possible values is also a possible value. Examples: Age, Fasting blood sugar level, Body temperature, Urine ph value 6

6 1.6.5 Scales of measurement Nominal Variables Qualitative No order or ranking in categories. Examples: Gender, Ethnicity Ordinal Variables Qualitative Categories can be ordered or ranked. Examples: income group Interval Variables Quantitative. Data can be ordered or ranked. There is no absolute zero. Zero is only an arbitrary point with which other values can compare. Difference between two numbers is a meaningful numerical value. They are called interval variables because the intervals between the numbers represent something real. This is not the case with ordinal variables. Ratio of two numbers is not a meaningful numerical value. Examples: Temperature Ratio Variables Possesses all the characteristics of an interval variable. There exists an absolute (true) zero. Ratio between different measurements is meaningful. Examples: Age, Pulse rate, Fasting blood sugar level, Number of children 7

7 2. Data Analysis with SPSS Running SPSS for Windows Method 01 Click on the Start button at the lower left of your screen, and among the program listed, find SPSS for windows and select SPSS 16.0 for Windows. Method 02 If there is an SPSS shortcut on the desktop, simply put the cursor on it and double click the left mouse button. Shown below is an image of the screen you will see when SPSS is ready. Menu Bar Tool Bar Start up dialog box Figure 01 8

8 You could select any one of the options on the start-up dialog box and click OK, or you could simply hit Cancel. If you hit Cancel, you can either enter new data in the blank Data Editor or you could open an existing file using the File menu bar as explained later. 2.2 Different Types of Windows in SPSS The Data Editor As shown in figure 01 first you will see start up dialog box listing several options; behind it is the Data Editor. The Data Editor is a worksheet used for entering and editing data. It has two panes, Data editor Variable View Data View Output viewer Syntax editor Script window Naming and defining variables When preparing a new dataset in SPSS, it is required to set the following attributes from the variable view. Move your cursor to the bottom of the Data Editor, where you will see a tab labeled Variable View. Click on that tab. A different grids appears, with these column headings: For each variable we create, we need to specify all or most of the attributes described by these column headings. 9

9 Name Should be a single word. Spaces and special characters (!,?, *, ) are not allowed. Each variable name must be unique; duplication is not allowed. The underscore character is frequently used where a space is desired in names. Type Click within the Type column, and a small gray button marked with three dots will appear; click on it and you ll see this dialog box. Numeric is the default type. (Basically, numeric and string types are preferred for many of the variables.) (For a full description of each of the variable types, click on the Help button.) Width& Decimals Applicable for numeric type of variables. Label This is an optional attribute which can be used for entering a detailed name. Values This option allows user to configure the coding structure for categorical variables. (In the Values column, click on the word None and then click the gray 10

10 box with three dots. This open the value labels dialog box. ) (eg: Type 1 in the value box and type male in the label box. Click Add. Then type 0 in the value, and female in label. Click Add and then click OK. ) Missing The user can assign codes to represent the missing observations. Measure The scale of measurement applicable to variable. Both interval and ratio scales are referred as scale type Entering Data The Data View pane of the Data Editor window is used to enter the data. Displayed initially is an empty spreadsheet with the variable names you have defined appearing as the column headings Saving a Data File On the File menu, choose Save As In the Save in box, select the destination directory that chosen (in our example, we re saving it to the Desktop.). Then give a suitable file name and click Save. 11

11 2.2.2 Output Viewer Display outputs and errors. Extension of the saved file will be spv. 2.3 Reading data to the SPSS Data can be entered directly or it can be imported from a number of different sources. The process for reading data stored in SPSS format data files; spreadsheet application, such as Microsoft Excel is to be covered in the class room session. SPSS format data files are organized by cases (rows) and variables (columns). 12

12 3. Descriptive Analysis of Data Descriptive statistics consists of organizing and summarizing the information collected. Descriptive statistics describes the information collected through numerical measurements, charts, graphs and tables. The main purpose of descriptive statistics is to provide an overview of the information collected. 3.1 Organizing Qualitative Data Recall that qualitative data provide no numerical measures that categorize or classify an individual. When qualitative data are collected, we often interested in determining the number of individuals that occur within each category Tabular Data Summaries A frequency table (frequency distribution) is a listing of the values a variable takes in a data set, along with how often (frequency) each value occurs. Definition 3.1: The frequency is the number of observations in the data set that fall into a particular class. Definition 3.2: The relative frequency is the class frequency divided by the total number of observations in the data set; that is, Relative frequency = Definition 3.3: The percentage is the relative frequency multiplied by 100; that is, Percentage = Relative frequency * 100 Relative frequency is usually more useful than a comparison of absolute frequencies. One- way frequency tables (Simple frequency table) Analyze Descriptive Statistics Frequencies (Select the variable and click OK) 13

13 Table 01: Composition of the sample by activity Note: The Valid Percent column takes into account missing values. For instance, if there was one missing values in this data set, then the valid number of cases would be 91. If that were the case, the valid Percentage of slight category would be 11%. Note that Percent and Valid Percent will both always total to 100%. The Cumulative Percent is a cumulative percentage of the cases for category and all categories listed above it in the table. The cumulative percentages are not meaningful, of course, unless the scale has ordinal properties. 3.2 Cross classification tables Cross classification tables (contingency tables/ two-way tables) display the relationship between two or more categorical (nominal or ordinal) variables. Analyze Descriptive Statistics Crosstabs 14

14 Note: Crosstabs command will not present percentages from its default options. You can add Row, Column and Total percentages as appropriate using Cells option in crosstab command window. Table 02: Composition of the sample by smoke and gender 15

15 3.3 Graphical Presentation for Categorical Data The most effective way to present information is by means of visual display. Graphs are frequently used in statistical analyses both as a means of uncovering patterns in a set of data and as a means of conveying the important information from a survey in a concise and accurate fashion Bar Charts Simple Bar Chart Graphs Legacy Dialogs Bar Choose the options Simple and Summaries for groups of cases Choose the relevant variable as category axis 16

16 Cluster Bar Chart Graphs Legacy Dialogs Bar Choose the options Cluster and Summaries for groups of cases Component Bar Chart (Sub-divided bar diagram) These diagrams show the total of values and its break up into parts. The bar is subdividing into various parts in proportion to the values given in the data and may be drawn on absolute figures or percentages. Each component occupies a part of the bar proportional to its share in the total. To distinguish different components from one another, different colors or shades may be given. When sub-divided bar diagram is drawn on percentage basis it is called percentage bar diagram. The various components should be kept in the same order in each bar. 17

17 Pie Chart SPSS Command Graphs Legacy Dialogs Pie Define 3.2 Organizing Quantitative Data Grouped frequency tables In order to construct a grouped frequency distribution, the numerical variable should be classified first. We can use Recode option in SPSS to perform this classification. One the variable is classified into a different variable, a frequency table can be prepared to present the grouped frequency distribution. SPSS command for Recode (into different variables) Transform Recode in to different variables or Transform Visual binning Graphical Presentation of Numerical Data When presenting and analyzing the behavior of numerical variable, different graphical options such as Histogram, Dot plot, Box plot can be used. SPSS commands Histogram: Graphs Legacy Dialogs Histogram Dot plot: Graphs Legacy Dialogs Scatter/ Dot Simple Dot Define Box plot: Graphs Legacy Dialogs Box plot Simple Define 18

18 3.3 Summary measures SPSS Command Analyze Descriptive Statistics Frequencies Statistics Analyze Descriptive Statistics Descriptives Analyze Descriptive Statistics Explore Central Tendency Mean: Median: It is the value that lies in the middle of the data when arranged in ascending order. That is, half the data are below the median and half the data are above the median. Mode: The mode of a variable is the most frequent observation of the variable that occurs in the data set Measures of Dispersion Range: Difference between the largest data value and the smallest data value. Sample variance: Sample Standard deviation: Inter-Quartile range: measure the spread of a data around the median. The range of middle 50% of the data is called the inter-quartile range. Quartiles Measures of skewness Kurtosis The quartiles of a set of values are the three points that divide the data set into four groups, each representing a fourth of the population being sampled. Skewness is the characteristic that describes the lack of symmetry. Degree of peakeedness of a distribution, usually taken relative to a normal distribution. 19

19 3.4 Scatter Plot When you analyze bi-variate data it is best to start with a suitable graph. In a quantitative bivariate data set, we have a (x; y) pair for each sampling unit, where x denotes the independent variable and y denotes the dependent variable. Each (x; y) pair can be considered as a point on the cartesian plan. Scatter plot is a plot of all the (x; y) pairs in the data set. The purpose of scatter plot is to illustrate diagrammatically any relationship between two quantitative variables. If the variables are related, what kind of relationship it is, linear or nonlinear? If the relationship is linear, the scattergram will show whether it is negative or positive. SPSS Command Graphs Legacy Dialogs Scatter/ Dot Simple Scatter Define 20

20 3.5 Correlation The correlation coefficient, r lies between -1 and +1. When r = 1, it signifies a perfect positive linear relationship When r = -1, it signifies a perfect negative linear relationship The further away r is from 0, the stronger is the correlation. Figure 6.5 shows some examples. SPSS Command Analyse Correlation Bivariate 21

21 4. Fundamentals of Statistical Inference The need for making educated guesses and drawing conclusions regarding some group of units of interest arises in almost every field. Such a group of interest is called a population. In research, the population represents a group of units that you wish to generalize your conclusions to. Even though the decisions have to be made pertaining to the population of interest, often it is impossible or very difficult to collect data from the whole population, due to practical constraints on the available money, time and labour etc., or due to the nature of the population. Therefore, often data are collected from only a subset of the population. Such a subset is called a sample. The process of making educated guess and conclusions regarding a population, using a sample from that population is called a Statistical Inference. Usually this involves collecting suitable data, analyzing data using suitable statistical techniques, measuring the uncertainty of the results and making conclusions. Statistical inference problems usually involve one or more unknown constant related to the population of interest. Such unknown constants are called parameters. For example, the total of the value of variable X for the units of a finite population (which is called the population total), the means of the values of X for the units of a finite population (which is called the population mean), proportion of units with some specified characteristics (which is called the population proportion) and the means of some random variable (which is called the expected value) are some examples for parameters. In addition, we come across parameters in various models like regression models, probability distributions. Often statistical inference problems involve estimation of parameters and test of hypotheses concerning parameters. Estimation can be of the form of point estimation and/or interval estimation. 22

22 4.1 Point Estimation It involves using the sample data to calculate a single number to estimate the parameter of interest. For instance, we might use the sample mean to estimate the population mean μ. The problem is that two different samples are very likely to result in different sample means, and thus there is some degree of uncertainty involved. A point estimate does not provide any information about the inherent variability of the estimator; we do not know how close is to μ in any given situation. While is more likely to be near the true population mean if the sample on which it is based is large. 4.2 Interval Estimation The method is often preferred. The technique provides a range of reasonable values that are intended to contain the parameter of interest, the range of values is called a confidence interval. In interval estimation we derive an interval so that we can say that the parameter lies within the interval with a given level of confidence. 4.3 Terminology and Notation Estimate An approximate value for a parameter, determined using a sample of data is called a point estimate or in short, an estimate Estimator We obtain an estimate by substituting the sample of data in to a formula. Such a formula is called an estimator. An estimator is a function of the data Notation We usually use Greek letters to denote parameters. For example the population mean, population standard deviation, population proportion are usually denoted by µ, σ and θ respectively. 23

23 Example: Suppose that we are interested in estimating the mean µ and the variance σ 2. Let X1, X2, X5 be 5 random observations from this population. Let {3, 5, 2, 1, 2} be one observed sample from this population and {4, 1, 3, 2, 1} be another observed sample from this population. Table 01 illustrates the terms parameters, estimators and estimates. Parameter Estimator Estimate 01 (Using {3, 5, 2, 1, 2}) µ σ 2 Estimate 02 (Using {4, 1, 3, 2, 1}) 4.4 Point Estimation of Population Mean Suppose X is a variable derived on the units of a large population and we are interested in the population mean μ. Suppose we have selected a random sample of n units and we have observed X on those units. Let x 1, x 2, x 3, be the observed values of X. Then = (x 1 + x 2 + x 3 + x n )/n can be used as an approximate value for the population mean. Therefore, we say that the is an estimate for μ. It is a point estimate. In order to estimate the population mean using the sample mean, one of the following options can be used. These were introduced in the previous section. Analyze Descriptive Statistics Frequencies Statistics Analyze Descriptive Statistics Descriptives Analyze Descriptive Statistics Explore Bound on the error of and confidence intervals Usually an estimate is not exactly equal to the parameter. The difference between the actual value of the parameter and the estimate is called the error of the estimate. Since we do not know the actual value of the parameter, we cannot know the exact error in our estimate. However we can place a bound on the error with a known level of confidence. For example, 24

24 using the statistical theory, we may be able to make a statement like we are 95% confident that error of the estimate is less than 75. This is equivalent to saying that we are 95% confident that. This is equivalent to saying that we are 95% confident that. This means, we are 95% confident that is in the interval ). Such a interval is called a 95% confidence interval. 25

25 Computing an Appropriate Confidence interval for a Population Mean Yes Is n 30? No Yes Is the value of σ known? No Yes Is the population Normal? No Use Use the sample standard deviation s to estimate σ and use Or, more correctly Use Is the value of σ known? Use a nonparametric technique Since n is large, there is little difference between these intervals Use Yes Use No Or Increase the sample size at least 30 to develop a confidence interval. 26

26 Small sample from a normal population Example 1 A researcher wish to estimate the average number of heart beats per minute for a certain population. In one such study the following data were obtained from 16 individuals. 77, 92, 93, 77, 98, 81, 76, 71, 100, 87, 88, 86, 97, 95, 81, 96 It is known from past research that the distribution of the number of heart beats per minute among humans is normally distributed. Find 90% confidence interval for the mean. SPSS Command for the interval Estimation of population mean Analyze Descriptive Statistics Explore Note: Use Statistics in Explore command and set the confidence level if it is required to be change. The default confidence level is 95%. 27

27 Interpretation: We are 90% confidence that the mean heart beat level for the population is between ( , ). Interpretation What do we mean by saying that we are 90% confident that the mean heart beat level for the population is between ( , ) Example 02 As reported by the US National Center for Health Statistics, the mean serum high density lipoprotein (HDL) cholesterol of female years old is μ = 53. Dr. Paul wants to estimate the mean serum HDL cholesterol of his years old female patients. He randomly selects 15 of his year old patients and obtains the data as shown. 65, 47, 51, 54, 70, 55, 44, 48, 36, 53, 45, 34, 59, 45, 54 28

28 a) Use the data to compute a point estimate for the population mean serum HDL cholesterol in patients. b) Construct a 95% confidence interval for the mean serum HDL cholesterol for the patients. Interpret the result. Note: In this problem it is not given that the population is normally distributed. Since the sample size is small, we must verify that serum HDL cholesterol is normally distributed. If a population cannot be assumed normal, we must use large sample or nonparametric techniques. However if we can assume that the parent population is normal, then small samples can be handled using the t distribution Assessing normality The assumption of normality is a prerequisite for many inferential statistical techniques. There are a number of different ways to explore this assumption graphically: Histogram Stem-and-leaf plot Boxplot Normal probability plot Furthermore, a number of statistics are available to test normality: Kolmogorov Smirnov statistic, with a Lilliefors significance level and the Shapiro Wilk statistic Skewness Kurtosis Normal probability plots 1. Select the Analyze menu. 2. Click on Descriptive Statistics and then Explore to open the Explore dialogue box. 3. Select the variable you require (i.e HDL) and click on the button to move this 29

29 variable into the Dependent List: box 4. Click on the Plots command pushbutton to obtain the Explore: Plots sub dialogue box. 5. Click on the Normality plots with tests check box, and ensure that the Factor levels together radio button is selected in the Boxplots display. 6. Click on Continue. 7. In the Display box, ensure that Both is activated. 8. Click on the Options command pushbutton to open the Explore: Options subdialogue box. 9. In the Missing Values box, click on the Exclude cases pairwise radio button. If this option is not selected then, by default, any variable with missing data will be excluded from the analysis. That is, plots and statistics will be generated only for cases with complete data. 10. Click on Continue and then OK. Normal Probability Plot In a normal probability plot, each observed value is paired with its expected value from the normal distribution. If the sample is from a normal distribution, then the cases fall more or less in a straight line. 30

30 Kolmogorov-Smirnov and Shapiro-Wilk statistics The Kolmogorov-Smirnov with a Lilliefors significance level for testing normality is produced with the normal probability and detrended probability plots. If the significance level is greater than 0.05 then normality is assumed. Since the conditions are satisfied we can precede with the t test confidence intervals. Large sample from a normal distribution (σ unkown) Example 03 A reacher is interested in obtaining an estimate of the average level of some enzyme in a certain human population. He has taken a sample of 35 individuals and determined the level of the enzyme in each individual. It is known from past research that the distribution of the level of this enzyme among humans is normally distributed. The following are the values 20, 11, 32, 25, 6, 23, 19, 24, 15, 31, 19, 23, 21, 27, 17, 20, 23, 23, 22, 13, 15, 28, 27, 18, 11, 32, 23, 28, 14, 23, 21, 25, 19, 29, 17 Construct a 95% confidence interval for the mean population mean and interpret the result. Large sample from a non-normal distribution, or we do not know data are normally distributed (σ unkown) Example 04 (Pulse data set) 1. Construct a 95% confidence interval for the mean pulse rate of all males 2. Construct a 95% confidence interval for the mean pulse rate of all females 31

31 3. Compare the preceding results. Can we conclude that the population means for males and females are different? Why or Why not? Note: We said that if we do not know σ (which is almost always the case) and the sample size n is large (say at least 30), then we can estimate σ by s in the z-based confidence interval. ( ) It can be argued, however, that because the t-based confidence interval ( ± ) is a statistically correct interval that not requires that we know σ, then it is best, if we do not know σ, to use this interval for any size sample even for a large sample. Most common t- tables give t points for degrees of freedom from 1 to 30, so we would need a more complete t table or computer software package to use the t-based confidence interval for a sample whose size n exceeds 31. For large samples (n > 30), the tradition by-hand approach is to invoke the Central Limit Theorem, to estimate σ using the sample standard deviation (s) and to construct an interval using the normal distribution, but this is just a practical approach from pre-computing days. With software like SPSS, the default presumption is that we don t know σ, and so the Explore command automatically uses the sample standard deviation and builds an interval using the value of the t distribution rather than the normal. However, because these intervals do not differ by much when n is at least 30, it is reasonable, if n is at least 30, to use the large sample, z-based interval as an approximation to the t-based interval. In practice, the values of the normal and t distribution becomes very close when n exceeds

32 5. Hypothesis testing 5.1 Introduction Sometimes, the objective of an investigation is not to estimate a parameter, but instead to decide which of two contradictory statements about the parameter is correct. This is called hypothesis testing. Hypothesis testing typically begin with some theory, claim or assertion about a particular parameter or several parameters. In any hypothesis testing problem, there are two contradictory hypotheses under consideration, one is called the null hypothesis. The other is called the alternative hypothesis. The validity of a hypothesis will be tested by analyzing the sample. The procedure which enables us to decide whether a certain hypothesis is true or not, is called Test of Hypothesis. 5.2 Terminology and Notation Hypothesis: A hypothesis is a statement or claim regarding a characteristic of one or more populations. Test of Hypothesis: The testing of hypothesis is a procedure based on sample evidence and probability, used to test claims regarding a characteristic of one or more populations. Hypothesis testing is based upon two types of hypotheses. The null hypothesis, denoted by H 0 is a statement to be tested. The null hypothesis is assumed true until evidence indicates otherwise. The alternative hypothesis denoted by H 1 is a claim to tested. We are trying to find evidence for the alternative hypothesis. Two - Tailed Left - Tailed Right -Tailed Table

33 Computation of Test Statistics A function of sample observations (i.e. statistic) whose computed value determined the final decision regarding acceptance or rejection of H 0, is called a Test Statistic. The appropriate test statistics has to be chosen very carefully and knowledge of its sampling distribution under H 0 (i.e. when the null hypothesis is true) is essential in framing the decision rule. If the value of the test statistic falls in the critical region, the null hypothesis is rejected. Types of Errors in Hypothesis Testing - Type I and Type II Errors As stated earlier, we use sample data to determine whether to reject or not reject the null hypothesis. Because the decision to reject or not reject the null hypothesis is based upon incomplete (i. e., sample) information, there is always the possibility of making an incorrect decision. In fact, there are four possible outcomes from hypothesis testing. Four Outcomes from Hypothesis Testing Reality H 0 is True H 1 is True Conclusion Table 5.2 Do not Reject H 0 Reject H 0 The Level of Significance The level of significance is the maximum probability of making a type I error and it is denoted by α, α = P (Type I error) = P( rejecting H 0 when H 0 is true) The probability of making a Type I error is chosen by the researcher before the sample data are collected. Traditionally, 0.01, 0.05 or 0.1 are taken as α Critical Region or Rejection Region The rejection region or critical region is the region of the standard normal curve corresponding to a predetermined level of significance α. The region under the normal curve which is not covered by the rejection region is known as Acceptance Region. Thus the 34

34 statistic which leads to rejection of null hypothesis H0 gives us the region known as Rejection region or Critical region. The value of the test statistic compute to test the null hypothesis H0 is known as the Critical Value. The Critical value separates the rejection region from the acceptance region. Two - Tailed Left - Tailed Right - Tailed Table 5.3 Methods for making conclusion Method 01: Compare the critical value with the test statistic: Two Tailed Left Tailed Right tailed Table

35 Method 02: Compare the p - value with the significance level: Two Tailed Left Tailed Right tailed Table 5.5 Power The probability of rejecting a wrong null hypothesis is called the power of the test. The probability of committing type ii error is denoted by ß. Power = 1-ß 5.3 Formulating a hypothesis It is ideal if a test can be derived such that both errors are minimized simultaneously. However, it may not be possible with the available data. Instead, we consider tests for which the probability of one error is controlled. Conventionally, the type I error is controlled. Usually, out of the two errors, one error is more serious than the other. In such situations it is reasonable to minimize the probability of the more serious error. In order to achieve this, the hypothesis is constructed so that the more serious error will be the type I error. An alternative way is to take the initially favored claim as the null hypothesis. The initially favored claim will not be rejected in favor of the alternative unless sample evidence contradicts it and provides strong support for the assertion. If one of the hypothesis is an equality and the other is an inequality, then the equality hypothesis is taken to be the null hypothesis. 36

36 5.4 Steps in test of hypothesis 1. Set up the Null Hypothesis H 0 and the Alternative Hypothesis H State the appropriate test statistic and also its sampling distribution when the null hypothesis is true. 3. Select the level of significance α of the test, if it is not specified in the given problem. 4. Find the critical region of the test at the chosen level of significance. 5. Compute the value of the test statistic on the basis of sample data null hypothesis. 6. If the computed value of test statistic lies in the critical region reject H 0 otherwise do not reject H Write the conclusion in plain non-technical language. 37

37 5.5 One Sample Hypothesis Tests about Population Mean Selecting an Appropriate Test Statistic to Test a Hypothesis about a Population Mean Yes Is n 30? No Yes Is the value of σ known? No Yes Is the population Normal? No Use Z = Use the sample standard deviation s to estimate σ and use Z = Or, more correctly Use Is the value of σ known? Use a nonparametric technique t = Since n is large, there is little difference between these tests Use Z = Yes Use No Or Increase the sample size at least 30 to conduct parametric hypothesis test t = 38

38 5.5.1 A small sample two sided hypothesis Example 5.1 File: ph.sav An engineer wants to measure the bias in a ph meter. She uses the meter to measure the ph in 14 neutral substances (ph = 7) and obtains the data obtained below Is there sufficient evidence to support the claim that the ph meter is not correctly calibrated at the α = 0.05 level of significance? Approach: In this case, we have only sixteen observations, meaning that the Central Limit Theorem does not apply. With a small sample, we should only use the t test if we can reasonably assume that the parent population is normally distributed. In this problem also since the sample size is small before proceeding to test, we must verify that ph is normally distributed. Hypothesis to be tested H 0 : Data are normally distributed. H 1 : Data are not normally distributed. Analyze Descriptive Statistics Explore 39

39 According to the Kolmogorov- Smirnov p-value 0.2 > Hence we do not reject H 0 under 0.05 level of significance.we can conclude data are normally distributed. Since the conditions are satisfied we can proceed with the t test. Hypothesis to be tested:. To conduct a one-sample t-test 1. Select the Analyze menu. 2. Click on Compare Means and then One-Sample T Test to open the One-Sample T Test dialogue box. 3. Select the variable you require (i.e. ph) and click on the button to move the variable into the Test Variable(s): box. 4. In the Test Value: box type the mean score (i.e. 7). 40

40 5. Click on OK. Calculated value of the test Statistic P-value Note: In SPSS a Column labeled Sig. (usually two tailed Sig.) displays the p-value of a particular Hypothesis test. Decision:.. Conclusion:

41 Note: Performing One-tail Tests using One-Sample T Test Procedure The One Sample T-test procedure in SPSS is designed to test two-tail hypothesis. However, a researcher may need to test a one-tail (left tail or right tail) hypothesis. In this situation the p- value for the corresponding test has to be computed using the following criteria. 1. For left-tail tests(i.e. H 1 : μ < ) If the sample mean is less than (i.e. t < 0) then, p-value = Sig/2 Otherwise, p-value = 1-Sig/2 2. For right-tail tests(i.e. H 1 : μ > ) If the sample mean is greater than (i.e. t > 0) then, p-value = Sig/2 Otherwise, p-value = 1-Sig/2 Example 5.2 In a study conducted by the U.S. Department of Agriculture, it was found that the mean daily caffeine intake of year old female in 2010 was milligrams. A nutritionist claims that the mean daily caffeine intake has increased since then. She obtains a simple random sample of 35 females between 20 and 29 years of age and determines their daily caffeine intakes. The results are presented in caffine.sav. Test the nutritionist s claim at the α = 0.05 level of significance. Approach: The dataset represents a large sample (n=35), so we can rely on the Central Limit Theorem to assert that the sampling distribution is approximately normal. Hypothesis:. P-value: Decision:.. Conclusion: 42

42 Non Parametric Binomial Test for the One-Sample Test procedure The Binomial Test procedure compares an observed proportion of cases to the propotion expected under a binomial distribution with a specified probability parameter. The observed proportion is defined either by the number of cases having the first value of a dichotomous (a variable that has two possible values) variable or by the number of cases at or below a given cut point on a scale (quantitative) variable. Hypothesis (to be tested on a quantitative variable) H0: median = m 0 vs, H1: median m 0 SPSS command Analyze Nonparametric Binomial Test Note: Set the cut point to the hypothesized median value. 43

43 6. Inferences on Two Samples In the preceding chapter, we used a statistical test of hypothesis to compare the unknown mean, proportion of a single population to some fixed known value. In practical applications however, it is far more common to compare the means of two different populations, where both parameters are unknown. In order to perform inference on the difference of two population means, we must first determine whether the data come from an independent or dependent sample. Samples are independent when he individuals selected for one sample do not dictate which individuals are to be in second sample. Samples are dependent when the individuals selected to be in one sample are used to determine the individuals to be in the second sample. 6.1 Testing hypotheses concerning two populations means μ 1 and μ 2 : Dependent Samples Let (x 1, y 1 ), (x 2, y 2 ), (x 3, y 3 ),. ( x n, y n ) be a random sample of paired observations. Suppose that x s are identically distributed with population mean and population variance μ 1 and respectively. Also suppose that y s are identically distributed with population mean and population variance μ 2 and respectively. Let μ d be a known constant. Consider the following hypotheses: Two-Tailed Left-Tailed Right-Tailed H 0 : H 0 : H 0 : H 1 : H 1 : H 1 : Rather than consider the two sets of observations to be distinct samples, we focus on the difference in measurements within each pair. Suppose that our two groups observations are as follows: 44

44 Sample 01 Sample 02 Differences within each pair x 11 x 21 x 31 x n1 x 12 x 22 x 32 x n2 d 1 = x 11 x 12 d 2 = x 21 x 22 d 3 = x 31 x 32. d n = x n1 x n2 = - ) 2 If differences are normally distributed or the sample size n is large, The test statistic is, U = Compare the critical value with the test statistic, using the guideline below Two - tailed Left - Tailed Right - Tailed If U < or U >,n-1 If U <,n-1 If U > reject the null hypothesis reject the null hypothesis reject the null hypothesis Confidence Interval for Matched Pairs Data We can also create a confidence interval for the mean difference, using the sample mean difference, the sample standard difference s d, the sample size and. Remember, the format for a confidence interval about population mean is of the following form: Point estimate ± Margin of error Based on the preceding formula we compute the confidence interval about as follows: 45

45 (1-α) 100% confidence interval for is given by SPSS Command Command for Paired - Samples T test Analyze Compare Means Paired Samples T Test Example 6.1 A dietitian hopes to reduce a person s cholesterol level by using a special diet supplemented with a combination of vitamin pills. Six (6) subjects were pre-tested and then placed on diet for two weeks. Their cholesterol levels were checked after the two week period. The results are shown below. Cholesterol levels are measured in milligrams per deciliter. 2.1 Test the claim that the Cholesterol level before the special diet is greater than the Cholesterol level after the special diet at α = 0.01 level of significance. 2.2 Construct 99% confidence interval for the difference in mean cholesterol levels. Assume that the cholesterol levels are normally distributed both before and after. Subject Before After Example 6.2 A physician is evaluating a new diet for patients with a family history of heart disease. To test the effectiveness of this diet, 16 patients are placed on the diet for 6 months. Their weights are measured before and after the study, and the physician wants to know if either set of measurements has changed. Test whether there are statistically significant differences between the pre and post-diet of these patients. Use 5% level of significant. Step 01: Calculating differences 46

46 Transform Compute Variable Step 02: Because the sample size is small, we must verify that difference data normally distributed. Analyze Descriptive Statistics Explore Note: Use Plots in Explore command and set Normality plots with test Step 03: Command for Paired - Samples T test Analyze Compare Means Paired Samples T Test 6.4 Performing One tail Tests using Paired Samples T Test procedure The Paired Samples T Test procedure in SPSS is designed to test two-tail hypothesis. However, a researcher may need to test a one tail (left-tail or right-tail) hypothesis. In this situation the p-value for the corresponding test has to be computed using the following criteria. 1. For left-tail tests (i.e. < 0) If the sample mean of differences is less than 0 (i.e t < 0) then, p-value = Sig/2. Otherwise, p-value = 1 Sig/2 47

47 2. For right-tail tests (i.e. > 0) If the sample mean of differences is greater than 0 (i.e t > 0) then, p-value = Sig/2. Otherwise, p-value = 1 Sig/2 Example: If a researcher tries to find whether post-diet weights have been significantly increased, determine the p-value and state your findings at 5% level of significance. 6.5 Nonparametric Wilcoxon Test for Two Related Samples Hypothesis H0: = 0 vs, H1: 0 SPSS command Analyze Nonparametric 2 Related Samples Note: Ensure that Wilcoxon is checked in the Test Type dialog box. 6.6 Testing hypotheses concerning two population means μ 1 and μ 2 : Independent samples Let x 1, x 2, x 3,.x m be a random sample of observations from a certain population with population mean and population variance μ 1 and respectively. Also let y 1, y 2, y n be a random sample of observations from a certain population with population mean and population variance μ 2 and respectively. Further suppose that two samples are independent. Let μ d be a known constant. Consider the following hypotheses: Two-Tailed Left-Tailed Right-Tailed H 0 : H 0 : H 0 : H 1 : H 1 : H 1 : 48

48 Case 01: Data from normal distributions, both variances are known The test statistic is, U = Compare the critical value with the test statistic, using the guideline below Two - Tailed Left - Tailed Right - Tailed If U < or U > reject the null hypothesis If U < reject the null hypothesis If U > reject the null hypothesis Case 02: Data from two normal distributions with unequal variances ( variances are unknown, m and n are small ), both The test statistic is, U = Compare the critical value with the test statistic, using the guideline below Two - tailed Left - Tailed Right - Tailed If Ucal < or t > If U cal <,ν If U cal > reject the null hypothesis reject the null hypothesis reject the null hypothesis Where ν = 49

49 (1-α)100% Confidence Interval about the Difference of Two Means ( ) ± Case 03: Data normal, both variances are unknown, but known that they are equal. = = = 2 = 2 Also let = The test statistic is, U = Compare the critical value with the test statistic, using the guideline below Two - tailed Left - Tailed Right - Tailed If Ucal < or Ucal> If U cal <,m+n-2 If U cal > reject the null hypothesis reject the null hypothesis reject the null hypothesis (1-α)100% Confidence Interval about the Difference of Two Means ( ) ± SPSS Command for the Independent-Samples T test Analyze Compare Means Independent Samples T Test Note: On Define Groups option, apply relevant codes of the groups to be compared. 50

50 6.6.1 Performing One tail Tests using Independent Samples T Test procedure The Independent Samples T Test procedure in SPSS is designed to test two-tail hypothesis. However, a researcher may need to test a one tail (left-tail or right-tail) hypothesis. In this situation the p-value for the corresponding test has to be computed using the following criteria. 1. For left-tail tests (i.e. < ) If the sample mean of differences is less than 0 (i.e t < 0) then, p-value = Sig/2. Otherwise, p-value = 1 Sig/2 2. For right-tail tests (i.e. > ) If the sample mean of differences is greater than 0 (i.e t > 0) then, p-value = Sig/2. Otherwise, p-value = 1 Sig/2 6.7 The Nonparametric Mann Whitney U Test for Two Independent Samples What should you do if the t test assumptions are markedly violated (e.g., what if the response variable is not normal?) One answer is to run the appropriate nonparametric test, which in this case called the Mann Whitney (M-W) U test. Hypothesis H0: = vs, H1: SPSS command Analyze Nonparametric 2 Independent Samples Note: Ensure that Mann Whitney U test is checked. On Define Groups option, apply relevant codes of the groups to be compared. 51

51 Example 6.3: The purpose of a study by Eidelman et al. was to investigate the nature of lung destruction in cigarette smokers before the development of marked emphysema. Three lung destructive index measurements were made on the lungs of lifelong nonsmokers and smokers who died suddenly outside the hospital of nonrespiratory causes. A large score indicates greater lung damage. For one of the indexes the scores yielded by the lungs of a sample of nine nonsmokers and a sample of 16 smokers are shown in Table 02. We wish to know if we may conclude, on the basis of these data, that smoker, in general, have greater lung damage as measured by this destructive index than do smokers. Nonsmokers Smokers Example 6.4: Researchers wished to know if they could conclude that two populations of infants differ with respect to mean age at which they walked alone. The following data (age in months) were collected: Sample from population A: 9.5, 10.5, 9.0, 9.75, 10.0, 13.0, 10.0, 13.5, 10.0, 9.5, 10.0, 9.75 Sample from population B: 12.5, 9.5, 13.5, 13.75, 12.0, 13.75, 12.5, 9.5, 12.0, 13.5, 12.0,

52 7. Comparison Multiple Groups In the preceding chapter, we covered techniques for determining whether a difference exits between the means of two independent populations. It is not unusual, however, to encounter situations in which we wish to test for differences among three or more independent means rather than just two. The extension of the two sample t test to three or more samples is known as the Analysis of Variance or ANOVA for short. Definition: Analysis of Variance (ANOVA) is an inferential method that is used to test the equality of three or more population means. 7.1 One- Way Analysis of Variance It is the simplest type of analysis of variance. The one-way analysis of variance is a form of design and subsequent analysis utilized when the data can be classified into k categories or levels of a single factor, and the equality of the k class means in the population is to be investigated. For example, five fertilizers are applied to four plots each of wheat and yield of wheat on each of the plot is given. We may be interested in finding out whether the effect of these fertilizers on the yield is significantly different or in other words, whether the samples have come from the same normal population. The answer to this problem is problem is provided by the technique of analysis of variance. The basic purpose of the variance is to test the homogeneity of several means. In order to perform ANOVA test, certain requirements must be satisfied. 7.2 Requirements of ANOVA Test 1. Independent random samples have been taken from each population. 2. The populations are normally distributed. 3. The population variances are all equal. 7.3 The Hypothesis test of Analysis of Variance H 0 : H 1 : At least one of the population means differs from the others 53

53 7.4 Decomposition of Total Sum of Squares The name analysis of variance is derived from a partitioning of total variability into its component parts. Let y ij is the j th observation of i th factor level. The data collected under the factor levels can be represented as follows. Group (Factor Level/ Treatment) k Number of observations mean variance n 1 n 2 n 3. n k Grand mean ( ) = = The total variation present in the data is measured by the sum of squares of all these deviations. Thus Total Sum of Squares (SSTo) = The total variation in the observation can be split into the following two components. 1. The variation between the classes or the variation due to different bases of classification, commonly known as treatments. 2. The variation within the classes, i.e, the inherent variation of the random variable within the observation of a class. This variation is due to chance causes which are beyond the control of human hand. 54

54 The sum of squares due to differences in the treatment means is called the treatment sum of squares or between sums of squares and is given by the expression. Sum of squares of the differences between treatments = or Treatment Sum of Squares (SSTr) The sum of squares due to inherent variabilities in the experiment material is called the Sum of Squares of the differences within the treatment. Sum of squares of differences within the treatment(sse) = It can be shown that = + Total sum of squares = Sum of squares between treatments + Sum of squares within treatments (SSTo) (SSTr) (SSE) 7.5 The Mean Squares In finding the average squared deviations due to treatment and to error, we divide each sum of squares by its degrees of freedom. We call the two resulting averages mean square treatment (MSTr) and mean square error (MSE), respectively. The number of degrees of freedom associated with SSTr = k-1 MSTr = The number of degrees of freedom associated with SSE = n- k MSE = The Expected Values of the Statistics MSTr and MSE under the null hypothesis E(MSE) =.(1) 55

55 E (MSTr) = +. (2) - mean of population i μ combined mean of all k population When the null hypothesis of ANOVA is true and all K population means are equal MSTr and MSE are two independent, unbiased estimators of the common population variance. In on the other hand, the null hypothesis is not true and differences do exist among k population means, then MSTr will tend to be larger than MSE. This happens because, when not all population means are equal, the second term in eq 2 is a positive number. 7.6 The test statistic in analysis of variance Under the assumption of ANOVA the ratios MSTr/ MSE possesses an F distribution with k-1 degrees of freedom for the numerator and n-k degrees of freedom for the denominator when the null hypothesis is true. Decision rule If > reject H 0 Alternatively p-value = Pr (F > ) under the distribution Thus reject H 0 if p value < α (level of significance) ANOVA Table Source of Sum of Degrees of Mean F test p-value variation Squares freedom Squares statistics Treatment SSTr k - 1 MSTr F = Pr( F > ) Error SSE n - k MSE Total SSTo n

56 Example 7.1 A family doctor claims that the mean HDL cholesterol levels of males in the age groups years old, years old and years old are equal. He obtains a simple random sample of 12 individuals from each group and determines their HDL cholesterol level. The results are presented in table 7.1 Table years old years old years old Approach: We must verify the requirements 1. As was stated in the problem, the data were collected using random sampling method. 2. None of the subjects selected are related any way. So the samples are independent. 3. Normality test suggest sample data come from populations that are normally distributed(by using the normality test). Because all requirements are satisfied, we can perform a one way ANOVA. Hypothesis:.. 57

57 Decision:. Conclusion:.. Example 7.2 An experimenter wished to study the effect of 5 fertilizers on the yield of crop. He divided the field into 45 plots and assigned each fertilizer at random to 9 plots. Data in table 4 represent the number of pods on soyabean plants for various plot types. Fertilizer Pods A B C D E Test at the 5% level to see whether the fertilizers differed significantly. Part 01: Hypothesis:.. Decision: Conclusion:.. Part 02: Where are the differences? After performing a one-factor independent measures ANOVA and finding out that the results are significant, we know that the means are not all the same. This relatively simple conclusion, however, actually raises more questions? Is different than? Are all five 58

58 means different? Post hoc provide answer to these questions whenever we have a significant ANOVA result? There are many different kinds of post-hoc tests, that examine which means are different from each other: One commonly used procedure is Tukey s Honestly Significant Difference Test. SPSS Command Analyze Compare Means One Way ANOVA The variables are still selected, as earlier. Click on Post Hoc and select only Tukey, as shown here:... 59

SPSS for Exploratory Data Analysis Data used in this guide: studentp.sav (http://people.ysu.edu/~gchang/stat/studentp.sav)

Data used in this guide: studentp.sav (http://people.ysu.edu/~gchang/stat/studentp.sav) Organize and Display One Quantitative Variable (Descriptive Statistics, Boxplot & Histogram) 1. Move the mouse pointer

More information

January 26, 2009 The Faculty Center for Teaching and Learning

THE BASICS OF DATA MANAGEMENT AND ANALYSIS A USER GUIDE January 26, 2009 The Faculty Center for Teaching and Learning THE BASICS OF DATA MANAGEMENT AND ANALYSIS Table of Contents Table of Contents... i

More information

Module 9: Nonparametric Tests. The Applied Research Center

Module 9: Nonparametric Tests The Applied Research Center Module 9 Overview } Nonparametric Tests } Parametric vs. Nonparametric Tests } Restrictions of Nonparametric Tests } One-Sample Chi-Square Test

More information

IBM SPSS Statistics for Beginners for Windows

ISS, NEWCASTLE UNIVERSITY IBM SPSS Statistics for Beginners for Windows A Training Manual for Beginners Dr. S. T. Kometa A Training Manual for Beginners Contents 1 Aims and Objectives... 3 1.1 Learning

More information

Data Analysis Tools. Tools for Summarizing Data

Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool

More information

Technology Step-by-Step Using StatCrunch

Technology Step-by-Step Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate

More information

Statistics Review PSY379

Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses

More information

An introduction to IBM SPSS Statistics

An introduction to IBM SPSS Statistics Contents 1 Introduction... 1 2 Entering your data... 2 3 Preparing your data for analysis... 10 4 Exploring your data: univariate analysis... 14 5 Generating descriptive

More information

IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA

CALIFORNIA STATE UNIVERSITY, LOS ANGELES INFORMATION TECHNOLOGY SERVICES IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA Summer 2013, Version 2.0 Table of Contents Introduction...2 Downloading the

More information

Using SPSS, Chapter 2: Descriptive Statistics

1 Using SPSS, Chapter 2: Descriptive Statistics Chapters 2.1 & 2.2 Descriptive Statistics 2 Mean, Standard Deviation, Variance, Range, Minimum, Maximum 2 Mean, Median, Mode, Standard Deviation, Variance,

More information

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition Online Learning Centre Technology Step-by-Step - Excel Microsoft Excel is a spreadsheet software application

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

More information

DATA INTERPRETATION AND STATISTICS

PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

More information

MINITAB ASSISTANT WHITE PAPER

MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

Foundation of Quantitative Data Analysis

Foundation of Quantitative Data Analysis Part 1: Data manipulation and descriptive statistics with SPSS/Excel HSRS #10 - October 17, 2013 Reference : A. Aczel, Complete Business Statistics. Chapters 1

More information

A Guide for a Selection of SPSS Functions

A Guide for a Selection of SPSS Functions IBM SPSS Statistics 19 Compiled by Beth Gaedy, Math Specialist, Viterbo University - 2012 Using documents prepared by Drs. Sheldon Lee, Marcus Saegrove, Jennifer

More information

1 SAMPLE SIGN TEST. Non-Parametric Univariate Tests: 1 Sample Sign Test 1. A non-parametric equivalent of the 1 SAMPLE T-TEST.

Non-Parametric Univariate Tests: 1 Sample Sign Test 1 1 SAMPLE SIGN TEST A non-parametric equivalent of the 1 SAMPLE T-TEST. ASSUMPTIONS: Data is non-normally distributed, even after log transforming.

More information

Variables and Data A variable contains data about anything we measure. For example; age or gender of the participants or their score on a test.

The Analysis of Research Data The design of any project will determine what sort of statistical tests you should perform on your data and how successful the data analysis will be. For example if you decide

More information

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

t-tests in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com www.excelmasterseries.com

More information

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

More information

Analyzing Research Data Using Excel

Analyzing Research Data Using Excel Fraser Health Authority, 2012 The Fraser Health Authority ( FH ) authorizes the use, reproduction and/or modification of this publication for purposes other than commercial

More information

MBA 611 STATISTICS AND QUANTITATIVE METHODS

MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain

More information

Data exploration with Microsoft Excel: analysing more than one variable

Data exploration with Microsoft Excel: analysing more than one variable Contents 1 Introduction... 1 2 Comparing different groups or different variables... 2 3 Exploring the association between categorical

More information

Bill Burton Albert Einstein College of Medicine william.burton@einstein.yu.edu April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

Bill Burton Albert Einstein College of Medicine william.burton@einstein.yu.edu April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1 Calculate counts, means, and standard deviations Produce

More information

Outline of Topics. Statistical Methods I. Types of Data. Descriptive Statistics

Statistical Methods I Tamekia L. Jones, Ph.D. (tjones@cog.ufl.edu) Research Assistant Professor Children s Oncology Group Statistics & Data Center Department of Biostatistics Colleges of Medicine and Public

More information

SPSS: Descriptive and Inferential Statistics. For Windows

For Windows August 2012 Table of Contents Section 1: Summarizing Data...3 1.1 Descriptive Statistics...3 Section 2: Inferential Statistics... 10 2.1 Chi-Square Test... 10 2.2 T tests... 11 2.3 Correlation...

More information

SPSS Manual for Introductory Applied Statistics: A Variable Approach

SPSS Manual for Introductory Applied Statistics: A Variable Approach John Gabrosek Department of Statistics Grand Valley State University Allendale, MI USA August 2013 2 Copyright 2013 John Gabrosek. All

More information

Statistical Significance and Bivariate Tests

Statistical Significance and Bivariate Tests BUS 735: Business Decision Making and Research 1 1.1 Goals Goals Specific goals: Re-familiarize ourselves with basic statistics ideas: sampling distributions,

More information

Directions for using SPSS

Directions for using SPSS Table of Contents Connecting and Working with Files 1. Accessing SPSS... 2 2. Transferring Files to N:\drive or your computer... 3 3. Importing Data from Another File Format...

More information

IBM SPSS Statistics 20 Part 1: Descriptive Statistics

CALIFORNIA STATE UNIVERSITY, LOS ANGELES INFORMATION TECHNOLOGY SERVICES IBM SPSS Statistics 20 Part 1: Descriptive Statistics Summer 2013, Version 2.0 Table of Contents Introduction...2 Downloading the

More information

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

More information

Describing, Exploring, and Comparing Data

24 Chapter 2. Describing, Exploring, and Comparing Data Chapter 2. Describing, Exploring, and Comparing Data There are many tools used in Statistics to visualize, summarize, and describe data. This chapter

More information

Projects Involving Statistics (& SPSS)

Projects Involving Statistics (& SPSS) Academic Skills Advice Starting a project which involves using statistics can feel confusing as there seems to be many different things you can do (charts, graphs,

More information

MAT 12O ELEMENTARY STATISTICS I

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE MAT 12O ELEMENTARY STATISTICS I 3 Lecture Hours, 1 Lab Hour, 3 Credits Pre-Requisite:

More information

Introduction to Statistics and Quantitative Research Methods

Introduction to Statistics and Quantitative Research Methods Purpose of Presentation To aid in the understanding of basic statistics, including terminology, common terms, and common statistical methods.

More information

An Introduction to SPSS. Workshop Session conducted by: Dr. Cyndi Garvan Grace-Anne Jackman

An Introduction to SPSS Workshop Session conducted by: Dr. Cyndi Garvan Grace-Anne Jackman Topics to be Covered Starting and Entering SPSS Main Features of SPSS Entering and Saving Data in SPSS Importing

More information

Inferential Statistics

Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

More information

Using Excel for inferential statistics

FACT SHEET Using Excel for inferential statistics Introduction When you collect data, you expect a certain amount of variation, just caused by chance. A wide variety of statistical tests can be applied

More information

Northumberland Knowledge

Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about

More information

SPSS TUTORIAL & EXERCISE BOOK

UNIVERSITY OF MISKOLC Faculty of Economics Institute of Business Information and Methods Department of Business Statistics and Economic Forecasting PETRA PETROVICS SPSS TUTORIAL & EXERCISE BOOK FOR BUSINESS

More information

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi - 110 012 seema@iasri.res.in 1. Descriptive Statistics Statistics

More information

Introduction to Regression and Data Analysis

Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

More information

Exploratory data analysis (Chapter 2) Fall 2011

Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,

More information

IBM SPSS Direct Marketing 23

IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release

More information

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

KSTAT MINI-MANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To

More information

4. Descriptive Statistics: Measures of Variability and Central Tendency

4. Descriptive Statistics: Measures of Variability and Central Tendency Objectives Calculate descriptive for continuous and categorical data Edit output tables Although measures of central tendency and

More information

Instructions for SPSS 21

1 Instructions for SPSS 21 1 Introduction... 2 1.1 Opening the SPSS program... 2 1.2 General... 2 2 Data inputting and processing... 2 2.1 Manual input and data processing... 2 2.2 Saving data... 3 2.3

More information

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members

More information

IBM SPSS Direct Marketing 22

IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release

More information

The Dummy s Guide to Data Analysis Using SPSS

The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests

More information

Chapter 7 Section 7.1: Inference for the Mean of a Population

Chapter 7 Section 7.1: Inference for the Mean of a Population Now let s look at a similar situation Take an SRS of size n Normal Population : N(, ). Both and are unknown parameters. Unlike what we used

More information

UNDERSTANDING THE INDEPENDENT-SAMPLES t TEST

UNDERSTANDING The independent-samples t test evaluates the difference between the means of two independent or unrelated groups. That is, we evaluate whether the means for two independent groups are significantly

More information

Once saved, if the file was zipped you will need to unzip it.

1 Commands in SPSS 1.1 Dowloading data from the web The data I post on my webpage will be either in a zipped directory containing a few files or just in one file containing data. Please learn how to unzip

More information

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals

More information

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

More information

Biostatistics: Types of Data Analysis

Biostatistics: Types of Data Analysis Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa A Scott, MS

More information

Recall this chart that showed how most of our course would be organized:

Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

More information

Exercise 1.12 (Pg. 22-23)

Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

When to use Excel. When NOT to use Excel 9/24/2014

Analyzing Quantitative Assessment Data with Excel October 2, 2014 Jeremy Penn, Ph.D. Director When to use Excel You want to quickly summarize or analyze your assessment data You want to create basic visual

More information

Descriptive Statistics

Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Data Analysis: Describing Data - Descriptive Statistics

WHAT IT IS Return to Table of ontents Descriptive statistics include the numbers, tables, charts, and graphs used to describe, organize, summarize, and present raw data. Descriptive statistics are most

More information

Introduction to SPSS. BEFORE YOU BEGIN, PLEASE ENSURE YOU HAVE DOWNLOADED THE SAMPLE DATA FILE USED IN THIS GUIDE: SPSSsampledata.

Introduction to SPSS This document will guide you through a general introduction to the SPSS interface as well as some of the basic functions and commands you would be likely to perform in SPSS. BEFORE

More information

Diagrams and Graphs of Statistical Data

Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in

More information

Introduction Course in SPSS - Evening 1

ETH Zürich Seminar für Statistik Introduction Course in SPSS - Evening 1 Seminar für Statistik, ETH Zürich All data used during the course can be downloaded from the following ftp server: ftp://stat.ethz.ch/u/sfs/spsskurs/

More information

Chi-Square Test. Contingency Tables. Contingency Tables. Chi-Square Test for Independence. Chi-Square Tests for Goodnessof-Fit

Chi-Square Tests 15 Chapter Chi-Square Test for Independence Chi-Square Tests for Goodness Uniform Goodness- Poisson Goodness- Goodness Test ECDF Tests (Optional) McGraw-Hill/Irwin Copyright 2009 by The

More information

SPSS Tests for Versions 9 to 13

SPSS Tests for Versions 9 to 13 Chapter 2 Descriptive Statistic (including median) Choose Analyze Descriptive statistics Frequencies... Click on variable(s) then press to move to into Variable(s): list

More information

SPSS Explore procedure

SPSS Explore procedure One useful function in SPSS is the Explore procedure, which will produce histograms, boxplots, stem-and-leaf plots and extensive descriptive statistics. To run the Explore procedure,

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

Nonparametric Two-Sample Tests. Nonparametric Tests. Sign Test

Nonparametric Two-Sample Tests Sign test Mann-Whitney U-test (a.k.a. Wilcoxon two-sample test) Kolmogorov-Smirnov Test Wilcoxon Signed-Rank Test Tukey-Duckworth Test 1 Nonparametric Tests Recall, nonparametric

More information

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone:

More information

Simple Predictive Analytics Curtis Seare

Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

More information

Module 5 Hypotheses Tests: Comparing Two Groups

Module 5 Hypotheses Tests: Comparing Two Groups Objective: In medical research, we often compare the outcomes between two groups of patients, namely exposed and unexposed groups. At the completion of this

More information

Tutorial 5: Hypothesis Testing

Tutorial 5: Hypothesis Testing Rob Nicholls nicholls@mrc-lmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................

More information

Statistics and research

Statistics and research Usaneya Perngparn Chitlada Areesantichai Drug Dependence Research Center (WHOCC for Research and Training in Drug Dependence) College of Public Health Sciences Chulolongkorn University,

More information

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.1-1.6) Objectives

More information

F. Farrokhyar, MPhil, PhD, PDoc

Learning objectives Descriptive Statistics F. Farrokhyar, MPhil, PhD, PDoc To recognize different types of variables To learn how to appropriately explore your data How to display data using graphs How

More information

NCSS Statistical Software

Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

An SPSS companion book. Basic Practice of Statistics

An SPSS companion book to Basic Practice of Statistics SPSS is owned by IBM. 6 th Edition. Basic Practice of Statistics 6 th Edition by David S. Moore, William I. Notz, Michael A. Flinger. Published by

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

The Chi-Square Test. STAT E-50 Introduction to Statistics

STAT -50 Introduction to Statistics The Chi-Square Test The Chi-square test is a nonparametric test that is used to compare experimental results with theoretical models. That is, we will be comparing observed

More information

NCSS Statistical Software

Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

More information

How to Conduct a Hypothesis Test

How to Conduct a Hypothesis Test The idea of hypothesis testing is relatively straightforward. In various studies we observe certain events. We must ask, is the event due to chance alone, or is there some

More information

INTRODUCTION TO SPSS FOR WINDOWS Version 19.0

INTRODUCTION TO SPSS FOR WINDOWS Version 19.0 Winter 2012 Contents Purpose of handout & Compatibility between different versions of SPSS.. 1 SPSS window & menus 1 Getting data into SPSS & Editing data..

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Final Exam Review MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) A researcher for an airline interviews all of the passengers on five randomly

More information

Using Excel for Statistics Tips and Warnings

Using Excel for Statistics Tips and Warnings November 2000 University of Reading Statistical Services Centre Biometrics Advisory and Support Service to DFID Contents 1. Introduction 3 1.1 Data Entry and

More information

MEASURES OF LOCATION AND SPREAD

Paper TU04 An Overview of Non-parametric Tests in SAS : When, Why, and How Paul A. Pappas and Venita DePuy Durham, North Carolina, USA ABSTRACT Most commonly used statistical procedures are based on the

More information

Research Variables. Measurement. Scales of Measurement. Chapter 4: Data & the Nature of Measurement

Chapter 4: Data & the Nature of Graziano, Raulin. Research Methods, a Process of Inquiry Presented by Dustin Adams Research Variables Variable Any characteristic that can take more than one form or value.

More information

Survey Research Data Analysis

Survey Research Data Analysis Overview Once survey data are collected from respondents, the next step is to input the data on the computer, do appropriate statistical analyses, interpret the data, and

More information

TIPS FOR DOING STATISTICS IN EXCEL

TIPS FOR DOING STATISTICS IN EXCEL Before you begin, make sure that you have the DATA ANALYSIS pack running on your machine. It comes with Excel. Here s how to check if you have it, and what to do if you

More information

An introduction to using Microsoft Excel for quantitative data analysis

Contents An introduction to using Microsoft Excel for quantitative data analysis 1 Introduction... 1 2 Why use Excel?... 2 3 Quantitative data analysis tools in Excel... 3 4 Entering your data... 6 5 Preparing

More information

Simple Linear Regression in SPSS STAT 314

Simple Linear Regression in SPSS STAT 314 1. Ten Corvettes between 1 and 6 years old were randomly selected from last year s sales records in Virginia Beach, Virginia. The following data were obtained,

More information

Normality Testing in Excel

Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com

More information

Introduction to Statistical Computing in Microsoft Excel By Hector D. Flores; hflores@rice.edu, and Dr. J.A. Dobelman

Introduction to Statistical Computing in Microsoft Excel By Hector D. Flores; hflores@rice.edu, and Dr. J.A. Dobelman Statistics lab will be mainly focused on applying what you have learned in class with

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

Introduction to StatsDirect, 11/05/2012 1

INTRODUCTION TO STATSDIRECT PART 1... 2 INTRODUCTION... 2 Why Use StatsDirect... 2 ACCESSING STATSDIRECT FOR WINDOWS XP... 4 DATA ENTRY... 5 Missing Data... 6 Opening an Excel Workbook... 6 Moving around

More information

Quantitative Methods for Finance

Quantitative Methods for Finance Module 1: The Time Value of Money 1 Learning how to interpret interest rates as required rates of return, discount rates, or opportunity costs. 2 Learning how to explain

More information

Introduction to Quantitative Methods

Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

More information