# Using Stata for Categorical Data Analysis

Size: px
Start display at page:

Transcription

1 Using Stata for Categorical Data Analysis NOTE: These problems make extensive use of Nick Cox s tab_chi, which is actually a collection of routines, and Adrian Mander s ipf command. From within Stata, use the commands ssc install tab_chi and ssc install ipf to get the most current versions of these programs. Thanks to Nick Cox, Richard Campbell and Philip Ender for helping me to identify the Stata routines needed for this handout. This handout shows how to work the problems in Stata; see the related handouts for the underlying statistical theory and for SPSS solutions. Most of the commands have additional optional parameters that may be useful; type help commandname for more information. CASE I. COMPARING SAMPLE AND POPULATION DISTRIBUTIONS. Suppose that a study of educational achievement of American men were being carried on. The population studied is the set of all American males who are 25 years old at the time of the study. Each subject observed can be put into 1 and only 1 of the following categories, based on his maximum formal educational achievement: 1 = college grad 2 = some college 3 = high school grad 4 = some high school 5 = finished 8th grade 6 = did not finish 8th grade Note that these categories are mutually exclusive and exhaustive. The researcher happens to know that 10 years ago the distribution of educational achievement on this scale for 25 year old men was: 1-18% 2-17% 3-32% 4-13% 5-17% 6-3% A random sample of 200 subjects is drawn from the current population of 25 year old males, and the following frequency distribution obtained: Using Stata for Categorical Data Analysis - Page 1

2 The researcher would like to ask if the present population distribution on this scale is exactly like that of 10 years ago. That is, he would like to test H 0 : H A : There has been no change across time. The distribution of education in the present population is the same as the distribution of education in the population 10 years ago There has been change across time. The present population distribution differs from the population distribution of 10 years ago. Stata Solution. Surprisingly, Stata does not seem to have any built-in routines for Case I, but luckily Nick Cox s chitesti routine (part of his tab_chi package) is available. Like other Stata immediate commands, chitesti obtains data not from the data stored in memory but from numbers typed as arguments. The format (without optional parameters) is chitesti #obs1 #obs2 [...] [ \ #exp1 #exp2 [...] ] In this case,. chitesti \ , sep(6) observed frequencies from keyboard; expected frequencies from keyboard Pearson chi2(5) = Pr = likelihood-ratio chi2(5) = Pr = observed expected obs - exp Pearson The significant chi-square statistics imply that the null should be rejected, i.e. the distribution today is not the same as 10 years ago. Alternatively, we could have the data in a file and then use the chitest command, e.g. the data would be. list observed expected, sep(6) observed expected Using Stata for Categorical Data Analysis - Page 2

3 We then give the command. chitest observed expected, sep(6) observed frequencies from observed; expected frequencies from expected Pearson chi2(5) = Pr = likelihood-ratio chi2(5) = Pr = observed expected obs - exp Pearson Other Hypothetical Distributions: In the above example, the hypothetical distribution we used was the known population distribution of 10 years ago. Another possible hypothetical distribution that is sometimes used is specified by the equi-probability model. The equiprobability model claims that the expected number of cases is the same for each category; that is, we test H 0 : H A : E 1 = E 2 =... = E c The frequencies are not all equal. The expected frequency for each cell is (Sample size/number of categories). Such a model might be plausible if we were interested in, say, whether birth rates differed across months. If for some bizarre reason we believed the equi-probability model might apply to educational achievement, we would hypothesize that people would fall into each of our 6 categories. With the chitesti and chitest commands, if you DON T specify expected frequencies, the equi-probability model is assumed. Hence,. chitesti , sep(6) observed frequencies from keyboard; expected frequencies equal Pearson chi2(5) = Pr = likelihood-ratio chi2(5) = Pr = observed expected obs - exp Pearson Using Stata for Categorical Data Analysis - Page 3

4 Or, using a data file,. chitest observed, sep(6) observed frequencies from observed; expected frequencies equal Pearson chi2(5) = Pr = likelihood-ratio chi2(5) = Pr = observed expected obs - exp Pearson Obviously, the equi-probability model does not work very well in this case, but there is no reason we would have expected it to. CASE II. TESTS OF ASSOCIATION A researcher wants to know whether men and women in a particular community differ in their political party preferences. She collects data from a random sample of 200 registered voters, and observes the following: Dem Rep Male Female Do men and women significantly differ in their political preferences? Use α =.05. Stata Solution. There are various ways to do this in Stata. Nick Cox s tabchii and tabchi commands, which are part of his tab_chi package, can be used. See their help files. But, Stata s tabi and tabulate commands are already available for Case II. tabi has the following format: tabi #11 #12 [...] \ #21 #22 [...] [\...], tabulate_options i.e. you enter the data for row 1, then row 2, etc. The command also includes several options for displaying various statistics and other types of information, e.g. chi2 gives you the Pearson chisquare, lrchi2 gives you the Likelihood Ratio Chi-Square, and exact gives you Fisher s Exact Test. For this problem, Using Stata for Categorical Data Analysis - Page 4

5 . tabi \50 30, chi2 lrchi2 exact col row 1 2 Total Total Pearson chi2(1) = Pr = likelihood-ratio chi2(1) = Pr = Fisher's exact = sided Fisher's exact = You could also enter the data like this: let gender = 1 if male, 2 if female; party = 1 if Democrat, 2 = Republican; wgt = frequency. Then,. list gender party wgt gender party wgt We can now use Stata s tabulate command (which can be abbreviated tab). The [freq=wgt] parameter tells it to weight each of the four combinations by its frequency.. tab gender party [freq = wgt], chi2 lrchi2 exact -> tabulation of gender by party party gender 1 2 Total Total Pearson chi2(1) = Pr = likelihood-ratio chi2(1) = Pr = Fisher's exact = sided Fisher's exact = If you have individual-level data, e.g. in this case the data set would have 200 individual-level records, the tab command is Using Stata for Categorical Data Analysis - Page 5

6 . tab gender party, chi2 lrchi2 exact -> tabulation of gender by party party gender 1 2 Total Total Pearson chi2(1) = Pr = likelihood-ratio chi2(1) = Pr = Fisher's exact = sided Fisher's exact = Sidelights. (1) I used the command expand wgt to create an individual-level dataset. This duplicated records based on their frequencies, i.e. it took the tabled data and expanded it into 200 individual-level records. (2) Yates correction for continuity is sometimes used for 1 X 2 and 2 X 2 tables. I personally don t know of any straightforward way to do this in Stata. [UPDATE OCT 2014: The user-written exactcc command (findit exactcc) can calculate the Yates correction if you really need it.] Fisher s Exact Test is generally better anyway. (3) Fisher s Exact Test is most useful when the sample is small, e.g. one or more expected values is less than 5. With larger N, it might take a while to calculate. Alternative Approach for 2 X 2 tables. Note that, instead of viewing this as one sample of 200 men and women, we could view it as two samples, a sample of 120 men and another sample of 80 women. Further, since there are only two categories for political party, testing whether men and women have the same distribution of party preferences is equivalent to testing whether the same proportion of men and women support the Republican party. Hence, we could also treat this as a two sample problem, case V, test of p 1 = p 2. We can use the prtesti and prtest commands. We ll let p = the probability of being Republican. Using prtesti,. prtesti , count Two-sample test of proportion x: Number of obs = 120 y: Number of obs = Variable Mean Std. Err. z P> z [95% Conf. Interval] x y diff under Ho: Ho: proportion(x) - proportion(y) = diff = 0 Ha: diff < 0 Ha: diff!= 0 Ha: diff > 0 z = z = z = P < z = P > z = P > z = Using a data file, we first create a new version of party that is coded 0 = Democrat, 1 = Republican, and then use the prtest command. Using Stata for Categorical Data Analysis - Page 6

7 . gen party2 = party - 1. prtest party2, by( gender) Two-sample test of proportion Male: Number of obs = 120 Female: Number of obs = Variable Mean Std. Err. z P> z [95% Conf. Interval] Male Female diff under Ho: Ho: proportion(male) - proportion(female) = diff = 0 Ha: diff < 0 Ha: diff!= 0 Ha: diff > 0 z = z = z = P < z = P > z = P > z = * z squared = the chi-square value we got earlier. display r(z) ^ A small advantage of this approach in this case is that the sign of the test statistic is meaningful. The positive and significant z value tells us men are more likely than women to be Republicans. CASE III: CHI-SQUARE TESTS OF ASSOCIATION FOR N-DIMENSIONAL TABLES A researcher collects the following data: Gender/Party Republican Democrat W NW W NW Male Female Test the hypothesis that sex, race, and party affiliation are independent of each other. Use α =.10. Stata Solution. Problems like this can be addressed using advanced Stata routines like poisson and glm. For our current purposes, however, Adrian Mander s ipf command (iterative proportional fitting) provides a simple, straightforward solution. (ipf also could have been used for some of the previous problems.) The format of the ipf command depends on how the data have been entered. One approach is to enter the data as 8 cases, with the variables gender, race, party and freq: Using Stata for Categorical Data Analysis - Page 7

8 . list, sep(4) gender race party freq Male White Republican Male NonWhite Republican 5 3. Male White Democrat Male NonWhite Democrat Female White Republican Female NonWhite Republican 2 7. Female White Democrat Female NonWhite Democrat You then use ipf specifying [fw = freq], i.e. you weight by the frequency count. (If instead your data set consists of the 100 individual-level cases, then just leave this parameter off.) The fit parameter tells ipf what model to fit; by specifying fit(gender+race+party) we tell ipf to fit the model of independence, i.e. we fit the main effects only but do not allow for any interactions (dependence) among the variables.. ipf [fw = freq], fit(gender + race + party) Deleting all matrices... Expansion of the various marginal models marginal model 1 varlist : gender marginal model 2 varlist : race marginal model 3 varlist : party unique varlist gender race party N.B. structural/sampling zeroes may lead to an incorrect df Residual degrees of freedom = 4 Number of parameters = 4 Number of cells = 8 Loglikelihood = Loglikelihood = Goodness of Fit Tests df = 4 Likelihood Ratio Statistic G^2 = p-value = Pearson Statistic X^2 = p-value = These are the same chi-square statistics we got before. If we are using the (rather generous).10 level of significance, we should reject the model of independence. However, we do not know where the dependence is at this point. CONDITIONAL INDEPENDENCE IN N-DIMENSIONAL TABLES Using the same data as in the last problem, test whether party vote is independent of sex and race, WITHOUT assuming that sex and race are independent of each other. Use α =.05. Stata Solution. We are being asked to test the model of conditional independence. This model says that party vote is not affected by either race or sex, although race and sex may be associated Using Stata for Categorical Data Analysis - Page 8

9 with each other. Such a model makes sense if we are primarily interested in the determinants of party vote, and do not care whether other variables happen to be associated with each other. To estimate this model with ipf, we use the * parameter to allow for an interaction (dependence) between gender and race, but we do not allow for gender or race to interact with party:. ipf [fw = freq], fit(gender + race + party + gender*race) Deleting all matrices... Expansion of the various marginal models marginal model 1 varlist : gender marginal model 2 varlist : race marginal model 3 varlist : party marginal model 4 varlist : gender race unique varlist gender race party N.B. structural/sampling zeroes may lead to an incorrect df Residual degrees of freedom = 3 Number of parameters = 5 Number of cells = 8 Loglikelihood = Loglikelihood = Goodness of Fit Tests df = 3 Likelihood Ratio Statistic G^2 = p-value = Pearson Statistic X^2 = p-value = Again, the chi-square statistics are the same as before. Because they are not significant at the.05 level (or.10 for that matter) we do NOT reject the model of conditional independence. Having said that, however, it can be noted that the model probably should include an effect of race on party affiliation, as the fit improves significantly when this interaction is added to the model:. ipf [fw = freq], fit(gender + race + party + gender*race + race*party) N.B. structural/sampling zeroes may lead to an incorrect df Residual degrees of freedom = 2 Number of parameters = 6 Number of cells = 8 Loglikelihood = Loglikelihood = Goodness of Fit Tests df = 2 Likelihood Ratio Statistic G^2 = p-value = Pearson Statistic X^2 = p-value = display display chi2tail(1, ) Using Stata for Categorical Data Analysis - Page 9

10 Note that, when the race*party interaction is added to the model, the Likelihood Ratio Chi- Square drops from to.1838, i.e. by This change (which has 1 degree of freedom) is significant at the.0175 level, implying that we should allow for a race*party interaction. We ll talk more about chi-square contrasts between models during 2 nd semester. Using Stata for Categorical Data Analysis - Page 10

### Is it statistically significant? The chi-square test

UAS Conference Series 2013/14 Is it statistically significant? The chi-square test Dr Gosia Turner Student Data Management and Analysis 14 September 2010 Page 1 Why chi-square? Tests whether two categorical

### Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

### Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

Introduction Hypothesis Testing Mark Lunt Arthritis Research UK Centre for Ecellence in Epidemiology University of Manchester 13/10/2015 We saw last week that we can never know the population parameters

### Multinomial and Ordinal Logistic Regression

Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,

### Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test Features Of The Chi-Square Statistic The chi-square test is non-parametric. That is, it makes no assumptions about the distribution

### Interaction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015

Interaction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Note: This handout assumes you understand factor variables,

### How to set the main menu of STATA to default factory settings standards

University of Pretoria Data analysis for evaluation studies Examples in STATA version 11 List of data sets b1.dta (To be created by students in class) fp1.xls (To be provided to students) fp1.txt (To be

### An Introduction to Statistics Course (ECOE 1302) Spring Semester 2011 Chapter 10- TWO-SAMPLE TESTS

The Islamic University of Gaza Faculty of Commerce Department of Economics and Political Sciences An Introduction to Statistics Course (ECOE 130) Spring Semester 011 Chapter 10- TWO-SAMPLE TESTS Practice

### Mind on Statistics. Chapter 15

Mind on Statistics Chapter 15 Section 15.1 1. A student survey was done to study the relationship between class standing (freshman, sophomore, junior, or senior) and major subject (English, Biology, French,

### Comparing Multiple Proportions, Test of Independence and Goodness of Fit

Comparing Multiple Proportions, Test of Independence and Goodness of Fit Content Testing the Equality of Population Proportions for Three or More Populations Test of Independence Goodness of Fit Test 2

### From this it is not clear what sort of variable that insure is so list the first 10 observations.

MNL in Stata We have data on the type of health insurance available to 616 psychologically depressed subjects in the United States (Tarlov et al. 1989, JAMA; Wells et al. 1989, JAMA). The insurance is

### Using Stata for One Sample Tests

Using Stata for One Sample Tests All of the one sample problems we have discussed so far can be solved in Stata via either (a) statistical calculator functions, where you provide Stata with the necessary

### MULTIPLE REGRESSION EXAMPLE

MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if

### 11. Analysis of Case-control Studies Logistic Regression

Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

### Association Between Variables

Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

### Odds ratio, Odds ratio test for independence, chi-squared statistic.

Odds ratio, Odds ratio test for independence, chi-squared statistic. Announcements: Assignment 5 is live on webpage. Due Wed Aug 1 at 4:30pm. (9 days, 1 hour, 58.5 minutes ) Final exam is Aug 9. Review

### Generalized Linear Models

Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

### Elementary Statistics

lementary Statistics Chap10 Dr. Ghamsary Page 1 lementary Statistics M. Ghamsary, Ph.D. Chapter 10 Chi-square Test for Goodness of fit and Contingency tables lementary Statistics Chap10 Dr. Ghamsary Page

### Basic Statistical and Modeling Procedures Using SAS

Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom

### Having a coin come up heads or tails is a variable on a nominal scale. Heads is a different category from tails.

Chi-square Goodness of Fit Test The chi-square test is designed to test differences whether one frequency is different from another frequency. The chi-square test is designed for use with data on a nominal

### CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont

CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont To most people studying statistics a contingency table is a contingency table. We tend to forget, if we ever knew, that contingency

### The Chi-Square Test. STAT E-50 Introduction to Statistics

STAT -50 Introduction to Statistics The Chi-Square Test The Chi-square test is a nonparametric test that is used to compare experimental results with theoretical models. That is, we will be comparing observed

### Independent t- Test (Comparing Two Means)

Independent t- Test (Comparing Two Means) The objectives of this lesson are to learn: the definition/purpose of independent t-test when to use the independent t-test the use of SPSS to complete an independent

### TABLE OF CONTENTS. About Chi Squares... 1. What is a CHI SQUARE?... 1. Chi Squares... 1. Hypothesis Testing with Chi Squares... 2

About Chi Squares TABLE OF CONTENTS About Chi Squares... 1 What is a CHI SQUARE?... 1 Chi Squares... 1 Goodness of fit test (One-way χ 2 )... 1 Test of Independence (Two-way χ 2 )... 2 Hypothesis Testing

### Chi-square test Fisher s Exact test

Lesson 1 Chi-square test Fisher s Exact test McNemar s Test Lesson 1 Overview Lesson 11 covered two inference methods for categorical data from groups Confidence Intervals for the difference of two proportions

### Comparing Means Between Groups

Comparing Means Between Groups Michael Ash Lecture 6 Summary of Main Points Comparing means between groups is an important method for program evaluation by policy analysts and public administrators. The

### Two Correlated Proportions (McNemar Test)

Chapter 50 Two Correlated Proportions (Mcemar Test) Introduction This procedure computes confidence intervals and hypothesis tests for the comparison of the marginal frequencies of two factors (each with

### Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

Parkland College A with Honors Projects Honors Program 2014 Calculating P-Values Isela Guerra Parkland College Recommended Citation Guerra, Isela, "Calculating P-Values" (2014). A with Honors Projects.

### Row vs. Column Percents. tab PRAYER DEGREE, row col

Bivariate Analysis - Crosstabulation One of most basic research tools shows how x varies with respect to y Interpretation of table depends upon direction of percentaging example Row vs. Column Percents.

### Section 12 Part 2. Chi-square test

Section 12 Part 2 Chi-square test McNemar s Test Section 12 Part 2 Overview Section 12, Part 1 covered two inference methods for categorical data from 2 groups Confidence Intervals for the difference of

### 1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

### Stats Review Chapters 9-10

Stats Review Chapters 9-10 Created by Teri Johnson Math Coordinator, Mary Stangler Center for Academic Success Examples are taken from Statistics 4 E by Michael Sullivan, III And the corresponding Test

### VI. Introduction to Logistic Regression

VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models

### Math 58. Rumbos Fall 2008 1. Solutions to Review Problems for Exam 2

Math 58. Rumbos Fall 2008 1 Solutions to Review Problems for Exam 2 1. For each of the following scenarios, determine whether the binomial distribution is the appropriate distribution for the random variable

### BA 275 Review Problems - Week 5 (10/23/06-10/27/06) CD Lessons: 48, 49, 50, 51, 52 Textbook: pp. 380-394

BA 275 Review Problems - Week 5 (10/23/06-10/27/06) CD Lessons: 48, 49, 50, 51, 52 Textbook: pp. 380-394 1. Does vigorous exercise affect concentration? In general, the time needed for people to complete

### In the general population of 0 to 4-year-olds, the annual incidence of asthma is 1.4%

Hypothesis Testing for a Proportion Example: We are interested in the probability of developing asthma over a given one-year period for children 0 to 4 years of age whose mothers smoke in the home In the

### An Introduction to Statistical Tests for the SAS Programmer Sara Beck, Fred Hutchinson Cancer Research Center, Seattle, WA

ABSTRACT An Introduction to Statistical Tests for the SAS Programmer Sara Beck, Fred Hutchinson Cancer Research Center, Seattle, WA Often SAS Programmers find themselves in situations where performing

### Chapter 19 The Chi-Square Test

Tutorial for the integration of the software R with introductory statistics Copyright c Grethe Hystad Chapter 19 The Chi-Square Test In this chapter, we will discuss the following topics: We will plot

### Binary Diagnostic Tests Two Independent Samples

Chapter 537 Binary Diagnostic Tests Two Independent Samples Introduction An important task in diagnostic medicine is to measure the accuracy of two diagnostic tests. This can be done by comparing summary

### Factors affecting online sales

Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4

### Two-sample t-tests. - Independent samples - Pooled standard devation - The equal variance assumption

Two-sample t-tests. - Independent samples - Pooled standard devation - The equal variance assumption Last time, we used the mean of one sample to test against the hypothesis that the true mean was a particular

### Math 108 Exam 3 Solutions Spring 00

Math 108 Exam 3 Solutions Spring 00 1. An ecologist studying acid rain takes measurements of the ph in 12 randomly selected Adirondack lakes. The results are as follows: 3.0 6.5 5.0 4.2 5.5 4.7 3.4 6.8

### II. DISTRIBUTIONS distribution normal distribution. standard scores

Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

### Simple Linear Regression Inference

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

### Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis

### Analysis of categorical data: Course quiz instructions for SPSS

Analysis of categorical data: Course quiz instructions for SPSS The dataset Please download the Online sales dataset from the Download pod in the Course quiz resources screen. The filename is smr_bus_acd_clo_quiz_online_250.xls.

### Handling missing data in Stata a whirlwind tour

Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled

### Tests for Two Proportions

Chapter 200 Tests for Two Proportions Introduction This module computes power and sample size for hypothesis tests of the difference, ratio, or odds ratio of two independent proportions. The test statistics

### Categorical Data Analysis

Richard L. Scheaffer University of Florida The reference material and many examples for this section are based on Chapter 8, Analyzing Association Between Categorical Variables, from Statistical Methods

### Data Analysis Tools. Tools for Summarizing Data

Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool

### Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

### COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

277 CHAPTER VI COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. This chapter contains a full discussion of customer loyalty comparisons between private and public insurance companies

### MODEL I: DRINK REGRESSED ON GPA & MALE, WITHOUT CENTERING

Interpreting Interaction Effects; Interaction Effects and Centering Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Models with interaction effects

### Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,

### Simulating Chi-Square Test Using Excel

Simulating Chi-Square Test Using Excel Leslie Chandrakantha John Jay College of Criminal Justice of CUNY Mathematics and Computer Science Department 524 West 59 th Street, New York, NY 10019 lchandra@jjay.cuny.edu

### Chi Square Tests. Chapter 10. 10.1 Introduction

Contents 10 Chi Square Tests 703 10.1 Introduction............................ 703 10.2 The Chi Square Distribution.................. 704 10.3 Goodness of Fit Test....................... 709 10.4 Chi Square

### Guido s Guide to PROC FREQ A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY

Guido s Guide to PROC FREQ A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY ABSTRACT PROC FREQ is an essential procedure within BASE

### MULTIPLE REGRESSION WITH CATEGORICAL DATA

DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 86 MULTIPLE REGRESSION WITH CATEGORICAL DATA I. AGENDA: A. Multiple regression with categorical variables. Coding schemes. Interpreting

### Common Univariate and Bivariate Applications of the Chi-square Distribution

Common Univariate and Bivariate Applications of the Chi-square Distribution The probability density function defining the chi-square distribution is given in the chapter on Chi-square in Howell's text.

### Introduction to Quantitative Methods

Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

### CHAPTER IV FINDINGS AND CONCURRENT DISCUSSIONS

CHAPTER IV FINDINGS AND CONCURRENT DISCUSSIONS Hypothesis 1: People are resistant to the technological change in the security system of the organization. Hypothesis 2: information hacked and misused. Lack

### SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

### Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217

Part 3 Comparing Groups Chapter 7 Comparing Paired Groups 189 Chapter 8 Comparing Two Independent Groups 217 Chapter 9 Comparing More Than Two Groups 257 188 Elementary Statistics Using SAS Chapter 7 Comparing

### Module 14: Missing Data Stata Practical

Module 14: Missing Data Stata Practical Jonathan Bartlett & James Carpenter London School of Hygiene & Tropical Medicine www.missingdata.org.uk Supported by ESRC grant RES 189-25-0103 and MRC grant G0900724

### In the past, the increase in the price of gasoline could be attributed to major national or global

Chapter 7 Testing Hypotheses Chapter Learning Objectives Understanding the assumptions of statistical hypothesis testing Defining and applying the components in hypothesis testing: the research and null

### Statistics in Retail Finance. Chapter 2: Statistical models of default

Statistics in Retail Finance 1 Overview > We consider how to build statistical models of default, or delinquency, and how such models are traditionally used for credit application scoring and decision

### ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages

### Rockefeller College University at Albany

Rockefeller College University at Albany PAD 705 Handout: Hypothesis Testing on Multiple Parameters In many cases we may wish to know whether two or more variables are jointly significant in a regression.

### Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 21, 2015

Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 21, 2015 References: Long 1997, Long and Freese 2003 & 2006 & 2014,

### Nonparametric Statistics

Nonparametric Statistics J. Lozano University of Goettingen Department of Genetic Epidemiology Interdisciplinary PhD Program in Applied Statistics & Empirical Methods Graduate Seminar in Applied Statistics

### CHAPTER 5 COMPARISON OF DIFFERENT TYPE OF ONLINE ADVERTSIEMENTS. Table: 8 Perceived Usefulness of Different Advertisement Types

CHAPTER 5 COMPARISON OF DIFFERENT TYPE OF ONLINE ADVERTSIEMENTS 5.1 Descriptive Analysis- Part 3 of Questionnaire Table 8 shows the descriptive statistics of Perceived Usefulness of Banner Ads. The results

### Chi Square Distribution

17. Chi Square A. Chi Square Distribution B. One-Way Tables C. Contingency Tables D. Exercises Chi Square is a distribution that has proven to be particularly useful in statistics. The first section describes

### Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

### Final Exam Practice Problem Answers

Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

### Introduction to STATA 11 for Windows

1/27/2012 Introduction to STATA 11 for Windows Stata Sizes...3 Documentation...3 Availability...3 STATA User Interface...4 Stata Language Syntax...5 Entering and Editing Stata Commands...6 Stata Online

### Erik Parner 14 September 2016. Basic Biostatistics - Day 2-21 September, 2016 1

PhD course in Basic Biostatistics Day Erik Parner, Department of Biostatistics, Aarhus University Log-transformation of continuous data Exercise.+.4+Standard- (Triglyceride) Logarithms and exponentials

### General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.

General Method: Difference of Means 1. Calculate x 1, x 2, SE 1, SE 2. 2. Combined SE = SE1 2 + SE2 2. ASSUMES INDEPENDENT SAMPLES. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n

### statistics Chi-square tests and nonparametric Summary sheet from last time: Hypothesis testing Summary sheet from last time: Confidence intervals

Summary sheet from last time: Confidence intervals Confidence intervals take on the usual form: parameter = statistic ± t crit SE(statistic) parameter SE a s e sqrt(1/n + m x 2 /ss xx ) b s e /sqrt(ss

### Recommend Continued CPS Monitoring. 63 (a) 17 (b) 10 (c) 90. 35 (d) 20 (e) 25 (f) 80. Totals/Marginal 98 37 35 170

Work Sheet 2: Calculating a Chi Square Table 1: Substance Abuse Level by ation Total/Marginal 63 (a) 17 (b) 10 (c) 90 35 (d) 20 (e) 25 (f) 80 Totals/Marginal 98 37 35 170 Step 1: Label Your Table. Label

### Introduction; Descriptive & Univariate Statistics

Introduction; Descriptive & Univariate Statistics I. KEY COCEPTS A. Population. Definitions:. The entire set of members in a group. EXAMPLES: All U.S. citizens; all otre Dame Students. 2. All values of

### 3. Analysis of Qualitative Data

3. Analysis of Qualitative Data Inferential Stats, CEC at RUPP Poch Bunnak, Ph.D. Content 1. Hypothesis tests about a population proportion: Binomial test 2. Chi-square testt for goodness offitfit 3. Chi-square

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### " Y. Notation and Equations for Regression Lecture 11/4. Notation:

Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

### 12.5: CHI-SQUARE GOODNESS OF FIT TESTS

125: Chi-Square Goodness of Fit Tests CD12-1 125: CHI-SQUARE GOODNESS OF FIT TESTS In this section, the χ 2 distribution is used for testing the goodness of fit of a set of data to a specific probability

### An introduction to IBM SPSS Statistics

An introduction to IBM SPSS Statistics Contents 1 Introduction... 1 2 Entering your data... 2 3 Preparing your data for analysis... 10 4 Exploring your data: univariate analysis... 14 5 Generating descriptive

### Recall this chart that showed how most of our course would be organized:

Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

### Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.

Analyzing Complex Survey Data: Some key issues to be aware of Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 24, 2015 Rather than repeat material that is

### 3.4 Statistical inference for 2 populations based on two samples

3.4 Statistical inference for 2 populations based on two samples Tests for a difference between two population means The first sample will be denoted as X 1, X 2,..., X m. The second sample will be denoted

### Introduction. Survival Analysis. Censoring. Plan of Talk

Survival Analysis Mark Lunt Arthritis Research UK Centre for Excellence in Epidemiology University of Manchester 01/12/2015 Survival Analysis is concerned with the length of time before an event occurs.

### 25 Working with categorical data and factor variables

25 Working with categorical data and factor variables Contents 25.1 Continuous, categorical, and indicator variables 25.1.1 Converting continuous variables to indicator variables 25.1.2 Converting continuous

### CHAPTER 12. Chi-Square Tests and Nonparametric Tests LEARNING OBJECTIVES. USING STATISTICS @ T.C. Resort Properties

CHAPTER 1 Chi-Square Tests and Nonparametric Tests USING STATISTICS @ T.C. Resort Properties 1.1 CHI-SQUARE TEST FOR THE DIFFERENCE BETWEEN TWO PROPORTIONS (INDEPENDENT SAMPLES) 1. CHI-SQUARE TEST FOR

### HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1 PREVIOUSLY used confidence intervals to answer questions such as... You know that 0.25% of women have red/green color blindness. You conduct a study of men

### Poisson Models for Count Data

Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

### Syntax Menu Description Options Remarks and examples Stored results Methods and formulas References Also see. level(#) , options2

Title stata.com ttest t tests (mean-comparison tests) Syntax Syntax Menu Description Options Remarks and examples Stored results Methods and formulas References Also see One-sample t test ttest varname

### SPSS Resources. 1. See website (readings) for SPSS tutorial & Stats handout

Analyzing Data SPSS Resources 1. See website (readings) for SPSS tutorial & Stats handout Don t have your own copy of SPSS? 1. Use the libraries to analyze your data 2. Download a trial version of SPSS

### Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA Limitations of the t-test Although the t-test is commonly used, it has limitations Can only

### Topic 8. Chi Square Tests

BE540W Chi Square Tests Page 1 of 5 Topic 8 Chi Square Tests Topics 1. Introduction to Contingency Tables. Introduction to the Contingency Table Hypothesis Test of No Association.. 3. The Chi Square Test