Row vs. Column Percents. tab PRAYER DEGREE, row col



Similar documents
Association Between Variables

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Is it statistically significant? The chi-square test

Additional sources Compilation of sources:

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

MULTIPLE REGRESSION EXAMPLE

SPSS Guide How-to, Tips, Tricks & Statistical Techniques

Chapter 7 Factor Analysis SPSS

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Chapter 5 Analysis of variance SPSS Analysis of variance

Using Stata for Categorical Data Analysis

7TH ANNUAL PARENTS, KIDS & MONEY SURVEY: SUPPLEMENTAL DATA

Types of Data, Descriptive Statistics, and Statistical Tests for Nominal Data. Patrick F. Smith, Pharm.D. University at Buffalo Buffalo, New York

Hypothesis testing - Steps

Linear Models in STATA and ANOVA

Profiles and Data Analysis. 5.1 Introduction

Release #2443 Release Date: Thursday, February 28, 2013

Guido s Guide to PROC FREQ A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY

DESCRIPTIVE STATISTICS & DATA PRESENTATION*

II. DISTRIBUTIONS distribution normal distribution. standard scores

Projects Involving Statistics (& SPSS)

SPSS Guide: Regression Analysis

Introduction to Quantitative Methods

Solución del Examen Tipo: 1

Enrollment Data Undergraduate Programs by Race/ethnicity and Gender (Fall 2008) Summary Data Undergraduate Programs by Race/ethnicity

Econometrics Simple Linear Regression

Institut für Soziologie Eberhard Karls Universität Tübingen

Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, Last revised February 21, 2015

The Roles of Men and Women

Binary Logistic Regression

3. Analysis of Qualitative Data

The correlation coefficient

Two Correlated Proportions (McNemar Test)

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Mind on Statistics. Chapter 15

Wooldridge, Introductory Econometrics, 4th ed. Chapter 7: Multiple regression analysis with qualitative information: Binary (or dummy) variables

VI. Introduction to Logistic Regression

Caregiving Impact on Depressive Symptoms for Family Caregivers of Terminally Ill Cancer Patients in Taiwan

The Dummy s Guide to Data Analysis Using SPSS

Using the National Longitudinal Survey

Contingency Tables and the Chi Square Statistic. Interpreting Computer Printouts and Constructing Tables

The Ariel Mutual Funds/Charles Schwab & Co., Inc. Black Investor Survey. Saving and Investing Among High Income African-American and White Americans

Chapter 7: Simple linear regression Learning Objectives

How to set the main menu of STATA to default factory settings standards

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

The Chi-Square Test. STAT E-50 Introduction to Statistics

Finding Supporters. Political Predictive Analytics Using Logistic Regression. Multivariate Solutions

Weight of Evidence Module

Odds ratio, Odds ratio test for independence, chi-squared statistic.

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

An introduction to IBM SPSS Statistics

The Impact of Familial and Marital Status on the Performance of Life Insurance Agents The Case of Taiwan

Mind on Statistics. Chapter 4

Release #2301 Release Date and Time: 6:00 a.m., Tuesday, March 10, 2009

A Framework for Analyses with Numeric and Categorical Dependent Variables. An Exercise in Using GLM. Analyses with Categorical Dependent Variables

Measurement and Measurement Scales

How to Make APA Format Tables Using Microsoft Word

SPSS Resources. 1. See website (readings) for SPSS tutorial & Stats handout

CHAPTER 15 NOMINAL MEASURES OF CORRELATION: PHI, THE CONTINGENCY COEFFICIENT, AND CRAMER'S V

MULTIPLE REGRESSION WITH CATEGORICAL DATA

The AP-Viacom Survey of Youth on Education March, 2011

Research Methods & Experimental Design

Estimation of σ 2, the variance of ɛ

Chapter 23. Two Categorical Variables: The Chi-Square Test

Does it balance? Exploring family and careers in accounting

Examining Differences (Comparing Groups) using SPSS Inferential statistics (Part I) Dwayne Devonish

Scientific Method. 2. Design Study. 1. Ask Question. Questionnaire. Descriptive Research Study. 6: Share Findings. 1: Ask Question.

Statistical tests for SPSS

Pearson Student Mobile Device Survey 2013

Pearson's Correlation Tests

Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL

11. Analysis of Case-control Studies Logistic Regression

Relationships by Affiliation

Comparing Means Between Groups

Poisson Models for Count Data

UNDERSTANDING THE TWO-WAY ANOVA

First-year Statistics for Psychology Students Through Worked Examples

Interaction effects and group comparisons Richard Williams, University of Notre Dame, Last revised February 20, 2015

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Simple Linear Regression Inference

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

Gender Differences in Employed Job Search Lindsey Bowen and Jennifer Doyle, Furman University

Crosstabulation & Chi Square

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.)

Inferential Statistics. What are they? When would you use them?

Statistics 2014 Scoring Guidelines

SPSS Explore procedure

Multinomial and Ordinal Logistic Regression

Running Descriptive Statistics: Sample and Population Values

Least Squares Estimation

CHAPTER IV FINDINGS AND CONCURRENT DISCUSSIONS

Using Excel for inferential statistics

Descriptive Statistics

Effects of public relations in fund raising events (A study of selected churches in Aba metropolis)

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Transcription:

Bivariate Analysis - Crosstabulation One of most basic research tools shows how x varies with respect to y Interpretation of table depends upon direction of percentaging example Row vs. Column Percents. tab PRAYER DEGREE, row col frequency row percentage column percentage BIBLE PRAYER IN PUBLIC RS HIGHEST DEGREE SCHO JUNIOR CO BACHELOR SCHOOLS LT HIGH S HIGH GRADUATE Total APPROVE 1,492 4,797 569 1,942 1,064 9,864 15.13 48.63 5.77 19.69 10.79 100.00 27.29 36.96 41.99 56.18 64.52 39.60 DISAPPROVE 3,976 8,183 786 1,515 585 15,045 26.43 54.39 5.22 10.07 3.89 100.00 72.71 63.04 58.01 43.82 35.48 60.40 Total 5,468 12,980 1,355 3,457 1,649 24,909 21.95 52.11 5.44 13.88 6.62 100.00 100.00 100.00 100.00 100.00 100.00 100.00 119. A. The United States Supreme Court has ruled that no state or local government may require the reading of the Lord's Prayer or Bible verses in public schools. What are your views on this--do you approve or disapprove of the court ruling? Row vs. Column Percents. by RACE, sort : tabulate PRAYER DEGREE, column nofreq row -> RACE = W row percentage column percentage BIBLE PRAYER IN PUBLIC RS HIGHEST DEGREE SCHO JUNIOR CO BACHELOR SCHOOLS LT HIGH S HIGH GRADUATE Total APPROVE 13.71 48.49 5.75 20.79 11.26 100.00 28.39 38.30 44.43 58.47 65.91 41.59 DISAPPROVE 24.62 55.60 5.12 10.51 4.15 100.00 71.61 61.70 55.57 41.53 34.09 58.41 Total 20.08 52.64 5.38 14.78 7.10 100.00 100.00 100.00 100.00 100.00 100.00 100.00 -> RACE = B row percentage column percentage BIBLE PRAYER IN PUBLIC RS HIGHEST DEGREE SCHO JUNIOR CO BACHELOR SCHOOLS LT HIGH S HIGH GRADUATE Total APPROVE 27.87 51.99 5.74 9.25 5.15 100.00 21.44 25.71 27.07 31.35 41.12 25.29 DISAPPROVE 34.56 50.85 5.23 6.86 2.50 100.00 78.56 74.29 72.93 68.65 58.88 74.71 Total 32.87 51.14 5.36 7.46 3.17 100.00 100.00 100.00 100.00 100.00 100.00 100.00 1

Reading Tables - Basics What s the difference between 20% and 30%? What do row, column, and total percents tell you? Why are percent distributions more informative than distributions of raw numbers? Always best to either show or mention your numbers two forms of info clearer than one. Example Attitudes towards Women's Work 70 60 50 59.2 51.5 Percent 40 30 30.5 37 20 10 10.4 11.5 0 WORK FULL-TIME WORK PART-TIME STAY HOME Men Women Example before recoding Attitudes toward Women's Work, by Education Precent 100 80 60 40 20 0 Work Full-Time Work Part-Time Stay Home 0 1 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 2

Example after recoding Attitudes toward Women's Work, by Education Percent 70 60 50 40 30 20 10 0 59 58 51 51 39 37 32 30 10 12 10 12 Work Full-Time Work Part-Time Stay Home 0-11 12 13-15 16-20 Extensions Three-way crosstabulations Web: use the control option Stata: use by or table Crosstabulations for subsample Web:use the filter option Stata: use the if option Example: Educational differences in attitudes toward women s roles could do either way (wrkbaby x educ x sex or wrkbaby x educ if sex=male). Chi-square statistic Introduction Chi-square statistic is useful for testing statistical significance of relationships in crosstabulations, or contingency tables. Contingency tables: tables cross-classifying categorical variables. 2-way and 3-way (involving two or three variables. 3

Chi-square statistic Chi-square statistics reported by a cross-tabulation tests the hypothesis that the row variable and the column variable are independent of each other. If they are independent, joint probabilities can be determined by marginal probabilities. (N i+ /N ++ ) x (N +j /N ++ ) For a two-way contingency table of dimension IxJ, the degrees of freedom are (I-1)(J-1). A large chi-square rejects the hypothesis. Larger the chi-square => the stronger the evidence against independence => the stronger the relationship. Example of religious homogamy. Chi-square statistic. tabulate RELIG RELIGSP, chi2 expected frequency expected frequency RS RELIGIOUS RS SPOUSE'S RELIGION CATHOLIC JEWISH PREFERENCE PROTESTAN NONE OTHER (SP Total PROTESTANT 659 60 3 41 9 772 456.9 212.1 20.3 65.4 17.3 772.0 CATHOLIC 51 256 4 19 2 332 196.5 91.2 8.8 28.1 7.4 332.0 JEWISH 1 2 22 2 1 28 16.6 7.7 0.7 2.4 0.6 28.0 NONE 19 19 4 41 3 86 50.9 23.6 2.3 7.3 1.9 86.0 OTHER (SPECIFY) 11 7 0 3 13 34 20.1 9.3 0.9 2.9 0.8 34.0 Total 741 344 33 106 28 1,252 741.0 344.0 33.0 106.0 28.0 1,252.0 Pearson chi2(16) 1.7e+03 Pr = 0.000 = Comparison of means Like crosstabulations but better for continuous (i.e., interval) variables. Example: Mean age at first marriage by sex and race (agewed x sex x race) 4

Comparison of means Mean age at Marriage, by Sex and Race Age at marriage 27 25 23 21 19 17 15 23.6 23.8 MALE 25.1 21.0 20.5 21.9 FEMALE White Black Other Regression: Concepts In essence, regression is conditional mean. e.g., X = independent variable, and Y= dependent variable Regression says Y = F(X) + ε F(X) is called the regression function. If F(X) is linear, F(X) = β 0 + β 1 X Example: income and happiness F(X) could be nonlinear. e.g., F(X) = β 0 + β 1 X + β 2 X 2. Regression: Error Term Because a regression model does not predict observed values exactly, there is an error term. This is due to our desire for data reduction. In bivariate case, a linear model looks like y i = β 0 + β 1 x i + ε i 5

Multivariate Regression Regression one of many alternatives in bivariate case, but only real alternative in multivariate case y i = β 0 + β 1 x i1 + β 2 x i2 + β 3 x i3 + ε i, β's are also called "partial" regression coefficients. Say β 3 = 2.0 (with x 1i = $ of income in thousands), e.g., controlling for sex and marital status, every $1,000 buys you two points on the happiness scale. Multivariate Regression cont d y i = β 0 + β 1 x i1 + β 2 x i2 + β 3 x i3 + ε i, If x 1 is defined so 0=not married & 1=married If x 2 is defined so 0=male & 1=female β 0 is predicted happiness for unmarried man w/ no income β 0 +β 1 is predicted happiness for married man w/ no income β 0 +β 1 +β 2 is predicted happiness for married woman w/ no income β 0 +β 1 +β 2 +β 3 x i3 is predicted happiness for married woman w/ income level x i Regression: Interpretations A. Causal Interpretation (Classical) Observed = True Model + Error B. Descriptive Interpretation (Demographic) Observed = Fitted + Residual C. Best Predictor Interpretation (Programmatic) Observed = Prediction + Deviation 6

Tradeoff between Goodness-of- Fit and Parsimony In modeling observed data with a statistical model, you always have the problem of balancing the need for accuracy on the one hand and the need for parsimony on the other hand. By accuracy I mean the ability of your statistical model to reproduce the observed data. (Prediction). Consequences of Including Irrelevant Independent Variables Should we always include as many independent variables as possible? The answer is no. Do not include irrelevant independent variables. Why? 1. Missing theoretically interesting findings 2. Violating the parsimony rule (Occam's Razor) 3. Wasting degrees of freedom 4. Making estimates imprecise. 7