Data Analysis for categorical variables and its application to happiness studies

Similar documents
Ordinal Regression. Chapter

Analysing Questionnaires using Minitab (for SPSS queries contact -)

11. Analysis of Case-control Studies Logistic Regression

Two Correlated Proportions (McNemar Test)

LOGISTIC REGRESSION ANALYSIS

CHAPTER 15 NOMINAL MEASURES OF CORRELATION: PHI, THE CONTINGENCY COEFFICIENT, AND CRAMER'S V

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

VI. Introduction to Logistic Regression

Weight of Evidence Module

Nominal and ordinal logistic regression

Binary Logistic Regression

How to set the main menu of STATA to default factory settings standards

Linda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén Table Of Contents

Additional sources Compilation of sources:

SAS Software to Fit the Generalized Linear Model

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Logit and Probit. Brad Jones 1. April 21, University of California, Davis. Bradford S. Jones, UC-Davis, Dept. of Political Science

Data analysis process

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots

i SPSS Regression 17.0

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

IBM SPSS Regression 20

Sun Li Centre for Academic Computing

Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541

Calculating Effect-Sizes

Interpretation of Somers D under four simple models

CHAPTER 14 ORDINAL MEASURES OF CORRELATION: SPEARMAN'S RHO AND GAMMA

SPSS Modules Features Statistics Premium

Simple linear regression

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes

Developing Risk Adjustment Techniques Using the System for Assessing Health Care Quality in the

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES

The Dummy s Guide to Data Analysis Using SPSS

Chapter 7: Simple linear regression Learning Objectives

Multinomial and Ordinal Logistic Regression

Multinomial Logistic Regression

Basic Statistical and Modeling Procedures Using SAS

HLM software has been one of the leading statistical packages for hierarchical

Generalized Linear Models

Machine Learning Logistic Regression

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Simple Linear Regression, Scatterplots, and Bivariate Correlation

Statistical Analysis A Manual on Dissertation Statistics in SPSS

LOGIT AND PROBIT ANALYSIS

Directions for using SPSS

Introduction to Quantitative Methods

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

III. INTRODUCTION TO LOGISTIC REGRESSION. a) Example: APACHE II Score and Mortality in Sepsis

Binary Diagnostic Tests Two Independent Samples

The Statistics Tutor s Quick Guide to

Association Between Variables

Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study)

The first three steps in a logistic regression analysis with examples in IBM SPSS. Steve Simon P.Mean Consulting

Module 4 - Multiple Logistic Regression

The Probit Link Function in Generalized Linear Models for Data Mining Applications

Regression with a Binary Dependent Variable

From this it is not clear what sort of variable that insure is so list the first 10 observations.

An introduction to IBM SPSS Statistics

SPSS Explore procedure

Logit Models for Binary Data

Logistic Regression.

Logistic regression modeling the probability of success

SUGI 29 Statistics and Data Analysis

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

Section 3 Part 1. Relationships between two numerical variables

Chapter 7 Factor Analysis SPSS

Beginning Tutorials. PROC FREQ: It s More Than Counts Richard Severino, The Queen s Medical Center, Honolulu, HI OVERVIEW.

Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, Last revised February 21, 2015

XPost: Excel Workbooks for the Post-estimation Interpretation of Regression Models for Categorical Dependent Variables

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

Predicting Successful Completion of the Nursing Program: An Analysis of Prerequisites and Demographic Variables

II. DISTRIBUTIONS distribution normal distribution. standard scores

Calculating, Interpreting, and Reporting Estimates of Effect Size (Magnitude of an Effect or the Strength of a Relationship)

Credit Risk Analysis Using Logistic Regression Modeling

Multiple logistic regression analysis of cigarette use among high school students

SPSS ADVANCED ANALYSIS WENDIANN SETHI SPRING 2011

Cool Tools for PROC LOGISTIC

Homework 11. Part 1. Name: Score: / null

Statistical Analysis Using SPSS for Windows Getting Started (Ver. 2014/11/6) The numbers of figures in the SPSS_screenshot.pptx are shown in red.

Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.)

SPSS Notes (SPSS version 15.0)

Tests for Two Survival Curves Using Cox s Proportional Hazards Model

Handling missing data in Stata a whirlwind tour

The CRM for ordinal and multivariate outcomes. Elizabeth Garrett-Mayer, PhD Emily Van Meter

Introduction to Multilevel Modeling Using HLM 6. By ATS Statistical Consulting Group

Using Excel for Statistical Analysis

Categorical Data Analysis

Table of Contents. Preface

Calculating the Probability of Returning a Loan with Binary Probability Models

SPSS Tests for Versions 9 to 13

Understanding and Quantifying EFFECT SIZES

Elements of statistics (MATH0487-1)

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R.

IBM SPSS Bootstrapping 22

ECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Regression Model I. Regression Node

Lecture 19: Conditional Logistic Regression

Module 14: Missing Data Stata Practical

Common Univariate and Bivariate Applications of the Chi-square Distribution

Transcription:

Data Analysis for categorical variables and its application to happiness studies Thanawit Bunsit Department of Economics, University of Bath The economics of happiness and wellbeing workshop: building on theory, method through practice Universidade Federal Fluminense, Rio de Janeiro, Brazil 20 th May, 2011

Workshop objective By the end of this workshop, the participants will be able to: i) examine correlation between two or more than two variables using correlation analysis techniques. ii) conduct the analysis using probit and logit model. iii) use statistical software (SPSS and STATA) for categorical variable data analysis and reliability test.

Outline Recap categorical variable Correlation analysis Probit and logit model Data analysis using SPSS and STATA Questions and comments

Correlation analysis Pearson s correlation coefficient Analyze---Covariate---Bivariate The Pearson correlation coefficient is a measure of linear association between two variables. The values of the correlation coefficient range from -1 to 1. The sign of the correlation coefficient indicates the direction of the relationship (positive or negative). The absolute value of the correlation coefficient indicates the strength,with larger absolute values indicating stronger relationships.

Correlation analysis Chi-square (χ 2 ) Analyze---Descriptive Statistics---Crosstabs -Select variables (row = independent, column = dependent) - Select Chi-square from Statistics tab - Select percentage from Cell tab

Correlation analysis Other correlation coefficient Nominal ------> Phi and Cramer s V Ordinal ------> Somer s D, Gamma, Kendall s tau Nominal + Interval ------> Eta

Activity 1: correlation Find the association between this factors. Religion and the level of HIV/AIDS cases (Low = up to 10% HIV/AIDS cases of population, High = more than 10% of population) Region and the level of HIV/AIDS cases Do you think which factors relate to GDP per capita? Use correlation coefficient and regression analysis to answer this question.

Binary logit model Dichotomous dependent variable Categorical dependent variable Yes or No Value 1 or 0 Logistic regression or logit model

Case 1 Training for new employees Variable 1) Score = Applicant s aptitude test score 2) Experience = Months of relevant prior experience that the applicant has had before this job 3) Pass = Whether the applicant actually passed the test after their training period (Yes = 1, No = 0)

Normal linear regression

Problems can be seen from Slope Scatter plot Likely to fail May pass Almost definitely pass

The logit transformation Data -----> Probabilities Score 1 2 3 4 5 Pass N 7 5 6 4 2 Prob. 0.7 0.5 0.6 0.4 0.2 Fail N 3 5 4 6 8 Prob. 0.3 0.5 0.4 0.6 0.8

Probabilities and odds Score 1 2 3 4 5 Pass N 7 5 6 4 2 Prob. 0.7 0.5 0.6 0.4 0.2 Fail N 3 5 4 6 8 Prob. 0.3 0.5 0.4 0.6 0.8 Odds 2.33 1.00 1.50 0.67 0.25

Odds ratio Data ----> Probabilities ----> Odds Odds ratio = P(event) 1-P(event) P(event) = The probability of a particular event occurring 1-P(event) = The probability of the event not occurring P(event) = odds(event)/[1+odds(event)]

The odds and logit Score 1 2 3 4 5 Pass N 7 5 6 4 2 Prob. 0.7 0.5 0.6 0.4 0.2 Fail N 3 5 4 6 8 Prob. 0.3 0.5 0.4 0.6 0.8 Odds 2.33 1.00 1.50 0.67 0.25 Logit 0.37 0.00 0.18-0.18-0.60

The logit curve Prob. -2-1 0 1 2 Logit

Probit model 1 Pr[H i = 3] Very Happy ξ 2 Pr[H i = 2] Fairly happy ξ 1 Pr[H i = 1] Not too happy 0

Probit model Hi* = X i β + ε i, ε i ~ N(0, 1), i = 1,..., N. (1) Generally for an m-alternative ordered model, it is defined as H i = j α j -1 < Hi* α j, (2) where α0 = and α m = and here j = 1, 2, 3. Then Pr [H i = j] = Pr[α j -1 < H i * α j ] = Pr[α j -1 < X i β + ε i α j ] = Pr[α j -1 X i β < ε i α j ] = Pr[α j -1 X i β < ε i α j X i β] = Φ(α j X i β) Φ(α j -1 X i β) (3)

Logit model results -2 Log Likelihood Cox& Snell R 2 Nagelkerke R 2 Classification table

SPSS command for logit Analyze Regression Binary logistic Define dependent and independent variables

Logit model χ 2 = (-2LL 0 ) (2LL M ) -2LL 0 = -2LL for the baseline or null model -2LL M = -2LL for the model after the variable(s) were entered

Logit model

Logit model Now we can calculate logit(pass), odds and its probability for a person who scores 1 or 5 Odds ratio = exp(logit(pass)) Logit(pass) = a + Bx = -1.314 + 0.467(1) = -0.847 Odds = exp(-0.847) = 0.429 Prob. = 0.429/(0.429+1) = 0.30

Activity 2 1.Analyse the previous model with two independent variables (Score and Experience) 2.Formulate a logit model 3.Interpret the results

Multinomial logit model Nominal or categorical dependent variable Dependent variable cannot be ordered More than two categories

Analyze Regression Probit Probit model Use example file Libongdata

Muito obrigado Thanawit Bunsit. Department of Economics, University of Bath. Present at Universidade Federal Fluminense, RJ, Brazil. 20th May, 2011.