Analyzing Titanic Survival Rates Carly Barry 12 April, 2012

Similar documents
11. Analysis of Case-control Studies Logistic Regression

Analysing Questionnaires using Minitab (for SPSS queries contact -)

Binary Logistic Regression

Directions for using SPSS

SPSS Explore procedure

EXCEL Analysis TookPak [Statistical Analysis] 1. First of all, check to make sure that the Analysis ToolPak is installed. Here is how you do it:

Chapter 5 Analysis of variance SPSS Analysis of variance

How to set the main menu of STATA to default factory settings standards

SPSS Resources. 1. See website (readings) for SPSS tutorial & Stats handout

STATISTICA Formula Guide: Logistic Regression. Table of Contents

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Chapter 13. Chi-Square. Crosstabs and Nonparametric Tests. Specifically, we demonstrate procedures for running two separate

ABSORBENCY OF PAPER TOWELS

The first three steps in a logistic regression analysis with examples in IBM SPSS. Steve Simon P.Mean Consulting

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS

Main Effects and Interactions

Having a coin come up heads or tails is a variable on a nominal scale. Heads is a different category from tails.

Your Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab Exercise #5 Analysis of Time of Death Data for Soldiers in Vietnam

The Dummy s Guide to Data Analysis Using SPSS

SPSS Tests for Versions 9 to 13

The Chi-Square Test. STAT E-50 Introduction to Statistics

One-Way ANOVA using SPSS SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate

Categorical Data Analysis

A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic

Inferential Statistics. What are they? When would you use them?

Scatter Plots with Error Bars

Examining Differences (Comparing Groups) using SPSS Inferential statistics (Part I) Dwayne Devonish

Statistical Analysis Using SPSS for Windows Getting Started (Ver. 2014/11/6) The numbers of figures in the SPSS_screenshot.pptx are shown in red.

Lago di Como, February 2006

SPSS Guide: Regression Analysis

Predictive Modeling of Titanic Survivors: a Learning Competition

Bill Burton Albert Einstein College of Medicine April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

Minitab Session Commands

Cool Tools for PROC LOGISTIC

Consider a study in which. How many subjects? The importance of sample size calculations. An insignificant effect: two possibilities.

Gestation Period as a function of Lifespan

General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.

Using MS Excel to Analyze Data: A Tutorial

Two Correlated Proportions (McNemar Test)

Predictor Coef StDev T P Constant X S = R-Sq = 0.0% R-Sq(adj) = 0.

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA

We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries?

Additional sources Compilation of sources:

January 26, 2009 The Faculty Center for Teaching and Learning

How To Run Statistical Tests in Excel

Developing Risk Adjustment Techniques Using the System for Assessing Health Care Quality in the

First-year Statistics for Psychology Students Through Worked Examples

Basic Statistical and Modeling Procedures Using SAS

SAS Software to Fit the Generalized Linear Model

TI-Inspire manual 1. I n str uctions. Ti-Inspire for statistics. General Introduction

TI-Inspire manual 1. Instructions. Ti-Inspire for statistics. General Introduction

Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study)

Simple Linear Regression Inference

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

Ordinal Regression. Chapter

Solutions to Homework 10 Statistics 302 Professor Larget

Statistics 2014 Scoring Guidelines

Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL

Chapter 7: Simple linear regression Learning Objectives

Simple Linear Regression, Scatterplots, and Bivariate Correlation

1.1. Simple Regression in Excel (Excel 2010).

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

Introduction Course in SPSS - Evening 1

Nursing/Registered Nurse Interview and Selection Days Numeracy and literacy tests: information and sample questions

Examining a Fitted Logistic Model

Chapter 23. Inferences for Regression

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

Chapter 13 Introduction to Linear Regression and Correlation Analysis

4. Descriptive Statistics: Measures of Variability and Central Tendency

An introduction to IBM SPSS Statistics

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

The correlation coefficient

Classification of Titanic Passenger Data and Chances of Surviving the Disaster Data Mining with Weka and Kaggle Competition Data

Data analysis and regression in Stata

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank

13. Poisson Regression Analysis

Weight of Evidence Module

Scientific Method. 2. Design Study. 1. Ask Question. Questionnaire. Descriptive Research Study. 6: Share Findings. 1: Ask Question.

Confidence Intervals for Cp

How to Make APA Format Tables Using Microsoft Word

CHAPTER 14 ORDINAL MEASURES OF CORRELATION: SPEARMAN'S RHO AND GAMMA

DIRECTIONS. Exercises (SE) file posted on the Stats website, not the textbook itself. See How To Succeed With Stats Homework on Notebook page 7!

Regression step-by-step using Microsoft Excel

Moderation. Moderation

Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.)

Federal Employee Viewpoint Survey Online Reporting and Analysis Tool

Minitab Tutorials for Design and Analysis of Experiments. Table of Contents

Data Analysis Tools. Tools for Summarizing Data

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

IBM SPSS Statistics 20 Part 1: Descriptive Statistics

Data analysis process

II. DISTRIBUTIONS distribution normal distribution. standard scores

Multiple-Comparison Procedures

Is it statistically significant? The chi-square test

Transcription:

http://blog.minitab.com/blog/real-world-quality-improvement/analyzing-titanic-survival-rates Analyzing Titanic Survival Rates Carly Barry 12 April, 2012 April 15, 2012 marks the 100th anniversary of the sinking of the Titanic. It s hard to imagine that 100 years have passed since more than 2,000 people boarded the luxury ship in hopes of making the maiden voyage from Southampton, England to New York City. Unfortunately, less than half of the people on board the Titanic survived its tragic sinking. Using the actual demographic and survival data from the Titanic voyage obtained from the American Statistical Association, I used Minitab to determine how survival rates vary according to class, gender, and age. I set up the data in my Minitab worksheet like this: 1

Note: The Coach class includes crew, second-class passengers, and third-class passengers. To compare the survival rates for first class and coach class, I chose Stat > Tables > Cross Tabulation and Chi-Square in Minitab and completed the dialog box as shown below: The results reveal a difference of 35.38% between the survival rates for first class and coach class (subtract the percentage of coach class passengers who survived from the percentage of first class passengers who survived, as shown below): 2

Of the 1,876 passengers who made up the coach class, 508 (or about 27%) survived, and of the 325 passengers who made up first class, 203 (or about 62.5%) survived. It seems to make sense that first class passengers with cabins away from the bottom of the ship (where water entered first) were able to make it aboard lifeboats. I also compared the survival rates for males and females: 3

The results reveal a difference of 52% between the survival rates for females and males. Of the 470 females aboard the Titanic, 344 or 73.2% survived. Of the 1731 males aboard the Titanic, 367 or 21.2% survived. Lastly, I compared the survival rates for adults and children. In Minitab, I chose Calc > Calculator, and typed in the variable name ChildorAdult. I entered the formula IF(Age>=18, Adult, Child ) as my criterion for labeling passengers as adults or children. Here are the results: The results reveal a difference of 21% between the survival rates for adults and children. Of the 109 children aboard the Titanic, 57 or 52.3% survived. Of the 2,092 adults on the ship, 654 or 31.3% survived. It s interesting to note that women and children were clearly the passengers of choice to save! If you d like to use Minitab to analyze the Titanic data yourself, download the data here.. 4

http://blog.minitab.com/blog/fun-with-statistics/analyzing-titanic-survival-rates-part-ii-v1 Analyzing Titanic Survival Rates, Part II: Binary Logistic Regression Joel Smith 17 April, 2012 Applying Binary Logistic Regression Analyzing Titanic Survival Rates, Part II: Binary Logistic Regression Joel Smith 17 April, 2012 In honor of the 100 th anniversary of the sinking of the Titanic, we recently posted a dataset on the passengers aboard the ship that included Class (coach or first), Gender (female or male), Age, and Status (survived or died). From Age an additional column was created indicating Child (17 years or younger) or Adult (18 years or older). In an earlier post, we showed how survival rates could be compared between levels of one variable for example, females versus males using Stat > Tables > Cross Tabulation and Chi Square. But what if we wanted to take all factors into consideration to paint a complete picture of survival rates? Applying Binary Logistic Regression In Minitab Statistical Software, Stat > Regression > Binary Logistic Regression allows us to create models when the response of interest (Status, in this case) is binary and only takes two values. To begin, include all terms and two-way interactions in the model and reduce it from there: 5

By clicking on Options, choose whether the model will predict the odds of Status = Died or Status = Survived as an optimist, I chose Survived : 6

You can also try different Link Functions in Options to find the model that best fits your data. By removing terms from my model that are not statistically significant and choosing different Link Functions, I ultimately came up with this Logistic Regression Table, similar to an ANOVA table from typical ANOVA (Stat > ANOVA) or Regression (Stat > Regression) output in Minitab: Logistic Regression Table Predictor Coef SE Coef Z P Constant -0.191839 0.175568-1.09 0.275 Class First 0.971320 0.0952002 10.20 0.000 Gender Male -1.03799 0.200630-5.17 0.000 Age 0.0044963 0.0033885 1.33 0.185 ChildorAdult Child 0.387517 0.174976 2.21 0.027 Gender*Age Male -0.0123825 0.0040596-3.05 0.002 From the p-values, you can determine which factors are significant: Class, Gender, ChildorAdult, and the Gender*Age interaction. (The Age term is left in the model because it is part of the interaction term.) 7

Next, we can use the Goodness-of-Fit Tests in the output to determine whether Goodness-of-Fit Tests or not the model adequately fits the data: Method Chi-Square DF P Pearson 270.946 272 0.507 Deviance 313.073 272 0.044 Hosmer-Lemeshow 8.815 8 0.358 For these tests, a significant p-value indicates our model does not fit the data adequately. While we do have one significant test (Deviance), the other two tests provide no evidence of significance and we are fairly comfortable that our model provides a good fit. If you find you have significant terms but the Goodness-of-Fit Tests are showing an inadequate model fit, it may be worth trying a different Link Function back in the Option dialog. In this case, I found the Gompit link function to provide the best fit. Measures of Association to Assess the Regression Model Finally, we can assess our model using Measures of Association: Measures of Association: (Between the Response Variable and Predicted Probabilities) Pairs Number Percent Summary Measures Concordant 785712 74.2 Somers' D 0.49 Discordant 262124 24.7 Goodman-Kruskal Gamma 0.50 Ties 11554 1.1 Kendall's Tau-a 0.22 Total 1059390 100.0 8

Measures of Association compares how often passengers who survived had higher predicted odds of survival than passengers who did not survive. By comparing every surviving passenger with every passenger who died, Minitab determines how often the model correctly or incorrectly predicted which would survive. In our analysis, 74.2% of the time the surviving passenger had higher predicted odds of survival, while 24.7% of the time they had lower and 1.1% of the time the odds were the same. With a good model you want a high percentage of concordant pairs and a low percentage of discordant pairs. Using the Regression Model to Predict Survival Finally, back in the main Binary Logistic Regression dialog box, choose Prediction and choose to store the predicted odds of survival for each passenger (shown below) or for new data points, as well as confidence intervals: Using this information, I created a graph demonstrating the odds of survival for passengers aboard the Titanic based on all of our significant factors: 9

Interestingly, there was only one female child in the first-class cabin on that voyage, therefore we could not model the survival odds for female children in first-class. Otherwise, it is clear from the graph that if you were an adult female in first class, your odds of survival were quite high and increased slightly if you were older. Even for an 18- year old female in first class, the odds of survival are estimated at 90.6% as compared to 32.3% for passengers in general! Unlike females whose odds of survival increased with age, a male s odds of survival decreased with age. (Remember that Gender*Age interaction?) So for an 80-year-old male passenger in coach, your odds of survival were a mere 14.4%! See in the dataset that of the 25 passengers meeting this criteria, a mere 3 survived for a true rate of 12%, which is consistent with the model. Had you been a male passenger who knew ahead of time about the impending tragedy, the cost of a first class ticket would have felt like a bargain. The same 80-year-old male would have enjoyed a relatively good 33.7% chance of survival had he booked in first class. Likewise, taking this voyage as a 17-year-old who would have been boarded on a lifeboat instead of an 18-year-old who would remain on the sinking ship increases your odds of survival by 10-14%, depending on Gender and Class. By looking at multiple factors at once, we are able to get a clear and accurate look at the odds of survival for any passenger based on just a few factors! 10

Comentários dos usuários (obtido no site, no Blog do Minitab) (.) the menus and dialog boxes for regression are different between Minitab 16 (shown above) and Minitab 17, as you note. In 17, go to Stat > Regression > Binary Logistic Regression > Fit Binary Logistic Model... to get to the equivalent dialog box. You can then use the Model, Options, Results, Stepwise, and other buttons to control how Minitab performs the analysis and the information that it includes in the output. Minitab 17 doesn't automatically graph Delta Chi-Square vs. Leverage, but you can store the Delta Chi-square and Leverage data using the "Storage" button, then plot them against each manually.. 11