http://blog.minitab.com/blog/real-world-quality-improvement/analyzing-titanic-survival-rates Analyzing Titanic Survival Rates Carly Barry 12 April, 2012 April 15, 2012 marks the 100th anniversary of the sinking of the Titanic. It s hard to imagine that 100 years have passed since more than 2,000 people boarded the luxury ship in hopes of making the maiden voyage from Southampton, England to New York City. Unfortunately, less than half of the people on board the Titanic survived its tragic sinking. Using the actual demographic and survival data from the Titanic voyage obtained from the American Statistical Association, I used Minitab to determine how survival rates vary according to class, gender, and age. I set up the data in my Minitab worksheet like this: 1
Note: The Coach class includes crew, second-class passengers, and third-class passengers. To compare the survival rates for first class and coach class, I chose Stat > Tables > Cross Tabulation and Chi-Square in Minitab and completed the dialog box as shown below: The results reveal a difference of 35.38% between the survival rates for first class and coach class (subtract the percentage of coach class passengers who survived from the percentage of first class passengers who survived, as shown below): 2
Of the 1,876 passengers who made up the coach class, 508 (or about 27%) survived, and of the 325 passengers who made up first class, 203 (or about 62.5%) survived. It seems to make sense that first class passengers with cabins away from the bottom of the ship (where water entered first) were able to make it aboard lifeboats. I also compared the survival rates for males and females: 3
The results reveal a difference of 52% between the survival rates for females and males. Of the 470 females aboard the Titanic, 344 or 73.2% survived. Of the 1731 males aboard the Titanic, 367 or 21.2% survived. Lastly, I compared the survival rates for adults and children. In Minitab, I chose Calc > Calculator, and typed in the variable name ChildorAdult. I entered the formula IF(Age>=18, Adult, Child ) as my criterion for labeling passengers as adults or children. Here are the results: The results reveal a difference of 21% between the survival rates for adults and children. Of the 109 children aboard the Titanic, 57 or 52.3% survived. Of the 2,092 adults on the ship, 654 or 31.3% survived. It s interesting to note that women and children were clearly the passengers of choice to save! If you d like to use Minitab to analyze the Titanic data yourself, download the data here.. 4
http://blog.minitab.com/blog/fun-with-statistics/analyzing-titanic-survival-rates-part-ii-v1 Analyzing Titanic Survival Rates, Part II: Binary Logistic Regression Joel Smith 17 April, 2012 Applying Binary Logistic Regression Analyzing Titanic Survival Rates, Part II: Binary Logistic Regression Joel Smith 17 April, 2012 In honor of the 100 th anniversary of the sinking of the Titanic, we recently posted a dataset on the passengers aboard the ship that included Class (coach or first), Gender (female or male), Age, and Status (survived or died). From Age an additional column was created indicating Child (17 years or younger) or Adult (18 years or older). In an earlier post, we showed how survival rates could be compared between levels of one variable for example, females versus males using Stat > Tables > Cross Tabulation and Chi Square. But what if we wanted to take all factors into consideration to paint a complete picture of survival rates? Applying Binary Logistic Regression In Minitab Statistical Software, Stat > Regression > Binary Logistic Regression allows us to create models when the response of interest (Status, in this case) is binary and only takes two values. To begin, include all terms and two-way interactions in the model and reduce it from there: 5
By clicking on Options, choose whether the model will predict the odds of Status = Died or Status = Survived as an optimist, I chose Survived : 6
You can also try different Link Functions in Options to find the model that best fits your data. By removing terms from my model that are not statistically significant and choosing different Link Functions, I ultimately came up with this Logistic Regression Table, similar to an ANOVA table from typical ANOVA (Stat > ANOVA) or Regression (Stat > Regression) output in Minitab: Logistic Regression Table Predictor Coef SE Coef Z P Constant -0.191839 0.175568-1.09 0.275 Class First 0.971320 0.0952002 10.20 0.000 Gender Male -1.03799 0.200630-5.17 0.000 Age 0.0044963 0.0033885 1.33 0.185 ChildorAdult Child 0.387517 0.174976 2.21 0.027 Gender*Age Male -0.0123825 0.0040596-3.05 0.002 From the p-values, you can determine which factors are significant: Class, Gender, ChildorAdult, and the Gender*Age interaction. (The Age term is left in the model because it is part of the interaction term.) 7
Next, we can use the Goodness-of-Fit Tests in the output to determine whether Goodness-of-Fit Tests or not the model adequately fits the data: Method Chi-Square DF P Pearson 270.946 272 0.507 Deviance 313.073 272 0.044 Hosmer-Lemeshow 8.815 8 0.358 For these tests, a significant p-value indicates our model does not fit the data adequately. While we do have one significant test (Deviance), the other two tests provide no evidence of significance and we are fairly comfortable that our model provides a good fit. If you find you have significant terms but the Goodness-of-Fit Tests are showing an inadequate model fit, it may be worth trying a different Link Function back in the Option dialog. In this case, I found the Gompit link function to provide the best fit. Measures of Association to Assess the Regression Model Finally, we can assess our model using Measures of Association: Measures of Association: (Between the Response Variable and Predicted Probabilities) Pairs Number Percent Summary Measures Concordant 785712 74.2 Somers' D 0.49 Discordant 262124 24.7 Goodman-Kruskal Gamma 0.50 Ties 11554 1.1 Kendall's Tau-a 0.22 Total 1059390 100.0 8
Measures of Association compares how often passengers who survived had higher predicted odds of survival than passengers who did not survive. By comparing every surviving passenger with every passenger who died, Minitab determines how often the model correctly or incorrectly predicted which would survive. In our analysis, 74.2% of the time the surviving passenger had higher predicted odds of survival, while 24.7% of the time they had lower and 1.1% of the time the odds were the same. With a good model you want a high percentage of concordant pairs and a low percentage of discordant pairs. Using the Regression Model to Predict Survival Finally, back in the main Binary Logistic Regression dialog box, choose Prediction and choose to store the predicted odds of survival for each passenger (shown below) or for new data points, as well as confidence intervals: Using this information, I created a graph demonstrating the odds of survival for passengers aboard the Titanic based on all of our significant factors: 9
Interestingly, there was only one female child in the first-class cabin on that voyage, therefore we could not model the survival odds for female children in first-class. Otherwise, it is clear from the graph that if you were an adult female in first class, your odds of survival were quite high and increased slightly if you were older. Even for an 18- year old female in first class, the odds of survival are estimated at 90.6% as compared to 32.3% for passengers in general! Unlike females whose odds of survival increased with age, a male s odds of survival decreased with age. (Remember that Gender*Age interaction?) So for an 80-year-old male passenger in coach, your odds of survival were a mere 14.4%! See in the dataset that of the 25 passengers meeting this criteria, a mere 3 survived for a true rate of 12%, which is consistent with the model. Had you been a male passenger who knew ahead of time about the impending tragedy, the cost of a first class ticket would have felt like a bargain. The same 80-year-old male would have enjoyed a relatively good 33.7% chance of survival had he booked in first class. Likewise, taking this voyage as a 17-year-old who would have been boarded on a lifeboat instead of an 18-year-old who would remain on the sinking ship increases your odds of survival by 10-14%, depending on Gender and Class. By looking at multiple factors at once, we are able to get a clear and accurate look at the odds of survival for any passenger based on just a few factors! 10
Comentários dos usuários (obtido no site, no Blog do Minitab) (.) the menus and dialog boxes for regression are different between Minitab 16 (shown above) and Minitab 17, as you note. In 17, go to Stat > Regression > Binary Logistic Regression > Fit Binary Logistic Model... to get to the equivalent dialog box. You can then use the Model, Options, Results, Stepwise, and other buttons to control how Minitab performs the analysis and the information that it includes in the output. Minitab 17 doesn't automatically graph Delta Chi-Square vs. Leverage, but you can store the Delta Chi-square and Leverage data using the "Storage" button, then plot them against each manually.. 11