Stat-285 Assignment Fall Term
|
|
- Frederick Horton
- 7 years ago
- Views:
Transcription
1 Stat-285 Assignment Fall Term 1. Women and children first. Have you ever watched a movie, or read a book, about a ship in trouble and when the words women and children first! are shouted out, you know that inevitably those words means that the ship is doomed to sink? You can find the source of gallant tradition at com/shiptraditionw_rrqb.htm. This question deals with the sinking of the Titanic and an examination of the probability of survivorship as a function of age, sex, and class of passage of this tragedy. Visit to get a list of the passengers aboard the Titanic. Download the datafile and import it into JMP. The file contains 5 variables: the passenger name; the class of passage; the age; the sex; and an indicator variable for survival status. (a) Several of the ages are missing. These could likely be reconstructed from the original sources. We will assume that the age values are MCAR. What does this mean, and what implications will this have for the analysis? Solution: MCAR = Missing Completely at Random implies that the missingness is unrelated to the response value, i.e. missingness is unrelated to survival status. The only effect that MCAR has on the analysis in that the se are larger than if the data were not missing. (b) Use Analyze->Fit Y-by-X platform to look at the breakdown of sex by class of passage. What does the mosaic plot show you? Confirm this by looking at a suitable contingency table with the appropriate percentages. Solution: The mosaic plot and contingency tables are: 1
2 The proportion of males seems to increase as the class of passage decreases increasing from 56% in first class to 70% in third class. [The chi-square test for equal proportions shows that there is strong c 2007 Carl James Schwarz 2
3 evidence that the sex ratio is not constant across class of passage.]. (c) Use the Analyze->Fit Y-by-X platform to investigate the survival rates of the two sexes for each separate class of passage. [Hint: Use the By button.]. Complete the following table note that S is survival: 1 Males Female Odds-ratio of S c.i. for Class P (S) ODDS(S) P (S) ODDS(S) F vs M odds-ratio 1st 2nd 3rd So what do you conclude about women and children first? Solution: The Analyze->Fit Y-by-X platform is completed as: The estimated proportion of survival can be read off the contingency tables, as can the odd ratio (but the odds ratio needs to be inverted as it is for males:females not females:males). The odds of survival for each sex are computed by hand. 1 If you use a By variable, you cannot save predictions directly to the data table as in previous assignments. However, saved columns are still accessible by using the Red-Triange Script Data Table Window. This will show a hidden data table that is created for each value of the By variable. You will have to do this for each value of the By variables. Here is the official FAQ from SAS: When by variables are used, JMP creates a new intermediate table for each level of the by variable. Statistics such as predicted values are saved to these intermediate tables rather than the original data table. To see the intermediate table you will need to click on the red triangle next to Generalized Linear Model Fit and choose Script->Data Table Window. You will have to do this for each level of the by variable. The new data table that appears will be for that specific level of the by variable and will contain the statistics such as predicted values that you have chosen. c 2007 Carl James Schwarz 3
4 c 2007 Carl James Schwarz 4
5 The completed table is: Males Female Odds-ratio c.i. for Class P (S) ODDS(S) P (S) ODDS(S) F vs M(S) odds-ratio 1st 33% 1:2 94% 16:1 30:1 (15 : 1 64 : 1) 2nd 15% 1:6 88% 7:1 43:1 (21 : 1 87 : 1) 3rd 12% 1:8 38% 1:2 5:1 ( 3 : 1 7 : 1) In all classes, females had a higher survival rate than males. The second class passengers appear to heed the call for women and children first as the odds of survival for females is the largest. If you look at the raw percentages, you see that the chances of survival for females among the first and second class passengers is roughly the same (around 90%), but the survival rate of males second classs is less than half of that in first class. (d) The above analysis ignored the age of the passengers. For each combination of sex and passenger class, fit a logistic regression to predict survival as a function of age. Complete the following table for predicting the SURVIVAL rates of passengers as a function of age [Hint: think carefully what JMP produces is it predicting survival or death?]: Coefficient Class Sex of age SE p-value 1st Males 1st Females 2nd Males 2nd Females 3rd Males 3rd Females So what do you conclude about the adage of women and children first? c 2007 Carl James Schwarz 5
6 Solution: Use the Analyze->Fit Y-by-X platform as follows: This gives the following summary output: c 2007 Carl James Schwarz 6
7 Notice that each of the above outputs is for the log-odds of DEATH (survival=0) and so the coefficient for SURVIVAL is simply the negative of the reported coefficient This gives the table: c 2007 Carl James Schwarz 7
8 Coefficient Class Sex of age SE p-value 1st Males st Females nd Males < nd Females rd Males rd Females None of the female coefficients are statistically significant from zero. This implies that there is no evidence of a relationship between age and survival for females in all three classes. There is strong evidence of an effect of age for males in all three classes. The coefficients are negative which implies that as age increases, the log-odds of survival (and hence the probability of survival) decrease. The effect of age appears to be strongest for the second class males as their coefficient has the largest magnitude, while the effect of age in the first and third class male passengers is about equal. A plot of the survival curves on both the ordinary and logit scale appears below: c 2007 Carl James Schwarz 8
9 Notice that the lines for females are almost flat (on the logit scale) with little change in survival by age, while the lines for males are very steep. If you compute the Range Odds Ratio the change in the oddsratio as you go from the smallest to the largest age for each sex-class combination, you find that the range of odds of survival is quite large for males but very small for females. So yes, it appears that the adage reads women and young males first. In more advanced classes (e.g. Stat-302 or Stat-402), you would have learned how to fit one model for the combined data over all sexes and classes of passage, and looked at the effect of age upon survival after adjusting for the sex and class of passage. 2. Never underestimate the p-o-w-e-r of the Orange side Many people find it annoying when a cell phone goes off at the exact climax of a film. 2 When I was visiting England in September 2005, I happened to go to a movie and noticed a series of ads that played before the movie started asking patrons to turn off their cell phone. The premise of these advertisements are pitches by various celebrities to the Orange Film Funding Board, a fictitious agency, for films they would like to produce. The ads 2 See or http: // or html. c 2007 Carl James Schwarz 9
10 were sponsored by the Orange Cell Phone company, one of the largest mobile phone companies in the United Kingdom. 3 You can view some of the advertisements at (don t forget to press the Play button beneath each ad): (a) - my favorite (b) (c) - my second favorite (d) These advertisements have made it into Wikipedia at org/wiki/orange_uk. But do these commercials actually work? (a) Describe how your would perform an experiment as a completely randomized design. The four ads are to be compared (with a control of no ads). There are 10 screens, five showings per day (morning, early afternoon, late afternoon, early evening, and late evening identified by the numbers 1 to 5), seven days per week (1=Sunday, 2=Monday, etc), and a 4 week test period. Solution: There are a total of 10 x 5 x 7 x 4 = 1400 possible showings. The five treatments (the 4 ads plus a control) should be randomly assigned to each of the showings and the number of cell phones that ring could be recorded. You can download some data from Stat-285/Assignments/cellphone.txt. The variables in the dataset are the week, day, showing, screen, ad used, number of tickets sold, and the number of cell phones that went off. Convert the number of cell phones that went off to a simple yes/no variable. (b) Test the hypothesis that the probability of a cell phone interruption is the same for all ads (including the control). Solution: This can be done with the Analyze->Fit Y-by-X platform or the Analyze->Fit Model platform or the Generalized Linear Model Platform: 3 More details at c 2007 Carl James Schwarz 10
11 c 2007 Carl James Schwarz 11
12 c 2007 Carl James Schwarz 12
13 In all cases, the p-value is <.0001 and so there is very strong evidence that the probability of being interrupted by a cell phone is not equal across all the treatment levels. Of course, at this stage, we don t know which treatment is best or worst. (c) Estimate the probability of a cell phone interrupting the movie for each ad and complete the following table: Ad Estimate se 95% ci None dh dv jc ss Solution: These probabilities were estimated using the Analyze- >Fit Model platform and the Generalized Linear Modeling option. Notice that the models estimate the probability of NO interruption, and must be subtracted from 1 to get the probability of an interruption. The se were estimated by taking the range of the 95% confidence interval and dividing by 4. Ad Estimate se 95% ci None (.16.25) dh (.10.20) dv (.01.05) jc (.03.10) ss (.03.08) c 2007 Carl James Schwarz 13
14 (d) Draw a suitable graph (possibly by hand) showing the results from the previous table. What does this graph show? Which ad seems to be the most effective? Solution: I used the graphing feature of JMP to create the following plot: The probability of interruption appears to be smallest for the dv and jc and ss commercials, followed by the dh ad, followed by the control screenings. The ads seem to work, but there appears to be some minor differences among the ads. (e) Estimate the difference in the log-odds between cases with no ads and the Darth Vader ad along with a se and and an approximate 95% confidence interval. Convert this to an odds ratio along with a 95% confidence interval. Interpret this odds-ratio. What do you conclude? Solution: Use the Contrast option of the Generalized Linear Model platform: c 2007 Carl James Schwarz 14
15 Again, be careful because JMP is measuring the log-odds of NO interruptions, and we want the log-odds of interruptions. The estimated difference in log-odds of interruptions between the control and dv ads is 2.17 (se.39). The approximate 95% confidence interval is ( ). Because this difference in log-odds is positive, this implies that the odds of an interruption in the control setting is HIGHER than the log-odds in the dv ad. The estimated odds-ratio is found as e 2.18 = 8.8 with an approximate 95% confidence interval from e 1.39 = 4 e 2.95 = 19). This implies that the odds of an interruption are about 9 times higher in the control showings than when the dv ad shows. Truly the The Phone is Strong Here. In more advanced classes (e.g. Stat-302 and Stat-402) you will learn how to use the actual number of cell phone calls as the response variable and how to adjust it for the number of tickets sold for that showing. Common errors made on this assignment check your work! Many students just attached all output and did not provide the table and conclusions. c 2007 Carl James Schwarz 15
16 There are NO jobs for people who just bash numbers through a statistical package and provide "computer diarrhea" as a report! It is vitally important that you understand what output is produced and that you are able to write a coherent report. In many cases, output is badly labelled and the results are not obvious. In the experimental design, some students did not consider the control group (no ad). Some students just stated the null hypothesis. Many students did not notice that the models estimate the probability of No interruption. c 2007 Carl James Schwarz 16
1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationHypothesis testing - Steps
Hypothesis testing - Steps Steps to do a two-tailed test of the hypothesis that β 1 0: 1. Set up the hypotheses: H 0 : β 1 = 0 H a : β 1 0. 2. Compute the test statistic: t = b 1 0 Std. error of b 1 =
More informationLOGISTIC REGRESSION ANALYSIS
LOGISTIC REGRESSION ANALYSIS C. Mitchell Dayton Department of Measurement, Statistics & Evaluation Room 1230D Benjamin Building University of Maryland September 1992 1. Introduction and Model Logistic
More informationUnit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.)
Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.) Logistic regression generalizes methods for 2-way tables Adds capability studying several predictors, but Limited to
More information11. Analysis of Case-control Studies Logistic Regression
Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationTwo Correlated Proportions (McNemar Test)
Chapter 50 Two Correlated Proportions (Mcemar Test) Introduction This procedure computes confidence intervals and hypothesis tests for the comparison of the marginal frequencies of two factors (each with
More informationGood luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:
Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours
More informationABSORBENCY OF PAPER TOWELS
ABSORBENCY OF PAPER TOWELS 15. Brief Version of the Case Study 15.1 Problem Formulation 15.2 Selection of Factors 15.3 Obtaining Random Samples of Paper Towels 15.4 How will the Absorbency be measured?
More informationThe first three steps in a logistic regression analysis with examples in IBM SPSS. Steve Simon P.Mean Consulting www.pmean.com
The first three steps in a logistic regression analysis with examples in IBM SPSS. Steve Simon P.Mean Consulting www.pmean.com 2. Why do I offer this webinar for free? I offer free statistics webinars
More informationThe Chi-Square Test. STAT E-50 Introduction to Statistics
STAT -50 Introduction to Statistics The Chi-Square Test The Chi-square test is a nonparametric test that is used to compare experimental results with theoretical models. That is, we will be comparing observed
More informationSolutions to Homework 10 Statistics 302 Professor Larget
s to Homework 10 Statistics 302 Professor Larget Textbook Exercises 7.14 Rock-Paper-Scissors (Graded for Accurateness) In Data 6.1 on page 367 we see a table, reproduced in the table below that shows the
More informationGeneral Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.
General Method: Difference of Means 1. Calculate x 1, x 2, SE 1, SE 2. 2. Combined SE = SE1 2 + SE2 2. ASSUMES INDEPENDENT SAMPLES. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n
More informationFactors affecting online sales
Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4
More informationRegression step-by-step using Microsoft Excel
Step 1: Regression step-by-step using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression
More informationAnalysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk
Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:
More informationOnce saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.
1 Commands in JMP and Statcrunch Below are a set of commands in JMP and Statcrunch which facilitate a basic statistical analysis. The first part concerns commands in JMP, the second part is for analysis
More informationMain Effects and Interactions
Main Effects & Interactions page 1 Main Effects and Interactions So far, we ve talked about studies in which there is just one independent variable, such as violence of television program. You might randomly
More informationMind on Statistics. Chapter 12
Mind on Statistics Chapter 12 Sections 12.1 Questions 1 to 6: For each statement, determine if the statement is a typical null hypothesis (H 0 ) or alternative hypothesis (H a ). 1. There is no difference
More informationScatter Plots with Error Bars
Chapter 165 Scatter Plots with Error Bars Introduction The procedure extends the capability of the basic scatter plot by allowing you to plot the variability in Y and X corresponding to each point. Each
More informationChapter 5 Analysis of variance SPSS Analysis of variance
Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,
More informationOdds ratio, Odds ratio test for independence, chi-squared statistic.
Odds ratio, Odds ratio test for independence, chi-squared statistic. Announcements: Assignment 5 is live on webpage. Due Wed Aug 1 at 4:30pm. (9 days, 1 hour, 58.5 minutes ) Final exam is Aug 9. Review
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationTechnical Information
Technical Information Trials The questions for Progress Test in English (PTE) were developed by English subject experts at the National Foundation for Educational Research. For each test level of the paper
More informationInternational Statistical Institute, 56th Session, 2007: Phil Everson
Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA E-mail: peverso1@swarthmore.edu 1. Introduction
More informationEstimation of σ 2, the variance of ɛ
Estimation of σ 2, the variance of ɛ The variance of the errors σ 2 indicates how much observations deviate from the fitted surface. If σ 2 is small, parameters β 0, β 1,..., β k will be reliably estimated
More information8 6 X 2 Test for a Variance or Standard Deviation
Section 8 6 x 2 Test for a Variance or Standard Deviation 437 This test uses the P-value method. Therefore, it is not necessary to enter a significance level. 1. Select MegaStat>Hypothesis Tests>Proportion
More informationSurvey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups
Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln Log-Rank Test for More Than Two Groups Prepared by Harlan Sayles (SRAM) Revised by Julia Soulakova (Statistics)
More informationBinary Diagnostic Tests Two Independent Samples
Chapter 537 Binary Diagnostic Tests Two Independent Samples Introduction An important task in diagnostic medicine is to measure the accuracy of two diagnostic tests. This can be done by comparing summary
More informationLinear Models in STATA and ANOVA
Session 4 Linear Models in STATA and ANOVA Page Strengths of Linear Relationships 4-2 A Note on Non-Linear Relationships 4-4 Multiple Linear Regression 4-5 Removal of Variables 4-8 Independent Samples
More informationMTH 140 Statistics Videos
MTH 140 Statistics Videos Chapter 1 Picturing Distributions with Graphs Individuals and Variables Categorical Variables: Pie Charts and Bar Graphs Categorical Variables: Pie Charts and Bar Graphs Quantitative
More informationAssociation Between Variables
Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi
More informationAnalysis of categorical data: Course quiz instructions for SPSS
Analysis of categorical data: Course quiz instructions for SPSS The dataset Please download the Online sales dataset from the Download pod in the Course quiz resources screen. The filename is smr_bus_acd_clo_quiz_online_250.xls.
More informationThe Effects of the Current Economic Conditions on Sport Participation. Chris Gratton and Themis Kokolakakis
The Effects of the Current Economic Conditions on Sport Participation Chris Gratton and Themis Kokolakakis Sport Industry Research Centre Sheffield Hallam University A118 Collegiate Hall Sheffield S10
More informationMultinomial and Ordinal Logistic Regression
Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationRegression Analysis: A Complete Example
Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty
More informationName Partners Date. Energy Diagrams I
Name Partners Date Visual Quantum Mechanics The Next Generation Energy Diagrams I Goal Changes in energy are a good way to describe an object s motion. Here you will construct energy diagrams for a toy
More informationDIRECTIONS. Exercises (SE) file posted on the Stats website, not the textbook itself. See How To Succeed With Stats Homework on Notebook page 7!
Stats for Strategy HOMEWORK 3 (Topics 4 and 5) (revised spring 2015) DIRECTIONS Data files are available from the main Stats website for many exercises. (Smaller data sets for other exercises can be typed
More informationUsing Excel for inferential statistics
FACT SHEET Using Excel for inferential statistics Introduction When you collect data, you expect a certain amount of variation, just caused by chance. A wide variety of statistical tests can be applied
More informationSection 13, Part 1 ANOVA. Analysis Of Variance
Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationChapter 15. Mixed Models. 15.1 Overview. A flexible approach to correlated data.
Chapter 15 Mixed Models A flexible approach to correlated data. 15.1 Overview Correlated data arise frequently in statistical analyses. This may be due to grouping of subjects, e.g., students within classrooms,
More informationSCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES
SCHOOL OF HEALTH AND HUMAN SCIENCES Using SPSS Topics addressed today: 1. Differences between groups 2. Graphing Use the s4data.sav file for the first part of this session. DON T FORGET TO RECODE YOUR
More informationModeration. Moderation
Stats - Moderation Moderation A moderator is a variable that specifies conditions under which a given predictor is related to an outcome. The moderator explains when a DV and IV are related. Moderation
More informationConsider a study in which. How many subjects? The importance of sample size calculations. An insignificant effect: two possibilities.
Consider a study in which How many subjects? The importance of sample size calculations Office of Research Protections Brown Bag Series KB Boomer, Ph.D. Director, boomer@stat.psu.edu A researcher conducts
More informationASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS
DATABASE MARKETING Fall 2015, max 24 credits Dead line 15.10. ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS PART A Gains chart with excel Prepare a gains chart from the data in \\work\courses\e\27\e20100\ass4b.xls.
More informationThe Dummy s Guide to Data Analysis Using SPSS
The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests
More informationChapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -
More informationSOLUTIONS TO BIOSTATISTICS PRACTICE PROBLEMS
SOLUTIONS TO BIOSTATISTICS PRACTICE PROBLEMS BIOSTATISTICS DESCRIBING DATA, THE NORMAL DISTRIBUTION SOLUTIONS 1. a. To calculate the mean, we just add up all 7 values, and divide by 7. In Xi i= 1 fancy
More informationAnswer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade
Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationHow To Run Statistical Tests in Excel
How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting
More informationAssignments Analysis of Longitudinal data: a multilevel approach
Assignments Analysis of Longitudinal data: a multilevel approach Frans E.S. Tan Department of Methodology and Statistics University of Maastricht The Netherlands Maastricht, Jan 2007 Correspondence: Frans
More informationImproved Interaction Interpretation: Application of the EFFECTPLOT statement and other useful features in PROC LOGISTIC
Paper AA08-2013 Improved Interaction Interpretation: Application of the EFFECTPLOT statement and other useful features in PROC LOGISTIC Robert G. Downer, Grand Valley State University, Allendale, MI ABSTRACT
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
More informationStatistics in Retail Finance. Chapter 2: Statistical models of default
Statistics in Retail Finance 1 Overview > We consider how to build statistical models of default, or delinquency, and how such models are traditionally used for credit application scoring and decision
More informationPremaster Statistics Tutorial 4 Full solutions
Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for
More informationChi Squared and Fisher's Exact Tests. Observed vs Expected Distributions
BMS 617 Statistical Techniques for the Biomedical Sciences Lecture 11: Chi-Squared and Fisher's Exact Tests Chi Squared and Fisher's Exact Tests This lecture presents two similarly structured tests, Chi-squared
More informationOrdinal Regression. Chapter
Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe
More informationHow to set the main menu of STATA to default factory settings standards
University of Pretoria Data analysis for evaluation studies Examples in STATA version 11 List of data sets b1.dta (To be created by students in class) fp1.xls (To be provided to students) fp1.txt (To be
More information2013 MBA Jump Start Program. Statistics Module Part 3
2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just
More informationANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R.
ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R. 1. Motivation. Likert items are used to measure respondents attitudes to a particular question or statement. One must recall
More informationAdditional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
More informationThe Kinetics of Enzyme Reactions
The Kinetics of Enzyme Reactions This activity will introduce you to the chemical kinetics of enzyme-mediated biochemical reactions using an interactive Excel spreadsheet or Excelet. A summarized chemical
More informationStatistics courses often teach the two-sample t-test, linear regression, and analysis of variance
2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample
More information1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ
STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material
More informationSUMAN DUVVURU STAT 567 PROJECT REPORT
SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.
More informationII. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
More informationOverview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)
Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and
More informationInteraction between quantitative predictors
Interaction between quantitative predictors In a first-order model like the ones we have discussed, the association between E(y) and a predictor x j does not depend on the value of the other predictors
More informationGeneralized Linear Models
Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the
More informationWeek TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500 6 8480
1) The S & P/TSX Composite Index is based on common stock prices of a group of Canadian stocks. The weekly close level of the TSX for 6 weeks are shown: Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500
More informationData Analysis, Research Study Design and the IRB
Minding the p-values p and Quartiles: Data Analysis, Research Study Design and the IRB Don Allensworth-Davies, MSc Research Manager, Data Coordinating Center Boston University School of Public Health IRB
More informationIntroduction to Regression and Data Analysis
Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it
More informationcontaining Kendall correlations; and the OUTH = option will create a data set containing Hoeffding statistics.
Getting Correlations Using PROC CORR Correlation analysis provides a method to measure the strength of a linear relationship between two numeric variables. PROC CORR can be used to compute Pearson product-moment
More informationFinal Exam Practice Problem Answers
Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal
More informationX X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)
CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.
More informationBA 275 Review Problems - Week 5 (10/23/06-10/27/06) CD Lessons: 48, 49, 50, 51, 52 Textbook: pp. 380-394
BA 275 Review Problems - Week 5 (10/23/06-10/27/06) CD Lessons: 48, 49, 50, 51, 52 Textbook: pp. 380-394 1. Does vigorous exercise affect concentration? In general, the time needed for people to complete
More informationAnalyzing Research Data Using Excel
Analyzing Research Data Using Excel Fraser Health Authority, 2012 The Fraser Health Authority ( FH ) authorizes the use, reproduction and/or modification of this publication for purposes other than commercial
More informationMULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
Sample Practice problems - chapter 12-1 and 2 proportions for inference - Z Distributions Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Provide
More informationPart 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217
Part 3 Comparing Groups Chapter 7 Comparing Paired Groups 189 Chapter 8 Comparing Two Independent Groups 217 Chapter 9 Comparing More Than Two Groups 257 188 Elementary Statistics Using SAS Chapter 7 Comparing
More informationHow To Check For Differences In The One Way Anova
MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way
More informationMethods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL
Paper SA01-2012 Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL ABSTRACT Analysts typically consider combinations
More informationClass 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)
Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the
More informationStandard Deviation Estimator
CSS.com Chapter 905 Standard Deviation Estimator Introduction Even though it is not of primary interest, an estimate of the standard deviation (SD) is needed when calculating the power or sample size of
More informationThis chapter discusses some of the basic concepts in inferential statistics.
Research Skills for Psychology Majors: Everything You Need to Know to Get Started Inferential Statistics: Basic Concepts This chapter discusses some of the basic concepts in inferential statistics. Details
More informationGuido s Guide to PROC FREQ A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY
Guido s Guide to PROC FREQ A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY ABSTRACT PROC FREQ is an essential procedure within BASE
More informationAdverse Impact Ratio for Females (0/ 1) = 0 (5/ 17) = 0.2941 Adverse impact as defined by the 4/5ths rule was not found in the above data.
1 of 9 12/8/2014 12:57 PM (an On-Line Internet based application) Instructions: Please fill out the information into the form below. Once you have entered your data below, you may select the types of analysis
More informationHYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate
More informationStatistics and Data Analysis
NESUG 27 PRO LOGISTI: The Logistics ehind Interpreting ategorical Variable Effects Taylor Lewis, U.S. Office of Personnel Management, Washington, D STRT The goal of this paper is to demystify how SS models
More informationStatCrunch and Nonparametric Statistics
StatCrunch and Nonparametric Statistics You can use StatCrunch to calculate the values of nonparametric statistics. It may not be obvious how to enter the data in StatCrunch for various data sets that
More informationLogistic (RLOGIST) Example #1
Logistic (RLOGIST) Example #1 SUDAAN Statements and Results Illustrated EFFECTS RFORMAT, RLABEL REFLEVEL EXP option on MODEL statement Hosmer-Lemeshow Test Input Data Set(s): BRFWGT.SAS7bdat Example Using
More informationIs it statistically significant? The chi-square test
UAS Conference Series 2013/14 Is it statistically significant? The chi-square test Dr Gosia Turner Student Data Management and Analysis 14 September 2010 Page 1 Why chi-square? Tests whether two categorical
More informationWeek 3&4: Z tables and the Sampling Distribution of X
Week 3&4: Z tables and the Sampling Distribution of X 2 / 36 The Standard Normal Distribution, or Z Distribution, is the distribution of a random variable, Z N(0, 1 2 ). The distribution of any other normal
More informationStatistical Analysis Using SPSS for Windows Getting Started (Ver. 2014/11/6) The numbers of figures in the SPSS_screenshot.pptx are shown in red.
Statistical Analysis Using SPSS for Windows Getting Started (Ver. 2014/11/6) The numbers of figures in the SPSS_screenshot.pptx are shown in red. 1. How to display English messages from IBM SPSS Statistics
More informationBivariate Statistics Session 2: Measuring Associations Chi-Square Test
Bivariate Statistics Session 2: Measuring Associations Chi-Square Test Features Of The Chi-Square Statistic The chi-square test is non-parametric. That is, it makes no assumptions about the distribution
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More information6 3 The Standard Normal Distribution
290 Chapter 6 The Normal Distribution Figure 6 5 Areas Under a Normal Distribution Curve 34.13% 34.13% 2.28% 13.59% 13.59% 2.28% 3 2 1 + 1 + 2 + 3 About 68% About 95% About 99.7% 6 3 The Distribution Since
More information