A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic


 Stewart Fowler
 1 years ago
 Views:
Transcription
1 A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic Report prepared for Brandon Slama Department of Health Management and Informatics University of Missouri, Columbia By Tyler Cook Chathuri Daluwatte Under the direction of Lori Thombs, Ph.D. Director, Social Sciences Statistics Center Department of Statistics University of Missouri, Columbia
2 Executive Summary The goal of this report is to identify variables and derive a model to predict the no show probability of a free health care clinic using an observed data set. In order to determine the probability of a no show we performed logistic regression, discriminant analysis and univariate analysis on the data set. However above mentioned multivariate statistical methods fell victim to the high number of missing values in the dataset. The large amount of incomplete data decreased the power of the analysis. Hence determining an accurate prediction model with reasonable error rates was almost impossible. However our results suggests that patients who did not show up for a scheduled appointment tended to have a larger average number of days between when the visit was scheduled and the date of the appointment. Also, the patients who confirmed their appointment during the reminder call were much more likely to actually arrive.
3 Goal of the Study Health care is providing diagnosis, treatment and prevention for diseases. Health care systems are organizations established to provide above motioned health needs in target populations which are owned and operated by different entities in a variability of standards. Free clinics are health care systems where services are provided to public for free. The clinic of interest in this study is such free clinic which is run by volunteers and voluntary physicians. The study is observational about the no shows of scheduled appointments at the mentioned free health clinic. The goal of the study is to predict the probability of no show using information about scheduled patients which are stored in the clinic database. Identifying such a model will help the clinic management to intervene in possible ways and try reducing the likelihood of no shows thus save valuable time of the voluntary staff and physicians. Data Set As the dataset provided was from an existing database, data cleaning was required prior to data analysis. The provided data set consisted of one dependent variable, visit status which can take three values Arrived, Cancelled and No show. Since the goal of the study was to model the probability of a no show, we excluded the possibility of visit status variable having the value, cancelled by removing the cancelled appointments from the data set. The data set included three continuous independent variables namely age, distance to the clinic and days from the appointment set up to the appointment date. The data set also had six categorical
4 variables; previous visit Status, patient Status, visit type, reminder call status, clinic type and scheduled by. Previous visit status had five levels as shown in table 1, but due to lack of data we removed levels pending and rescheduled from the dataset. Table 1 Reminder call status too had five levels but similar to previous visit status, we removed the levels cancelled and rescheduled, table 2. Table 2 Clinic type had four levels, but as shown on table 3 three of the clinic types (Diabetes care, Dermatology, MSK night) had very low number of observations compared to the MedZou Clinic, thus we redefined the variable by categorizing other clinic types than MedZou clinic into one level named Non MedZou.
5 Table 3 Variable visit type had five levels but similar to clinic type, with very high frequency at level full visit as shown in table 4. Thus we redefined the variable to have two levels, full visit and non full visit. Table 4 The variable patient status had two levels new and return which we used as it is table 5. The Scheduled by variable, which represents the person who scheduled the appointment using a number, had too many levels with low counts at each level. Thus we decided not to use scheduled by variable in our analysis. Table 5
6 Logistic Regression In order to predict the probability of a no show in the dependent variable visit status, we attempted to fit a logistic regression model. We implemented the logistic regression model using PROC GENMOD procedure in SAS and we tried to first fit a model with all the independent variables in the model. The model suggested variables Reminder Call Status, Days, Age to be significant predictors in determining the probability of a no show. The predictive ability of the model was analyzed by calculating the error rates by using 0.5 as the threshold to determine whether the predicted probability suggests an arrival or a no show. Since the model predicts the probability of no show, if the predicted probability is greater than 0.5, that suggested a predicted no show while a probability less than or equal to 0.5 suggested a predicted arrival. Result of the model is concluded in table 6. As evident from table 6 the misclassified no show rate is 73.68% which is unacceptably high. PROC GENMOD procedure deletes observations with missing values, thus while running this model even though the dataset we provided had 771 observations, 497 observations were not used in the model prediction due to missing values which drastically degraded model s ability to predict.
7 Logistic Regression Prediction Error Rates for the Full model Predicted True Arrived No Show Total Arrived % 8.94% % No Show % 26.32% % Total % 14.96% % Table 6 By using various variable combinations in the logistic regression we were able to identify the best model fit which used following seven variables in the model; Age, Days to appointment, Distance, Patient Status, Visit Type, Reminder Call Status, Clinic type. This model selected variables Patient Status, Reminder Call Status and Age as significant predictors in determining the probability of a no show. The error rates of the model are reported in table 7 but results for this model are not very different from the full model in terms of performance. By removing the variable Previous Visit status we could increase the number of observations used in the model by 167 but still we are not using 330 observations due to missing values.
8 Logistic Regression Prediction Error Rates for the Best Fit model Predicted True Arrived No Show Total Arrived % 10.53% % No Show % 27.74% % Total % 15.87% % Table 7 Prediction ability of individual independent variables Since the logistic regression models did a poor job in predicting the probability of a no show, we analyzed the prediction ability of each independent variable by plotting the visit status against each independent variable. We first report the significant predictor data plots. Reminder call status was selected from both logistic regression models to be a good predictor. As shown in Fig 1. when reminder call status has the value confirmed we have low no show rate while for the other two values (specially for no answer ) the no show rate is high. Thus reminder call status shows moderate prediction ability.
9 Fig. 1 Fig. 2 shows the data distribution for days variable, where you can see the percentage of no shows is lower than arrivals towards the less days end of the graph while the no shows percentage is higher than the arrivals towards the more days end of the graph. This represents the moderate prediction ability of the variable days to appointment. Fig. 2
10 Visit status behavior with respect to age is reported in Fig.3, where you can see, towards the younger age of the plot the no show rate is comparatively smaller than arrivals but between the no show percentage is higher than arrivals. Fig. 3 Fig. 4 8 shows the variation of visit status with the independent variables; patient status, previous visit status, visit type, clinic type and distance. These graphs illustrate the fact that status of the independent variable does not describe the variations of visit status as we saw earlier with the above mentioned good predictor variables (reminder call status, days and age). In other words, for all values of the independent variable, the visit status always holds a low no show percentage.
11 Fig. 4 Fig. 5 Fig.6 Fig. 7 Fig. 8
12 Discriminant Analysis We also attempted a discriminant analysis in order to classify patient status. We investigated normal distribution based methods as well as nonparametric methods using PROC DISCRIM in SAS. The goal was to find the best subset of independent variables that are able to accurately predict whether or not a patient would fail to show up to their scheduled appointment. The predictive ability of each model was assessed based on cross validation error rates. Models with lower error rates are preferred. Also, it is important keep in mind that this statistical procedure deletes observations with any missing values. This limits the amount of available data and potentially harms the model s ability to predict observations when there is a large amount of missing data like in this study. Our first approach used the normal distribution method. This technique assumes that the independent variables follow a multivariate normal distribution. The independent variables in our analysis are a mix of continuous and categorical random variables so the assumption of multivariate normality might not be appropriate. Nevertheless, this method is attractive because one is able to access the discriminant function in the output. This is desirable because the discriminant function could then be used to easily classify new observations and determine which patients are likely to not show up for their appointment. The best normal based discriminant analysis included five of the independent variables: days, reminder, distance, patient status, and age. The cross validation error rates for this model are in the table 8.
13 Normal Based Discriminant Analysis Cross Validation Error Rates Predicted True Arrived No Show Total Arrived % 8.75% % No Show % 22.86% % Total % 13.04% % Table 8 Several conclusions can be made from the results. This model does a good job at classifying those who arrived for their appointment, only misclassifying 8.75% of these patients. However, this model does a very poor job of accurately classifying the patients who failed to show up to their appointment. This model only correctly classified 22.86% of the patients whose true status was no show. Therefore, 77.14% of these no show patients were incorrectly predicted to arrive. This is the worst error we could make since our goal is to identify patients who will no show in order to target them for some kind of intervention. Unfortunately, these results indicate that the normal based discriminant analysis is not adequately able to classify patient status. Next we attempted a nonparametric discriminant analysis. This method is more flexible since it places no distributional assumptions on the independent
14 variables. The downside to this method is that one cannot get the discriminant function. So it is not very practical to implement these results when attempting to classify new observations. The best nonparametric discriminant analysis used five of the independent variables: days, reminder, distance, patient status, and age. In this case, the age variable only marginally improves the model but given the overall poor performance of the other methods we decided to include age even though it means we no longer have a parsimonious model. The cross validation error results for this model are in the table 9. Nonparametric Discriminant Analysis Cross Validation Error Rates Predicted True Arrived No Show Total Arrived % 9.06% % No Show % 61.43% % Total % 25.00% % Table 9 These results are a notable improvement over the normal based method. Once again the model does a good job classifying the arrived patients, only misclassifying 9.06% of these patients. The nonparametric model correctly classified 61.43% of
15 the no show patients. While this is a significant improvement from the normal method it is still not very useful. When the main goal is to be able to predict no show patients it is imperative to have a very low error rate for this category and misclassifying about 4 out of every 10 no show patients is disappointing. Univariate Analyses Since the logistic regression and discriminant analysis results were unsatisfactory we decided to examine each of the independent variables individually with the status outcome. T tests were performed with the continuous variables in order to test whether there was a mean difference between no show and arrived patients. Also, chi squared tests of association were performed for each of the categorical variables. The results of these analyses can be found in the table 10.
16 Mean Std Err Arrival No Show T Statistic p value Days Distance Age Chi square p value Reminder < Previous Visit Status Patient Status Visit Type MedZou Clinic Table 10 From table 10 we can see that four of the tests are significant at alpha=0.05 (days, distance, reminder, and patient status). Also, age and visit type are marginally significant with p values <0.10. The tests for previous visit status and clinic type are not significant at any reasonable alpha level. Therefore, there is insufficient evidence to conclude that previous visit status and clinic type are related to status. The test for the days variable has a corresponding p value of We reject the null hypothesis that the mean days for no show and arrived patients are the same. We conclude that there is a statistically significant difference in the means of arrived patients and no show patients. By examining the means we can see that patients who attend their appointments have a lower mean number of days
17 from the date the appointment is scheduled until the date of the appointment. This is an intuitive result. The longer a patient has to wait the more likely they are to forget their appointment or have other important things arise thus causing them to not show up. Next we will look closer at the distance variable. This test was significant with p value so we reject the null hypothesis that the mean distance is the same for the two groups of patients. The estimates of the means indicate that the patients who arrived for their appointments had a larger mean distance from the clinic which is a slightly surprising result. The final continuous variable is age. The test was marginally significant with a p value of so there is some evidence that the no show patients and arrived patients have different mean ages. Moreover, it appears that the mean age for the arrived patients is higher than the mean age for no show patients. It is important to note that this statistically significant difference might not be a practically significant difference. The difference between the means is less than 2 years which raises some questions about whether age can really be used to distinguish between no show and arrived patients. The first of the significant categorical variables is the reminder call. The chisquare test indicates that there is sufficient evidence to conclude that the reminder call is associated with patient status. A look at the contingency table provides some insight. The table 11 represents the total counts for each combination of patient status and reminder call.
18 Arrived No Show Total Left Message Confirmed No Answer Total Table 11 One can see from the table that for left message and no answer the counts are about even between arrived and no shows. The real difference that stands out is in the confirmed row. Of the patients who confirmed their appointment, 77.7% did actually arrive. So it seems that a good indicator that a patient will arrive for their appointment is that they confirmed when given a reminder call. Next we will examine patient status. Once again the chi square test indicates that there is some statistical association between patient status and arrival status. Below is the contingency table 12 for these variables: Arrived No Show Total New Patient Return Patient Total Table 12 The table 12 indicates that the majority of both new patients and return patients did attend their scheduled appointments. Interestingly, a lower percentage of new patients failed to show up for their appointments.
19 Finally we will consider the results for visit type. Here we are examining whether a full visit versus not full visit is associated with arrival status. The counts for each combination can be found in table 13. Arrived No Show Total Full Visit Not full Visit Total Table 13 The first thing to notice in table 13 is the low counts for the not full visit. With only 5 patients failing to arrive for a not full visit it is difficult to draw any conclusions and use these results for classification. Also, the majority of full visit patients did arrive for their appointment. So this appears to be another example of a statistically significant result that does not have much of a practical application. Conclusions The statistical methods utilized in this study fell victim to the high number of missing values in the dataset. The large amount of incomplete data decreased power and made accurate prediction of observations almost impossible. There are potential ways to remedy this issue that might be useful in a supplementary investigation. One possibility is to use imputation in order to fill in the missing values. The ideal solution would be to acquire a larger sample of complete observation. However, this might not be possible given the clinic is staffed by volunteers with limited time and resources.
20 We employed only two methods out of a wide variety of statistical tools. Future research on this issue might benefit from using additional methods. Classification and regression trees would provide another way to predict observations. Also, a multinomial logistic regression would be able to model a status outcome that has more than two categories. Even though our methods did not perform as well as we would have liked there is still some useful information to come out of this study. Several of the independent variables did have statistically significant relationships with the status outcome. In particular, patients who did not show up for a scheduled appointment tended to have a larger average number of days between when the visit was scheduled and the date of the exam. Also, the patients who confirmed their appointment during the reminder call were much more likely to actually arrive. Therefore, when attempting to identify a patient as a potential no show, we would recommend giving patients who have a large number of days from the scheduled date until the appointment date additional reminder calls. Moreover, it might be beneficial to make repeated reminder calls until the patient either confirms of indicates that they will be canceling. The aim of this analysis was to develop a procedure that could be used to predict patients who will fail to show up for a scheduled appointment. In order to accomplish this goal we set out by first fitting a logistic regression model and then performing discriminant analysis. Unfortunately neither of these methods provided satisfactory results that were able to classify patients with reasonable error rates. Obtaining a larger sample of complete observations or handling the missing values
21 in some other way might provide the needed power to get useful results in a future study.
Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jintselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jintselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
More information13.2 The Chi Square Test for Homogeneity of Populations The setting: Used to compare distribution of proportions in two or more populations.
13.2 The Chi Square Test for Homogeneity of Populations The setting: Used to compare distribution of proportions in two or more populations. Data is organized in a two way table Explanatory variable (Treatments)
More informationStatistical matching: Experimental results and future research questions
Statistical matching: Experimental results and future research questions 2015 19 Ton de Waal Content 1. Introduction 4 2. Methods for statistical matching 5 2.1 Introduction to statistical matching 5 2.2
More informationResearch Methods & Experimental Design
Research Methods & Experimental Design 16.422 Human Supervisory Control April 2004 Research Methods Qualitative vs. quantitative Understanding the relationship between objectives (research question) and
More informationHow to Conduct a Hypothesis Test
How to Conduct a Hypothesis Test The idea of hypothesis testing is relatively straightforward. In various studies we observe certain events. We must ask, is the event due to chance alone, or is there some
More informationInferential Statistics
Inferential Statistics Sampling and the normal distribution Zscores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation:  Feature vector X,  qualitative response Y, taking values in C
More informationHandling missing data in Stata a whirlwind tour
Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled
More informationStatistics and research
Statistics and research Usaneya Perngparn Chitlada Areesantichai Drug Dependence Research Center (WHOCC for Research and Training in Drug Dependence) College of Public Health Sciences Chulolongkorn University,
More informationFairfield Public Schools
Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity
More informationII. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
More informationTRANSCRIPT: In this lecture, we will talk about both theoretical and applied concepts related to hypothesis testing.
This is Dr. Chumney. The focus of this lecture is hypothesis testing both what it is, how hypothesis tests are used, and how to conduct hypothesis tests. 1 In this lecture, we will talk about both theoretical
More informationPaper Let the Data Speak: New Regression Diagnostics Based on Cumulative Residuals
Paper 25528 Let the Data Speak: New Regression Diagnostics Based on Cumulative Residuals Gordon Johnston and Ying So SAS Institute Inc. Cary, North Carolina, USA Abstract Residuals have long been used
More informationHypothesis Testing: General Framework 1 1
Hypothesis Testing: General Framework Lecture 2 K. Zuev February 22, 26 In previous lectures we learned how to estimate parameters in parametric and nonparametric settings. Quite often, however, researchers
More informationSPSS: Descriptive and Inferential Statistics. For Windows
For Windows August 2012 Table of Contents Section 1: Summarizing Data...3 1.1 Descriptive Statistics...3 Section 2: Inferential Statistics... 10 2.1 ChiSquare Test... 10 2.2 T tests... 11 2.3 Correlation...
More informationThe ChiSquare Test. STAT E50 Introduction to Statistics
STAT 50 Introduction to Statistics The ChiSquare Test The Chisquare test is a nonparametric test that is used to compare experimental results with theoretical models. That is, we will be comparing observed
More informationBIOS 665: Analysis of Categorical Data
BIOS 665: Analysis of Categorical Data Course Syllabus Fall 2016 Meeting Times Lecture: Tuesdays & Thursdays, 11:00am12:15pm, Michael Hooker Research Center 0001 Recitation Session Hours: Tuesdays 3:304:30pm,
More informationBasic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
More informationLogistic Regression. Introduction. The Purpose Of Logistic Regression
Logistic Regression...1 Introduction...1 The Purpose Of Logistic Regression...1 Assumptions Of Logistic Regression...2 The Logistic Regression Equation...3 Interpreting Log Odds And The Odds Ratio...4
More informationChi Square Analysis. When do we use chi square?
Chi Square Analysis When do we use chi square? More often than not in psychological research, we find ourselves collecting scores from participants. These data are usually continuous measures, and might
More informationFacebook Friend Suggestion Eytan Daniyalzade and Tim Lipus
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information
More informationProgramme du parcours Clinical Epidemiology 20142015. UMR 1. Methods in therapeutic evaluation A Dechartres/A Flahault
Programme du parcours Clinical Epidemiology 20142015 UR 1. ethods in therapeutic evaluation A /A Date cours Horaires 15/10/2014 1417h General principal of therapeutic evaluation (1) 22/10/2014 1417h
More informationLogistic regression diagnostics
Logistic regression diagnostics Biometry 755 Spring 2009 Logistic regression diagnostics p. 1/28 Assessing model fit A good model is one that fits the data well, in the sense that the values predicted
More informationAddressing Analytics Challenges in the Insurance Industry. Noe Tuason California State Automobile Association
Addressing Analytics Challenges in the Insurance Industry Noe Tuason California State Automobile Association Overview Two Challenges: 1. Identifying High/Medium Profit who are High/Low Risk of Flight Prospects
More informationPredicting Defaults of Loans using Lending Club s Loan Data
Predicting Defaults of Loans using Lending Club s Loan Data Oleh Dubno Fall 2014 General Assembly Data Science Link to my Developer Notebook (ipynb)  http://nbviewer.ipython.org/gist/odubno/0b767a47f75adb382246
More informationHYPOTHESIS TESTING: CONFIDENCE INTERVALS, TTESTS, ANOVAS, AND REGRESSION
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, TTESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate
More informationVariables and Data A variable contains data about anything we measure. For example; age or gender of the participants or their score on a test.
The Analysis of Research Data The design of any project will determine what sort of statistical tests you should perform on your data and how successful the data analysis will be. For example if you decide
More informationLogistic Regression With SAS
Logistic Regression With SAS Please read my introductory handout on logistic regression before reading this one. The introductory handout can be found at. Run the program LOGISTIC.SAS from my SAS programs
More informationBinary Logistic Regression
Binary Logistic Regression Main Effects Model Logistic regression will accept quantitative, binary or categorical predictors and will code the latter two in various ways. Here s a simple model including
More informationSample Size and Power in Clinical Trials
Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance
More informationStatistics for Clinical Trial SAS Programmers 1: paired ttest Kevin Lee, Covance Inc., Conshohocken, PA
Statistics for Clinical Trial SAS Programmers 1: paired ttest Kevin Lee, Covance Inc., Conshohocken, PA ABSTRACT This paper is intended for SAS programmers who are interested in understanding common statistical
More informationObjectives. 9.1, 9.2 Inference for twoway tables. The hypothesis: no association. Expected cell counts. The chisquare test.
Objectives 9.1, 9.2 Inference for twoway tables The hypothesis: no association Expected cell counts The chisquare test Using software Further reading: http://onlinestatbook.com/2/chi_square/contingency.html
More informationOrdinal Regression. Chapter
Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe
More informationModule 5 Hypotheses Tests: Comparing Two Groups
Module 5 Hypotheses Tests: Comparing Two Groups Objective: In medical research, we often compare the outcomes between two groups of patients, namely exposed and unexposed groups. At the completion of this
More informationMachine Learning Logistic Regression
Machine Learning Logistic Regression Jeff Howbert Introduction to Machine Learning Winter 2012 1 Logistic regression Name is somewhat misleading. Really a technique for classification, not regression.
More informationNegative Binomials Regression Model in Analysis of Wait Time at Hospital Emergency Department
Negative Binomials Regression Model in Analysis of Wait Time at Hospital Emergency Department Bill Cai 1, Iris Shimizu 1 1 National Center for Health Statistic, 3311 Toledo Road, Hyattsville, MD 20782
More informationUNDERSTANDING CLINICAL TRIAL STATISTICS. Prepared by Urania Dafni, Xanthi Pedeli, Zoi Tsourti
UNDERSTANDING CLINICAL TRIAL STATISTICS Prepared by Urania Dafni, Xanthi Pedeli, Zoi Tsourti DISCLOSURES Urania Dafni has reported no conflict of interest Xanthi Pedeli has reported no conflict of interest
More informationAP Statistics 1998 Scoring Guidelines
AP Statistics 1998 Scoring Guidelines These materials are intended for noncommercial use by AP teachers for course and exam preparation; permission for any other use must be sought from the Advanced Placement
More informationBivariate Analysis. Correlation. Correlation. Pearson's Correlation Coefficient. Variable 1. Variable 2
Bivariate Analysis Variable 2 LEVELS >2 LEVELS COTIUOUS Correlation Used when you measure two continuous variables. Variable 2 2 LEVELS X 2 >2 LEVELS X 2 COTIUOUS ttest X 2 X 2 AOVA (Ftest) ttest AOVA
More informationPASS Sample Size Software
Chapter 250 Introduction The Chisquare test is often used to test whether sets of frequencies or proportions follow certain patterns. The two most common instances are tests of goodness of fit using multinomial
More informationResearch Methods 1 Handouts, Graham Hole,COGS  version 1.0, September 2000: Page 1:
Research Methods 1 Handouts, Graham Hole,COGS  version 1.0, September 000: Page 1: CHISQUARE TESTS: When to use a ChiSquare test: Usually in psychological research, we aim to obtain one or more scores
More informationPredictive Modelling Pilot Project
Predictive Modelling Pilot Project 1. Introduction The Long Term Conditions QIPP (quality, innovation, productivity and prevention) workstream seeks to improve clinical outcomes and experience for patients
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationMISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
More informationLecture 22: Introduction to Loglinear Models
Lecture 22: Introduction to Loglinear Models Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina
More informationCHAPTER 11 CHISQUARE: NONPARAMETRIC COMPARISONS OF FREQUENCY
CHAPTER 11 CHISQUARE: NONPARAMETRIC COMPARISONS OF FREQUENCY The hypothesis testing statistics detailed thus far in this text have all been designed to allow comparison of the means of two or more samples
More informationChiSquare Test. Contingency Tables. Contingency Tables. ChiSquare Test for Independence. ChiSquare Tests for GoodnessofFit
ChiSquare Tests 15 Chapter ChiSquare Test for Independence ChiSquare Tests for Goodness Uniform Goodness Poisson Goodness Goodness Test ECDF Tests (Optional) McGrawHill/Irwin Copyright 2009 by The
More informationHYPOTHESIS TESTING: POWER OF THE TEST
HYPOTHESIS TESTING: POWER OF THE TEST The first 6 steps of the 9step test of hypothesis are called "the test". These steps are not dependent on the observed data values. When planning a research project,
More informationCHISQUARE: TESTING FOR GOODNESS OF FIT
CHISQUARE: TESTING FOR GOODNESS OF FIT In the previous chapter we discussed procedures for fitting a hypothesized function to a set of experimental data points. Such procedures involve minimizing a quantity
More informationCAB TRAVEL TIME PREDICTI  BASED ON HISTORICAL TRIP OBSERVATION
CAB TRAVEL TIME PREDICTI  BASED ON HISTORICAL TRIP OBSERVATION N PROBLEM DEFINITION Opportunity New Booking  Time of Arrival Shortest Route (Distance/Time) TaxiPassenger Demand Distribution Value Accurate
More informationIBM SPSS Statistics 20 Part 4: ChiSquare and ANOVA
CALIFORNIA STATE UNIVERSITY, LOS ANGELES INFORMATION TECHNOLOGY SERVICES IBM SPSS Statistics 20 Part 4: ChiSquare and ANOVA Summer 2013, Version 2.0 Table of Contents Introduction...2 Downloading the
More informationMultiple Regression in SPSS STAT 314
Multiple Regression in SPSS STAT 314 I. The accompanying data is on y = profit margin of savings and loan companies in a given year, x 1 = net revenues in that year, and x 2 = number of savings and loan
More informationχ 2 = (O i E i ) 2 E i
Chapter 24 TwoWay Tables and the ChiSquare Test We look at twoway tables to determine association of paired qualitative data. We look at marginal distributions, conditional distributions and bar graphs.
More informationThe Effect of a Carveout Advanced Access Scheduling System on Noshow Rates
Practice Management Vol. 41, No. 1 51 The Effect of a Carveout Advanced Access Scheduling System on Noshow Rates Kevin J. Bennett, PhD; Elizabeth G. Baxley, MD Background and Objectives: The relationship
More informationOdds ratio, Odds ratio test for independence, chisquared statistic.
Odds ratio, Odds ratio test for independence, chisquared statistic. Announcements: Assignment 5 is live on webpage. Due Wed Aug 1 at 4:30pm. (9 days, 1 hour, 58.5 minutes ) Final exam is Aug 9. Review
More informationCOMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.
277 CHAPTER VI COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. This chapter contains a full discussion of customer loyalty comparisons between private and public insurance companies
More informationStudents' Opinion about Universities: The Faculty of Economics and Political Science (Case Study)
Cairo University Faculty of Economics and Political Science Statistics Department English Section Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study) Prepared
More informationSydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1. 1. Introduction p. 2. 2. Statistical Methods Used p. 5. 3. 10 and under Males p.
Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1 Table of Contents 1. Introduction p. 2 2. Statistical Methods Used p. 5 3. 10 and under Males p. 8 4. 11 and up Males p. 10 5. 10 and under
More informationChisquare test. More types of inference for nominal variables
Chisquare test FPP 28 More types of inference for nominal variables Nominal data is categorical with more than two categories Compare observed frequencies of nominal variable to hypothesized probabilities
More informationAssociation Between Variables
Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi
More informationCATEGORICAL DATA ChiSquare Tests for Univariate Data
CATEGORICAL DATA ChiSquare Tests For Univariate Data 1 CATEGORICAL DATA ChiSquare Tests for Univariate Data Recall that a categorical variable is one in which the possible values are categories or groupings.
More informationPaper Beyond BreslowDay: Homogeneity Across R x C Tables ABSTRACT INTRODUCTION SAMPLE DATA K 2 2 TABLES
Paper 74949 Beyond BreslowDay: Homogeneity Across R x C Tables Ginny P. Lai, David R. Mink, David J. Pasta, ICON Late Phase & Outcomes Research, San Francisco, CA ABSTRACT In the epidemiological world,
More informationVariable Selection and Transformation of Variables in SAS Enterprise Miner
Variable Selection and Transformation of Variables in SAS Enterprise Miner Kattamuri S. Sarma, Ph.D Ecostat Research Corp., White Plains NY kssarma@worldnet.att.net kssarma@ecostatresearch.com 2 Issues
More informationIs it statistically significant? The chisquare test
UAS Conference Series 2013/14 Is it statistically significant? The chisquare test Dr Gosia Turner Student Data Management and Analysis 14 September 2010 Page 1 Why chisquare? Tests whether two categorical
More informationCRJ Doctoral Comprehensive Exam Statistics Friday August 23, :00pm 5:30pm
CRJ Doctoral Comprehensive Exam Statistics Friday August 23, 23 2:pm 5:3pm Instructions: (Answer all questions below) Question I: Data Collection and Bivariate Hypothesis Testing. Answer the following
More informationCHAPTER 11. GOODNESS OF FIT AND CONTINGENCY TABLES
CHAPTER 11. GOODNESS OF FIT AND CONTINGENCY TABLES The chisquare distribution was discussed in Chapter 4. We now turn to some applications of this distribution. As previously discussed, chisquare is
More informationAnalyzing Titanic Survival Rates Carly Barry 12 April, 2012
http://blog.minitab.com/blog/realworldqualityimprovement/analyzingtitanicsurvivalrates Analyzing Titanic Survival Rates Carly Barry 12 April, 2012 April 15, 2012 marks the 100th anniversary of the
More informationService courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
More informationInvestigating the Investigative Task: Testing for Skewness An Investigation of Different Test Statistics and their Power to Detect Skewness
Investigating the Investigative Task: Testing for Skewness An Investigation of Different Test Statistics and their Power to Detect Skewness Josh Tabor Canyon del Oro High School Journal of Statistics Education
More informationEpidemiologyBiostatistics Exam Exam 2, 2001 PRINT YOUR LEGAL NAME:
EpidemiologyBiostatistics Exam Exam 2, 2001 PRINT YOUR LEGAL NAME: Instructions: This exam is 30% of your course grade. The maximum number of points for the course is 1,000; hence this exam is worth 300
More information1/2/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2
PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 When and why do we use logistic regression? Binary Multinomial Theory behind logistic regression Assessing the model Assessing predictors
More informationUSING LOGISTIC REGRESSION TO PREDICT CUSTOMER RETENTION. Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA
USING LOGISTIC REGRESSION TO PREDICT CUSTOMER RETENTION Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA Logistic regression is an increasingly popular statistical technique
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationOrganizing Your Approach to a Data Analysis
Biost/Stat 578 B: Data Analysis Emerson, September 29, 2003 Handout #1 Organizing Your Approach to a Data Analysis The general theme should be to maximize thinking about the data analysis and to minimize
More information93.4 Likelihood ratio test. NeymanPearson lemma
93.4 Likelihood ratio test NeymanPearson lemma 91 Hypothesis Testing 91.1 Statistical Hypotheses Statistical hypothesis testing and confidence interval estimation of parameters are the fundamental
More informationChi Squared and Fisher's Exact Tests. Observed vs Expected Distributions
BMS 617 Statistical Techniques for the Biomedical Sciences Lecture 11: ChiSquared and Fisher's Exact Tests Chi Squared and Fisher's Exact Tests This lecture presents two similarly structured tests, Chisquared
More informationThe More Trees, the Better! Scaling Up Performance Using Random Forest in SAS Enterprise Miner
Paper 33612015 The More Trees, the Better! Scaling Up Performance Using Random Forest in SAS Enterprise Miner Narmada Deve Panneerselvam, Spears School of Business, Oklahoma State University, Stillwater,
More informationInferential Statistics. What are they? When would you use them?
Inferential Statistics What are they? When would you use them? What are inferential statistics? Why learn about inferential statistics? Why use inferential statistics? When are inferential statistics utilized?
More informationHandling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza
Handling missing data in large data sets Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza The problem Often in official statistics we have large data sets with many variables and
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationThe general form of the PROC GLM statement is
Linear Regression Analysis using PROC GLM Regression analysis is a statistical method of obtaining an equation that represents a linear relationship between two variables (simple linear regression), or
More informationDeath on the Titanic
Death on the Titanic Introduction On its maiden voyage, the cruise ship Titanic collided with an iceberg and sank. There was much loss of life. It is of interest to test how well sample proportions from
More informationSemester 1 Statistics Short courses
Semester 1 Statistics Short courses Course: STAA0001 Basic Statistics Blackboard Site: STAA0001 Dates: Sat. March 12 th and Sat. April 30 th (9 am 5 pm) Assumed Knowledge: None Course Description Statistical
More informationNormality Testing in Excel
Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com
More informationFinal Exam Practice Problem Answers
Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal
More informationHYPOTHESIS TESTING WITH SPSS:
HYPOTHESIS TESTING WITH SPSS: A NONSTATISTICIAN S GUIDE & TUTORIAL by Dr. Jim Mirabella SPSS 14.0 screenshots reprinted with permission from SPSS Inc. Published June 2006 Copyright Dr. Jim Mirabella CHAPTER
More informationSTATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Webbased Analytics Table
More informationSOME NOTES ON STATISTICAL INTERPRETATION. Below I provide some basic notes on statistical interpretation for some selected procedures.
1 SOME NOTES ON STATISTICAL INTERPRETATION Below I provide some basic notes on statistical interpretation for some selected procedures. The information provided here is not exhaustive. There is more to
More informationAP Statistics 2002 Scoring Guidelines
AP Statistics 2002 Scoring Guidelines The materials included in these files are intended for use by AP teachers for course and exam preparation in the classroom; permission for any other use must be sought
More informationCONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19
PREFACE xi 1 INTRODUCTION 1 1.1 Overview 1 1.2 Definition 1 1.3 Preparation 2 1.3.1 Overview 2 1.3.2 Accessing Tabular Data 3 1.3.3 Accessing Unstructured Data 3 1.3.4 Understanding the Variables and Observations
More informationAmerican Journal Of Business Education July/August 2012 Volume 5, Number 4
The Impact Of The Principles Of Accounting Experience On Student Preparation For Intermediate Accounting Linda G. Carrington, Ph.D., Sam Houston State University, USA ABSTRACT Both students and instructors
More informationT adult = 96 T child = 114.
Homework Solutions Do all tests at the 5% level and quote pvalues when possible. When answering each question uses sentences and include the relevant JMP output and plots (do not include the data in your
More informationSPSS Guide: Regression Analysis
SPSS Guide: Regression Analysis I put this together to give you a stepbystep guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar
More informationA Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution
A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September
More informationContingency Tables and the Chi Square Statistic. Interpreting Computer Printouts and Constructing Tables
Contingency Tables and the Chi Square Statistic Interpreting Computer Printouts and Constructing Tables Contingency Tables/Chi Square Statistics What are they? A contingency table is a table that shows
More informationBA 275 Review Problems  Week 6 (10/30/0611/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp. 394398, 404408, 410420
BA 275 Review Problems  Week 6 (10/30/0611/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp. 394398, 404408, 410420 1. Which of the following will increase the value of the power in a statistical test
More informationStatistical Modeling Using SAS
Statistical Modeling Using SAS Xiangming Fang Department of Biostatistics East Carolina University SAS Code Workshop Series 2012 Xiangming Fang (Department of Biostatistics) Statistical Modeling Using
More information12.5: CHISQUARE GOODNESS OF FIT TESTS
125: ChiSquare Goodness of Fit Tests CD121 125: CHISQUARE GOODNESS OF FIT TESTS In this section, the χ 2 distribution is used for testing the goodness of fit of a set of data to a specific probability
More informationVI. Introduction to Logistic Regression
VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models
More informationHatice Camgöz Akdağ. findings of previous research in which two independent firm clusters were
Innovative Culture and Total Quality Management as a Tool for Sustainable Competitiveness: A Case Study of Turkish Fruit and Vegetable Processing Industry SMEs, Sedef Akgüngör Hatice Camgöz Akdağ Aslı
More information