# SUMAN DUVVURU STAT 567 PROJECT REPORT

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 SUMAN DUVVURU STAT 567 PROJECT REPORT

3 We also run a parametric accelerated failure time (AFT) models besides proportional hazards (PH) models. Whereas the key assumption of a PH model is that hazard ratios are constant over time, the key assumption for AFT model is that survival time accelerates (or decelerates) by a constant factor when comparing different levels of covariates.the most common distribution for parametric modeling of survival data is the Weibull distribution which will be used for the estimation of parameters. We also estimate the CDF using both the parametric and non-parametric methods. Results: The results are divided in to two sections: 1. ADDICTS data exploration. 2. Parametric and non-parametric models.

4 1. EXPLORATION OF ADDICTS DATA This figure below is a life data event plot from SPLIDA that shows the survival times of all the patients in the addicts dataset.

5 Survival plots: Survival Plot Surviving SURVT Quantiles Group Median Time Lower95% Upper95% 25% Failures 75% Failures Combined The median survival time is 504 days with a 95% CI of [394, 550]. Samples of survival times are frequently skewed, therefore, the median is a better measure of central location than mean. Below is the survival plots drawn separately for the two clinics using JMP software. Survival Plot 1.0 Surviving SURVT

6 The above graph provides important results regarding the comparison of the two clinics. The curve for clinic 2 consistently lies above the curve for clinic 1, indicating that clinic 2 does better than clinic 1 in retaining its patients in methadone treatment. Further, because the two curves diverge after about a year, it appears that clinic 2 is vastly superior to clinic 1 after one year but only slightly better than clinic 1 prior to one year. Tests Between Groups Test ChiSquare DF Prob>ChiSq Log-Rank <.0001 Wilcoxon Both the Log-rank and wilcoxon tests yield significant p-values. Hence it suggests that there is significant difference in survival times between the two clinics. Then we explored the relationship between dose and survival time using bivariate fits; Oneway Analysis of SURVT By dose SURVT dose Means and Std Deviations Level Number Mean Std Dev Std Err Mean Lower 95% Upper 95% As we can see the survival times tend to be high at a dose of about 80mg.

7 Then to check if the effect of dose are different in the two clinics, we did the bivariate fits for the data from the two clinics independently. Bivariate Fit of SURVT By dose clinic= dose SURVT Bivariate Fit of SURVT By dose clinic= dose SURVT The trend of higher survival times at high doses of around 80 is still observed in both the clinics. Also doses above 80mg are administered only in clinic 2 which don t appear to any effective.

8 Estimation of F(t) using non-parametric methods Then we also investigated the distribution of survival times using the non-parametric methods. We estimated a CDF F(t) from the data without having to assume an underlying parametric distribution. Refer to Table-2 in the appendix for the CDF estimates at different survival times. Estimation of F(t) using parametric methods: Since weibull fits the best, we used it to estimate the CDF. Using addicts data Parametric ML CDF Estimates Pointwise Approximate 95% Confidence Intervals Weibull Distribution SURVT Fhat Std.Err. 95% Lower 95% Upper

9 Using addicts data Parametric ML Quantile Estimates Pointwise Approximate 95% Confidence Intervals Weibull Distribution p Quanhat Std.Err. 95% Lower 95% Upper The approximate 95% likelihood confidence interval for 0.1 Quantile SURVT is: [ ] The approximate 95% likelihood confidence interval for beta is: [ ] The 0.10 quantile is days meaning the 90% of them have survival times above 98.5 days.

10 Which distribution fits best? It is necessary to examine which distribution best fits the data and then make useful inference based on the distribution. Here in the plot below I have shown the fit of the weibull distribution with all other distributions such as exponential, normal, log normal etc., and the table below shows their log likelihood values. From these values we can see that Weibull distribution best fits the data. Maximum likelihood estimation results: Response units: SURVT Weibull Distribution Log likelihood mu se_mu sigma se_sigma 1 weibull exponential frechet lev

11 5 logistic loglogistic lognormal normal sev There are several ways to view this distribution, including probability plots, survival plots hazard plots etc.

12 Response units: SURVT Weibull Distribution Log likelihood at maximum point: Parameter Approx Conf. Interval MLE Std.Err. 95% Lower 95% Upper mu sigma weibull.eta weibull.beta The ML estimate of mean survival time for the addicts data is An approximate 95% confidence interval is [502.3, 663.7]. A beta less than 1 models a hazard rate that decreases with time, as in the infant mortality period. A beta equal to 1 models a constant hazard rate, as in the normal life period. And a beta greater than 1 models an increasing hazard rate. Because beta is > 1 and < 2 in our case, the hazard rate is rapid.

13 2. PARAMETRIC AND NON-PARAMETRIC MODELS Cox proportional hazards model: We then fit the cox proportional hazards model to the addicts data. The proportional hazard model is the most general of the regression models because it is not based on any assumptions concerning the nature or shape of the underlying survival distribution. The model assumes that the underlying hazard rate (rather than survival time) is a function of the independent variables (covariates). No assumptions are made about the nature or shape of the hazard function. Thus, in a sense, Cox's regression model may be considered to be a nonparametric method. Fitting a proportional hazards model using JMP Proportional Hazards Fit Censored By: Type Whole Model Number of Events 150 Number of Censorings 88 Total Number 238 Model -LogLikelihood ChiSquare DF Prob>Chisq Difference <.0001 Full Reduced Parameter Estimates Term Estimate Std Error Lower CL Upper CL clinic[1] prison dose Effect Likelihood Ratio Tests Source Nparm DF L-R ChiSquare Prob>ChiSq clinic <.0001 prison dose <.0001 There are three statistical objectives typically considered. One is to test for the significance of the treatment status variable, adjusted for other variables. Another is to obtain a point estimate of the effect of treatment status, adjusted for other variables. And a third is to obtain a confidence interval for this effect. To test for the significance of the treatment effects, we look at the log likelihood ratio test results for all the different explanatory variables. The Effect Likelihood-Ratio Test tests whether or not a variable is important given the other variable is already in the model. All the variables including clinic, prison and dose have significant p-values indicating that they have a significant effect on the survival time. However the 95% CI of the regression estimate for prison contains zero. So there is little statistical evidence

14 that prison record affects the survival time of the addicts. Hence prison record might not be a good predictor of the survival time. Risk Ratios Unit Risk Ratios Per unit change in regressor Term Risk Ratio Lower CL Upper CL Reciprocal prison dose Range Risk Ratios Per change in regressor over entire range Term Risk Ratio Lower CL Upper CL Reciprocal prison dose Risk Ratios for clinic Level1 /Level2 Risk Ratio Reciprocal Above is a table showing risk ratios for the effect of each variable adjusted for the other variables in the model. Also the 95% CI of risk ratio for prison record contains 1 which again suggests that hazard for prisoners is not very different from the non-prisoners. The risk ratio clinic2/clinic1 is 0.36 suggesting that clinic 1 is associated with higher hazard compared to clinic 2. Our analysis of the output has led us to conclude that the model is appropriate and that, using this model, we get statistically significant risk ratios for clinic and dose. Parametric survival models: A parametric survival model is one in which survival time (the outcome) is assumed to follow a known distribution. Examples of distributions that are commonly used for survival time are: the Weibull, the exponential (a special case of the Weibull), the loglogistic, the lognormal, and the generalized gamma. Linear regression, logistic regression, and Poisson regression are examples of parametric models that are commonly used in the health sciences. With these models, the outcome is assumed to follow some distribution The Cox proportional hazards model, by contrast, is not a fully parametric model. Rather it is a semiparametric model because even if the regression parameters (the betas) are known, the distribution of the outcome remains unknown. The baseline survival (or hazard) function is not specified in a Cox model. Now we fit parametric models to the addicts data.

15 Parametric accelerated failure time (AFT) models We also run a parametric accelerated failure time (AFT) models besides proportional hazards models. Whereas the key assumption of a PH model is that hazard ratios are constant over time, the key assumption for AFT model is that survival time accelerates (or decelerates) by a constant factor when comparing different levels of covariates. The most common distribution for parametric modeling of survival data is the Weibull distribution which will be used for the estimation of parameters. ALT multiple regression using all the explanatory variables yielded the following results: Maximum likelihood estimation results: Response units: SURVT Weibull Distribution Variable: Relationship (g) 1 clinic: Linear 2 prison: Linear 3 dose: Linear Model formula: Location ~ clinic + prison + dose Log likelihood at maximum point: Parameter Approx Conf. Interval MLE Std.Err. 95% Lower 95% Upper (Intercept) clinic prison dose sigma weibull.beta We get similar results an in PH model. Clinic and dose are significant based on the 95% CI of the estimates which does not include 0. However the 95% CI includes 0 for the prison estimate which again suggests that there is not enough evidence to support that prison record affects the survival time.

16 Here is a probability plot (individual conditions) that shows separate regression analyses for the two clinics. Weibull Distribution Log likelihood eta se_eta beta se_beta 1 1clinic clinic As we can see, the weibull shape parameters are not the same between the same groups. Also there is a huge difference in the log likelihoods.

17 Here is a probability plot (individual conditions) that shows separate regression analyses for the all combinations of the levels of dose and prison record. Weibull Distribution Log likelihood eta se_eta beta se_beta 1 1clinic;0prison clinic;1prison clinic;1prison clinic;0prison It looks here again the major differentiating factor is clinic and within each clinic there is no difference in the likelihood across different prison records.

19 APPENDIX: Table 1: RAW DATA: clinic status SURVT prison

20 clinic status SURVT prison

21 clinic status SURVT prison

22 clinic status SURVT prison

23 Table 2: Nonparametric estimates from addicts data with approximate 95% simultaneous confidence intervals numeric matrix: 141 rows, 6 columns. SURVT-lower SURVT-upper Fhat SE_Fhat 95% Lower 95% Upper SURVT-lower SURVT-upper Fhat SE_Fhat 95% Lower 95% Upper SURVT-lower SURVT-upper Fhat SE_Fhat 95% Lower 95% Upper

24 SURVT-lower SURVT-upper Fhat SE_Fhat 95% Lower 95% Upper SURVT-lower SURVT-upper Fhat SE_Fhat 95% Lower 95% Upper

25 SURVT-lower SURVT-upper Fhat SE_Fhat 95% Lower 95% Upper SURVT-lower SURVT-upper Fhat SE_Fhat 95% Lower 95% Upper

### Introduction. Survival Analysis. Censoring. Plan of Talk

Survival Analysis Mark Lunt Arthritis Research UK Centre for Excellence in Epidemiology University of Manchester 01/12/2015 Survival Analysis is concerned with the length of time before an event occurs.

### Gamma Distribution Fitting

Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics

### Parametric survival models

Parametric survival models ST3242: Introduction to Survival Analysis Alex Cook October 2008 ST3242 : Parametric survival models 1/17 Last time in survival analysis Reintroduced parametric models Distinguished

### Survival Analysis Using SPSS. By Hui Bian Office for Faculty Excellence

Survival Analysis Using SPSS By Hui Bian Office for Faculty Excellence Survival analysis What is survival analysis Event history analysis Time series analysis When use survival analysis Research interest

### Lecture 15 Introduction to Survival Analysis

Lecture 15 Introduction to Survival Analysis BIOST 515 February 26, 2004 BIOST 515, Lecture 15 Background In logistic regression, we were interested in studying how risk factors were associated with presence

### Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

### Modeling the Claim Duration of Income Protection Insurance Policyholders Using Parametric Mixture Models

Modeling the Claim Duration of Income Protection Insurance Policyholders Using Parametric Mixture Models Abstract This paper considers the modeling of claim durations for existing claimants under income

### Survival Analysis of Left Truncated Income Protection Insurance Data. [March 29, 2012]

Survival Analysis of Left Truncated Income Protection Insurance Data [March 29, 2012] 1 Qing Liu 2 David Pitt 3 Yan Wang 4 Xueyuan Wu Abstract One of the main characteristics of Income Protection Insurance

### 11. Analysis of Case-control Studies Logistic Regression

Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

### Introduction to Event History Analysis DUSTIN BROWN POPULATION RESEARCH CENTER

Introduction to Event History Analysis DUSTIN BROWN POPULATION RESEARCH CENTER Objectives Introduce event history analysis Describe some common survival (hazard) distributions Introduce some useful Stata

### Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model

### SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

### An Application of Weibull Analysis to Determine Failure Rates in Automotive Components

An Application of Weibull Analysis to Determine Failure Rates in Automotive Components Jingshu Wu, PhD, PE, Stephen McHenry, Jeffrey Quandt National Highway Traffic Safety Administration (NHTSA) U.S. Department

### 200609 - ATV - Lifetime Data Analysis

Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 2015 200 - FME - School of Mathematics and Statistics 715 - EIO - Department of Statistics and Operations Research 1004 - UB - (ENG)Universitat

### Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

### Regression Modeling Strategies

Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions

### Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

Directions in Statistical Methodology for Multivariable Predictive Modeling Frank E Harrell Jr University of Virginia Seattle WA 19May98 Overview of Modeling Process Model selection Regression shape Diagnostics

### From the help desk: Bootstrapped standard errors

The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution

### Distribution (Weibull) Fitting

Chapter 550 Distribution (Weibull) Fitting Introduction This procedure estimates the parameters of the exponential, extreme value, logistic, log-logistic, lognormal, normal, and Weibull probability distributions

### Statistical Analysis of Life Insurance Policy Termination and Survivorship

Statistical Analysis of Life Insurance Policy Termination and Survivorship Emiliano A. Valdez, PhD, FSA Michigan State University joint work with J. Vadiveloo and U. Dias Session ES82 (Statistics in Actuarial

### Interpretation of Somers D under four simple models

Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms

### 7.1 The Hazard and Survival Functions

Chapter 7 Survival Models Our final chapter concerns models for the analysis of data which have three main characteristics: (1) the dependent variable or response is the waiting time until the occurrence

### Parametric Models. dh(t) dt > 0 (1)

Parametric Models: The Intuition Parametric Models As we saw early, a central component of duration analysis is the hazard rate. The hazard rate is the probability of experiencing an event at time t i

### Normality Testing in Excel

Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com

### Basic Statistical and Modeling Procedures Using SAS

Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom

### Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln Log-Rank Test for More Than Two Groups Prepared by Harlan Sayles (SRAM) Revised by Julia Soulakova (Statistics)

### Parametric Survival Models

Parametric Survival Models Germán Rodríguez grodri@princeton.edu Spring, 2001; revised Spring 2005, Summer 2010 We consider briefly the analysis of survival data when one is willing to assume a parametric

### II. DISTRIBUTIONS distribution normal distribution. standard scores

Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

### Simple linear regression

Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

### International Statistical Institute, 56th Session, 2007: Phil Everson

Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA E-mail: peverso1@swarthmore.edu 1. Introduction

### Chapter 3 RANDOM VARIATE GENERATION

Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

### Generalized Linear Models

Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

### 1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

### Lecture 2 ESTIMATING THE SURVIVAL FUNCTION. One-sample nonparametric methods

Lecture 2 ESTIMATING THE SURVIVAL FUNCTION One-sample nonparametric methods There are commonly three methods for estimating a survivorship function S(t) = P (T > t) without resorting to parametric models:

### SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg IN SPSS SESSION 2, WE HAVE LEARNT: Elementary Data Analysis Group Comparison & One-way

### Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

### Permutation Tests for Comparing Two Populations

Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. Jae-Wan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### Applied Reliability Page 1 APPLIED RELIABILITY. Techniques for Reliability Analysis

Applied Reliability Page 1 APPLIED RELIABILITY Techniques for Reliability Analysis with Applied Reliability Tools (ART) (an EXCEL Add-In) and JMP Software AM216 Class 4 Notes Santa Clara University Copyright

### Simple Linear Regression Inference

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

### Estimation of σ 2, the variance of ɛ

Estimation of σ 2, the variance of ɛ The variance of the errors σ 2 indicates how much observations deviate from the fitted surface. If σ 2 is small, parameters β 0, β 1,..., β k will be reliably estimated

### Survival analysis methods in Insurance Applications in car insurance contracts

Survival analysis methods in Insurance Applications in car insurance contracts Abder OULIDI 1-2 Jean-Marie MARION 1 Hérvé GANACHAUD 3 1 Institut de Mathématiques Appliquées (IMA) Angers France 2 Institut

### Factors affecting online sales

Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4

### Principles of Hypothesis Testing for Public Health

Principles of Hypothesis Testing for Public Health Laura Lee Johnson, Ph.D. Statistician National Center for Complementary and Alternative Medicine johnslau@mail.nih.gov Fall 2011 Answers to Questions

### Generalized Linear Models. Today: definition of GLM, maximum likelihood estimation. Involves choice of a link function (systematic component)

Generalized Linear Models Last time: definition of exponential family, derivation of mean and variance (memorize) Today: definition of GLM, maximum likelihood estimation Include predictors x i through

### Poisson Models for Count Data

Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

### Lecture 4 PARAMETRIC SURVIVAL MODELS

Lecture 4 PARAMETRIC SURVIVAL MODELS Some Parametric Survival Distributions (defined on t 0): The Exponential distribution (1 parameter) f(t) = λe λt (λ > 0) S(t) = t = e λt f(u)du λ(t) = f(t) S(t) = λ

### Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

### NCSS Statistical Software

Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

### Dongfeng Li. Autumn 2010

Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis

### Competing-risks regression

Competing-risks regression Roberto G. Gutierrez Director of Statistics StataCorp LP Stata Conference Boston 2010 R. Gutierrez (StataCorp) Competing-risks regression July 15-16, 2010 1 / 26 Outline 1. Overview

### Multinomial and Ordinal Logistic Regression

Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,

### Tests for Two Survival Curves Using Cox s Proportional Hazards Model

Chapter 730 Tests for Two Survival Curves Using Cox s Proportional Hazards Model Introduction A clinical trial is often employed to test the equality of survival distributions of two treatment groups.

### business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

### Chapter 7 Section 7.1: Inference for the Mean of a Population

Chapter 7 Section 7.1: Inference for the Mean of a Population Now let s look at a similar situation Take an SRS of size n Normal Population : N(, ). Both and are unknown parameters. Unlike what we used

### Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

### Applying Statistics Recommended by Regulatory Documents

Applying Statistics Recommended by Regulatory Documents Steven Walfish President, Statistical Outsourcing Services steven@statisticaloutsourcingservices.com 301-325 325-31293129 About the Speaker Mr. Steven

### Simple Predictive Analytics Curtis Seare

Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

### Logistic Regression (1/24/13)

STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

### ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R.

ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R. 1. Motivation. Likert items are used to measure respondents attitudes to a particular question or statement. One must recall

### VI. Introduction to Logistic Regression

VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models

### e = random error, assumed to be normally distributed with mean 0 and standard deviation σ

1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.

### Statistics in Retail Finance. Chapter 2: Statistical models of default

Statistics in Retail Finance 1 Overview > We consider how to build statistical models of default, or delinquency, and how such models are traditionally used for credit application scoring and decision

### Simple Linear Regression in SPSS STAT 314

Simple Linear Regression in SPSS STAT 314 1. Ten Corvettes between 1 and 6 years old were randomly selected from last year s sales records in Virginia Beach, Virginia. The following data were obtained,

### Statistics for Clinical Trial SAS Programmers 1: paired t-test Kevin Lee, Covance Inc., Conshohocken, PA

Statistics for Clinical Trial SAS Programmers 1: paired t-test Kevin Lee, Covance Inc., Conshohocken, PA ABSTRACT This paper is intended for SAS programmers who are interested in understanding common statistical

### 2. Simple Linear Regression

Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

### E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F

Random and Mixed Effects Models (Ch. 10) Random effects models are very useful when the observations are sampled in a highly structured way. The basic idea is that the error associated with any linear,

### Descriptive Statistics

Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition Online Learning Centre Technology Step-by-Step - Excel Microsoft Excel is a spreadsheet software application

### LOGISTIC REGRESSION ANALYSIS

LOGISTIC REGRESSION ANALYSIS C. Mitchell Dayton Department of Measurement, Statistics & Evaluation Room 1230D Benjamin Building University of Maryland September 1992 1. Introduction and Model Logistic

### Regression Analysis: A Complete Example

Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

### Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Chapter 4 Two-Sample T-Tests Assuming Equal Variance (Enter Means) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when the variances of

### Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples

### SPSS TUTORIAL & EXERCISE BOOK

UNIVERSITY OF MISKOLC Faculty of Economics Institute of Business Information and Methods Department of Business Statistics and Economic Forecasting PETRA PETROVICS SPSS TUTORIAL & EXERCISE BOOK FOR BUSINESS

### A Basic Introduction to Missing Data

John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

### ESTIMATING THE DISTRIBUTION OF DEMAND USING BOUNDED SALES DATA

ESTIMATING THE DISTRIBUTION OF DEMAND USING BOUNDED SALES DATA Michael R. Middleton, McLaren School of Business, University of San Francisco 0 Fulton Street, San Francisco, CA -00 -- middleton@usfca.edu

### MINITAB ASSISTANT WHITE PAPER

MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

### Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach

Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach Refik Soyer * Department of Management Science The George Washington University M. Murat Tarimcilar Department of Management Science

### HLM software has been one of the leading statistical packages for hierarchical

Introductory Guide to HLM With HLM 7 Software 3 G. David Garson HLM software has been one of the leading statistical packages for hierarchical linear modeling due to the pioneering work of Stephen Raudenbush

### Developing Business Failure Prediction Models Using SAS Software Oki Kim, Statistical Analytics

Paper SD-004 Developing Business Failure Prediction Models Using SAS Software Oki Kim, Statistical Analytics ABSTRACT The credit crisis of 2008 has changed the climate in the investment and finance industry.

### Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

### Technology Step-by-Step Using StatCrunch

Technology Step-by-Step Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate

### Using Minitab for Regression Analysis: An extended example

Using Minitab for Regression Analysis: An extended example The following example uses data from another text on fertilizer application and crop yield, and is intended to show how Minitab can be used to

### Linear Regression in SPSS

Linear Regression in SPSS Data: mangunkill.sav Goals: Examine relation between number of handguns registered (nhandgun) and number of man killed (mankill) checking Predict number of man killed using number

### For example, enter the following data in three COLUMNS in a new View window.

Statistics with Statview - 18 Paired t-test A paired t-test compares two groups of measurements when the data in the two groups are in some way paired between the groups (e.g., before and after on the

### Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study)

Cairo University Faculty of Economics and Political Science Statistics Department English Section Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study) Prepared

### Inferential Statistics

Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

### HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Introduction 2. A General Formulation 3. Truncated Normal Hurdle Model 4. Lognormal

### Standard Deviation Calculator

CSS.com Chapter 35 Standard Deviation Calculator Introduction The is a tool to calculate the standard deviation from the data, the standard error, the range, percentiles, the COV, confidence limits, or

### Introduction to Survival Analysis

John Fox Lecture Notes Introduction to Survival Analysis Copyright 2014 by John Fox Introduction to Survival Analysis 1 1. Introduction I Survival analysis encompasses a wide variety of methods for analyzing

### MULTIPLE REGRESSION EXAMPLE

MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if

### How unemployment insurance savings accounts affect employment duration: evidence from Chile

Nagler IZA Journal of Labor & Development ORIGINAL ARTICLE Open Access How unemployment insurance savings accounts affect employment duration: evidence from Chile Paula Nagler Correspondence: paula.nagler@maastrichtuniversity.nl

### Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY

Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY ABSTRACT: This project attempted to determine the relationship

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### Regression, least squares

Regression, least squares Joe Felsenstein Department of Genome Sciences and Department of Biology Regression, least squares p.1/24 Fitting a straight line X Two distinct cases: The X values are chosen

### Standard Operating Procedure for Statistical Analysis of Clinical Trial Data

Page 1 of 13 Standard Operating Procedure for Statistical Analysis of Clinical Trial Data SOP ID Number: Effective Date: 22/01/2010 Version Number & Date of Authorisation: V01, 15/01/2010 Review Date:

### SPSS Tests for Versions 9 to 13

SPSS Tests for Versions 9 to 13 Chapter 2 Descriptive Statistic (including median) Choose Analyze Descriptive statistics Frequencies... Click on variable(s) then press to move to into Variable(s): list