Modelling and added value

Transcription

1 Modelling and added value Course: Statistical Evaluation of Diagnostic and Predictive Models Thomas Alexander Gerds (University of Copenhagen) Summer School, Barcelona, June 30, / 53

2 Multiple regression Multiple regression can be used to exploit the joint predictive power of several or many variables, and also to assess the added value of new markers in the presence of conventional risk factors. Commonly used modelling techniques: logistic regression for binary outcome Cox regression for time-to-event (survival) outcome P-values testing the null hypothesis of no association are not a good measure of predictive power. 2 / 53

3 Example: epo study 1 Anaemia is a deciency of red blood cells and/or hemoglobin and an additional risk factor for cancer patients. Randomized placebo controlled trial: does treatment with epoetin beta epo (300 U/kg) enhance hemoglobin concentration level and improve survival chances? Henke et al identied the c20 expression (erythropoietin receptor status) as a new biomarker for the prognosis of locoregional progression-free survival. 1 Henke et al. Do erythropoietin receptors on cancer cells explain unexpected clinical ndings? J Clin Oncol, 24(29): , / 53

4 Treatment The study includes head and neck cancer patients with a tumor located in the oropharynx (36%), the oral cavity (27%), the larynx (14%) or in the hypopharynx (23%). One of the treatments was radiotherapy following Resection Complete Incomplete No Placebo Epo with non-missing blood values 4 / 53

5 Outcome Blood hemoglobin levels were measured weekly during radiotherapy (7 weeks). Treatment with epoetin beta was dened successful when the hemoglobin level increased suciently. For patient i set Y i = { 1 treatment successful 0 treatment failed 5 / 53

6 Target Patient no. Treatment successful Predicted probability 1 0 P P P P P P P 7 6 / 53

7 Predictors Age min: 41 y, median: 59 y, max: 80 y Gender male: 85%, female: 15% Baseline hemoglobin mean: g/dl, std: 1.45 Treatment epo: 50%, placebo 50% Resection complete: 48%, incomplete: 19%, no resection: 34% Epo receptor status neg: 32%, pos: 68% 7 / 53

8 Logistic regression Response: treatment successful yes/no Factor OddsRatio StandardError CI.95 pvalue (Intercept) < Age [0.91; 1.03] Sex:female [0.91; 26.02] HbBase [1.99; 5.91] < Treatment:Epo [23.9; ] < Resection:Incompl [0.36; 9.03] Resection:Compl [1.13; 17.36] Receptor:positive [1.72; 23.39] / 53

9 The model provides general information Treatment with epo increases the chance (odds) of reaching the target hemoglobin level signicantly by a factor of (CI 95% : [23.9; 493.4], p < ) in the overall study population. Does that mean everyone should be treated? 9 / 53

10 The model provides information for a single patient For example: the predicted probability that a 51 year old man with complete tumor resection and baseline hemoglobin level 12.6 g/dl reaches the target hemoglobin level (Y i =1) is [Epo group: ] 97.4% [ Placebo: ] 29.2 % If a similar patient has baseline hemoglobin level 14.8 g/dl then the model predicts: [Epo group: ] 99.8% [Placebo: ] 84.7 % 10 / 53

11 Predictions and Brier score for logistic regression Patient Treatment Predicted Brier no. successful probability (%) Residual score Y i P i Y i P i (Y i P i ) < Σ / 53

12 The model behind the table ( ) Pi log = β 0 + β 1 x 1,i + + β k x k,i 1 P i P i = exp{β 0 + β 1 x 1,i + + β k x k,i } P i the probability of successful treatment x 1,i rst predictor for subject i: (e.g. age = 50) x 2,i second predictor for subject i: (e.g. gender = male) x k,i k'th predictor for subject i: (e.g. eporeceptor = pos) β 0,..., β k are regression coecients that are estimated based on the epo study 12 / 53

13 Predicted treatment success probability (logistic regression) For a treated man with no resection possible and negative epo receptor status. Predicted risk 100% 14 90% 80% 13 70% Baseline hemoglobin (g/dl) % 50% 40% 10 30% 9 20% 10% Age (years) 0% 13 / 53

14 Nomogram Points age sex HbBase Treat Resection eporec Total Points Linear Predictor female male Epo Placebo Incompl No Compl Chance of treatment success / 53

15 Nomogram: R-code library(rms) f7 <- lrm(y~age+sex+hbbase+treat+resection+eporec,data= Epo,x=TRUE,y=TRUE) dd <- datadist(epo) options(datadist = "dd") nom7 <- nomogram(f7, fun=function(x)1/(1+exp(-x)), fun.at=c(.001,.01,.05,0.25,0.75,.95,.99,.999), funlabel="chance of treatment success") plot(nom7) library(dynnom) f7 <- glm(y~age+sex+hbbase+treat+resection+eporec,data= Epo,family=binomial()) DynNom(f7,Epo,clevel=0.95) 15 / 53

16 Tools for evaluating prediction accuracy For each subject we have a predicted risk based on multiple predictors. To evaluate the prediction performance of the logistic regression model we consider the following tools: Prediction accuracy: Brier score (lack of calibration and lack of spread of predictions) Discrimination: Roc curve, c-index = AUC (lack of spread of predictions) Calibration plot: (lack of calibration) Re-classication scatterplot/table: (changes of risk predictions) Brier score: The squared dierence between the observed status and the predicted risk. AUC: The fraction of randomly selected pairs of patients where the predicted risk was higher for the diseased subject compared to the non-diseased subject. 16 / 53

17 Brier score for null model in the Epo study Patient Treatment Predicted Brier no. successful probability (%) Residual score Y i P i Y i P i (Y i P i ) Σ The predicted probability is the prevalence of patients with treatment success in the data set. 17 / 53

18 Prevalence model Calibration plot Observed proportion 0 % 25 % 50 % 75 % 100 % Performance null model Brier=24.7 AUC= % 25 % 50 % 75 % 100 % Predicted probability of treatment success 18 / 53

19 Univariate logistic regression models Categorical predictors library(rms) resecmodel <- lrm(y~resection,data=epo,x=true,y=true) sexmodel <- lrm(y~sex,data=epo,x=true,y=true) treatmodel <- lrm(y~treat,data=epo,x=true,y=true) ## or via glm treatmodel <- glm(y~treat,data=epo,family="binomial") Continuous predictors library(rms) basehbmodel <- lrm(y~hbbase,data=epo,x=true,y=true) agemodel <- glm(y~age,data=epo,family="binomial") 19 / 53

20 Categorical predictors Resection status Treatment success 0 Treatment success 1 No Incompl Compl Gender Treatment success 0 Treatment success 1 male female Treatment Treatment success 0 Treatment success 1 Placebo 66 8 Epo / 53

21 Categorical predictors: Resection status, gender, treatment Calibration plot Observed proportion 0 % 25 % 50 % 75 % 100 % Null model Brier=24.7 AUC=50.0 Gender model Brier=24.7 AUC=50.3 Resection model Brier=24.0 AUC=58.7 Treatment model Brier=13.6 AUC= % 25 % 50 % 75 % 100 % Predicted probability of treatment success 21 / 53

22 Continuous predictors: Baseline hemoglobin, Age Scatter plot Age (years) Baseline hemoglobin (g/dl) Treatment success Treatment failed 22 / 53

23 Continuous predictors: Baseline hemoglobin, Age Calibration plot 100 % Null model Brier=24.7 AUC= % Observed proportion 50 % Age model Brier=24.7 AUC= % Baseline hemoglobin model Brier=19.3 AUC= % 0 % 25 % 50 % 75 % 100 % Predicted probability of treatment success 23 / 53

24 Continuous predictors: Baseline hemoglobin, Age Roc curves Sensitivity 0 % 25 % 50 % 75 % 100 % Null model Brier=24.7 AUC=50.0 Age model Brier=24.7 AUC=51.2 Baseline hemoglobin Brier=19.3 AUC= % 25 % 50 % 75 % 100 % 1 Specificity 24 / 53

25 Continuous predictors: Baseline hemoglobin, Age Re classification plot Predicted chance (Age model) Predicted chance (Hemoglobin model) 0 % 25 % 50 % 75 % 100 % 0 % 25 % 50 % 75 % 100 % Treatment success Treatment failed 25 / 53

26 Multiple logistic regression Model excluding epo receptor status add <- lrm(y~age+sex+hbbase+treat+resection,data=epo,x= TRUE,y=TRUE) Model including epo receptor status add.epor <- lrm(y~age+sex+hbbase+treat+resection+eporec,data=epo,x=true,y=true) 26 / 53

27 Multiple logistic regression Re classification plot Predicted chance (excluding receptor status) Predicted chance (including receptor status) 0 % 25 % 50 % 75 % 100 % 0 % 25 % 50 % 75 % 100 % Treatment success Treatment failed 27 / 53

28 Multiple logistic regression Calibration plot 100 % Null model Brier=24.7 AUC= % Observed proportion 50 % 25 % All variables Brier= 9.6 AUC= % All + receptor status Brier= 8.7 AUC= % 25 % 50 % 75 % 100 % Predicted event probability 28 / 53

29 Multiple logistic regression Roc curves Sensitivity 0 % 25 % 50 % 75 % 100 % All variables Brier= 9.6 AUC=93.3 All + receptor status Brier= 8.7 AUC= % 25 % 50 % 75 % 100 % 1 Specificity 29 / 53

30 Exercises Do the tutorial 'Added value of new marker' 2. Split the IVF data (see link on course homepage) at random into two parts (60% for learning, 40% for evaluation). Then, build a multiple logistic regression model to predict response. Include the following covariates: antfoll, smoking, fsh, ovolume, bmi. 3. Produce a table which shows the odds ratios with condence limits (hint: Publish::publish.glm(t)) and write a caption which explains the table. 4. Produce a calibration plot and write a caption. (hint: ModelGood::calPlot2) 5. Produce a Roc curve, add the Brier score and AUC as a legend, and write a caption. 6. Build a second logistic regression model where you include the above variables and add the variable cyclelen. 7. Evaluate the added value of cyclelen: re-classication table and plot (hint: ModelGood::reclass), dierence in Brier scores and AUC with appropriate tests. Describe the underlying null hypotheses. 8. For each subject in the test data compute the dierence of the predictions between the model which excludes cyclelen and the model that includes cyclelen. Consider this dierence as a new continuous marker and produce the corresponding ROC curve and AUC. Describe the interpretation of AUC for this specic ROC curve in words and comment. 30 / 53

31 Model selection Very many dierent 'logistic regression models' can be constructed by selecting subsets of variables and transformations/groupings of variables. Standard multiple (logistic) regression works if the number of predictors is not too large, and substantially smaller than the sample size the decision maker has a-priory knowledge about which variables to put into the model Ad-hoc model selection algorithms, like automated backward elimination, do not lead to reproducible prediction models. 31 / 53

32 A Conversation of Richard Olshen with Leo Breiman 3... Olshen: What about arcing, bagging and boosting? Breiman: Okay. Yeah. This is fascinating stu, Richard. In the last ve years, there have been some really big breakthroughs in prediction. And I think combining predictors is one of the two big breakthroughs. And the idea of this was, okay, that suppose you take CART, which is a pretty good classier, but not a great classier. I mean, for instance, neural nets do a much better job. Olshen: Well, suitably trained? Breiman: Suitably trained. Olshen: Against an untrained CART? Breiman: Right. Exactly. And I think I was thinking about this. I had written an article on subset selection in linear regression. I had realized then that subset selection in linear regression is really a very unstable procedure. If you tamper with the data just a little bit, the rst best ve variable regression may change to another set of ve variables. And so I thought, Okay. We can stabilize this by just perturbing the data a little and get the best ve variable predictor. Perturb it again. Get the best ve variable predictor and then average all these ve variable predictors. And sure enough, that worked out beautifully. This was published in an article in the Annals (Breiman, 1996b) Statist. Sci. Volume 16, Issue 2 (2001), / 53

33 33 / 53

34 Backward elimination On full data (n=149): library(rms) data(epo) f7 <- lrm(y~age+sex+hbbase+treat+resection+eporec,data=epo,x=true,y=true) fastbw(f7) Deleted Chi-Sq d.f. P Residual d.f. P AIC age Resection Approximate Estimates after Deleting Factors Coef S.E. Wald Z P Intercept sex=female HbBase Treat=Epo eporec Factors in Final Model [1] sex HbBase Treat eporec 34 / 53

35 Backward elimination On reduced data (n=130): library(rms) data(epo) set.seed(1731) f7a <- lrm(y~age+sex+hbbase+treat+resection+eporec,data=epo[sample(1:149, replace=false,size=130),],x=true,y=true) fastbw(f7a) Deleted Chi-Sq d.f. P Residual d.f. P AIC age sex Resection Approximate Estimates after Deleting Factors Coef S.E. Wald Z P Intercept HbBase Treat=Epo eporec Factors in Final Model [1] HbBase Treat eporec 35 / 53

36 Guided model selection The hope of conventional regression modelling is that the better the model ts the better it predicts. But, the model should predict new patients. Prostate Cancer Risk Calculator: We used multivariable logistic regression to model the risk of prostate cancer by considering all possible combinations of main eects and interactions. The models chosen were those that minimized the Bayesian information criterion (BIC) and maximized the average out-of-sample area under the receiver operating characteristic curve (via 4-fold cross-validation). 36 / 53

37 The two cultures 4 4 L. Breiman. Statistical modeling: The two cultures. Statistical Science, 16 (3): , / 53

38 The two cultures 38 / 53

39 Classication trees A tree model is a form of recursive partitioning. It lets the data decide which variables are important and where to place cut-os in continuous variables. In general terms, the purpose of the analyzes via tree-building algorithms is to determine a set of splits that permit accurate prediction or classication of cases. In other words: a tree is a combination of many medical tests. 39 / 53

40 Epo study 1 arm p < Placebo Epo 2 Resection p = HbBase p < {No, Incomplete} Complete 11.3 > 11.3 Node 3 (n = 39) Node 4 (n = 35) Node 6 (n = 19) Node 7 (n = 56) / 53

41 Roughly, the algorithm works as follows: 1. Find the predictor so that the best possible split on that predictor optimizes some statistical criterion over all possible splits on the other predictors. 2. For ordinal and continuous predictors, the split is of the form X < c versus X c. 3. Repeat step 1 within each previously formed subset. 4. Proceed until fewer than k observations remain to be split, or until nothing is gained from further splitting, i.e. the tree is fully grown. 5. The tree is pruned according to some criterion. 41 / 53

42 Characters of classication trees But: Trees are specically designed for accurate classication/prediction Results have a graphical representation and are easy to interpret No model assumptions Recursive partitioning can identify complex interactions One can introduce dierent costs of miss-classication in the three Trees are not robust against even small perturbations of the data. It is quite easy to over-t the data. 42 / 53

43 More complex tree (overtting?) 1 arm p < Placebo Epo 2 Resection p = HbBase p < {No, Incomplete}Complete 11.3 > HbBase p = Resection p = No{Incomplete, Complete} 12.1 > eporec p = positive negative Node 3 (n = 39) Node 5 (n = 25) Node 6 (n = 10) Node 8 (n = 19) Node 10 (n = 18) Node 12 (n = 27) Node 13 (n = 11) / 53

44 Comparing the dierent predictions Patient no. Treatment successful Predicted probability (%) Simple Complex LRM tree tree / 53

45 Comparing the dierent predictions Model Brier score AUC Simple tree Logistic regression Complex tree Random forest Note: These numbers are estimated by using the same data that were used to construct the models. 45 / 53

46 Dilemma: Both, logistic regression with automated variable selection, e.g., backward elimination, and also decision trees are notoriously unstable (overt). How shall we proceed? 46 / 53

47 In search of a solution Genuine algorithms to obtain a useful prediction model: X i Neural Nets Support Vector Machines Bump hunting and LASSO Rigde regression and boosting RandomForests Logic regression ˆF (y X i ) All these algorithms can be applied in high dimensional settings, i.e., when there are more candidate predictor variables than subjects. 47 / 53

48 Penalized likelihood regression (works for logistic and Cox partial likelihood) Ridge regression: ˆβ ridge = argmax{likelihood(β) λ j β 2 j } Shrinks LASSO regression: ˆβ LASSO = argmax{likelihood(β) λ j β j } Shrinks and selects Elastic net: combines L1 and L2 norm 48 / 53

49 Package glmnet library(modelgood) library(glmnet) g1a <- glmnet(as.numeric(epo$y)-1,x=model.matrix(~-1 +age + HbBase + Treat + Resection + eporec+sex,data=epo),alpha=0.1) g1 <- ElasticNet(Y~age + HbBase + Treat + Resection + eporec+sex,data=epo,alpha=0.1) plot(g1a) print(g1) $call ElasticNet(formula = Y ~ age + HbBase + Treat + Resection + eporec + sex, data = Epo, alpha = 0.1) $enet Call: glmnet(x = covariates, y = response, alpha = 0.1, lambda = optlambda) Df %Dev Lambda [1,] $Lambda [1] attr(,"class") [1] "ElasticNet" 49 / 53

50 Shrinked regression coecients Coefficients L1 Norm 50 / 53

51 A function of the penalization parameter λ Coefficients Log Lambda 51 / 53

52 Summary Predicted probabilities for the unknown current or future event status of a subject can be obtained from a penalized or unpenalized logistic regression models. Predictions can also be obtained from a decision tree or random forest. Re-classication plots, calibration plots, ROC curves, Brier score and AUC can be used to assess and compare the performance of dierent models. The apparent comparison using the same data that were used to select and t the models is not fair and may be grossly misleading. Advanced algorithmic methods have tuning parameters which are optimized for obtaining accurate predictions. 52 / 53

53 Exercise 2.2 Consider the results of Exercise 2.1. Change the seed several times used to split the IVF data and repeat the analysis. Report the Monte carlo error in the AUC of the two models. Introduce a random normal noise variable into the IVF data set and analyse its added value. Repeat with 10 such variables to see if any of these random noise variable has higher added value than cyclelen. 53 / 53