Stata logistic regression nomogram generator



Similar documents
Development of the nomolog program and its evolution

XPost: Excel Workbooks for the Post-estimation Interpretation of Regression Models for Categorical Dependent Variables

Generalized Linear Models

11. Analysis of Case-control Studies Logistic Regression

MODEL I: DRINK REGRESSED ON GPA & MALE, WITHOUT CENTERING

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.)

Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, Last revised February 21, 2015

Descriptive Statistics

Gamma Distribution Fitting

Multivariate Logistic Regression

Simple Predictive Analytics Curtis Seare

Microsoft Excel Tutorial

Interaction effects and group comparisons Richard Williams, University of Notre Dame, Last revised February 20, 2015

Module 2 Basic Data Management, Graphs, and Log-Files

Probit Analysis By: Kim Vincent

Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY

SPSS: Getting Started. For Windows

USC Marshall School of Business Marshall Information Services

Scatter Plot, Correlation, and Regression on the TI-83/84

Competent Data Management - a key component

Statistics and Data Analysis

SPSS The Basics. Jennifer Thach RHS Assessment Office March 3 rd, 2014

Developing Risk Adjustment Techniques Using the System for Assessing Health Care Quality in the

Assignment 2: Option Pricing and the Black-Scholes formula The University of British Columbia Science One CS Instructor: Michael Gelbart

MA261-A Calculus III 2006 Fall Homework 3 Solutions Due 9/22/2006 8:00AM

Stata Walkthrough 4: Regression, Prediction, and Forecasting

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

Everything You Wanted to Know about Moderation (but were afraid to ask) Jeremy F. Dawson University of Sheffield

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone:

Introduction to Logistic Regression

Spreadsheet software for linear regression analysis

0 Introduction to Data Analysis Using an Excel Spreadsheet

Diploma of Business Unit Guide 2015

Overview Accounting for Business (MCD1010) Introductory Mathematics for Business (MCD1550) Introductory Economics (MCD1690)...

SEQUENCES ARITHMETIC SEQUENCES. Examples

WESTMORELAND COUNTY PUBLIC SCHOOLS Integrated Instructional Pacing Guide and Checklist Computer Math

From this it is not clear what sort of variable that insure is so list the first 10 observations.

Instructions for SPSS 21

Assignment objectives:

Straightening Data in a Scatterplot Selecting a Good Re-Expression Model

Interaction effects between continuous variables (Optional)

Binary Logistic Regression

Alexander Zlotnik. Ramon y Cajal University Hospital. Stata is a registered trademark of StataCorp LP, College Station, TX, USA.

Directions for using SPSS

Statistical Analysis Using SPSS for Windows Getting Started (Ver. 2014/11/6) The numbers of figures in the SPSS_screenshot.pptx are shown in red.

TCP/IP Networking, Part 2: Web-Based Control

The F distribution and the basic principle behind ANOVAs. Situating ANOVAs in the world of statistical tests

Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.

Category Management Software Solution

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

MULTIPLE REGRESSION EXAMPLE

AP Physics 1 and 2 Lab Investigations

Psychology 205: Research Methods in Psychology

PROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY

Computer-Aided Multivariate Analysis

Introduction to SPSS 16.0

Using Your TI-89 in Elementary Statistics

Syntax Menu Description Options Remarks and examples Stored results References Also see

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups

EXCEL Tutorial: How to use EXCEL for Graphs and Calculations.

The programming language C. sws1 1

Visual Basic Programming. An Introduction

IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA

LOGISTIC REGRESSION ANALYSIS

Forecasting in STATA: Tools and Tricks

ACCUPLACER Arithmetic & Elementary Algebra Study Guide

Integrating Spreadsheet Templates and Data Analysis into Fluid Power Instruction

There are six different windows that can be opened when using SPSS. The following will give a description of each of them.

Data Analysis Methodology 1

Break-even analysis. On page 256 of It s the Business textbook, the authors refer to an alternative approach to drawing a break-even chart.

Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD

Multinomial and Ordinal Logistic Regression

PEER REVIEW HISTORY ARTICLE DETAILS VERSION 1 - REVIEW. Elizabeth Comino Centre fo Primary Health Care and Equity 12-Aug-2015

Calibration and Linear Regression Analysis: A Self-Guided Tutorial

Introduction Course in SPSS - Evening 1

25 Working with categorical data and factor variables

Unit 6: Polynomials. 1 Polynomial Functions and End Behavior. 2 Polynomials and Linear Factors. 3 Dividing Polynomials

Cross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models.

Stata web services: Toward Stata-based healthcare informatics applications integrated

Tutorial for the TI-89 Titanium Calculator

Data analysis process

STATISTICA Formula Guide: Logistic Regression. Table of Contents

SPSS Resources. 1. See website (readings) for SPSS tutorial & Stats handout

How To: Analyse & Present Data

Nonlinear Regression Functions. SW Ch 8 1/54/

Using Excel for Handling, Graphing, and Analyzing Scientific Data:

Traditional Conjoint Analysis with Excel

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

Visual basic string search function, download source code visual basic 6.0 gratis. > Visit Now <

THE CERN/SL XDATAVIEWER: AN INTERACTIVE GRAPHICAL TOOL FOR DATA VISUALIZATION AND EDITING

Nominal and Real U.S. GDP

The Circumference Function

Moderation. Moderation

Data analysis and regression in Stata

Formulas & Functions in Microsoft Excel

RELEVANT TO ACCA QUALIFICATION PAPER P3. Studying Paper P3? Performance objectives 7, 8 and 9 are relevant to this exam

Getting Started with R and RStudio 1

Introduction. Survival Analysis. Censoring. Plan of Talk

Transcription:

Stata logistic regression nomogram generator WORK IN PROGRESS PAPER Alexander Zlotnik, Telecom.Eng. Víctor Abraira Santos, PhD Ramón y Cajal University Hospital

NOTICE Nomogram generators for logistic and Cox regression models have been updated since this presentation. Download links to the latest program versions (nomolog & nomocox), examples, tutorials and methodological notes are available on this webpage: http://www.zlotnik.net/stata/nomograms/

Structure of Presentation Introduction Logistic regression nomograms Objectives Stata programming Gotchas Programming techniques Results Limitations Future work

Introduction Nomograms are one of the simplest, easiest and cheapest methods of mechanical calculus. ( ) precision is similar to that of a logarithmic ruler ( ). Nomograms can be used for research purposes ( ) sometimes leading to new scientific results. Source: Nomography and its applications G.S.Jovanovsky, Ed. Nauka, 1977

Introduction Examples

Structure of Presentation Introduction Logistic regression nomograms Objectives Stata programming Gotchas Programming techniques Results Limitations Future work

Logistic regression predictive models Logistic regression based predictive models are used in many fields, clinical research being one of them. Problems: Variable importance is not obvious for some clinicians. Calculating an output probability with a set of input variable values can be laborious for these models, which hinders their adoption.

Logistic regression predictive models Source: YanSun, 2011

Logistic regression nomograms A nomogram could make this calculation much easier. Ex: Kattan nomograms for prostate cancer. Source: prostate-cancer.org

Logistic regression nomograms Output probability calculations are much easier. Variable importance is clear at a glance (longer the line => more important variable).

Logistic regression nomograms Logistic regression nomogram generation Plot all possible scores/points (α 1 x i ) for each variable (X 1..N ). Get constant (α 0 ). Transform into probability of event given the formula p = α + TP) 1+ e 1 ( 0 Total points = TP = α 1 X 1 + α 2 X 2 +

Logistic regression nomograms Nomogram usage for a given case Get input variable values, i.e. X 1 =x 1, X 2 =x 2 Obtain scores for all variables. Add all scores. Get probability (on a scale adjusted by the constant α 0 ).

Dialysis time Risk of death for a 40 year old woman, LOS (length of stay) = 20 days, Dialysis time = 75 months

Dialysis time Risk of death for a 40 year old woman, LOS (length of stay) = 20 days, Dialysis time = 75 months 40 years old => ~ 2.6 points woman => ~ 0 points LOS = 20 days => ~ 0.3 points Dialisis time = 75 m => ~ 0.5 points -------------------------- Total score 3.4 Probability 0.05 5%

Structure of Presentation Introduction Logistic regression nomograms Objectives Stata programming Gotchas Programming techniques Results Limitations Future work

Objectives Build a general purpose nomogram generator made entirely in Stata without external module dependence. Executable after arbitrary logistic or logit Stata commands. Automatic (or imposed) variable and data labeling. Automatic (or imposed) variable min/max, divisions, variable labels, dummy data labels.

Structure of Presentation Introduction Logistic regression nomograms Objectives Stata programming Gotchas Programming techniques Results Limitations Future work

Stata programming Gotchas Macro string variables limited to 244 chars. Solution: strl? Stata 13? Hard to grasp macro nesting syntax. Lack of built in data structures (non numeric arrays, dictionaries, lists, etc).

Stata programming Gotchas Lack of decent debugger (set trace on/off is not enough). Steep learning curve for error interpretation. Unit testing is hard to implement.

Structure of Presentation Introduction Logistic regression nomograms Objectives Stata programming Gotchas Programming techniques Results Limitations Future work

Programming techniques Time series graph hacks. logit & logistic outputs. Variable & data labels. Macro nesting.

Programming techniques Time series graph hacks Fictitious data points mytime Actual data point

Programming techniques Logit & logistic outputs

Programming techniques Custom data structures

Programming techniques Variable & data labels

Programming techniques Macro nesting

Structure of Presentation Introduction Logistic regression nomograms Objectives Stata programming Gotchas Programming techniques Results Limitations Future work

Execution example

Variable labels are used if defined Variable labels are used if defined

Positive coefficients Usually, positive coefficients are required in these nomograms in order to ease calculations (no substractions to get total score). Due to the linear nature of the TP term, it is easy to make all coefficients positive. p = α + TP) 1+ e 1 ( 0 TP = α 1 X 1 + α 2 X 2 +

Positive coefficients This can be done manually. Let s see an example logit muerto edadr ib1.gtrata tpodial ib2.hbsagdon

(hbsagdon) (Gtrata) Some dummy coefficients are negative

Positive coefficients The most negative coefficient in Gtrata is (3) Tacro, i.e. data value 3. Hence, instead of logit muerto edadr ib1.gtrata tpodial ib2.hbsagdon we use logit muerto edadr ib3.gtrata tpodial ib2.hbsagdon

(hbsagdon) (Gtrata) Now all dummy coefficients are positive

Positive coefficients The program also can perform this operation automatically (work in progress). Since coefficients are linearly related, they do not need to be recalculated.

Results Supports any kind of variable ordering. Supports negative coefficients. Supports omitted variables due to collinearity. Works after an almost arbitrary regression command. Let s see execution modes & parameters

Remember the custom data structure

Execution modes Automatic: everything is determined automatically. Manual: define all parameters manually (laborious). Hybrid: get some stuff automatically and refine it manually. For example: variable range is 14 to 81 and we want to make it 10 to 100 leaving all other variables as they are.

Hybrid mode The edadr coefficient does not change

By the way Since the output is a standard Stata graph, it may be edited for further adjustments.

Other execution options imaxvarlabellen = 30 //Max N of chars to display in variable labels imaxdatalabellen = 30 //Max N of chars to display in data value labels ivarlabdescr = 0 //Use variable description as variable label when possible (0=no; 1=yes) idummylabwithvalues = 1 //Show data values on dummy data value labels (0=no; 1=yes) icoefforcepositive = 0 //Force positive coefs (0=no; 1=yes)

Structure of Presentation Introduction Logistic regression nomograms Objectives Stata programming Gotchas Programming techniques Results Limitations Future work

Limitations Dummy syntax must be bx.var. No interaction operators ( #, ## ) allowed. Maximum of 15 variables (xtline command options overflow). Suggestions? Categorical variables have a limit of with 40 dummys each (easy to supersede if needed). Several performance improvements possible (max. string length).

Structure of Presentation Introduction Logistic regression nomograms Objectives Stata programming Gotchas Programming techniques Results Limitations Future work

Future work Overcoming Stata limits: string variables, xtline command. Better drawing adjustments (fonts, axis, etc). Interaction operators support. Cox regression nomograms.

Questions?

Backup slides

Dummy coefficient re adjustment Given a categorical variable A with N cátegories and a regression constant α 0 z = α 0 + α A1 D 1 + α A2 D 2 + + α AN D N If α Ai i=1..n < 0, we set as reference the most negative coefficient i.e. min(α Ai i=1..n )

Dummy coefficient re adjustment p = α + TP) 1+ e 1 ( 0 TP = α A1 D1 + α A2 D2+

Dummy coefficient re adjustment And then z = β 0 + β A1 D 1 + β A2 D 2 + + β AN D N where β 0 = α 0 min(α Ai i=1..n ) β 1 = α 1 min(α Ai i=1..n ) β N = α N min(α Ai i=1..n )

Normal execution Some dummy coefficients are negative

Execution with forced positive coefficients All coefficients are forced to be positive This causes a displacement in the Total score to Prob conversion (due to α 0 )