How To Use Logistic Regression On A Palm Handheld
|
|
|
- Tracey Burke
- 5 years ago
- Views:
Transcription
1 Decisions at Hand: A Decision Support System on Handhelds Blaz Zupan a,b,c, Ales Porenta a, Gaj Vidmar d, Noriaki Aoki c, Ivan Bratko a,b, J. Robert Beck c a Faculty of Computer and Information Science, University of Ljubljana, Slovenia b Jozef Stefan Institute, Ljubljana, Slovenia c Baylor College of Medicine, Houston, TX, USA d Faculty of Arts, University of Ljubljana, Slovenia Abstract One of the applications of clinical information systems is decision support. Although the advantages of utilizing such aids have never been theoretically disputed, they have been rarely used in practice. The factor that probably often limits the utility of clinical decision support systems is the need for computing power at the very site of decision making - at the place where the patient is interviewed, in discussion rooms, etc. The paper reports on a possible solution to this problem. A decision-support shell LogReg is presented, which runs on a handheld computer. A general schema for handheld-based decision support is also proposed, where decision models are developed on personal computers/workstations, encoded in XML and then transferred to handhelds, where the models are used within a decision support shell. A use case where LogReg has been applied to clinical outcome prediction in crush injury is presented. Keywords: Decision support; Handhelds; Logistic regression Introduction In the past clinical information systems have mainly focused on the management of clinical data. Besides the data management, modern clinical information systems are more ambitious and many include tools for decision support. Providing physicians with such tools can increase the quality of clinical decisions and decrease costs []. Clinical decision support systems can help in prognosis, diagnosis, treatment planning and other frequent and non-trivial tasks physicians have to face in daily practice. Decision support systems in medicine can be viewed as intelligent advisors, or sources of second opinion. Their typical life cycle often consists of defining a problem on which to focus, gathering the corresponding retrospective data, and constructing the predictive model. The predictive models form a core of decision support systems: a physician would enter the data on the patient under consideration, and then invoke the decision support model to compute the corresponding probabilities of, say, diagnosis or prognosis, success of treatment plans. For medical decision support systems to be accepted and used by physicians, they have to be available wherever decisions take place: in physicians offices, during visits at patient s beds, in meeting rooms when discussing their use with colleague physicians, In other words, the decision support system has to be immediately accessible best, the technology has to reside and be readily available in the physician s pocket. Recent technological advances can help us to solve the abovementioned problem. Namely, handheld computers are more and more becoming a standard physician s accessory to track dates, store contact information, or even manage other data on patients. They are computationally sufficient to carry decision support tasks of reasonable complexity. While there are an increasing number of medical applications available for handhelds, their use in clinical decision support is at best rare. Of the few systems that exist and are in routine use, we here mention the Prostogram by M. Kattan and P. Fearn [4,5,6] from Memorial Sloan Kettering Hospital in New York, NY. Using Prostogram, expert urologist can enter the preoperative or postoperative data of a patient with prostate cancer, and compute the probability of cancer recurrence after radical prostatectomy. The program is easy to use and is in routine use among physicians in the Department of Urology at the aforementioned institution. Its major weakness, however, is that the prognostic model in the Prostogram is hardcoded in the program. Any alternation of the prognostic model, like changing the coefficients used to compute probabilities, or adding other predictive variables, requires changes of the underlying program, its recompilation and reinstallation on the handheld not a trivial task that requires involvement of programmer with special skills. The work described in this paper was inspired by the approach of Kattan and Fearn, by extending their idea to the point where a physician that has a decision support model of a particular type available can use it on handhelds without the need to devise a special program. For this
2 purpose, we have developed a decision support shell called LogReg that runs on Palm handhelds, an often used handheld platform. LogReg takes a decision support model encoded in XML and provides for data entry and modelbased reasoning on a handheld. The current version of LogReg supports models obtained by logistic regression, a modern statistical technique that is often used for medical data analysis and modeling. In the paper, we first introduce basic concepts of logistic regression and present some of its aspects important for the proposed decision support shell. The discussion of LogReg s architecture is given next. We use the model that predicts the probability of crush syndrome, and present a series of LogReg s snapshots when using this model. Several ongoing developments that are using the proposed framework are discussed in the Conclusion. Logistic Regression The Logistic Model Logistic regression is one of the most frequently used multivariate statistical methods in the biomedical sciences and it is extensively treated in various textbooks [7,3]. It is use for predicting a binary dependent variable, i.e., an outcome coded as either 0 (e.g., NO or disease absence) or (e.g., YES or disease presence). It is based on the logistic function, which is of sigmoid shape and asymptotically approaches or - when z approaches positive or negative infinity, respectively: f ( z) = () z + The linear combination of the independent variables (the components of the predictors' vector x) e z = α + β X + h + β X (2) is included in the logistic model k k ( = ) = ( x) = ( ) P Y X X P f z (3),, k An increase in the sum of β i X i thus implies an increase in the conditional probability of the outcome. The model is basically a linear one, as the relationship between the predictors and the predicted value, subject to the logit transformation (i.e., taking the natural logarithm of the odds), is linear. The regression equation is P z = logit ( P) = ln P (4) As always within the general linear model, one can include binary, categorical and/or quantitative predictors in the logistic regression model, as well as higher-order terms and interactions, whereby each nominal (categorical) variable with n categories is encoded with n- binary ( dummy ) variables. Due to its prototype nature and in order to assure simplicity of display and interpretation, LogReg is presently limited to models without higher-order terms and interactions. The usual linear regression is less suitable for predicting a binary outcome (in which case the second classical alternative linear discriminant analysis is equivalent to linear regression) than the logistic model, even though for P between 0.2 and 0.8 the two models do not differ much. One reason is face validity, since linear regression can predict dependent-variable values above or below 0, while the major advantage of the logistic model lies in its robustness (its assumptions are weaker, the linear model's assumption of normally distributed residuals can not be met). Logistic regression procedures are built into numerous modern statistical software packages. The maximum likelihood method is almost exclusively used for estimating model parameters. It requires somewhat larger samples, but with a sufficiently high ratio of the number of cases compared vs. the number of variables, it produces much more reliable results than the least-squares method. A problem, which is particularly relevant for our application, is estimation of the baseline odds, i.e., the parameter α which equals logit P (0,,0). In order to be able to estimate it, one must collect the data with a followup study, or carry out a case-control study while knowing the sampling fractions of the population within cases and controls [3]. Relative influence (importance) of individual predictors is usually expressed in terms of odds ratio (OR), with which there is no estimation problem with case-control and cross-sectional data. The general formula OR i( Xi Xi2) = e β (5) reduces to e β in case of binary predictors. For numeric predictors, the formula represents the change in the odds if X i increases by one unit holding all other predictor values constant. Estimating Confidence Interval for Predicted Value The predicted value P for a case that was not present in the training set is obtained by plugging its X values into the logistic formula with the estimated regression coefficients. Henceforth we will assume that P>0.5 always implies classifying the case into the class and P<0.5 always implies class 0, because determination of optimal cut-off point for classification far exceeds the scope of our application, so as does the whole area of measures of predictive accuracy of the logistic regression model. The systems for supporting medical decision-making should provide the user at least with some idea of the reliability (certainty) of the decision. Therefore we implemented calculation of confidence interval (CI) for predicted value in LogReg. Approximating the binomial distribution of
3 proportion p with the normal distribution, i.e., obtaining CI by calculating the standard error as ( ) SE p ( p) p = (6) N would be a feasible procedure for sample size over 00 (which is a rule when developing logistic regression models) for P between 0.2 and 0.8, but such a solution does not work for extreme proportions (where the upper bound of the 95%-CI can exceed, and the lower bound can fall below 0) and it also completely ignores the information provided by the values of independent variables for the classified case. Hence, a modern procedure [9] was implemented, which is based on the linear nature of the logistic model: the standard inferential apparatus of linear regression is used to assess g(x) = logit(p), then the obtained CI bounds for g(x) are transformed into probabilities with the inverse transformation P = g( ) e (7) + x To implement this in LogReg, two problems had to be solved, arising from the standard expression for 95% (or with proper adjustment of the multiplicative factor applied to the standard error of the estimate any other narrower or wider) CI for predicted valued in linear regression [8]: ( ) ( ) CI = E Y ± MSE + h (8) ii The first problem is of statistical nature and regards the mean squared error (MSE): there is no real equivalent of this quantity in logistic regression, so in the definition ( Y E( Y) ) MSE = (9) number of predictors instead of using the difference between actual and predicted value, the difference between actual class (0 or ) and predicted probability P can be used, regardless of the fact that the calculation of standard error of estimate is carried out within the world of logit values. The leverages (h ii ), i.e., the diagonal values of the matrix H=X (X T X) - X T, represent a technical problem. Transferring the data matrix X (which has values in the first column, corresponding to the constant term in the regression equation) which has been used to develop the logistic regression model into the handheld, then transposing, inverting and multiplying matrices in the handheld is out of consideration because of the handheld's memory and speed limitations. An efficient solution is to transfer previously calculated matrix (X T X) - into the handheld and calculate leverage of a new case as T ( ) T h ii i i = x X X x (0) Explaining Decisions with Risk Odds Ratio In addition to the presentation of uncertainty of prediction, explaining the system's decision increases the quality of support for the user's decision. The issue of inter-relations and visualization of different parameters and characteristics of logistic regression analysis demands extensive and indepth coverage, such as provided by Harrell [2], while here we only mention the display of risk odds ratio (ROR), provided by LogReg for each of the predictors. Generally speaking, ROR provides a comparison between any two predictor configurations regarding the predicted value P, but it is usually calculated for a pair of configurations (cases) differing only in one predictor value. If the chosen predictor is binary, the expression is either computationally equivalent to the odds ratio (in the case of LogReg this applies to the cases with the value of the predictor of interest), or it equals (if the predictor value is 0). With numeric predictors, an individual case is compared with a value representative for the population from which the analyzed sample has been taken, which is usually the sample arithmetic mean of predictor (this is also the implementation in LogReg). If the predictor is a categorical one, the calculation (which, as already stressed, only applies to models without interaction terms) can best be illustrated by the following example. Four categories are coded as a group of three binary variables, whereby the baseline (first) category has been coded as X a =, X 2a =0 and X 3a =0, and the individual for whom we have predicted P by means of logistic regression belongs to the category coded as third (X b =X 2b =0, X 3b =): ( X X ) β + ( X X ) β + ( X X ) β β + β ROR e e a b 2a 2b 2 3a 3b 3 3 = = () 3: Architecture of Handheld-Based Decision Support System To use LogReg on a Palm handheld, one has first to develop a decision support model on a personal computer or a workstation (Figure ). The model has than to be encoded in XML and (with a help of a special program) the user has to mark the corresponding XML file to be transferred to the Palm handheld. The model is then loaded to Palm handheld with its first synchronization with a PC. Multiple models can be prepared and selected to be used on Palm handheld in this way. One can prepare and choose any number of such models. Once models are transferred to the Palm handheld, they are ready to be used. When implementing LogReg, we needed to solve three problems: () the transfer of XML-encoded models to the Palm handheld, (2) the data entry on a Palm handheld, and (3) the utility of the decision models. For synchronization, a special program is automatically run on a PC that opens the XML files that have been marked by the user, parses the XML files, and sends the data on the models to handheld s data base. When invoked, LogReg
4 consults the database to see which models reside on the Palm handheld. The most recently used model is selected, and an entry form with predictor variables is dynamically generated. After the user specifies the data, LogReg consults the database again to retrieve the logistic model s coefficients that are used for computation of the probabilities. E D workstation handheld statistical analysis data mining XML synchronization C clinical data decision support model data base decision support shell Figure Handheld-based decision support system Notice that the content of the LogReg s screen depends on the information encoded in XML files. The construction of a decision support system for some specific problem thus consists of defining the corresponding XML file that encodes the decision support model, and does not involve any programming. Use Case: Clinical Outcome Prediction in Crush Injury A B Figure 2 offers a series of snapshots that show how LogReg can be used in practice. For demonstration purposes, we here use the model developed by co-author Aoki and his colleagues. The model is used for clinical outcome prediction in crush injury and consists of three prognostic variables (delay of response by emergency medical service personnel, patient s urine color and pulse rate), and based on their values the model computes the probability the patient has severe crush syndrome. To use LogReg, the user has to press on its icon (Figure 2.A). The program displays the entry form of the model that was most recently used (B). The user can also see which other models were loaded into Palm handheld (D) and display additional information about each model (E). LogReg supports different modes for data entry (B, C), including pull-down menus and check boxes for nominal and entry fields for continuous variables. To engage the decision support model, the user has to press the Compute button (F): LogReg displays the probability of outcome and its confidence intervals (G). The influence of each of the prognostic variables on the outcome s probability can be observed through risk odds ratios (H). The XML file that encodes the crush injury syndrome model and that was used in our example can be obtained from the web site at H Figure 2 Snapshots from LogReg F G
5 Conclusion We have described the framework and the decision support shell LogReg that enable the use of logistic regression models on handheld computers. Our main assumption that such systems will be useful in clinical practice is the observation that handhelds, and in particular Palm handhelds, are increasingly used by physicians, who are becoming familiar with such devices and their easy-to-use operating systems. While LogReg at present only supports the reasoning with logistic regression models, the concept that we have described in this paper is general and can be extended to other modeling techniques. At the time of writing this paper, several applications of LogReg are being developed. With Baylor College of Medicine in Houston, TX, we are developing decision support systems for management of trauma patients. A treatment planning decision support system for patients with hip or wrist injury is being developed in collaboration with the Clinical Center in Ljubljana, Slovenia. With Memorial Sloan Kettering Hospital in New York we are constructing LogReg-compatible predictive models for cancer recurrence prognosis and treatment selection. Acknowledgments We would like to acknowledge the financial support of Ministry of Science and Technology of Republic of Slovenia and of American Cancer Society (grant no ). CHS, a company from Ljubljana, Slovenia, sponsored some of the Palm devices that were used in the project described. We are thankful to Michael W. Kattan, Begoña Campos Bonilla, Clint Cummins and Janez Stare for valuable suggestions and advice. References [] Anderson JD: Increasing the acceptance of clinical information systems. MD Computing 999; 6():62-5. [2] Harrell FE: Predicting outcomes: applied survival analysis and logistic regression. Charlotesville 999: School of Medicine, University of Virginia. [3] Hosmer DW, Lemeshow S: Applied logistic regression. New York 989: Wiley. [4] Kattan MW, Zelefsky MJ, Kupelian PA, Scardino PT, Fuks Z, Leibel SA. Pretreatment nomogram for predicting the outcome of three-dimensional conformal radiotherapy in prostate cancer. J Clin Oncol 2000; 8(9): [5] Kattan MW, Wheeler TM, Scardino PT. Postoperative nomogram for disease recurrence after radical prostatectomy for prostate cancer. J Clin Oncol 999; 7(5): [6] Kattan MW, Fearn P: Nomograms Web Page [7] Kleinbaum DG: Logistic regression: a self-learning text. New York 994: Springer. [8] Myers JL, Well AD: Research design and statistical analysis. New York 99: Harper Collins. [9] Piergorsch WW, Casella G: Confidence bands for logistic regression with restricted predictor variables. Biometrics 988; 44(3): Address for correspondence Blaz Zupan, Faculty of Computer and Information Science, University of Ljubljana, Trzaska 25, SI-000 Ljubljana, Slovenia [email protected] url:
STATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
11. Analysis of Case-control Studies Logistic Regression
Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
Individual Prediction
Individual Prediction Michael W. Kattan, Ph.D. Professor of Medicine, Epidemiology and Biostatistics, Cleveland Clinic Lerner College of Medicine of Case Western Reserve University Chairman, Department
SAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
Statistical Rules of Thumb
Statistical Rules of Thumb Second Edition Gerald van Belle University of Washington Department of Biostatistics and Department of Environmental and Occupational Health Sciences Seattle, WA WILEY AJOHN
15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
Systematic Reviews and Meta-analyses
Systematic Reviews and Meta-analyses Introduction A systematic review (also called an overview) attempts to summarize the scientific evidence related to treatment, causation, diagnosis, or prognosis of
Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)
Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and
Multivariate Logistic Regression
1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation
Statistics in Retail Finance. Chapter 6: Behavioural models
Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural
Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus
Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives
13. Poisson Regression Analysis
136 Poisson Regression Analysis 13. Poisson Regression Analysis We have so far considered situations where the outcome variable is numeric and Normally distributed, or binary. In clinical work one often
Multiple logistic regression analysis of cigarette use among high school students
Multiple logistic regression analysis of cigarette use among high school students ABSTRACT Joseph Adwere-Boamah Alliant International University A binary logistic regression analysis was performed to predict
Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios
Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios By: Michael Banasiak & By: Daniel Tantum, Ph.D. What Are Statistical Based Behavior Scoring Models And How Are
Multinomial Logistic Regression
Multinomial Logistic Regression Dr. Jon Starkweather and Dr. Amanda Kay Moske Multinomial logistic regression is used to predict categorical placement in or the probability of category membership on a
Binary Diagnostic Tests Two Independent Samples
Chapter 537 Binary Diagnostic Tests Two Independent Samples Introduction An important task in diagnostic medicine is to measure the accuracy of two diagnostic tests. This can be done by comparing summary
Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.)
Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.) Logistic regression generalizes methods for 2-way tables Adds capability studying several predictors, but Limited to
Logistic regression modeling the probability of success
Logistic regression modeling the probability of success Regression models are usually thought of as only being appropriate for target variables that are continuous Is there any situation where we might
Regression III: Advanced Methods
Lecture 4: Transformations Regression III: Advanced Methods William G. Jacoby Michigan State University Goals of the lecture The Ladder of Roots and Powers Changing the shape of distributions Transforming
Regression Modeling Strategies
Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions
Dealing with Missing Data
Dealing with Missing Data Roch Giorgi email: [email protected] UMR 912 SESSTIM, Aix Marseille Université / INSERM / IRD, Marseille, France BioSTIC, APHM, Hôpital Timone, Marseille, France January
Logistic Regression (1/24/13)
STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used
Linda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén www.statmodel.com. Table Of Contents
Mplus Short Courses Topic 2 Regression Analysis, Eploratory Factor Analysis, Confirmatory Factor Analysis, And Structural Equation Modeling For Categorical, Censored, And Count Outcomes Linda K. Muthén
Advanced Quantitative Methods for Health Care Professionals PUBH 742 Spring 2015
1 Advanced Quantitative Methods for Health Care Professionals PUBH 742 Spring 2015 Instructor: Joanne M. Garrett, PhD e-mail: [email protected] Class Notes: Copies of the class lecture slides
X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)
CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.
CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19
PREFACE xi 1 INTRODUCTION 1 1.1 Overview 1 1.2 Definition 1 1.3 Preparation 2 1.3.1 Overview 2 1.3.2 Accessing Tabular Data 3 1.3.3 Accessing Unstructured Data 3 1.3.4 Understanding the Variables and Observations
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +
Generalized Linear Models
Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the
LOGISTIC REGRESSION ANALYSIS
LOGISTIC REGRESSION ANALYSIS C. Mitchell Dayton Department of Measurement, Statistics & Evaluation Room 1230D Benjamin Building University of Maryland September 1992 1. Introduction and Model Logistic
Binary Logistic Regression
Binary Logistic Regression Main Effects Model Logistic regression will accept quantitative, binary or categorical predictors and will code the latter two in various ways. Here s a simple model including
Teaching model: C1 a. General background: 50% b. Theory-into-practice/developmental 50% knowledge-building: c. Guided academic activities:
1. COURSE DESCRIPTION Degree: Double Degree: Derecho y Finanzas y Contabilidad (English teaching) Course: STATISTICAL AND ECONOMETRIC METHODS FOR FINANCE (Métodos Estadísticos y Econométricos en Finanzas
ANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING
ANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING BY OMID ROUHANI-KALLEH THESIS Submitted as partial fulfillment of the requirements for the degree of
Operation Count; Numerical Linear Algebra
10 Operation Count; Numerical Linear Algebra 10.1 Introduction Many computations are limited simply by the sheer number of required additions, multiplications, or function evaluations. If floating-point
Statistical Models in Data Mining
Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of
Guide to Biostatistics
MedPage Tools Guide to Biostatistics Study Designs Here is a compilation of important epidemiologic and common biostatistical terms used in medical research. You can use it as a reference guide when reading
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
The Probit Link Function in Generalized Linear Models for Data Mining Applications
Journal of Modern Applied Statistical Methods Copyright 2013 JMASM, Inc. May 2013, Vol. 12, No. 1, 164-169 1538 9472/13/$95.00 The Probit Link Function in Generalized Linear Models for Data Mining Applications
Applied Regression Analysis and Other Multivariable Methods
THIRD EDITION Applied Regression Analysis and Other Multivariable Methods David G. Kleinbaum Emory University Lawrence L. Kupper University of North Carolina, Chapel Hill Keith E. Muller University of
Yew May Martin Maureen Maclachlan Tom Karmel Higher Education Division, Department of Education, Training and Youth Affairs.
How is Australia s Higher Education Performing? An analysis of completion rates of a cohort of Australian Post Graduate Research Students in the 1990s. Yew May Martin Maureen Maclachlan Tom Karmel Higher
December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS
December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B KITCHENS The equation 1 Lines in two-dimensional space (1) 2x y = 3 describes a line in two-dimensional space The coefficients of x and y in the equation
ENHANCED CONFIDENCE INTERPRETATIONS OF GP BASED ENSEMBLE MODELING RESULTS
ENHANCED CONFIDENCE INTERPRETATIONS OF GP BASED ENSEMBLE MODELING RESULTS Michael Affenzeller (a), Stephan M. Winkler (b), Stefan Forstenlechner (c), Gabriel Kronberger (d), Michael Kommenda (e), Stefan
Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
Regression 3: Logistic Regression
Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic regression Logistic regression in R Outline Logistic regression Introduction The model Looking at and comparing
Sun Li Centre for Academic Computing [email protected]
Sun Li Centre for Academic Computing [email protected] Elementary Data Analysis Group Comparison & One-way ANOVA Non-parametric Tests Correlations General Linear Regression Logistic Models Binary Logistic
Data Mining mit der JMSL Numerical Library for Java Applications
Data Mining mit der JMSL Numerical Library for Java Applications Stefan Sineux 8. Java Forum Stuttgart 07.07.2005 Agenda Visual Numerics JMSL TM Numerical Library Neuronale Netze (Hintergrund) Demos Neuronale
Statistics and Data Analysis
NESUG 27 PRO LOGISTI: The Logistics ehind Interpreting ategorical Variable Effects Taylor Lewis, U.S. Office of Personnel Management, Washington, D STRT The goal of this paper is to demystify how SS models
Chi Squared and Fisher's Exact Tests. Observed vs Expected Distributions
BMS 617 Statistical Techniques for the Biomedical Sciences Lecture 11: Chi-Squared and Fisher's Exact Tests Chi Squared and Fisher's Exact Tests This lecture presents two similarly structured tests, Chi-squared
Linear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
International Statistical Institute, 56th Session, 2007: Phil Everson
Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA E-mail: [email protected] 1. Introduction
Linear Codes. Chapter 3. 3.1 Basics
Chapter 3 Linear Codes In order to define codes that we can encode and decode efficiently, we add more structure to the codespace. We shall be mainly interested in linear codes. A linear code of length
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance
Data Mining & Data Stream Mining Open Source Tools
Data Mining & Data Stream Mining Open Source Tools Darshana Parikh, Priyanka Tirkha Student M.Tech, Dept. of CSE, Sri Balaji College Of Engg. & Tech, Jaipur, Rajasthan, India Assistant Professor, Dept.
PARALLEL LINES ASSUMPTION IN ORDINAL LOGISTIC REGRESSION AND ANALYSIS APPROACHES
International Interdisciplinary Journal of Scientific Research ISSN: 2200-9833 www.iijsr.org PARALLEL LINES ASSUMPTION IN ORDINAL LOGISTIC REGRESSION AND ANALYSIS APPROACHES Erkan ARI 1 and Zeki YILDIZ
Interpretation of Somers D under four simple models
Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms
Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP
Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation
Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study)
Cairo University Faculty of Economics and Political Science Statistics Department English Section Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study) Prepared
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a
NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
Linear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
Validation of measurement procedures
Validation of measurement procedures R. Haeckel and I.Püntmann Zentralkrankenhaus Bremen The new ISO standard 15189 which has already been accepted by most nations will soon become the basis for accreditation
Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank
Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing C. Olivia Rud, VP, Fleet Bank ABSTRACT Data Mining is a new term for the common practice of searching through
Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: [email protected] Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
South Carolina College- and Career-Ready (SCCCR) Pre-Calculus
South Carolina College- and Career-Ready (SCCCR) Pre-Calculus Key Concepts Arithmetic with Polynomials and Rational Expressions PC.AAPR.2 PC.AAPR.3 PC.AAPR.4 PC.AAPR.5 PC.AAPR.6 PC.AAPR.7 Standards Know
Response variables assume only two values, say Y j = 1 or = 0, called success and failure (spam detection, credit scoring, contracting.
Prof. Dr. J. Franke All of Statistics 1.52 Binary response variables - logistic regression Response variables assume only two values, say Y j = 1 or = 0, called success and failure (spam detection, credit
It is important to bear in mind that one of the first three subscripts is redundant since k = i -j +3.
IDENTIFICATION AND ESTIMATION OF AGE, PERIOD AND COHORT EFFECTS IN THE ANALYSIS OF DISCRETE ARCHIVAL DATA Stephen E. Fienberg, University of Minnesota William M. Mason, University of Michigan 1. INTRODUCTION
Cross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models.
Cross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models. Dr. Jon Starkweather, Research and Statistical Support consultant This month
Health Care and Life Sciences
Sensitivity, Specificity, Accuracy, Associated Confidence Interval and ROC Analysis with Practical SAS Implementations Wen Zhu 1, Nancy Zeng 2, Ning Wang 2 1 K&L consulting services, Inc, Fort Washington,
A new score predicting the survival of patients with spinal cord compression from myeloma
A new score predicting the survival of patients with spinal cord compression from myeloma (1) Sarah Douglas, Department of Radiation Oncology, University of Lubeck, Germany; [email protected] (2) Steven
USING LOGISTIC REGRESSION TO PREDICT CUSTOMER RETENTION. Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA
USING LOGISTIC REGRESSION TO PREDICT CUSTOMER RETENTION Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA Logistic regression is an increasingly popular statistical technique
Statistics 305: Introduction to Biostatistical Methods for Health Sciences
Statistics 305: Introduction to Biostatistical Methods for Health Sciences Modelling the Log Odds Logistic Regression (Chap 20) Instructor: Liangliang Wang Statistics and Actuarial Science, Simon Fraser
Analysis of Bayesian Dynamic Linear Models
Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main
Machine Learning Logistic Regression
Machine Learning Logistic Regression Jeff Howbert Introduction to Machine Learning Winter 2012 1 Logistic regression Name is somewhat misleading. Really a technique for classification, not regression.
LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as
LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values
Chapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -
Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD
Tips for surviving the analysis of survival data Philip Twumasi-Ankrah, PhD Big picture In medical research and many other areas of research, we often confront continuous, ordinal or dichotomous outcomes
Confidence Intervals in Public Health
Confidence Intervals in Public Health When public health practitioners use health statistics, sometimes they are interested in the actual number of health events, but more often they use the statistics
Application of discriminant analysis to predict the class of degree for graduating students in a university system
International Journal of Physical Sciences Vol. 4 (), pp. 06-0, January, 009 Available online at http://www.academicjournals.org/ijps ISSN 99-950 009 Academic Journals Full Length Research Paper Application
Adaptive Demand-Forecasting Approach based on Principal Components Time-series an application of data-mining technique to detection of market movement
Adaptive Demand-Forecasting Approach based on Principal Components Time-series an application of data-mining technique to detection of market movement Toshio Sugihara Abstract In this study, an adaptive
Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini
NEW YORK UNIVERSITY ROBERT F. WAGNER GRADUATE SCHOOL OF PUBLIC SERVICE Course Syllabus Spring 2016 Statistical Methods for Public, Nonprofit, and Health Management Section Format Day Begin End Building
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
THE SELECTION OF RETURNS FOR AUDIT BY THE IRS. John P. Hiniker, Internal Revenue Service
THE SELECTION OF RETURNS FOR AUDIT BY THE IRS John P. Hiniker, Internal Revenue Service BACKGROUND The Internal Revenue Service, hereafter referred to as the IRS, is responsible for administering the Internal
Java Modules for Time Series Analysis
Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series
Prediction of Stock Performance Using Analytical Techniques
136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University
Better decision making under uncertain conditions using Monte Carlo Simulation
IBM Software Business Analytics IBM SPSS Statistics Better decision making under uncertain conditions using Monte Carlo Simulation Monte Carlo simulation and risk analysis techniques in IBM SPSS Statistics
Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
Research on the Factor Analysis and Logistic Regression with the Applications on the Listed Company Financial Modeling.
2nd International Conference on Social Science and Technology Education (ICSSTE 2016) Research on the Factor Analysis and Logistic Regression with the Applications on the Listed Company Financial Modeling
Why High-Order Polynomials Should Not be Used in Regression Discontinuity Designs
Why High-Order Polynomials Should Not be Used in Regression Discontinuity Designs Andrew Gelman Guido Imbens 2 Aug 2014 Abstract It is common in regression discontinuity analysis to control for high order
PROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY
PROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY ABSTRACT Keywords: Logistic. INTRODUCTION This paper covers some gotchas in SAS R PROC LOGISTIC. A gotcha
Data quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
Precalculus REVERSE CORRELATION. Content Expectations for. Precalculus. Michigan CONTENT EXPECTATIONS FOR PRECALCULUS CHAPTER/LESSON TITLES
Content Expectations for Precalculus Michigan Precalculus 2011 REVERSE CORRELATION CHAPTER/LESSON TITLES Chapter 0 Preparing for Precalculus 0-1 Sets There are no state-mandated Precalculus 0-2 Operations
DISCRIMINANT FUNCTION ANALYSIS (DA)
DISCRIMINANT FUNCTION ANALYSIS (DA) John Poulsen and Aaron French Key words: assumptions, further reading, computations, standardized coefficents, structure matrix, tests of signficance Introduction Discriminant
Question 2: How do you solve a matrix equation using the matrix inverse?
Question : How do you solve a matrix equation using the matrix inverse? In the previous question, we wrote systems of equations as a matrix equation AX B. In this format, the matrix A contains the coefficients
Lecture 25. December 19, 2007. Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
Corporate Defaults and Large Macroeconomic Shocks
Corporate Defaults and Large Macroeconomic Shocks Mathias Drehmann Bank of England Andrew Patton London School of Economics and Bank of England Steffen Sorensen Bank of England The presentation expresses
Chapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
Probit Analysis By: Kim Vincent
Probit Analysis By: Kim Vincent Quick Overview Probit analysis is a type of regression used to analyze binomial response variables. It transforms the sigmoid dose-response curve to a straight line that
