Copyright 2006, SAS Institute Inc. All rights reserved. Predictive Modeling using SAS
|
|
|
- Ralph Chambers
- 9 years ago
- Views:
Transcription
1 Predictive Modeling using SAS
2 Purpose of Predictive Modeling To Predict the Future x To identify statistically significant attributes or risk factors x To publish findings in Science, Nature, or the New England Journal of Medicine To enhance & enable rapid decision making at the level of the individual patient, client, customer, etc. x To enable decision making and influence policy through publications and presentations
3 Challenges: Opportunistic Data
4 Challenges: Data Deluge
5 Challenges: Errors, Outliers, and Missings cking #cking ADB NSF dirdep SVG bal Y Y 1208 Y Y 0 Y Y 4301 y Y 234 Y Y Y 1208 Y Y Y Y Y 234
6 Challenges: Rare Events OK Rare Condition
7 Methodology: Empirical Validation
8 Methodology: Diversity of Algorithms
9 Jargon Target = Dependent Variable. Inputs, Predictors = Independent Variables. Supervised Classification = Predicting class membership with algorithms that use a target. Scoring = The process of generating predictions on new data for decision making. This is not a re-running of models but an application of model results (e.g. equation and parameter estimates) to new data. Scoring Code = programming code that can be used to prepare and generate predictions on new data including transformations, imputation results, and model parameter estimates and equations. Data Scientist = What someone who used to be a data miner and before that a statistician calls themselves when looking for a job.
10 Binary Target Example: Predicting Low Birth Weight North Carolina Birth Records from North Carolina Center for Health Statistics 7.2% low birth weight births ( < 2500 grams) excluding multiple births An oversampled (50% LBWT) development set of 17,063 births from 2000 and test set of 16,656 births from 2001 Data contains Information on parents ethnicity, age, education level and marital status Data contains information on mothers health condition and reproductive history.
11 Predicting the Future with Data Splitting Training Validation TEST Models are fit to Training Data, compared and selected on Validation and tested on a future Test set.
12 Scenario: an early warning system for LBWT PREDICTORS Parent socio-,eco-, demo- graphics, health and behaviour Age, edu, race, medical conditions, smoking etc. Prior pregnancy related data # pregnancies, last outcome, prior pregnancies etc. Medical History for pregnancy Obstetric procedures Events of Labor Method of delivery Hypertension, cardiac disease, etc. Amniocentesis, ultrasound, etc. Breech, fetal distress etc. Vaginal, c-section etc. New born characteristics congenital anomalies (spinabifida, heart), APGAR score, anemia
13 Beware of Temporal Infidelity.. Parent socio-,eco,- demo- graphics and behaviour Prior pregnancy related data Medical History for pregnancy Obstetric procedures Events of Labor Method of delivery New born characteristics Time
14 Model Assessments for Binary Targets Predicted** TP FN AP Accuracy = (TP+TN)/n 0 FP TN AN Sensitivity = TP/AP Specificity = TN/AN PP PN n ** - Where Predicted 1=(Pred Prob > Cutoff) Lift = (TP/PP)/π 1
15 Lift SE Assessment Charts for Binary Targets Lift Charts ROC Charts Depth 1-SP Explore measures across a range of cutoffs TP FN TP FN TP FN TP FN TP FN TP FN FP TN FP TN FP TN FP TN FP TN FP TN
16 Receiver Operator Curves 1.0 weak model strong model A measure of a model s predictive performance, or model s ability to discriminate between target class levels. Areas under the curve range from 0.5 to 1.0. A concordance statistic: for every pair of observations with different outcomes (LBWT=1, LBWT=0) AuROC measures the probability that the ordering of the predicted probabilities agrees with the ordering of the actual target values. Or the probability that a low birth weight baby (LBWT=1) has a higher predicted probability of low birth weight than a normal birth weight baby (LBWT=0).
17 Key Features of SAS STAT Code: Data Partition SURVEYSELECT is used to partition data into Training (67%) and Validation (33%) sets. The OUTALL option provides one dataset with a variable, SELECTED that indicates dataset membership. Stratification on the target, LBWT ensures equal representation of low birth weight cases in training and validation sets.
18 Key Features of SAS STAT Code: Imputation STDIZE will do missing value replacement (REPONLY) and is applied to the Training data. The OUTSTAT option saves a dataset to be used to insert results (score) into Validation and Test sets. The METHOD=IN (MED) uses the imputation information from the training data to score the Validation and Test data.
19 Key Features of SAS STAT Code After selecting three final models using stepwise methods, these three models are fit in LOGISTIC. The SCORE statement allows for scoring of new data and adjusts oversampled data back to the population prior (PRIOREVENT=0.072). The same dataset is re-scored (Sco_validate) so that predictions for all three models are in the same set for comparisons. The process is repeated using the Test set.
20 Key Features of SAS STAT Code The dataset with all three predictions (Sco_validate) is supplied to PROC LOGISTIC. The ROCCONTRAST statements provides statistical significance tests for differences between ROC curves for model results specified in the three ROC statements. To generate ROC contrasts, all terms used in the ROC statements must be placed on the model statement. The NOFIT option suppresses the fitting of the specified model. Because of the presence of the ROC and ROCCONTRAST statements, ROC plots are generated when ODS GRAPHICS are enabled. The process is repeated with the Test set.
21 Comparing ROC curves
22 Comparing ROC curves
23 DEMONSTRATION
24 Interval Target Example: Predicting Donation Amounts A veterans organization seeks continued contributions from lapsing donors. Use lapsing-donor donation amounts from an earlier campaign to predict future donations. Inputs include information on previous donation behavior by donors and solicitations by the charity. For example DEMVARS: socioeconomic/demographic information, GIFTVARS: donation amount attributes, CNTVARS: donation frequency information, PROMVARS: Solicitation frequencies.
25 Key Features of SAS STAT Code GLMSELECT fits interval target models and can process validation and test datasets, or perform cross validation for smaller datasets. It can also perform data partition using the PARTITION statement. GLMSELECT supports a class statement similar to PROC GLM but is designed for predictive modeling. Selection methods include Backward, Forward, Stepwise, LAR and LASSO. Models can be tuned with the CHOOSE= option to select the step in a selection routine using e.g. AIC, SBC, Mallow s CP, or validation data error. CHOOSE=VALIDATE selects that step that minimizes Validation data error. SELECT= determines the order in which effects enter or leave the model. Options include, for example: ADJRSQ, AIC, SBC, CP, CV, RSQUARE and SL. SL uses the traditional approach of significance level.
26 Model Tuning using Validation ASE
27 Final Model Fitting and Score Code in GLM GLMSELECT does not provide hypothesis test results and model diagnostics. The model selected by GLMSELECT can be refit in PROC GLM. PLOTS=DIAGNOSTICS requests diagnostic plots. The new CODE statement requests score code that can be applied to a new set with the %INCLUDE statement. SOURCE2 prints the scoring action to the log. The following procedures support a CODE statement as of V12.1: GENMOD, GLIMMIX, GLM, GLMSELECT, LOGISTIC, MIXED, PLM, and REG.
28 PROC GLM Statistical Graphics Diagnostics ODS GRAPHICS ON and PLOTS=DIANGOSTICS.
29 Predictive Modeling: Foundation SAS or Enterprise Miner
30 DEMONSTRATION
31 Thank You! Lorne Rothman, PhD, P.Stat. Principal Statistician
Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL
Paper SA01-2012 Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL ABSTRACT Analysts typically consider combinations
Knowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Lecture 15 - ROC, AUC & Lift Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk [email protected] Tom Kelsey ID5059-17-AUC
MANA Home Birth Data 2004-2009: Consumer Considerations
MANA Home Birth Data 2004-2009: Consumer Considerations By: Lauren Korfine, PhD U.S. maternity care costs continue to rise without evidence of improving outcomes for women or babies. The cesarean section
Evaluation & Validation: Credibility: Evaluating what has been learned
Evaluation & Validation: Credibility: Evaluating what has been learned How predictive is a learned model? How can we evaluate a model Test the model Statistical tests Considerations in evaluating a Model
Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign
Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign Arun K Mandapaka, Amit Singh Kushwah, Dr.Goutam Chakraborty Oklahoma State University, OK, USA ABSTRACT Direct
Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank
Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing C. Olivia Rud, VP, Fleet Bank ABSTRACT Data Mining is a new term for the common practice of searching through
Using SAS to Create Sales Expectations for Everyday and Seasonal Products Ellebracht, Netherton, Gentry, Hallmark Cards, Inc.
Paper SA-06-2012 Using SAS to Create Sales Expectations for Everyday and Seasonal Products Ellebracht, Netherton, Gentry, Hallmark Cards, Inc., Kansas City, MO Abstract In order to provide a macro-level
ECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Regression Model I. Regression Node
Enterprise Miner - Regression 1 ECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Regression Model I. Regression Node 1. Some background: Linear attempts to predict the value of a continuous
Agenda. Mathias Lanner Sas Institute. Predictive Modeling Applications. Predictive Modeling Training Data. Beslutsträd och andra prediktiva modeller
Agenda Introduktion till Prediktiva modeller Beslutsträd Beslutsträd och andra prediktiva modeller Mathias Lanner Sas Institute Pruning Regressioner Neurala Nätverk Utvärdering av modeller 2 Predictive
Supplementary online appendix
Supplementary online appendix 1 Table A1: Five-state sample: Data summary Year AZ CA MD NJ NY Total 1991 0 1,430 0 0 0 1,430 1992 0 1,428 0 0 0 1,428 1993 0 1,346 0 0 0 1,346 1994 0 1,410 0 0 0 1,410 1995
Modeling Lifetime Value in the Insurance Industry
Modeling Lifetime Value in the Insurance Industry C. Olivia Parr Rud, Executive Vice President, Data Square, LLC ABSTRACT Acquisition modeling for direct mail insurance has the unique challenge of targeting
Building risk prediction models - with a focus on Genome-Wide Association Studies. Charles Kooperberg
Building risk prediction models - with a focus on Genome-Wide Association Studies Risk prediction models Based on data: (D i, X i1,..., X ip ) i = 1,..., n we like to fit a model P(D = 1 X 1,..., X p )
A Property & Casualty Insurance Predictive Modeling Process in SAS
Paper AA-02-2015 A Property & Casualty Insurance Predictive Modeling Process in SAS 1.0 ABSTRACT Mei Najim, Sedgwick Claim Management Services, Chicago, Illinois Predictive analytics has been developing
SAS Code to Select the Best Multiple Linear Regression Model for Multivariate Data Using Information Criteria
Paper SA01_05 SAS Code to Select the Best Multiple Linear Regression Model for Multivariate Data Using Information Criteria Dennis J. Beal, Science Applications International Corporation, Oak Ridge, TN
STATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection
Directions in Statistical Methodology for Multivariable Predictive Modeling Frank E Harrell Jr University of Virginia Seattle WA 19May98 Overview of Modeling Process Model selection Regression shape Diagnostics
MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
Azure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
A Property and Casualty Insurance Predictive Modeling Process in SAS
Paper 11422-2016 A Property and Casualty Insurance Predictive Modeling Process in SAS Mei Najim, Sedgwick Claim Management Services ABSTRACT Predictive analytics is an area that has been developing rapidly
APPLICATION PROGRAMMING: DATA MINING AND DATA WAREHOUSING
Wrocław University of Technology Internet Engineering Henryk Maciejewski APPLICATION PROGRAMMING: DATA MINING AND DATA WAREHOUSING PRACTICAL GUIDE Wrocław (2011) 1 Copyright by Wrocław University of Technology
Performance Measures in Data Mining
Performance Measures in Data Mining Common Performance Measures used in Data Mining and Machine Learning Approaches L. Richter J.M. Cejuela Department of Computer Science Technische Universität München
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
Big Data Analytics. Benchmarking SAS, R, and Mahout. Allison J. Ames, Ralph Abbey, Wayne Thompson. SAS Institute Inc., Cary, NC
Technical Paper (Last Revised On: May 6, 2013) Big Data Analytics Benchmarking SAS, R, and Mahout Allison J. Ames, Ralph Abbey, Wayne Thompson SAS Institute Inc., Cary, NC Accurate and Simple Analysis
ABSTRACT INTRODUCTION
Paper SP03-2009 Illustrative Logistic Regression Examples using PROC LOGISTIC: New Features in SAS/STAT 9.2 Robert G. Downer, Grand Valley State University, Allendale, MI Patrick J. Richardson, Van Andel
Didacticiel Études de cas
1 Theme Data Mining with R The rattle package. R (http://www.r project.org/) is one of the most exciting free data mining software projects of these last years. Its popularity is completely justified (see
A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic
A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic Report prepared for Brandon Slama Department of Health Management and Informatics University of Missouri, Columbia
Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@
Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@ Yanchun Xu, Andrius Kubilius Joint Commission on Accreditation of Healthcare Organizations,
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification
5. Multiple regression
5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful
Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression
Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction
Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences
Introduction to Statistics for Psychology and Quantitative Methods for Human Sciences Jonathan Marchini Course Information There is website devoted to the course at http://www.stats.ox.ac.uk/ marchini/phs.html
Enhancing Compliance with Predictive Analytics
Enhancing Compliance with Predictive Analytics FTA 2007 Revenue Estimation and Research Conference Reid Linn Tennessee Department of Revenue [email protected] Sifting through a Gold Mine of Tax Data
Alex Vidras, David Tysinger. Merkle Inc.
Using PROC LOGISTIC, SAS MACROS and ODS Output to evaluate the consistency of independent variables during the development of logistic regression models. An example from the retail banking industry ABSTRACT
USING LOGISTIC REGRESSION TO PREDICT CUSTOMER RETENTION. Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA
USING LOGISTIC REGRESSION TO PREDICT CUSTOMER RETENTION Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA Logistic regression is an increasingly popular statistical technique
A fast, powerful data mining workbench designed for small to midsize organizations
FACT SHEET SAS Desktop Data Mining for Midsize Business A fast, powerful data mining workbench designed for small to midsize organizations What does SAS Desktop Data Mining for Midsize Business do? Business
Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection
Chapter 311 Introduction Often, theory and experience give only general direction as to which of a pool of candidate variables (including transformed variables) should be included in the regression model.
BORN Ontario: Clinical Reports Hospitals Part 1 May 2012
BORN Ontario: Clinical Reports Hospitals Part 1 May 2012 Hospital Reports Release dates Report types Use and interpretation Access Questions and Answers 2 Clinical Reports Release Dates Available in the
Data Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka ([email protected]) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
Data Mining Practical Machine Learning Tools and Techniques
Credibility: Evaluating what s been learned Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 5 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Issues: training, testing,
Mortality Assessment Technology: A New Tool for Life Insurance Underwriting
Mortality Assessment Technology: A New Tool for Life Insurance Underwriting Guizhou Hu, MD, PhD BioSignia, Inc, Durham, North Carolina Abstract The ability to more accurately predict chronic disease morbidity
S03-2008 The Difference Between Predictive Modeling and Regression Patricia B. Cerrito, University of Louisville, Louisville, KY
S03-2008 The Difference Between Predictive Modeling and Regression Patricia B. Cerrito, University of Louisville, Louisville, KY ABSTRACT Predictive modeling includes regression, both logistic and linear,
Risk pricing for Australian Motor Insurance
Risk pricing for Australian Motor Insurance Dr Richard Brookes November 2012 Contents 1. Background Scope How many models? 2. Approach Data Variable filtering GLM Interactions Credibility overlay 3. Model
Course Syllabus. Purposes of Course:
Course Syllabus Eco 5385.701 Predictive Analytics for Economists Summer 2014 TTh 6:00 8:50 pm and Sat. 12:00 2:50 pm First Day of Class: Tuesday, June 3 Last Day of Class: Tuesday, July 1 251 Maguire Building
Data Mining Applications in Fund Raising
Data Mining Applications in Fund Raising Nafisseh Heiat Data mining tools make it possible to apply mathematical models to the historical data to manipulate and discover new information. In this study,
Customer and Business Analytic
Customer and Business Analytic Applied Data Mining for Business Decision Making Using R Daniel S. Putler Robert E. Krider CRC Press Taylor &. Francis Group Boca Raton London New York CRC Press is an imprint
Maternity Care Primary C-Section Rate Specifications 2014 (07/01/2013 to 06/30/2014 Dates of Service)
Summary of Changes Denominator Changes: Two additions were made to the denominator criteria. The denominator was changed to include patients who had: a vertex position delivery AND a term pregnancy of
Descriptive statistics parameters: Measures of centrality
Descriptive statistics parameters: Measures of centrality Contents Definitions... 3 Classification of descriptive statistics parameters... 4 More about central tendency estimators... 5 Relationship between
New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction
Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.
Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.
Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví Pavel Kříž Seminář z aktuárských věd MFF 4. dubna 2014 Summary 1. Application areas of Insurance Analytics 2. Insurance Analytics
Ultrasound scans in pregnancy
Ultrasound scans in pregnancy www.antenatalscreening.wales.nhs.uk Copyright 2016 Public Health Wales NHS Trust. All rights reserved. Not to be reproduced in whole or in part without the permission of the
49. INFANT MORTALITY RATE. Infant mortality rate is defined as the death of an infant before his or her first birthday.
49. INFANT MORTALITY RATE Wing Tam (Alice) Jennifer Cheng Stat 157 course project More Risk in Everyday Life Risk Meter LIKELIHOOD of exposure to hazardous levels Low Medium High Consequences: Severity,
UMBILICAL CORD BLOOD COLLECTION
UMBILICAL CORD BLOOD COLLECTION by Frances Verter, PhD Founder & Director, Parent's Guide to Cord Blood Foundation [email protected] and Kim Petrella, RN Department of Obstetrics and Gynecology
Predictive Modelling of High Cost Healthcare Users in Ontario
Predictive Modelling of High Cost Healthcare Users in Ontario Health Analytics Branch, MOHLTC SAS Health User Group Forum April 12, 2013 Introduction High cost healthcare users (HCUs) are patients who
!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"
!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"!"#"$%&#'()*+',$$-.&#',/"-0%.12'32./4'5,5'6/%&)$).2&'7./&)8'5,5'9/2%.%3%&8':")08';:
Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar
Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar Prepared by Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc. www.data-mines.com [email protected]
Binary Logistic Regression
Binary Logistic Regression Main Effects Model Logistic regression will accept quantitative, binary or categorical predictors and will code the latter two in various ways. Here s a simple model including
Microsoft Azure Machine learning Algorithms
Microsoft Azure Machine learning Algorithms Tomaž KAŠTRUN @tomaz_tsql [email protected] http://tomaztsql.wordpress.com Our Sponsors Speaker info https://tomaztsql.wordpress.com Agenda Focus on explanation
Performance Measures for Machine Learning
Performance Measures for Machine Learning 1 Performance Measures Accuracy Weighted (Cost-Sensitive) Accuracy Lift Precision/Recall F Break Even Point ROC ROC Area 2 Accuracy Target: 0/1, -1/+1, True/False,
ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS
DATABASE MARKETING Fall 2015, max 24 credits Dead line 15.10. ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS PART A Gains chart with excel Prepare a gains chart from the data in \\work\courses\e\27\e20100\ass4b.xls.
Estimation of Fetal Weight: Mean Value from Multiple Formulas
Estimation of Fetal Weight: Mean Value from Multiple Formulas Michael G. Pinette, MD, Yuqun Pan, MD, Sheila G. Pinette, RPA-C, Jacquelyn Blackstone, DO, John Garrett, Angelina Cartin Mean fetal weight
Health Care and Life Sciences
Sensitivity, Specificity, Accuracy, Associated Confidence Interval and ROC Analysis with Practical SAS Implementations Wen Zhu 1, Nancy Zeng 2, Ning Wang 2 1 K&L consulting services, Inc, Fort Washington,
Basic Statistical and Modeling Procedures Using SAS
Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom
Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product
Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product Sagarika Prusty Web Data Mining (ECT 584),Spring 2013 DePaul University,Chicago [email protected] Keywords:
Predictive Modeling of Titanic Survivors: a Learning Competition
SAS Analytics Day Predictive Modeling of Titanic Survivors: a Learning Competition Linda Schumacher Problem Introduction On April 15, 1912, the RMS Titanic sank resulting in the loss of 1502 out of 2224
Oracle Data Miner (Extension of SQL Developer 4.0)
An Oracle White Paper October 2013 Oracle Data Miner (Extension of SQL Developer 4.0) Generate a PL/SQL script for workflow deployment Denny Wong Oracle Data Mining Technologies 10 Van de Graff Drive Burlington,
Predictive Modeling Techniques in Insurance
Predictive Modeling Techniques in Insurance Tuesday May 5, 2015 JF. Breton Application Engineer 2014 The MathWorks, Inc. 1 Opening Presenter: JF. Breton: 13 years of experience in predictive analytics
Application of Data Mining based Malicious Code Detection Techniques for Detecting new Spyware
Application of Data Mining based Malicious Code Detection Techniques for Detecting new Spyware Cumhur Doruk Bozagac Bilkent University, Computer Science and Engineering Department, 06532 Ankara, Turkey
Improving performance of Memory Based Reasoning model using Weight of Evidence coded categorical variables
Paper 10961-2016 Improving performance of Memory Based Reasoning model using Weight of Evidence coded categorical variables Vinoth Kumar Raja, Vignesh Dhanabal and Dr. Goutam Chakraborty, Oklahoma State
Quality of Birth Certificate Data. Daniela Nitcheva, PhD Division of Biostatistics PHSIS
Quality of Birth Certificate Data Daniela Nitcheva, PhD Division of Biostatistics PHSIS Data Quality SC State Law requires that you file the birth certificate within 5 days of a child s birth. Data needs
AUSTRALIA AND NEW ZEALAND FACTSHEET
AUSTRALIA AND NEW ZEALAND FACTSHEET What is Stillbirth? In Australia and New Zealand, stillbirth is the death of a baby before or during birth, from the 20 th week of pregnancy onwards, or 400 grams birthweight.
Improved Interaction Interpretation: Application of the EFFECTPLOT statement and other useful features in PROC LOGISTIC
Paper AA08-2013 Improved Interaction Interpretation: Application of the EFFECTPLOT statement and other useful features in PROC LOGISTIC Robert G. Downer, Grand Valley State University, Allendale, MI ABSTRACT
A Population Based Risk Algorithm for the Development of Type 2 Diabetes: in the United States
A Population Based Risk Algorithm for the Development of Type 2 Diabetes: Validation of the Diabetes Population Risk Tool (DPoRT) in the United States Christopher Tait PhD Student Canadian Society for
WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat
Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise
BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts
BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an
First Trimester Screening for Down Syndrome
First Trimester Screening for Down Syndrome What is first trimester risk assessment for Down syndrome? First trimester screening for Down syndrome, also known as nuchal translucency screening, is a test
Newborn Scenario. Consolidated Instructor Manual. Frances Wickham Lee, DBA Heidi H. Schmoll, RN, MSN-Ed. Content Author: Sheila Smith RN, PhD
Newborn Scenario Consolidated Instructor Manual Frances Wickham Lee, DBA Heidi H. Schmoll, RN, MSN-Ed. Content Author: Sheila Smith RN, PhD 1/23/13 Table of Contents Curricular Information 3 Faculty Information
Better credit models benefit us all
Better credit models benefit us all Agenda Credit Scoring - Overview Random Forest - Overview Random Forest outperform logistic regression for credit scoring out of the box Interaction term hypothesis
Consistent Binary Classification with Generalized Performance Metrics
Consistent Binary Classification with Generalized Performance Metrics Nagarajan Natarajan Joint work with Oluwasanmi Koyejo, Pradeep Ravikumar and Inderjit Dhillon UT Austin Nov 4, 2014 Problem and Motivation
COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction
COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised
Analysing Questionnaires using Minitab (for SPSS queries contact -) [email protected]
Analysing Questionnaires using Minitab (for SPSS queries contact -) [email protected] Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:
1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
BIG DATA Driven Innovations in the Life Insurance Industry
BIG DATA Driven Innovations in the Life Insurance Industry Edmund Fong FIAA Vincent Or FSA RGA Reinsurance Company 13 November 2015 I keep saying the sexy job in the next ten years will be statisticians.
Test Positive True Positive False Positive. Test Negative False Negative True Negative. Figure 5-1: 2 x 2 Contingency Table
ANALYSIS OF DISCRT VARIABLS / 5 CHAPTR FIV ANALYSIS OF DISCRT VARIABLS Discrete variables are those which can only assume certain fixed values. xamples include outcome variables with results such as live
Handling missing data in Stata a whirlwind tour
Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled
Segmentation For Insurance Payments Michael Sherlock, Transcontinental Direct, Warminster, PA
Segmentation For Insurance Payments Michael Sherlock, Transcontinental Direct, Warminster, PA ABSTRACT An online insurance agency has built a base of names that responded to different offers from various
Chickenpox in pregnancy: what you need to know
Chickenpox in pregnancy: what you need to know First published December 2003 Revised edition published November 2008 What is chickenpox? Chickenpox is a very infectious illness caused by a virus called
The Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
TAMANG SAGOT. PhilHealth Circular 025-2015. Social Health Insurance Coverage and Benefits for Women About To Give Birth (Revision 2)
TAMANG SAGOT PhilHealth Circular 025-2015 Social Health Insurance Coverage and Benefits for Women About To Give Birth (Revision 2) 1. What is new in this PC 25-2015? This circular contains the following
Despite its emphasis on credit-scoring/rating model validation,
RETAIL RISK MANAGEMENT Empirical Validation of Retail Always a good idea, development of a systematic, enterprise-wide method to continuously validate credit-scoring/rating models nonetheless received
Charts of fetal size: limb bones
BJOG: an International Journal of Obstetrics and Gynaecology August 2002, Vol. 109, pp. 919 929 Charts of fetal size: limb bones Lyn S. Chitty a, *, Douglas G. Altman b Objective To construct new size
13. Poisson Regression Analysis
136 Poisson Regression Analysis 13. Poisson Regression Analysis We have so far considered situations where the outcome variable is numeric and Normally distributed, or binary. In clinical work one often
Measuring the Discrimination Quality of Suites of Scorecards:
Measuring the Discrimination Quality of Suites of Scorecards: ROCs Ginis, ounds and Segmentation Lyn C Thomas Quantitative Financial Risk Management Centre, University of Southampton UK CSCC X, Edinburgh
Survival Analysis of the Patients Diagnosed with Non-Small Cell Lung Cancer Using SAS Enterprise Miner 13.1
Paper 11682-2016 Survival Analysis of the Patients Diagnosed with Non-Small Cell Lung Cancer Using SAS Enterprise Miner 13.1 Raja Rajeswari Veggalam, Akansha Gupta; SAS and OSU Data Mining Certificate
