# Incorporating cost in Bayesian Variable Selection, with application to cost-effective measurement of quality of health care.

Save this PDF as:

Size: px
Start display at page:

Download "Incorporating cost in Bayesian Variable Selection, with application to cost-effective measurement of quality of health care."

## Transcription

1 Incorporating cost in Bayesian Variable Selection, with application to cost-effective measurement of quality of health care University of Florida 10th Annual Winter Workshop: Bayesian Model Selection and Objective Methods 2 Synopsis Dimitris Fouskakis, Department of Mathematics, School of Applied Mathematical and Physical Sciences, National Technical University of Athens, Athens, Greece; Joint work with: Ioannis Ntzoufras & David Draper Department of Statistics Department of Applied Mathematics and Statistics Athens University of Economics and Business University of California Athens, Greece; Santa Cruz, USA; 1. Motivation - Indirect Measurement of Quality of Health Care. 2. Model Specification. 3. Cost - Benefit Analysis. 4. Cost - Restriction - Benefit Analysis. 5. Discussion. Presentation is available at: fouskakis/conferences/bms/bms.pdf. University of Florida 10th Annual Winter Workshop: Bayesian Model Selection and Objective Methods 3 1 Motivation - Indirect Measurement of Quality of Health Care How to measure hospital quality of care? Indirect method: input-output approach hospital outcomes (e.g., mortality within 30 days of admission) compared after adjusting for differences in inputs (sickness at admission). Patient sickness at admission is traditionally assessed by using logistic regression of mortality within 30 days of admission on a fairly large number of sickness indicators (on the order of 100) to construct a sickness scale. Benefit - Only Analysis : Classical variable selection techniques can be employed to find an optimal subset of indicators. In a major U.S. study constructed by RAND Corporation, such approach was used to reduced the initial list of p = 83 sickness indicators gathered on n =2, 532 pneumonia patients down to a core of 14 predictors (Keeler, et al., 1990). University of Florida 10th Annual Winter Workshop: Bayesian Model Selection and Objective Methods 4 The 14-Variable Rand Pneumonia Scale The RAND admission sickness scale for pneumonia (p = 14 variables), with the marginal data collection costs per patient for each variable (in minutes of abstraction time). Variable Cost Variable Cost (Minutes) (Minutes) Blood Urea Nitrogen 1.50 Age 0.50 Systolic Blood Pressure 0.50 Chest X-ray Congestive 2.50 Score (2-point scale) Heart Failure Score (3-point scale) Total APACHE II Score APACHE II Coma Score 2.50 (36-point scale) (3-point scale) Serum Albumin 1.50 Shortness of Breath 1.00 (3-point scale) Day 1 Respiratory Distress 1.00 Septic Complications 3.00 Prior Respiratory Failure 2.00 Recently Hospitalized 2.00 Ambulatory Score 2.50 Initial Temperature 0.50 (3-point scale)

2 University of Florida 10th Annual Winter Workshop: Bayesian Model Selection and Objective Methods 5 2 Model Specification Logistic regression model with Y i = 1 if patient i dies after 30 days of admission. X ij : j sickness predictor variable for the i patient. m γ =(γ 1,...,γ p ) T. γ j : Binary indicators of the inclusion of the variable X j in the model. Model space M = {0, 1} p ; p = total number of variables considered. Hence the model formulation can be summarized as indep (Y i γ) Bernoulli(p i (γ)), ( ) pi (γ) η i (γ) = log = β j γ j X ij, 1 p i (γ) j=0 η(γ) = X diag(γ) β = Xγ βγ. University of Florida 10th Annual Winter Workshop: Bayesian Model Selection and Objective Methods 6 Two different approaches The RAND Benefit - Only approach is sub-optimal: it does not consider differences in cost of data collection among available predictors. We propose a Cost - Benefit Analysis, in which variables are chosen only when they predict well enough given how much they cost to collect. In problems such as this, in which there are two desirable criteria that compete, and over which a joint optimization must be achieved, there are two main ways to proceed: Both criteria can be placed on a common scale, and optimization can occur on that scale (strategy (a)). One criterion can be optimized, subject to a bound on the other (strategy (b)). University of Florida 10th Annual Winter Workshop: Bayesian Model Selection and Objective Methods 7 Three methods for solving this problem (1) (strategy (a)) Draper and Fouskakis (2000) and Fouskakis and Draper (2002, 2008) proposed an approach to this problem based on Bayesian Decision Theory. They used stochastic optimization methods to find (near-) optimal subsets of predictor variables that maximize an expected utility function which trades off data collection cost against predictive accuracy. (2) (strategy (a)) In this work, as an alternative to (1), we propose a prior distribution that accounts for the cost of each variable and results in a set of posterior model probabilities which correspond to a Generalized Cost-Adjusted version of the Bayesian Information Criterion (Fouskakis, Ntzoufras and Draper, 2007a). (3) (strategy (b)) We also implement a Cost - Restriction - Benefit Analysis, where the search is conducted only among models whose cost does not exceed a budgetary restriction (Fouskakis, Ntzoufras and Draper, 2007b), by the usage of a Population - Based Trans - Dimensional RJMCMC Method. Here we present results from methods (2) (Cost - Benefit Analysis) and (3) (Cost - Restriction - Benefit Analysis). University of Florida 10th Annual Winter Workshop: Bayesian Model Selection and Objective Methods 8 3 Cost-Benefit Analysis The aim is to identify well fitted models after taking into account the cost of each variable. Therefore we need to estimate the posterior model probability f(γ) f(y βγ, γ)f(βγ γ)dβγ f(γ y) = f(γ ) f(y βγ, γ )f(βγ γ γ )dβγ {0,1} p after introducing a prior on model space f(γ) depending on the cost. Prior on Model Parameters ( ( ) ) 1 f(βγ γ) =Normal 0, 4n X T γxγ Low Information Prior, since it gives weight to the prior equal to one data-point (see Ntzoufras, Delaportas and Forster, 2003).

3 University of Florida 10th Annual Winter Workshop: Bayesian Model Selection and Objective Methods 9 A Cost-penalized Prior on Model Space University of Florida 10th Annual Winter Workshop: Bayesian Model Selection and Objective Methods 10 Approximations of the Posterior Model Odds ( γj f(γ j ) exp 2 c 0 c j c 0 ) log n for j =1,...,p. When comparing models γ (k) and γ (l) penalty imposed to the log-likelihood ratio is given by 2 log f(γ(k) ) f(γ (l) ) = ( γ (k) j c j : cost per observation for X j variable. ) γ (l) cj ) j log n c (dγ d (k) γ log n. (l) 0 c 0 : baseline cost (default choice: c 0 = min{c j } j =1,...,p). Indifference concerning the cost c j = c 0 for j =1,...,p uniform prior on model space (f(γ) 1) Posterior model odds = Bayes factor. Using Laplace approximation in our model formulation we end up 2 log f(γ y) = 2 log f(y βγ, γ)+φ(γ) } prior model prob. { }} { 2 log f(γ) +O(n 1 ). {{ } Penalty Term with φ(γ) = β γ : posterior mode of f(β γ y, γ), dγ = p γj is the dimension of the model γ, 1 4n β T γx T γxγ βγ Ψ 1 γ + dγ log(4n) + log X T γxγ } {{ } can be thought a measure of discrepancy between the data and the prior information of the model parameters Ψγ is minus the inverse of the Hessian matrix of h(βγ ) = log f(y βγ, γ) + log f(βγ γ) evaluated at the posterior mode βγ.. University of Florida 10th Annual Winter Workshop: Bayesian Model Selection and Objective Methods 11 University of Florida 10th Annual Winter Workshop: Bayesian Model Selection and Objective Methods 12 Penalty Interpretation: A generalized cost-adjusted BIC Implementation and Results 2 log f(γ y) = 2 log f(y ˆβγ)+ = 2 log f(y ˆβγ)+ C γ c 0 Cγ = p γ jc j, the cost of model γ. ˆβγ = MLE of the parameters βγ of model γ. If c j = c 0 for all j BIC = 2 log f(y ˆβγ)+dγ log n. γ j c j c 0 log n + O(1) log n + O(1). Run RJMCMC (Green, 1995) for 100K iterations in the full model space. Eliminate non-important variables (with marginal probabilities < 0.30) forming a new reduced model space. Run RJMCMC for 100K iterations in the reduced model space to estimate posterior model odds and best models. Two setups: 1. Benefit only analysis (uniform prior on model space). 2. Cost - Benefit Analysis (cost penalized prior on model space).

4 University of Florida 10th Annual Winter Workshop: Bayesian Model Selection and Objective Methods 13 Preliminary Results: Marginal Probabilities f(γ j =1 y) University of Florida 10th Annual Winter Workshop: Bayesian Model Selection and Objective Methods 14 Reduced Model Space: Posterior Model Probabilities/Odds Variable Benefit Cost-Benefit Index Name Cost Analysis Analysis 1 Systolic Blood Pressure (SBP) Score Age Blood Urea Nitrogen Apache II Coma Score Shortness of Breath Day Septic Complications Initial Temperature Heart Rate Day Chest Pain Day Cardiomegaly Score Hematologic History Score Apache Respiratory Rate Score Admission SBP Respiratory Rate Day Confusion Day Apache ph Score Morbid + Comorbid Score Musculoskeletal Score Common variables in both analyses: X 1 + X 2 + X 3 + X 5 + X 12 + X 70 Benefit-Only Analysis Common Variables Additional Model Posterior k Within Each Analysis Variables Cost Probabilities PO 1k 1 X 4 + X 15 + X 37 + X 73 +X 8 +X 27 +X X 8 +X X X 27 +X Cost-Benefit Analysis Common Variables Additional Model Posterior k Within Each Analysis Variables Cost Probabilities PO 1k 1 X 46 + X 51 +X 49 +X X 14 +X 49 +X X 13 +X 49 +X X 13 +X 14 +X 49 +X X 14 +X X X 37 +X X 13 +X 14 +X X above 3%. posterior odds of the best model within each analysis versus the current model k. University of Florida 10th Annual Winter Workshop: Bayesian Model Selection and Objective Methods 15 University of Florida 10th Annual Winter Workshop: Bayesian Model Selection and Objective Methods 16 Reduced Model Space: Comparisons Comparison of measures of fit, cost and dimensionality between the best models in the reduced model space of the benefit-only and cost-benefit analysis; percentage difference is in relation to benefit-only. Analysis Difference Benefit-Only Cost-Benefit (%) Minimum Deviance Median Deviance Cost Dimension Cost Restriction - Benefit Analysis Implement a Cost - Restriction - Benefit Analysis, in which the practical relevance of the selected variable subsets is ensured by enforcing an overall limit on the total data collection cost of each subset: the search is conducted only among models whose cost does not exceed this budgetary restriction C. Therefore, we should a-priori exclude models γ with total cost larger than C, resulting to a significantly reduced model space, M = {γ {0, 1} p : c i γ i C}. AIM: Estimate posterior model probabilities in the cost restricted model space. i=1

5 University of Florida 10th Annual Winter Workshop: Bayesian Model Selection and Objective Methods 17 University of Florida 10th Annual Winter Workshop: Bayesian Model Selection and Objective Methods 18 PROBLEM: Due to the cost limit, model space areas of local maximum exist. Thus, we need to change the definition of the neighborhood structure of the proposed models and construct more advanced proposed jumps possibly between models of the same cost in order to avoid getting trapped into local maxima. SOLUTION: Intelligent trans-dimension MCMC methods that allow to move across areas of local maximum even if these are distinct. Proposed Algorithm We have developed a Population Based Trans-Dimensional Reversible-Jump Markov Chain Monte Carlo algorithm (Population RJMCMC), combining ideas from the Population-Based MCMC (Jasra, Stephens and Holmes, 2007) and Simulated Tempering (Geyer and Thompson, 1995) algorithms. Population RJMCMC Use 3 chains: The actual one, plus two auxiliary ones. In the auxiliary chains the posterior distributions are raised in a power t k (temperature), k =1, 2. 1st auxiliary chain: t 1 > 1 increasing differences between the posterior probabilities (makes the distribution steeper allowing by this way the MCMC to move closer to locally best models). 2nd auxiliary chain: 0<t 2 < 1 reducing differences between the posterior probabilities (makes the distribution flatter allowing by this way the MCMC to move easily across different models). Temperatures t k change stochastically. By this way the extensive number of chains is avoided. University of Florida 10th Annual Winter Workshop: Bayesian Model Selection and Objective Methods 19 The incorporation of stochastic temperatures can be done using pseudo priors g k (t k ). In this case the posterior distribution will be expanded to { f(β, γ, β (k), γ (k),t 1,t 2 y) f(y β, γ)f(β γ)f(γ) } 2 k=1 { f(y β (k), γ (k) )f(β (k) γ (k) )f(γ (k) )} tk g k (t k ), where γ (k) and β (k) are the model indicator and parameter vector of chain k. Model indicators and parameters can be updated using RJMCMC steps, while the temperature t k can be generated from the conditional posterior distribution f(t k β, γ, β (k), γ (k),t \k, y) { f(y β (k), γ (k) )f(β (k) γ (k) )f(γ (k) ) } t k g k (t k ). University of Florida 10th Annual Winter Workshop: Bayesian Model Selection and Objective Methods 20 Since g k (t k ) are pseudo-priors, we can set g k (t k ) h k(t k ) Z k (y,t k ) where h k (t k ) are convenient and easy to simulate from density functions resulting to For the selection of h k (t k ) we propose to use f(t k y) =h k (t k ). h 1(t 1)=Gamma(t 1 1; a 2,b 2) and h 2(t 2)=Beta(t 2; a 1,b 1). Prior Distributions The desired posterior marginal distribution for the temperatures t k is given by ( f(t k y) f(y tk, β (k), γ (k) )f(β (k) γ (k) )f(γ (k) ) ) t k g k (t k )dβ (k) γ (k) M β (k) Z k (y,t k )g k (t k ), where Z k (y,t k ) is the marginal likelihood over all possible models for chain k. Same prior on model parameters as in the Cost - Benefit Analysis and a uniform prior on cost restricted model space, i.e. f(γ) I(γ M: c(γ) = γ jc j C), where c j is the differential cost per observation for variable X j and C is the budgetary restriction.

6 University of Florida 10th Annual Winter Workshop: Bayesian Model Selection and Objective Methods 21 University of Florida 10th Annual Winter Workshop: Bayesian Model Selection and Objective Methods 22 Implementation and Results COST LIMIT: C = 10 minutes of abstraction time. Run Population RJMCMC for 100K iterations in the full model space, twice, starting each time from a different model. Eliminate non-important variables (with marginal probabilities < 0.30 in both runs) forming a new reduced model space. Run population RJMCMC in the reduced space, twice. Compare results and performance of population RJMCMC with simple RJMCMC. Preliminary Results: Marginal Probabilities f(γ j =1 y) Variables with marginal posterior probabilities f(γ j =1 y) above 0.30 in at least one run. Marginal Posterior Probabilities Variable First Run Second Run Index Name Cost Analysis Analysis 1 Systolic Blood Pressure (SBP) Score Age Blood Urea Nitrogen Apache II Coma Score Shortness of Breath Day Serum Albumin Initial Temperature Apache Respiratory Rate Score Admission SBP Respiratory Rate Day Confusion Day Body System Count Apache ph Score University of Florida 10th Annual Winter Workshop: Bayesian Model Selection and Objective Methods 23 Reduced Model Space: Posterior Model Probabilities/Odds University of Florida 10th Annual Winter Workshop: Bayesian Model Selection and Objective Methods 24 Reduced Model Space: Monte Carlo Errors Common variables in both analyses: X 2 + X 4 Population RJMCMC - 500K iterations 1st Run 2nd Run Common Additional Posterior Posterior k m Variables Variables Prob. PO 1k Prob. PO 1k 1 m 1 X 1 + X 12 + X 37 +X 3 +X 5 +X m 2 +X 5 +X 46 +X 62 +X m 3 +X 3 +X 62 +X m 4 +X 3 +X 5 +X 6 +X Simple RJMCMC - 500K iterations 1st Run 2nd Run Common Additional Posterior Posterior k m Variables Variables Prob. PO 1k Prob. PO 1k 1 m 1 X 62 +X 1 +X 3 +X 5 +X 12 +X m 3 +X 1 +X 3 +X 12 +X 37 +X m 2 +X 1 +X 5 +X 12 +X 37 +X 46 +X m 5 +X 3 +X 5 +X 46 +X 49 +X < 0.03 > m 6 +X 1 +X 3 +X 5 +X 49 +X < 0.03 > 19.9 posterior odds of the best model within each analysis versus the current model k. All models appearing in the table have total cost 10 min (cost limit). Monte Carlo Errors (%) RJMCMC Type Run Iterations m 1 m 2 m 3 m 4 POP K POP K POP K POP K POP K POP K SIMPLE 1 500K SIMPLE 2 500K Relative Comparisons SIMPLE vs. POP. 500K (First Run) 200K K SIMPLE vs. POP. 500K (Second Run) 200K K

7 University of Florida 10th Annual Winter Workshop: Bayesian Model Selection and Objective Methods 25 University of Florida 10th Annual Winter Workshop: Bayesian Model Selection and Objective Methods 26 References 5 Discussion Cost - Benefit Analysis: The resulting models achieve dramatic gains in cost and noticeable improvement in model simplicity at the price of a small loss in predictive accuracy, when compared to the results of a more traditional benefit-only analysis. Cost - Restriction - Benefit Analysis: Population RJMCMC algorithm explores the model space efficiently and converges faster than simple RJMCMC (having lower Monte Carlo errors). Draper D, Fouskakis D (2000). A case study of stochastic optimization in health policy: problem formulation and preliminary results. Journal of Global Optimization, 18, Fouskakis D, Draper D (2002). Stochastic optimization: a review. International Statistical Review, 70, Fouskakis D, Draper D (2008). Comparing stochastic optimization methods for variable selection in binary outcome prediction, with application to health policy. Journal of the American Statistical Association, 103, forthcoming. Fouskakis D, Ntzoufras I, Draper D (2007a). Bayesian variable selection using cost-adjusted BIC, with application to cost-effective measurement of quality of health care. (submitted). Fouskakis D, Ntzoufras I, Draper D (2007b). Population Based Reversible Jump MCMC for Bayesian Variable Selection and Evaluation Under Cost Limit Restrictions. (submitted). Geyer CJ, Thomson EA (1995). Annealing Markov Chain Monte Carlo with applications to ancestral inference. Journal of the American Statistical Association, 90, Green P (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82, Jasra A, Stephens DA, Holmes CC (2007). Population-based reversible jump MCMC. Biometrika. forthcoming. Keeler E, Kahn K, Draper D, Sherwood M, Rubenstein L, Reinisch E, Kosecoff J, Brook R (1990). Changes in sickness at admission following the introduction of the Prospective Payment System. Journal of the American Medical Association, 264, Ntzoufras I, Dellaportas P, Forster JJ (2003). Bayesian variable and link determination for generalized linear models. Journal of Statistical Planning and Inference, 111,

### STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

### Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

### Tutorial on Markov Chain Monte Carlo

Tutorial on Markov Chain Monte Carlo Kenneth M. Hanson Los Alamos National Laboratory Presented at the 29 th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Technology,

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### Parallelization Strategies for Multicore Data Analysis

Parallelization Strategies for Multicore Data Analysis Wei-Chen Chen 1 Russell Zaretzki 2 1 University of Tennessee, Dept of EEB 2 University of Tennessee, Dept. Statistics, Operations, and Management

### Multivariate Logistic Regression

1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation

### Introduction to Markov Chain Monte Carlo

Introduction to Markov Chain Monte Carlo Monte Carlo: sample from a distribution to estimate the distribution to compute max, mean Markov Chain Monte Carlo: sampling using local information Generic problem

### Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

### Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

### Detection of changes in variance using binary segmentation and optimal partitioning

Detection of changes in variance using binary segmentation and optimal partitioning Christian Rohrbeck Abstract This work explores the performance of binary segmentation and optimal partitioning in the

### Markov Chain Monte Carlo Simulation Made Simple

Markov Chain Monte Carlo Simulation Made Simple Alastair Smith Department of Politics New York University April2,2003 1 Markov Chain Monte Carlo (MCMC) simualtion is a powerful technique to perform numerical

### Generalized Linear Models. Today: definition of GLM, maximum likelihood estimation. Involves choice of a link function (systematic component)

Generalized Linear Models Last time: definition of exponential family, derivation of mean and variance (memorize) Today: definition of GLM, maximum likelihood estimation Include predictors x i through

### BayesX - Software for Bayesian Inference in Structured Additive Regression

BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich

### Logistic Regression (1/24/13)

STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

### A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data

A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data Faming Liang University of Florida August 9, 2015 Abstract MCMC methods have proven to be a very powerful tool for analyzing

### Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com

Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian

### Local classification and local likelihoods

Local classification and local likelihoods November 18 k-nearest neighbors The idea of local regression can be extended to classification as well The simplest way of doing so is called nearest neighbor

### More details on the inputs, functionality, and output can be found below.

Overview: The SMEEACT (Software for More Efficient, Ethical, and Affordable Clinical Trials) web interface (http://research.mdacc.tmc.edu/smeeactweb) implements a single analysis of a two-armed trial comparing

### Section 5. Stan for Big Data. Bob Carpenter. Columbia University

Section 5. Stan for Big Data Bob Carpenter Columbia University Part I Overview Scaling and Evaluation data size (bytes) 1e18 1e15 1e12 1e9 1e6 Big Model and Big Data approach state of the art big model

### Item selection by latent class-based methods: an application to nursing homes evaluation

Item selection by latent class-based methods: an application to nursing homes evaluation Francesco Bartolucci, Giorgio E. Montanari, Silvia Pandolfi 1 Department of Economics, Finance and Statistics University

### Estimating the evidence for statistical models

Estimating the evidence for statistical models Nial Friel University College Dublin nial.friel@ucd.ie March, 2011 Introduction Bayesian model choice Given data y and competing models: m 1,..., m l, each

### Imputing Missing Data using SAS

ABSTRACT Paper 3295-2015 Imputing Missing Data using SAS Christopher Yim, California Polytechnic State University, San Luis Obispo Missing data is an unfortunate reality of statistics. However, there are

### Lab 8: Introduction to WinBUGS

40.656 Lab 8 008 Lab 8: Introduction to WinBUGS Goals:. Introduce the concepts of Bayesian data analysis.. Learn the basic syntax of WinBUGS. 3. Learn the basics of using WinBUGS in a simple example. Next

### CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /

### Gaussian Processes to Speed up Hamiltonian Monte Carlo

Gaussian Processes to Speed up Hamiltonian Monte Carlo Matthieu Lê Murray, Iain http://videolectures.net/mlss09uk_murray_mcmc/ Rasmussen, Carl Edward. "Gaussian processes to speed up hybrid Monte Carlo

### Linear Classification. Volker Tresp Summer 2015

Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

### Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur

Probabilistic Linear Classification: Logistic Regression Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 18, 2016 Probabilistic Machine Learning (CS772A) Probabilistic Linear Classification:

### Predict Influencers in the Social Network

Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons

### APPLIED MISSING DATA ANALYSIS

APPLIED MISSING DATA ANALYSIS Craig K. Enders Series Editor's Note by Todd D. little THE GUILFORD PRESS New York London Contents 1 An Introduction to Missing Data 1 1.1 Introduction 1 1.2 Chapter Overview

### Christfried Webers. Canberra February June 2015

c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic

### Sample Size Designs to Assess Controls

Sample Size Designs to Assess Controls B. Ricky Rambharat, PhD, PStat Lead Statistician Office of the Comptroller of the Currency U.S. Department of the Treasury Washington, DC FCSM Research Conference

### Inference on Phase-type Models via MCMC

Inference on Phase-type Models via MCMC with application to networks of repairable redundant systems Louis JM Aslett and Simon P Wilson Trinity College Dublin 28 th June 202 Toy Example : Redundant Repairable

### PREDICTIVE DISTRIBUTIONS OF OUTSTANDING LIABILITIES IN GENERAL INSURANCE

PREDICTIVE DISTRIBUTIONS OF OUTSTANDING LIABILITIES IN GENERAL INSURANCE BY P.D. ENGLAND AND R.J. VERRALL ABSTRACT This paper extends the methods introduced in England & Verrall (00), and shows how predictive

### Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

### AN ACCESSIBLE TREATMENT OF MONTE CARLO METHODS, TECHNIQUES, AND APPLICATIONS IN THE FIELD OF FINANCE AND ECONOMICS

Brochure More information from http://www.researchandmarkets.com/reports/2638617/ Handbook in Monte Carlo Simulation. Applications in Financial Engineering, Risk Management, and Economics. Wiley Handbooks

### Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about

### Using SAS PROC MCMC to Estimate and Evaluate Item Response Theory Models

Using SAS PROC MCMC to Estimate and Evaluate Item Response Theory Models Clement A Stone Abstract Interest in estimating item response theory (IRT) models using Bayesian methods has grown tremendously

### Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

### Handling attrition and non-response in longitudinal data

Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein

### CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships

### Penalized regression: Introduction

Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood

### Time Series Analysis

Time Series Analysis hm@imm.dtu.dk Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby 1 Outline of the lecture Identification of univariate time series models, cont.:

### Probabilistic Methods for Time-Series Analysis

Probabilistic Methods for Time-Series Analysis 2 Contents 1 Analysis of Changepoint Models 1 1.1 Introduction................................ 1 1.1.1 Model and Notation....................... 2 1.1.2 Example:

### Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features

### Machine Learning and Pattern Recognition Logistic Regression

Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,

### PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

### The CRM for ordinal and multivariate outcomes. Elizabeth Garrett-Mayer, PhD Emily Van Meter

The CRM for ordinal and multivariate outcomes Elizabeth Garrett-Mayer, PhD Emily Van Meter Hollings Cancer Center Medical University of South Carolina Outline Part 1: Ordinal toxicity model Part 2: Efficacy

### A Bayesian hierarchical surrogate outcome model for multiple sclerosis

A Bayesian hierarchical surrogate outcome model for multiple sclerosis 3 rd Annual ASA New Jersey Chapter / Bayer Statistics Workshop David Ohlssen (Novartis), Luca Pozzi and Heinz Schmidli (Novartis)

### The Exponential Family

The Exponential Family David M. Blei Columbia University November 3, 2015 Definition A probability density in the exponential family has this form where p.x j / D h.x/ expf > t.x/ a./g; (1) is the natural

### Chenfeng Xiong (corresponding), University of Maryland, College Park (cxiong@umd.edu)

Paper Author (s) Chenfeng Xiong (corresponding), University of Maryland, College Park (cxiong@umd.edu) Lei Zhang, University of Maryland, College Park (lei@umd.edu) Paper Title & Number Dynamic Travel

### Adaptive Search with Stochastic Acceptance Probabilities for Global Optimization

Adaptive Search with Stochastic Acceptance Probabilities for Global Optimization Archis Ghate a and Robert L. Smith b a Industrial Engineering, University of Washington, Box 352650, Seattle, Washington,

### Joint models for classification and comparison of mortality in different countries.

Joint models for classification and comparison of mortality in different countries. Viani D. Biatat 1 and Iain D. Currie 1 1 Department of Actuarial Mathematics and Statistics, and the Maxwell Institute

### These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher

### LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values

### Imputing Values to Missing Data

Imputing Values to Missing Data In federated data, between 30%-70% of the data points will have at least one missing attribute - data wastage if we ignore all records with a missing value Remaining data

Graduate Programs in Statistics Course Titles STAT 100 CALCULUS AND MATR IX ALGEBRA FOR STATISTICS. Differential and integral calculus; infinite series; matrix algebra STAT 195 INTRODUCTION TO MATHEMATICAL

### Linear Threshold Units

Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

### I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of

### CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(

### Bayes and Naïve Bayes. cs534-machine Learning

Bayes and aïve Bayes cs534-machine Learning Bayes Classifier Generative model learns Prediction is made by and where This is often referred to as the Bayes Classifier, because of the use of the Bayes rule

### Yiming Peng, Department of Statistics. February 12, 2013

Regression Analysis Using JMP Yiming Peng, Department of Statistics February 12, 2013 2 Presentation and Data http://www.lisa.stat.vt.edu Short Courses Regression Analysis Using JMP Download Data to Desktop

### Likelihood: Frequentist vs Bayesian Reasoning

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B University of California, Berkeley Spring 2009 N Hallinan Likelihood: Frequentist vs Bayesian Reasoning Stochastic odels and

### The Proportional Odds Model for Assessing Rater Agreement with Multiple Modalities

The Proportional Odds Model for Assessing Rater Agreement with Multiple Modalities Elizabeth Garrett-Mayer, PhD Assistant Professor Sidney Kimmel Comprehensive Cancer Center Johns Hopkins University 1

### CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not

### An Application of Inverse Reinforcement Learning to Medical Records of Diabetes Treatment

An Application of Inverse Reinforcement Learning to Medical Records of Diabetes Treatment Hideki Asoh 1, Masanori Shiro 1 Shotaro Akaho 1, Toshihiro Kamishima 1, Koiti Hasida 1, Eiji Aramaki 2, and Takahide

### Least Squares Estimation

Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

### Dealing with large datasets

Dealing with large datasets (by throwing away most of the data) Alan Heavens Institute for Astronomy, University of Edinburgh with Ben Panter, Rob Tweedie, Mark Bastin, Will Hossack, Keith McKellar, Trevor

### Examining credit card consumption pattern

Examining credit card consumption pattern Yuhao Fan (Economics Department, Washington University in St. Louis) Abstract: In this paper, I analyze the consumer s credit card consumption data from a commercial

### STAT3016 Introduction to Bayesian Data Analysis

STAT3016 Introduction to Bayesian Data Analysis Course Description The Bayesian approach to statistics assigns probability distributions to both the data and unknown parameters in the problem. This way,

### INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its

### TOWARD BIG DATA ANALYSIS WORKSHOP

TOWARD BIG DATA ANALYSIS WORKSHOP 邁 向 巨 量 資 料 分 析 研 討 會 摘 要 集 2015.06.05-06 巨 量 資 料 之 矩 陣 視 覺 化 陳 君 厚 中 央 研 究 院 統 計 科 學 研 究 所 摘 要 視 覺 化 (Visualization) 與 探 索 式 資 料 分 析 (Exploratory Data Analysis, EDA)

### CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES From Exploratory Factor Analysis Ledyard R Tucker and Robert C MacCallum 1997 180 CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES In

### Regression Modeling Strategies

Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions

### Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

1 Web-based Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed

### i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by

Statistics 580 Maximum Likelihood Estimation Introduction Let y (y 1, y 2,..., y n be a vector of iid, random variables from one of a family of distributions on R n and indexed by a p-dimensional parameter

### SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

### Bayesian Statistics in One Hour. Patrick Lam

Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical

### Making Sense of the Mayhem: Machine Learning and March Madness

Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research

### Poisson Models for Count Data

Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

### Pricing and calibration in local volatility models via fast quantization

Pricing and calibration in local volatility models via fast quantization Parma, 29 th January 2015. Joint work with Giorgia Callegaro and Martino Grasselli Quantization: a brief history Birth: back to

### Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data

Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data (Oxford) in collaboration with: Minjie Xu, Jun Zhu, Bo Zhang (Tsinghua) Balaji Lakshminarayanan (Gatsby) Bayesian

### 11. Analysis of Case-control Studies Logistic Regression

Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

### Program description for the Master s Degree Program in Mathematics and Finance

Program description for the Master s Degree Program in Mathematics and Finance : English: Master s Degree in Mathematics and Finance Norwegian, bokmål: Master i matematikk og finans Norwegian, nynorsk:

Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

### Statistics in Applications III. Distribution Theory and Inference

2.2 Master of Science Degrees The Department of Statistics at FSU offers three different options for an MS degree. 1. The applied statistics degree is for a student preparing for a career as an applied

### Lecture 6: Logistic Regression

Lecture 6: CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 13, 2011 Outline Outline Classification task Data : X = [x 1,..., x m]: a n m matrix of data points in R n. y { 1,

### Hedge fund pricing and model uncertainty

Hedge fund pricing and model uncertainty Spyridon D.Vrontos a, Ioannis D.Vrontos b, Daniel Giamouridis c, a Department of Statistics and Actuarial-Financial Mathematics, University of Aegean, Samos, Greece

### Parameter Estimation: A Deterministic Approach using the Levenburg-Marquardt Algorithm

Parameter Estimation: A Deterministic Approach using the Levenburg-Marquardt Algorithm John Bardsley Department of Mathematical Sciences University of Montana Applied Math Seminar-Feb. 2005 p.1/14 Outline

### Bayesian Statistical Analysis in Medical Research

Bayesian Statistical Analysis in Medical Research David Draper Department of Applied Mathematics and Statistics University of California, Santa Cruz draper@ams.ucsc.edu www.ams.ucsc.edu/ draper ROLE Steering

### A Stochastic Model For Critical Illness Insurance

A Stochastic Model For Critical Illness Insurance Erengul Ozkok Submitted for the degree of Doctor of Philosophy on completion of research in the Department of Actuarial Mathematics & Statistics, School

### Neural Networks for Machine Learning. Lecture 13a The ups and downs of backpropagation

Neural Networks for Machine Learning Lecture 13a The ups and downs of backpropagation Geoffrey Hinton Nitish Srivastava, Kevin Swersky Tijmen Tieleman Abdel-rahman Mohamed A brief history of backpropagation

### Bayesian Statistics: Indian Buffet Process

Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note

### GLM, insurance pricing & big data: paying attention to convergence issues.

GLM, insurance pricing & big data: paying attention to convergence issues. Michaël NOACK - michael.noack@addactis.com Senior consultant & Manager of ADDACTIS Pricing Copyright 2014 ADDACTIS Worldwide.

### Generalized Linear Models

Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

### Credit Risk Models. August 24 26, 2010

Credit Risk Models August 24 26, 2010 AGENDA 1 st Case Study : Credit Rating Model Borrowers and Factoring (Accounts Receivable Financing) pages 3 10 2 nd Case Study : Credit Scoring Model Automobile Leasing

### Gamma Distribution Fitting

Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics

### Bayesian Phylogeny and Measures of Branch Support

Bayesian Phylogeny and Measures of Branch Support Bayesian Statistics Imagine we have a bag containing 100 dice of which we know that 90 are fair and 10 are biased. The

### Bayesian logistic betting strategy against probability forecasting. Akimichi Takemura, Univ. Tokyo. November 12, 2012

Bayesian logistic betting strategy against probability forecasting Akimichi Takemura, Univ. Tokyo (joint with Masayuki Kumon, Jing Li and Kei Takeuchi) November 12, 2012 arxiv:1204.3496. To appear in Stochastic

### Basics of Statistical Machine Learning

CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar