Bayesian Penalized Methods for High Dimensional Data



Similar documents
Lasso on Categorical Data

Linear Algebra Review. Vectors

Marketing Mix Modelling and Big Data P. M Cain

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University

Logistic Regression (1/24/13)

TOWARD BIG DATA ANALYSIS WORKSHOP

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Overview. Longitudinal Data Variation and Correlation Different Approaches. Linear Mixed Models Generalized Linear Mixed Models

Penalized regression: Introduction

STA 4273H: Statistical Machine Learning

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

Statistical machine learning, high dimension and big data

Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC

Regularized Logistic Regression for Mind Reading with Parallel Validation

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Presentation by: Ahmad Alsahaf. Research collaborator at the Hydroinformatics lab - Politecnico di Milano MSc in Automation and Control Engineering

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Factors for success in big data science

Bayes and Naïve Bayes. cs534-machine Learning

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data

CHAPTER 15 NOMINAL MEASURES OF CORRELATION: PHI, THE CONTINGENCY COEFFICIENT, AND CRAMER'S V

UKB_WCSGAX: UK Biobank 500K Samples Genotyping Data Generation by the Affymetrix Research Services Laboratory. April, 2015

Cognitive Neuroscience. Questions. Multiple Methods. Electrophysiology. Multiple Methods. Approaches to Thinking about the Mind

THE HUMAN BRAIN. observations and foundations

New Work Item for ISO Predictive Analytics (Initial Notes and Thoughts) Introduction

Degrees of Freedom and Model Search

Ridge Regression. Patrick Breheny. September 1. Ridge regression Selection of λ Ridge regression in R/SAS

Dimensionality Reduction: Principal Components Analysis

Statistics Graduate Courses

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

Neuroimaging module I: Modern neuroimaging methods of investigation of the human brain in health and disease

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Introduction to nonparametric regression: Least squares vs. Nearest neighbors

Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Fitting Subject-specific Curves to Grouped Longitudinal Data

Building risk prediction models - with a focus on Genome-Wide Association Studies. Charles Kooperberg

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on

BayesX - Software for Bayesian Inference in Structured Additive Regression

Identifying SPAM with Predictive Models

modelsampler: An R Tool for Variable Selection and Model Exploration in Linear Regression

Text Analytics (Text Mining)

Obtaining Knowledge. Lecture 7 Methods of Scientific Observation and Analysis in Behavioral Psychology and Neuropsychology.

Bayesian Variable Selection in Normal Regression Models

How to assess the risk of a large portfolio? How to estimate a large covariance matrix?

Lecture 5: Singular Value Decomposition SVD (1)

RUNNING HEAD: FAFSA lists 1

Linear Classification. Volker Tresp Summer 2015

Bayesian Statistics: Indian Buffet Process

Linear regression methods for large n and streaming data

Big Data need Big Model 1/44

High-Dimensional Image Warping

An EVIDENCE-ENHANCED HEALTHCARE ECOSYSTEM for Cancer: I/T perspectives

Orthogonal Diagonalization of Symmetric Matrices

Traffic Driven Analysis of Cellular Data Networks

The Data Mining Process

Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees

Syllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015

Nonlinear Iterative Partial Least Squares Method

Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur

Incorporating cost in Bayesian Variable Selection, with application to cost-effective measurement of quality of health care.

Differential privacy in health care analytics and medical research An interactive tutorial

Exploratory Data Analysis with MATLAB

Parallelization Strategies for Multicore Data Analysis

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

Graphical Modeling for Genomic Data

Machine Learning Big Data using Map Reduce

MONTE-CARLO SIMULATION OF AMERICAN OPTIONS WITH GPUS. Julien Demouth, NVIDIA

Likelihood Approaches for Trial Designs in Early Phase Oncology

Predictive Gene Signature Selection for Adjuvant Chemotherapy in Non-Small Cell Lung Cancer Patients

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

GLAM Array Methods in Statistics

Individual patient data meta-analysis of continuous diagnostic markers

Speaker First Plenary Session THE USE OF "BIG DATA" - WHERE ARE WE AND WHAT DOES THE FUTURE HOLD? William H. Crown, PhD

Component Ordering in Independent Component Analysis Based on Data Power

Machine Learning Logistic Regression

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.

Applications of R Software in Bayesian Data Analysis

SYSTEMS OF REGRESSION EQUATIONS

ScienceDirect. Brain Image Classification using Learning Machine Approach and Brain Structure Analysis

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Nonparametric statistics and model selection

BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING

Social Security Disability Insurance and young onset dementia: A guide for employers and employees

Transcription:

Bayesian Penalized Methods for High Dimensional Data Joseph G. Ibrahim Joint with Hongtu Zhu and Zakaria Khondker

What is Covered? Motivation GLRR: Bayesian Generalized Low Rank Regression L2R2: Bayesian Longitudinal Low Rank Regression ADNI data analysis

Alzheimer s Disease Alzheimer's disease (AD) is an escalating national epidemic and a genetically complex, progressive, and fatal neurodegenetive disease. The incidence of AD doubles every five years after the age of 65 and the number of AD patients has recently dramatically increased, which has caused a heavy socioeconomic burden. AD is the sixth leading cause of death in the United States, and there is no means to prevent, cure or even slow its progression.

ADNI Database The Alzheimer's Disease Neuroimaging Initiative (ADNI) is the first "Big Data" project for AD and is collecting imaging, genetic, clinical, and cognitive data for measuring the progress of AD or the effects of treatment. ADNI began 2004 and has three phases including ADNI 1, ADNI Go, and ADNI 2. Efficiently integrating big ADNI data may lead to (AD1) detecting AD at the earliest stage possible and marking its progress through biomarkers; (AD2) developing new diagnostic methods for AD intervention, prevention and treatment.

ADNI Database ADNI 1. Integrating Imaging and Genetic Data to identify genetic and environmental contributions to brain baseline data and brain development trajectories. Model: Brain volume = f(snp, age, gender, ) Data: Genotype: SNPs (X) ( 600,000+) MRI ROI (region of interest volumes = Y) (93) Prognostic factors: age, gender, education, etc. Disease status

Magnetic Resonance Imaging (MRI) Voxel is 3 D version of Pixel MRI machine reads signal on a voxel, stores in 3 D array smri = structure of brain fmri = brain activity from blood flow Voxel: n subjects will yield nx6 million matrix ROIs reduce dimension to 93 ROIs ROIs may be more clinically meaningful

Single Nucleotide Polymorphism (SNP) Normal (not rare) different nucleotides in the same location SNPs may affect gene function ADNI: 600,000 SNPs n=750 << 600,000 SNPs Select SNPs only on top 40 genes reported by AlzGene database ( 1,000 SNPs)

Bayesian Shrinkage and Selection Prior : log(prior) = penalty function = Posterior: Frequentist penalized estimation Maximum aposteriori (MAP) estimation MLE sets penalty to 0 (MAP with noninformative priors)

Bayesian Shrinkage and Selection Popular choice α 1 shrinkage and selection: creates singularity at 0 and a black hole, to pull smaller elements to 0 Bridge regression: α < 1 L 1 priors (lasso, adaptive lasso): α = 1 α > 1 No selection, shrinkage only ridge regression: α = 2

Prior creates a singularity at origin. MAP estimation allows selection and shrinkage Black Hole Priors: α 1 Unstable around the boundary

Distributional Perspective Huge spike/gravity implies smaller coefficients shrink more Singularity/Discontinuity at the origin No singularity Smaller spike/gravity implies smaller coefficients shrink less Want huge spike (gravity) at the origin; Gravity should pull the smaller coefficients to 0

Distributional Perspective Flatter tail/weaker gravity implies larger coefficients shrink less Steeper slope/stronger gravity implies larger coefficients shrink more Want heavy tails/minimum gravity / flat density far from origin; Gravity should not affect the larger coefficients

Commonly Used Priors Larger spike at the origin and heavier tails

GLRR: Why Low Rank Regression? Do SNPs act alone or work together? Do the ROIs also act together? Do ROIs and SNPs acting together support some underlying structure in the regression coefficients. We try and exploit this structure to reduce dimension

GLRR: Low Rank Regression n p * = r*(p+d) << p*d, 5*(1K+1K) = 10K << 1K*1K = 1 million

GLRR: Generalization of SVD U and V need not be unitary (orthonormal) otherwise need matrix VMF and metropolis No ordering restriction on elements of Δ otherwise need truncated normal and metropolis Many Bayesian applications do not require identifiability Allows closed form full conditionals to apply Gibbs sampler scale to larger dimensions computational efficiency

GLRR: Model and Priors

GLRR: Model and Priors Cov(Y i ) = Priors on Covariance parameters

GLRR: Why L 2 Priors If covariates are correlated L 2 tends to push them towards each other more correlated estimates (Ridge), reason for our choice L 1 tends to pick one, force the rest to 0 least absolute subset selection operator (lasso) True β 1 1 1 1 1 1 1 1 1 1 OLS 2.95 1.09 1.11 1.24 0.98 0.98 1.57 1.14 1.33 0.66 Ridge 1.13 1.02 0.75 1.19 0.86 0.99 1.46 1.03 1.21 0.62 Lasso 0 0 0 2.95 0 0.07 0.97 0 0.23 0 n = 30, p = 10, blue = highly correlated x s, black = independent x s

GLRR: Comparison Criteria for Determining the Rank of B MEN used by Yuan (JRSSB, 2007)

GLRR: Finding Rank, (p,d,n) = (200,100,100)

GLRR: Simulated Performance

GLRR: Simulated ROC Blue:GLRR5 Red:GLRR3 Black: = LASSO --- : BLASSO : G-SMuRFS

GLRR: Simulated Image Recovery Rows: True, LASSO, BLASSO, G-SMuRFS, GLRR3, GLRR5, respectively. Columns: Cases 1-5 n = 1,000 GLRR better for low rank, lasso and GLRR are similar for high rank

GLRR: ADNI Application ADNI Database: n = 749 subjects, d = 93 ROI volumes, p = 1,072 SNPs on top 40 genes from AlzGene database. Standardized ROI volumes and SNPs Smallest BIC was at r = 3 (checked r = 1 to 10) Compute Binary B (say, B bin ) using p value < 0.001 thresholding Columns of U correspond to SNPs and Columns of V correspond to ROI Compute B bint B bin (ROI), B bin B bint (SNP)

GLRR: Using B bint B bin Largest Diagonals Top ROI: highest # of significant SNPs Largest Column Sum Top ROI: highest # sig. SNPs and highest # sig. of SNPs that also affect other ROIs 7.1 g protein/ounce 0.81 g protein/ounce 0.10 g protein/calorie 0.12 g protein/calorie

GLRR: ADNI Results -log 10 (p) of B -log 10 (p) of U -log 10 (p) of V B B bint B bin B bin B bin T

GLRR: ADNI ROI Network Top 20 ROIs based on B bint B bin and 3 layers of V ROIs most highly correlated with rs10792821(picalm), rs9791189(nedd9), rs9376660(loc651924), rs17310467(prnp), respectively. Dot size = size of coefficient (element of B).

L2R2: Model Setup

L2R2: Priors q* = number of random effects Covariance estimation same as GLRR Can apply Gibbs sampler

L2R2 : Simulated Results

L2R2 : Simulated ROC L2R2 and G SMuRFS same for prognostic factors L2R2 better than G SMuRFS for SNPs

L2R2: Simulated Image Recovery True G SMuRFS L2R2 Mod. Sparse Ext. Spares

Closing Remarks GLRR outperforms LASSO, BLASSO, and G SMuRFS in a great many settings. Gibbs: Scale to larger dimensions only feasible choice for HD data Metropolis: Don t scale Single try: works on small dimensions Multiple try: only on tiny dimensions Selection with p >> n is unstable

Closing Remarks Computer code written in MATLAB For r=3 in GLRR, 30 minutes for 10K samples (1500 parameters). For r=5 in GLRR, 40 minutes for 10K samples (2500 parameters) BLASSO takes 3 hours (40K parameters).