Bayesian Penalized Methods for High Dimensional Data Joseph G. Ibrahim Joint with Hongtu Zhu and Zakaria Khondker
What is Covered? Motivation GLRR: Bayesian Generalized Low Rank Regression L2R2: Bayesian Longitudinal Low Rank Regression ADNI data analysis
Alzheimer s Disease Alzheimer's disease (AD) is an escalating national epidemic and a genetically complex, progressive, and fatal neurodegenetive disease. The incidence of AD doubles every five years after the age of 65 and the number of AD patients has recently dramatically increased, which has caused a heavy socioeconomic burden. AD is the sixth leading cause of death in the United States, and there is no means to prevent, cure or even slow its progression.
ADNI Database The Alzheimer's Disease Neuroimaging Initiative (ADNI) is the first "Big Data" project for AD and is collecting imaging, genetic, clinical, and cognitive data for measuring the progress of AD or the effects of treatment. ADNI began 2004 and has three phases including ADNI 1, ADNI Go, and ADNI 2. Efficiently integrating big ADNI data may lead to (AD1) detecting AD at the earliest stage possible and marking its progress through biomarkers; (AD2) developing new diagnostic methods for AD intervention, prevention and treatment.
ADNI Database ADNI 1. Integrating Imaging and Genetic Data to identify genetic and environmental contributions to brain baseline data and brain development trajectories. Model: Brain volume = f(snp, age, gender, ) Data: Genotype: SNPs (X) ( 600,000+) MRI ROI (region of interest volumes = Y) (93) Prognostic factors: age, gender, education, etc. Disease status
Magnetic Resonance Imaging (MRI) Voxel is 3 D version of Pixel MRI machine reads signal on a voxel, stores in 3 D array smri = structure of brain fmri = brain activity from blood flow Voxel: n subjects will yield nx6 million matrix ROIs reduce dimension to 93 ROIs ROIs may be more clinically meaningful
Single Nucleotide Polymorphism (SNP) Normal (not rare) different nucleotides in the same location SNPs may affect gene function ADNI: 600,000 SNPs n=750 << 600,000 SNPs Select SNPs only on top 40 genes reported by AlzGene database ( 1,000 SNPs)
Bayesian Shrinkage and Selection Prior : log(prior) = penalty function = Posterior: Frequentist penalized estimation Maximum aposteriori (MAP) estimation MLE sets penalty to 0 (MAP with noninformative priors)
Bayesian Shrinkage and Selection Popular choice α 1 shrinkage and selection: creates singularity at 0 and a black hole, to pull smaller elements to 0 Bridge regression: α < 1 L 1 priors (lasso, adaptive lasso): α = 1 α > 1 No selection, shrinkage only ridge regression: α = 2
Prior creates a singularity at origin. MAP estimation allows selection and shrinkage Black Hole Priors: α 1 Unstable around the boundary
Distributional Perspective Huge spike/gravity implies smaller coefficients shrink more Singularity/Discontinuity at the origin No singularity Smaller spike/gravity implies smaller coefficients shrink less Want huge spike (gravity) at the origin; Gravity should pull the smaller coefficients to 0
Distributional Perspective Flatter tail/weaker gravity implies larger coefficients shrink less Steeper slope/stronger gravity implies larger coefficients shrink more Want heavy tails/minimum gravity / flat density far from origin; Gravity should not affect the larger coefficients
Commonly Used Priors Larger spike at the origin and heavier tails
GLRR: Why Low Rank Regression? Do SNPs act alone or work together? Do the ROIs also act together? Do ROIs and SNPs acting together support some underlying structure in the regression coefficients. We try and exploit this structure to reduce dimension
GLRR: Low Rank Regression n p * = r*(p+d) << p*d, 5*(1K+1K) = 10K << 1K*1K = 1 million
GLRR: Generalization of SVD U and V need not be unitary (orthonormal) otherwise need matrix VMF and metropolis No ordering restriction on elements of Δ otherwise need truncated normal and metropolis Many Bayesian applications do not require identifiability Allows closed form full conditionals to apply Gibbs sampler scale to larger dimensions computational efficiency
GLRR: Model and Priors
GLRR: Model and Priors Cov(Y i ) = Priors on Covariance parameters
GLRR: Why L 2 Priors If covariates are correlated L 2 tends to push them towards each other more correlated estimates (Ridge), reason for our choice L 1 tends to pick one, force the rest to 0 least absolute subset selection operator (lasso) True β 1 1 1 1 1 1 1 1 1 1 OLS 2.95 1.09 1.11 1.24 0.98 0.98 1.57 1.14 1.33 0.66 Ridge 1.13 1.02 0.75 1.19 0.86 0.99 1.46 1.03 1.21 0.62 Lasso 0 0 0 2.95 0 0.07 0.97 0 0.23 0 n = 30, p = 10, blue = highly correlated x s, black = independent x s
GLRR: Comparison Criteria for Determining the Rank of B MEN used by Yuan (JRSSB, 2007)
GLRR: Finding Rank, (p,d,n) = (200,100,100)
GLRR: Simulated Performance
GLRR: Simulated ROC Blue:GLRR5 Red:GLRR3 Black: = LASSO --- : BLASSO : G-SMuRFS
GLRR: Simulated Image Recovery Rows: True, LASSO, BLASSO, G-SMuRFS, GLRR3, GLRR5, respectively. Columns: Cases 1-5 n = 1,000 GLRR better for low rank, lasso and GLRR are similar for high rank
GLRR: ADNI Application ADNI Database: n = 749 subjects, d = 93 ROI volumes, p = 1,072 SNPs on top 40 genes from AlzGene database. Standardized ROI volumes and SNPs Smallest BIC was at r = 3 (checked r = 1 to 10) Compute Binary B (say, B bin ) using p value < 0.001 thresholding Columns of U correspond to SNPs and Columns of V correspond to ROI Compute B bint B bin (ROI), B bin B bint (SNP)
GLRR: Using B bint B bin Largest Diagonals Top ROI: highest # of significant SNPs Largest Column Sum Top ROI: highest # sig. SNPs and highest # sig. of SNPs that also affect other ROIs 7.1 g protein/ounce 0.81 g protein/ounce 0.10 g protein/calorie 0.12 g protein/calorie
GLRR: ADNI Results -log 10 (p) of B -log 10 (p) of U -log 10 (p) of V B B bint B bin B bin B bin T
GLRR: ADNI ROI Network Top 20 ROIs based on B bint B bin and 3 layers of V ROIs most highly correlated with rs10792821(picalm), rs9791189(nedd9), rs9376660(loc651924), rs17310467(prnp), respectively. Dot size = size of coefficient (element of B).
L2R2: Model Setup
L2R2: Priors q* = number of random effects Covariance estimation same as GLRR Can apply Gibbs sampler
L2R2 : Simulated Results
L2R2 : Simulated ROC L2R2 and G SMuRFS same for prognostic factors L2R2 better than G SMuRFS for SNPs
L2R2: Simulated Image Recovery True G SMuRFS L2R2 Mod. Sparse Ext. Spares
Closing Remarks GLRR outperforms LASSO, BLASSO, and G SMuRFS in a great many settings. Gibbs: Scale to larger dimensions only feasible choice for HD data Metropolis: Don t scale Single try: works on small dimensions Multiple try: only on tiny dimensions Selection with p >> n is unstable
Closing Remarks Computer code written in MATLAB For r=3 in GLRR, 30 minutes for 10K samples (1500 parameters). For r=5 in GLRR, 40 minutes for 10K samples (2500 parameters) BLASSO takes 3 hours (40K parameters).