Bayesian Penalized Methods for High Dimensional Data

Size: px

Start display at page:

Download "Bayesian Penalized Methods for High Dimensional Data"

Tyrone Barrett
10 years ago
Views:

1 Bayesian Penalized Methods for High Dimensional Data Joseph G. Ibrahim Joint with Hongtu Zhu and Zakaria Khondker

2 What is Covered? Motivation GLRR: Bayesian Generalized Low Rank Regression L2R2: Bayesian Longitudinal Low Rank Regression ADNI data analysis

4 Alzheimer s Disease Alzheimer's disease (AD) is an escalating national epidemic and a genetically complex, progressive, and fatal neurodegenetive disease. The incidence of AD doubles every five years after the age of 65 and the number of AD patients has recently dramatically increased, which has caused a heavy socioeconomic burden. AD is the sixth leading cause of death in the United States, and there is no means to prevent, cure or even slow its progression.

The incidence of AD doubles every five years after the age of 65 and the number of AD patients has recently

5 ADNI Database The Alzheimer's Disease Neuroimaging Initiative (ADNI) is the first "Big Data" project for AD and is collecting imaging, genetic, clinical, and cognitive data for measuring the progress of AD or the effects of treatment. ADNI began 2004 and has three phases including ADNI 1, ADNI Go, and ADNI 2. Efficiently integrating big ADNI data may lead to (AD1) detecting AD at the earliest stage possible and marking its progress through biomarkers; (AD2) developing new diagnostic methods for AD intervention, prevention and treatment.

ADNI began 2004 and has three phases including ADNI 1, ADNI Go, and ADNI 2.

6 ADNI Database ADNI 1. Integrating Imaging and Genetic Data to identify genetic and environmental contributions to brain baseline data and brain development trajectories. Model: Brain volume = f(snp, age, gender, ) Data: Genotype: SNPs (X) ( 600,000+) MRI ROI (region of interest volumes = Y) (93) Prognostic factors: age, gender, education, etc. Disease status

to brain baseline data and brain development trajectories.

7 Magnetic Resonance Imaging (MRI) Voxel is 3 D version of Pixel MRI machine reads signal on a voxel, stores in 3 D array smri = structure of brain fmri = brain activity from blood flow Voxel: n subjects will yield nx6 million matrix ROIs reduce dimension to 93 ROIs ROIs may be more clinically meaningful

= brain activity from blood flow Voxel: n subjects will yield nx6 million

8 Single Nucleotide Polymorphism (SNP) Normal (not rare) different nucleotides in the same location SNPs may affect gene function ADNI: 600,000 SNPs n=750 << 600,000 SNPs Select SNPs only on top 40 genes reported by AlzGene database ( 1,000 SNPs)

gene function ADNI: 600,000 SNPs n=750 << 600,000 SNPs

9 Bayesian Shrinkage and Selection Prior : log(prior) = penalty function = Posterior: Frequentist penalized estimation Maximum aposteriori (MAP) estimation MLE sets penalty to 0 (MAP with noninformative priors)

10 Bayesian Shrinkage and Selection Popular choice α 1 shrinkage and selection: creates singularity at 0 and a black hole, to pull smaller elements to 0 Bridge regression: α < 1 L 1 priors (lasso, adaptive lasso): α = 1 α > 1 No selection, shrinkage only ridge regression: α = 2

smaller elements to 0 Bridge regression: α < 1 L 1 priors (lasso,

11 Prior creates a singularity at origin. MAP estimation allows selection and shrinkage Black Hole Priors: α 1 Unstable around the boundary

12 Distributional Perspective Huge spike/gravity implies smaller coefficients shrink more Singularity/Discontinuity at the origin No singularity Smaller spike/gravity implies smaller coefficients shrink less Want huge spike (gravity) at the origin; Gravity should pull the smaller coefficients to 0

singularity Smaller spike/gravity implies smaller coefficients shrink

13 Distributional Perspective Flatter tail/weaker gravity implies larger coefficients shrink less Steeper slope/stronger gravity implies larger coefficients shrink more Want heavy tails/minimum gravity / flat density far from origin; Gravity should not affect the larger coefficients

larger coefficients shrink more Want heavy tails/minimum gravity /

14 Commonly Used Priors Larger spike at the origin and heavier tails

16 GLRR: Why Low Rank Regression? Do SNPs act alone or work together? Do the ROIs also act together? Do ROIs and SNPs acting together support some underlying structure in the regression coefficients. We try and exploit this structure to reduce dimension

17 GLRR: Low Rank Regression n p * = r*(p+d) << p*d, 5*(1K+1K) = 10K << 1K*1K = 1 million

18 GLRR: Generalization of SVD U and V need not be unitary (orthonormal) otherwise need matrix VMF and metropolis No ordering restriction on elements of Δ otherwise need truncated normal and metropolis Many Bayesian applications do not require identifiability Allows closed form full conditionals to apply Gibbs sampler scale to larger dimensions computational efficiency

and metropolis Many Bayesian applications do not require identifiability Allows closed form

19 GLRR: Model and Priors

20 GLRR: Model and Priors Cov(Y i ) = Priors on Covariance parameters

21 GLRR: Why L 2 Priors If covariates are correlated L 2 tends to push them towards each other more correlated estimates (Ridge), reason for our choice L 1 tends to pick one, force the rest to 0 least absolute subset selection operator (lasso) True β OLS Ridge Lasso n = 30, p = 10, blue = highly correlated x s, black = independent x s

22 GLRR: Comparison Criteria for Determining the Rank of B MEN used by Yuan (JRSSB, 2007)

23 GLRR: Finding Rank, (p,d,n) = (200,100,100)

24 GLRR: Simulated Performance

25 GLRR: Simulated ROC Blue:GLRR5 Red:GLRR3 Black: = LASSO --- : BLASSO : G-SMuRFS

26 GLRR: Simulated Image Recovery Rows: True, LASSO, BLASSO, G-SMuRFS, GLRR3, GLRR5, respectively. Columns: Cases 1-5 n = 1,000 GLRR better for low rank, lasso and GLRR are similar for high rank

27 GLRR: ADNI Application ADNI Database: n = 749 subjects, d = 93 ROI volumes, p = 1,072 SNPs on top 40 genes from AlzGene database. Standardized ROI volumes and SNPs Smallest BIC was at r = 3 (checked r = 1 to 10) Compute Binary B (say, B bin ) using p value < thresholding Columns of U correspond to SNPs and Columns of V correspond to ROI Compute B bint B bin (ROI), B bin B bint (SNP)

28 GLRR: Using B bint B bin Largest Diagonals Top ROI: highest # of significant SNPs Largest Column Sum Top ROI: highest # sig. SNPs and highest # sig. of SNPs that also affect other ROIs 7.1 g protein/ounce 0.81 g protein/ounce 0.10 g protein/calorie 0.12 g protein/calorie

29 GLRR: ADNI Results -log 10 (p) of B -log 10 (p) of U -log 10 (p) of V B B bint B bin B bin B bin T

30 GLRR: ADNI ROI Network Top 20 ROIs based on B bint B bin and 3 layers of V ROIs most highly correlated with rs (picalm), rs (nedd9), rs (loc651924), rs (prnp), respectively. Dot size = size of coefficient (element of B).

32 L2R2: Model Setup

33 L2R2: Priors q* = number of random effects Covariance estimation same as GLRR Can apply Gibbs sampler

34 L2R2 : Simulated Results

35 L2R2 : Simulated ROC L2R2 and G SMuRFS same for prognostic factors L2R2 better than G SMuRFS for SNPs

36 L2R2: Simulated Image Recovery True G SMuRFS L2R2 Mod. Sparse Ext. Spares

37 Closing Remarks GLRR outperforms LASSO, BLASSO, and G SMuRFS in a great many settings. Gibbs: Scale to larger dimensions only feasible choice for HD data Metropolis: Don t scale Single try: works on small dimensions Multiple try: only on tiny dimensions Selection with p >> n is unstable

38 Closing Remarks Computer code written in MATLAB For r=3 in GLRR, 30 minutes for 10K samples (1500 parameters). For r=5 in GLRR, 40 minutes for 10K samples (2500 parameters) BLASSO takes 3 hours (40K parameters).

Lasso on Categorical Data

Lasso on Categorical Data Yunjin Choi, Rina Park, Michael Seo December 14, 2012 1 Introduction In social science studies, the variables of interest are often categorical, such as race, gender, and nationality.