Comparing Functional Data Analysis Approach and Nonparametric Mixed-Effects Modeling Approach for Longitudinal Data Analysis

Transcription

1 Comparing Functional Data Analysis Approach and Nonparametric Mixed-Effects Modeling Approach for Longitudinal Data Analysis Hulin Wu, PhD, Professor (with Dr. Shuang Wu) Department of Biostatistics & Computational Biology University of Rochester Medical Center October, 1 Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 1 / 31

2 Table of contents 1 Introduction Comparisons: NPME vs. fpca-pace 3 Comparisons: Individual Smoothing vs. fpca-integration Method Summary and Conclusion Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 / 31

3 Question to Address Nonparametric longitudinal data analysis methods: Nonparametric mixed-effects models Functional PCA analysis Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 3 / 31

4 Analysis of longitudinal studies Parametric mixed-effects models: LME and NLME models: e.g. y i = X i β + Z i b i + ɛ i, b i N (, D), ɛ i N (, R i ), i = 1,,..., n Parametric Restrictive Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 / 31

5 Nonparametric mixed-effects (NPME) model y i (t) = µ(t) + ν i (t) + ɛ i (t) = p q β j B j (t) + b ik Bk (t) + ɛ i(t) j=1 k=1 Regression splines: Various choices of basis functions, known Mixed-effects modeling: Borrow information from across-subjects (curves), shrink to the mean Estimation: MLE or REML (SAS, R) Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 / 31

6 Functional approach based on principal component analysis Y ij = X i (t ij ) + ɛ ij K = µ(t ij ) + ξ ik φ k (t ij ) + ɛ ij k=1 Mean function µ(t): any nonparametric smoothing method Between-subject (curve) variation K ξ ik φ k (t ij ): Karhunen-Loeve k=1 approximation Both PC scores (ξ ik ) and basis functions (eigenfunctions φ k (t)): need to be estimated from data PC scores (coefficients): estimated by PACE: mixed-effects modeling idea to borrow information across subjects (curves) Integration method: individual estimate for each subject (curve) Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 6 / 31

7 Simulation Comparisons: NPME and fpca-pace y i (t) = a i + a i1 cos(πt) + a i sin(πt) + ɛ i (t), a i = [a i, a i1, a i ] T N [(1,, 1), diag(σ, σ1, σ)], ɛ i (t) N [, σɛ (1 + t)], i = 1,,..., n t j = j/(m + 1), j = 1,,..., m n =, m = Unbalanced data: r miss =.,.,.8 ISE = (ˆµ(t) µ(t)) dt MISE = 1 n n (ŷ i (t) y i (t)) dt i=1 Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 7 / 31

8 Simulation I: small variation, (σ, σ 1, σ ) = (, 1, 1) 8 6 y i t Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 8 / 31

9 Simulation I: small variation, (σ, σ 1, σ ) = (, 1, 1) r miss Model Mean function Individual fits LPME.1 (.19).3733 (.88) % RSME.13 (.118).3733 (.88) PACE.177 (.133).38 (.118) LPME.169 (.116).618 (.813) % RSME.19 (.98).618 (.813) PACE.177 (.18).693 (.18) LPME. (.19) 1.3 (.76) 8% RSME.131 (.11) 1.3 (.76) PACE.1 (.189) (.691) Winner: Nonparametric mixed-effects (NPME) models Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 9 / 31

10 Simulation II: large variation, (σ, σ 1, σ ) = (3, 3, 3) 1 1 y i t Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 1 / 31

11 Simulation II: large variation, (σ, σ 1, σ ) = (3, 3, 3) r miss Model Mean function Individual fits LPME.31 (.77) 1.96 (.31) % RSME.36 (.797) 1.96 (.31) PACE.3639 (.31).11 (.67) LPME.31 (.6) (.66) % RSME.397 (.7) (.66) PACE.388 (.97) (.6) LPME.16 (.3) (1.36) 8% RSME.69 (.6) (1.36) PACE.616 (.36) (.31) Mean function estimate winner: NPME model Individual function estimate winner: fpca-pace Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 11 / 31

12 Example 1: Viral load in AIDS clinical trials viral load time(day) n = 6 patients, n i is 1, with a median of 8. Mean function estimates: RSME (blue), FPCA (red). Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 1 / 31

13 Viral load: individual fits 6 Patient 3 6 Patient 9 6 Patient 13 Patient 18 6 Patient 3 6 Patient 6 Patient 6 Patient 3 6 Patient 6 6 Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 13 / 31

14 Example : Yeast cell cycle gene expressions 3 gene expression time(min) 67 genes, t j = 7 (j 1) (minute), j = 1,,..., 18. Gene expressions are centered by mean of each gene; contains missing data. Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 1 / 31

15 Yeast gene expressions: individual fits Gene 6 Gene 1937 Gene Gene Gene 3 1 Gene 1 Gene 6 1 Gene 71 1 Gene Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 1 / 31

16 Time-course microarray gene expressions Independent sampling: one measurement from each subject, e.g. mice Longitudinal sampling: repeated measurements from same subject, e.g. human Features of data: number of genes n very large, usually several thousands number of time points m small (m 1) very few replications at each time point, usually or 3 noisy, possibly with missing data Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 16 / 31

17 Time-course microarray gene expressions Problem interested: identify differentially expressed genes One group: difference from baseline; variation over time Two or more groups: difference between groups Methods: ANOVA approach: treat time variable as a particular experimental factor (instant extension from static microarray experiments) Continuous approach: treat gene expressions as noisy measurements from an underlying function; nonparametric estimation of the underlying function (possibly with random effects) Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 17 / 31

18 Time-course microarray gene expressions y ijk = x i (t j ) + ɛ ijk, i = 1,..., n; j = 1,,..., m; k = 1,..., K L x i (t) = β il φ l (t), ɛ ijk (, σ ) l= H : x i (t) =, i = 1,..., n φ l (t): spline basis or PC basis In real data, no clear cut Statistics that provide a good ranking Multiple testing adjustment to control error rare, e.g. False Discovery Rate (FDR) Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 18 / 31

19 Methods Individual nonparametric smoothing (EDGE) φ l (t) as fixed basis statistics: goodness-of-fit (F statistics); area under curve (AUC) fpca-integration method (individual estimate of PC scores) φ l (t) as as eigenfunctions, estimated from entire samples statistics: area under curve (AUC) Both use bootstrap to calculate the null distribution of the statistics Significance cut-off by controlling FDR Applicable to both independence and longitudinal cases Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 19 / 31

20 Simulation study n=1, m = 1, K = 3 observations equidistant in [, 1] proportion of significant genes p =.1 Under H : y ijk = ɛ ijk, ɛ ijk N (,. ) Under H 1 : y ijk = a i sin(ω i π(t j b i )) + ɛ ijk, where a i, ω i U(., ), b i U(, 1). simulations Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 / 31

21 Simulation I Error under H 1 : ɛ ijk N (,. ) EDGE num rejected corr rejected FDR FNR FDR= FDR= FDR= PCA num rejected corr rejected FDR FNR FDR= FDR= FDR= Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 1 / 31

22 Simulation II Error under H 1 : ɛ ijk N (, (. v i ) ), v i is a dispersion factor EDGE num rejected corr rejected FDR FNR FDR= FDR= FDR= PCA num rejected corr rejected FDR FNR FDR= FDR= FDR= Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 / 31

23 Gene data from lungs of mice number of probes: n = 37 days post infection (DPI):, 1,..., 1 (m = 11) repetition: 3 for DPI= 1,..., 1, 6 for DPI= (3 no flu virus, 3 killed immediately after receiving flu virus) normalized by Welle lab using the PLIER normalization method; log-transformation H : x i (t) = baseline, t Baseline 1: gene expression for DPI=, no flu virus Baseline : gene expression for DPI=, immediately after receiving flu virus Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 3 / 31

24 Gene data from lungs of mice: Baseline 1 EDGE (F) EDGE (AUC) PCA (AUC) 397 (FDR=.1) (FDR=.) 7133 (FDR=.) EDGE fails: oversmoothed observe an increase in gene expression between DPI=, no flu virus and DPI=, immediately after receiving flu virus = stress genes Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 / 31

25 Baseline 1: top 9 genes selected by PCA, not by EDGE (AUC) Gene 136 Gene 31 Gene Gene Gene Gene Gene Gene Gene Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 / 31

26 Baseline 1: top 9 genes selected by EDGE(AUC), not by PCA Gene Gene Gene Gene 33 1 Gene Gene Gene Gene Gene Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 6 / 31

27 Gene data from lungs of mice: Baseline EDGE (F) EDGE (AUC) PCA (AUC) 119 (FDR=.1) 1 (FDR=.) 3 (FDR=.) 3 p values by PCA 1 p values by EDGE (auc) Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 7 / 31

28 Baseline : top 9 genes selected by PCA, not by EDGE (AUC) Gene 1136 Gene 67 Gene Gene Gene Gene 17 1 Gene Gene 61 1 Gene Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 8 / 31

29 Baseline : top 9 genes selected by EDGE (AUC), not by PCA 1 Gene Gene 1379 Gene Gene 68 1 Gene Gene Gene Gene 6778 Gene Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 9 / 31

30 Summary Nonparametric longitudinal data analysis methods: Individual nonparametric smoothing Not borrow information across subjects (curves) at all Deal with complete different curves for different subjects FPCA-individual estimates of PC scores Weakly borrow information across subjects via PC basis estimate PC basis: adaptive for some between-subject (Curve) variations FPCA-PACE Borrow information across subjects via mixed-effects PC score estimate PC basis: adaptive for large between-subject (Curve) variation Nonparametric mixed-effects (NPME) modeling Strongly borrow information across subjects (curves) Deal with longitudinal data with similar patterns Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 3 / 31

31 References Storey et al. () Significance analysis of time course microarray experiments. Proceedings of the National Academy of Sciences, 1, Wu, H. and Zhang, J.-T. (6) Nonparametric regression methods for longitudinal data analysis: mixed-effects modeling approaches. John Wiley & Sons, New York. Yao, F., Müller, H.-G., and Wang, J.-L. () Functional linear regression analysis for longitudinal data. The Annals of Statistics, 33, Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 31 / 31