Comparing Functional Data Analysis Approach and Nonparametric Mixed-Effects Modeling Approach for Longitudinal Data Analysis Hulin Wu, PhD, Professor (with Dr. Shuang Wu) Department of Biostatistics & Computational Biology University of Rochester Medical Center Email: hwu@bst.rochester.edu October, 1 Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 1 / 31
Table of contents 1 Introduction Comparisons: NPME vs. fpca-pace 3 Comparisons: Individual Smoothing vs. fpca-integration Method Summary and Conclusion Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 / 31
Question to Address Nonparametric longitudinal data analysis methods: Nonparametric mixed-effects models Functional PCA analysis Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 3 / 31
Analysis of longitudinal studies Parametric mixed-effects models: LME and NLME models: e.g. y i = X i β + Z i b i + ɛ i, b i N (, D), ɛ i N (, R i ), i = 1,,..., n Parametric Restrictive Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 / 31
Nonparametric mixed-effects (NPME) model y i (t) = µ(t) + ν i (t) + ɛ i (t) = p q β j B j (t) + b ik Bk (t) + ɛ i(t) j=1 k=1 Regression splines: Various choices of basis functions, known Mixed-effects modeling: Borrow information from across-subjects (curves), shrink to the mean Estimation: MLE or REML (SAS, R) Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 / 31
Functional approach based on principal component analysis Y ij = X i (t ij ) + ɛ ij K = µ(t ij ) + ξ ik φ k (t ij ) + ɛ ij k=1 Mean function µ(t): any nonparametric smoothing method Between-subject (curve) variation K ξ ik φ k (t ij ): Karhunen-Loeve k=1 approximation Both PC scores (ξ ik ) and basis functions (eigenfunctions φ k (t)): need to be estimated from data PC scores (coefficients): estimated by PACE: mixed-effects modeling idea to borrow information across subjects (curves) Integration method: individual estimate for each subject (curve) Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 6 / 31
Simulation Comparisons: NPME and fpca-pace y i (t) = a i + a i1 cos(πt) + a i sin(πt) + ɛ i (t), a i = [a i, a i1, a i ] T N [(1,, 1), diag(σ, σ1, σ)], ɛ i (t) N [, σɛ (1 + t)], i = 1,,..., n t j = j/(m + 1), j = 1,,..., m n =, m = Unbalanced data: r miss =.,.,.8 ISE = (ˆµ(t) µ(t)) dt MISE = 1 n n (ŷ i (t) y i (t)) dt i=1 Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 7 / 31
Simulation I: small variation, (σ, σ 1, σ ) = (, 1, 1) 8 6 y i 6...6.8 1 t Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 8 / 31
Simulation I: small variation, (σ, σ 1, σ ) = (, 1, 1) r miss Model Mean function Individual fits LPME.1 (.19).3733 (.88) % RSME.13 (.118).3733 (.88) PACE.177 (.133).38 (.118) LPME.169 (.116).618 (.813) % RSME.19 (.98).618 (.813) PACE.177 (.18).693 (.18) LPME. (.19) 1.3 (.76) 8% RSME.131 (.11) 1.3 (.76) PACE.1 (.189) 1.987 (.691) Winner: Nonparametric mixed-effects (NPME) models Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 9 / 31
Simulation II: large variation, (σ, σ 1, σ ) = (3, 3, 3) 1 1 y i 1 1...6.8 1 t Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 1 / 31
Simulation II: large variation, (σ, σ 1, σ ) = (3, 3, 3) r miss Model Mean function Individual fits LPME.31 (.77) 1.96 (.31) % RSME.36 (.797) 1.96 (.31) PACE.3639 (.31).11 (.67) LPME.31 (.6) 3.671 (.66) % RSME.397 (.7) 3.671 (.66) PACE.388 (.97) 1.1166 (.6) LPME.16 (.3) 8.689 (1.36) 8% RSME.69 (.6) 8.689 (1.36) PACE.616 (.36) 6.7611 (.31) Mean function estimate winner: NPME model Individual function estimate winner: fpca-pace Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 11 / 31
Example 1: Viral load in AIDS clinical trials viral load 6. 6.. 3. 3. 1. 1 3 6 7 8 9 time(day) n = 6 patients, n i is 1, with a median of 8. Mean function estimates: RSME (blue), FPCA (red). Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 1 / 31
Viral load: individual fits 6 Patient 3 6 Patient 9 6 Patient 13 Patient 18 6 Patient 3 6 Patient 6 Patient 6 Patient 3 6 Patient 6 6 Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 13 / 31
Example : Yeast cell cycle gene expressions 3 gene expression 1 1 3 6 8 1 1 time(min) 67 genes, t j = 7 (j 1) (minute), j = 1,,..., 18. Gene expressions are centered by mean of each gene; contains missing data. Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 1 / 31
Yeast gene expressions: individual fits Gene 6 Gene 1937 Gene 191 1 Gene 311 1 Gene 3 1 Gene 1 Gene 6 1 Gene 71 1 Gene 99 1 1 1 Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 1 / 31
Time-course microarray gene expressions Independent sampling: one measurement from each subject, e.g. mice Longitudinal sampling: repeated measurements from same subject, e.g. human Features of data: number of genes n very large, usually several thousands number of time points m small (m 1) very few replications at each time point, usually or 3 noisy, possibly with missing data Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 16 / 31
Time-course microarray gene expressions Problem interested: identify differentially expressed genes One group: difference from baseline; variation over time Two or more groups: difference between groups Methods: ANOVA approach: treat time variable as a particular experimental factor (instant extension from static microarray experiments) Continuous approach: treat gene expressions as noisy measurements from an underlying function; nonparametric estimation of the underlying function (possibly with random effects) Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 17 / 31
Time-course microarray gene expressions y ijk = x i (t j ) + ɛ ijk, i = 1,..., n; j = 1,,..., m; k = 1,..., K L x i (t) = β il φ l (t), ɛ ijk (, σ ) l= H : x i (t) =, i = 1,..., n φ l (t): spline basis or PC basis In real data, no clear cut Statistics that provide a good ranking Multiple testing adjustment to control error rare, e.g. False Discovery Rate (FDR) Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 18 / 31
Methods Individual nonparametric smoothing (EDGE) φ l (t) as fixed basis statistics: goodness-of-fit (F statistics); area under curve (AUC) fpca-integration method (individual estimate of PC scores) φ l (t) as as eigenfunctions, estimated from entire samples statistics: area under curve (AUC) Both use bootstrap to calculate the null distribution of the statistics Significance cut-off by controlling FDR Applicable to both independence and longitudinal cases Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 19 / 31
Simulation study n=1, m = 1, K = 3 observations equidistant in [, 1] proportion of significant genes p =.1 Under H : y ijk = ɛ ijk, ɛ ijk N (,. ) Under H 1 : y ijk = a i sin(ω i π(t j b i )) + ɛ ijk, where a i, ω i U(., ), b i U(, 1). simulations Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 / 31
Simulation I Error under H 1 : ɛ ijk N (,. ) EDGE num rejected corr rejected FDR FNR FDR=. 91. 87.3.1.139 FDR=.1 1.1 9.9.99.1 FDR=. 116. 93.98.191.68 PCA num rejected corr rejected FDR FNR FDR=. 96. 9...6 FDR=.1 1. 96.1.619.3 FDR=. 11.8 97.66.17.6 Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 1 / 31
Simulation II Error under H 1 : ɛ ijk N (, (. v i ) ), v i is a dispersion factor EDGE num rejected corr rejected FDR FNR FDR=. 66. 63.3..393 FDR=.1 81.99 7.1.9.78 FDR=. 1.1 8.1.187.17 PCA num rejected corr rejected FDR FNR FDR=. 8.8 8.8.16 FDR=.1 86. 86..13.18 FDR=. 9.78 91.7.11.91 Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 / 31
Gene data from lungs of mice number of probes: n = 37 days post infection (DPI):, 1,..., 1 (m = 11) repetition: 3 for DPI= 1,..., 1, 6 for DPI= (3 no flu virus, 3 killed immediately after receiving flu virus) normalized by Welle lab using the PLIER normalization method; log-transformation H : x i (t) = baseline, t Baseline 1: gene expression for DPI=, no flu virus Baseline : gene expression for DPI=, immediately after receiving flu virus Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 3 / 31
Gene data from lungs of mice: Baseline 1 EDGE (F) EDGE (AUC) PCA (AUC) 397 (FDR=.1) (FDR=.) 7133 (FDR=.) EDGE fails: oversmoothed observe an increase in gene expression between DPI=, no flu virus and DPI=, immediately after receiving flu virus = stress genes Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 / 31
Baseline 1: top 9 genes selected by PCA, not by EDGE (AUC) Gene 136 Gene 31 Gene 116 1 1 1 1 Gene 1863 1 3 Gene 616 1 1. Gene 9919 1 1 Gene 1636 1 1 Gene 1336 1. 1 1 Gene 6 1 1 1 Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 / 31
Baseline 1: top 9 genes selected by EDGE(AUC), not by PCA Gene 3899 1 Gene 376 1 Gene 33 1 1 Gene 33 1 Gene 17877 1 Gene 13133 1 Gene 831 1 1 Gene 118 1 1 Gene 133 1 1 1 1 1 Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 6 / 31
Gene data from lungs of mice: Baseline EDGE (F) EDGE (AUC) PCA (AUC) 119 (FDR=.1) 1 (FDR=.) 3 (FDR=.) 3 p values by PCA 1 p values by EDGE (auc) 1 1 1 1 8 6...6.8 1...6.8 1 Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 7 / 31
Baseline : top 9 genes selected by PCA, not by EDGE (AUC) Gene 1136 Gene 67 Gene 711 1 Gene 136 1 Gene 18639 1 1 Gene 17 1 Gene 838 1 Gene 61 1 Gene 3 1 1 1 1 Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 8 / 31
Baseline : top 9 genes selected by EDGE (AUC), not by PCA 1 Gene 398 1 Gene 1379 Gene 31 1 1 Gene 68 1 Gene 1 1 1 Gene 81 1 1 1 Gene 1697 1 1 1 1 1 Gene 6778 Gene 88 1 1 1 Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 9 / 31
Summary Nonparametric longitudinal data analysis methods: Individual nonparametric smoothing Not borrow information across subjects (curves) at all Deal with complete different curves for different subjects FPCA-individual estimates of PC scores Weakly borrow information across subjects via PC basis estimate PC basis: adaptive for some between-subject (Curve) variations FPCA-PACE Borrow information across subjects via mixed-effects PC score estimate PC basis: adaptive for large between-subject (Curve) variation Nonparametric mixed-effects (NPME) modeling Strongly borrow information across subjects (curves) Deal with longitudinal data with similar patterns Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 3 / 31
References Storey et al. () Significance analysis of time course microarray experiments. Proceedings of the National Academy of Sciences, 1, 1837-18. Wu, H. and Zhang, J.-T. (6) Nonparametric regression methods for longitudinal data analysis: mixed-effects modeling approaches. John Wiley & Sons, New York. Yao, F., Müller, H.-G., and Wang, J.-L. () Functional linear regression analysis for longitudinal data. The Annals of Statistics, 33, 873-93. Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 31 / 31