Comparing Functional Data Analysis Approach and Nonparametric Mixed-Effects Modeling Approach for Longitudinal Data Analysis

Similar documents

Study Design Sample Size Calculation & Power Analysis. RCMAR/CHIME April 21, 2014 Honghu Liu, PhD Professor University of California Los Angeles

Statistical issues in the analysis of microarray data

Package empiricalfdr.deseq2

Study Design and Statistical Analysis

Knowledge Discovery and Data Mining

Normality Testing in Excel

MIC - Detecting Novel Associations in Large Data Sets. by Nico Güttler, Andreas Ströhlein and Matt Huska

Study Guide for the Final Exam

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Research Methods & Experimental Design

Statistical Analysis. NBAF-B Metabolomics Masterclass. Mark Viant

Two-Way ANOVA tests. I. Definition and Applications...2. II. Two-Way ANOVA prerequisites...2. III. How to use the Two-Way ANOVA tool?...

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

Statistics in Medicine Research Lecture Series CSMC Fall 2014

Descriptive Statistics

ANOVA. February 12, 2015

Integrating DNA Motif Discovery and Genome-Wide Expression Analysis. Erin M. Conlon

RNA-seq. Quantification and Differential Expression. Genomics: Lecture #12

BIO 226: APPLIED LONGITUDINAL ANALYSIS COURSE SYLLABUS. Spring 2015

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Gene Expression Analysis

Publication List. Chen Zehua Department of Statistics & Applied Probability National University of Singapore

Functional Principal Components Analysis with Survey Data

Statistics Graduate Courses

Cancer Biostatistics Workshop Science of Doing Science - Biostatistics

Longitudinal Data Analysis

This can dilute the significance of a departure from the null hypothesis. We can focus the test on departures of a particular form.

Molecular Genetics: Challenges for Statistical Practice. J.K. Lindsey

False Discovery Rates

DATA ANALYSIS. QEM Network HBCU-UP Fundamentals of Education Research Workshop Gerunda B. Hughes, Ph.D. Howard University

Quantitative proteomics background

Exploratory data analysis for microarray data

How To Run Statistical Tests in Excel

Statistical Rules of Thumb

Introduction to data analysis: Supervised analysis

Statistics in Retail Finance. Chapter 6: Behavioural models

Time series experiments

Post-hoc comparisons & two-way analysis of variance. Two-way ANOVA, II. Post-hoc testing for main effects. Post-hoc testing 9.

Statistics in Applications III. Distribution Theory and Inference

Introducing the Multilevel Model for Change

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Nonparametric Regression Methods for Longitudinal Data Analysis

Applying Statistics Recommended by Regulatory Documents

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

Course on Functional Analysis. ::: Gene Set Enrichment Analysis - GSEA -

Comparative genomic hybridization Because arrays are more than just a tool for expression analysis

Chapter 1. Longitudinal Data Analysis. 1.1 Introduction

Introduction to nonparametric regression: Least squares vs. Nearest neighbors

The Friedman Test with MS Excel. In 3 Simple Steps. Kilem L. Gwet, Ph.D.

Statistical Models in R

COURSE PLAN BDA: Biomedical Data Analysis Master in Bioinformatics for Health Sciences Academic Year Qualification.

Gerry Hobbs, Department of Statistics, West Virginia University

Fitting Subject-specific Curves to Grouped Longitudinal Data

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Package ERP. December 14, 2015

Version 4.0. Statistics Guide. Statistical analyses for laboratory and clinical researchers. Harvey Motulsky

Tests for Two Survival Curves Using Cox s Proportional Hazards Model

II. DISTRIBUTIONS distribution normal distribution. standard scores

Statistics Review PSY379

Analysis of Data. Organizing Data Files in SPSS. Descriptive Statistics

Mixed-effects regression and eye-tracking data

NONPARAMETRIC STATISTICS 1. depend on assumptions about the underlying distribution of the data (or on the Central Limit Theorem)

RARITAN VALLEY COMMUNITY COLLEGE ACADEMIC COURSE OUTLINE MATH 111H STATISTICS II HONORS

Parametric and Nonparametric: Demystifying the Terms

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

Trust, Job Satisfaction, Organizational Commitment, and the Volunteer s Psychological Contract

Statistical Analysis Strategies for Shotgun Proteomics Data

1) The table lists the smoking habits of a group of college students. Answer: 0.218

Sample Size and Power in Clinical Trials

Tutorial for proteome data analysis using the Perseus software platform

UNIVERSITY OF NAIROBI

Bioavailability / Bioequivalence

Nominal and ordinal logistic regression

From Reads to Differentially Expressed Genes. The statistics of differential gene expression analysis using RNA-seq data

SPSS Tests for Versions 9 to 13

Additional sources Compilation of sources:

Probabilistic Forecasting of Medium-Term Electricity Demand: A Comparison of Time Series Models

Functional Data Analysis of MALDI TOF Protein Spectra

Principles of Hypothesis Testing for Public Health

Generalized Linear Models

ANOVA ANOVA. Two-Way ANOVA. One-Way ANOVA. When to use ANOVA ANOVA. Analysis of Variance. Chapter 16. A procedure for comparing more than two groups

Transcription:

Comparing Functional Data Analysis Approach and Nonparametric Mixed-Effects Modeling Approach for Longitudinal Data Analysis Hulin Wu, PhD, Professor (with Dr. Shuang Wu) Department of Biostatistics & Computational Biology University of Rochester Medical Center Email: hwu@bst.rochester.edu October, 1 Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 1 / 31

Table of contents 1 Introduction Comparisons: NPME vs. fpca-pace 3 Comparisons: Individual Smoothing vs. fpca-integration Method Summary and Conclusion Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 / 31

Question to Address Nonparametric longitudinal data analysis methods: Nonparametric mixed-effects models Functional PCA analysis Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 3 / 31

Analysis of longitudinal studies Parametric mixed-effects models: LME and NLME models: e.g. y i = X i β + Z i b i + ɛ i, b i N (, D), ɛ i N (, R i ), i = 1,,..., n Parametric Restrictive Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 / 31

Nonparametric mixed-effects (NPME) model y i (t) = µ(t) + ν i (t) + ɛ i (t) = p q β j B j (t) + b ik Bk (t) + ɛ i(t) j=1 k=1 Regression splines: Various choices of basis functions, known Mixed-effects modeling: Borrow information from across-subjects (curves), shrink to the mean Estimation: MLE or REML (SAS, R) Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 / 31

Functional approach based on principal component analysis Y ij = X i (t ij ) + ɛ ij K = µ(t ij ) + ξ ik φ k (t ij ) + ɛ ij k=1 Mean function µ(t): any nonparametric smoothing method Between-subject (curve) variation K ξ ik φ k (t ij ): Karhunen-Loeve k=1 approximation Both PC scores (ξ ik ) and basis functions (eigenfunctions φ k (t)): need to be estimated from data PC scores (coefficients): estimated by PACE: mixed-effects modeling idea to borrow information across subjects (curves) Integration method: individual estimate for each subject (curve) Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 6 / 31

Simulation Comparisons: NPME and fpca-pace y i (t) = a i + a i1 cos(πt) + a i sin(πt) + ɛ i (t), a i = [a i, a i1, a i ] T N [(1,, 1), diag(σ, σ1, σ)], ɛ i (t) N [, σɛ (1 + t)], i = 1,,..., n t j = j/(m + 1), j = 1,,..., m n =, m = Unbalanced data: r miss =.,.,.8 ISE = (ˆµ(t) µ(t)) dt MISE = 1 n n (ŷ i (t) y i (t)) dt i=1 Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 7 / 31

Simulation I: small variation, (σ, σ 1, σ ) = (, 1, 1) 8 6 y i 6...6.8 1 t Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 8 / 31

Simulation I: small variation, (σ, σ 1, σ ) = (, 1, 1) r miss Model Mean function Individual fits LPME.1 (.19).3733 (.88) % RSME.13 (.118).3733 (.88) PACE.177 (.133).38 (.118) LPME.169 (.116).618 (.813) % RSME.19 (.98).618 (.813) PACE.177 (.18).693 (.18) LPME. (.19) 1.3 (.76) 8% RSME.131 (.11) 1.3 (.76) PACE.1 (.189) 1.987 (.691) Winner: Nonparametric mixed-effects (NPME) models Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 9 / 31

Simulation II: large variation, (σ, σ 1, σ ) = (3, 3, 3) 1 1 y i 1 1...6.8 1 t Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 1 / 31

Simulation II: large variation, (σ, σ 1, σ ) = (3, 3, 3) r miss Model Mean function Individual fits LPME.31 (.77) 1.96 (.31) % RSME.36 (.797) 1.96 (.31) PACE.3639 (.31).11 (.67) LPME.31 (.6) 3.671 (.66) % RSME.397 (.7) 3.671 (.66) PACE.388 (.97) 1.1166 (.6) LPME.16 (.3) 8.689 (1.36) 8% RSME.69 (.6) 8.689 (1.36) PACE.616 (.36) 6.7611 (.31) Mean function estimate winner: NPME model Individual function estimate winner: fpca-pace Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 11 / 31

Example 1: Viral load in AIDS clinical trials viral load 6. 6.. 3. 3. 1. 1 3 6 7 8 9 time(day) n = 6 patients, n i is 1, with a median of 8. Mean function estimates: RSME (blue), FPCA (red). Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 1 / 31

Viral load: individual fits 6 Patient 3 6 Patient 9 6 Patient 13 Patient 18 6 Patient 3 6 Patient 6 Patient 6 Patient 3 6 Patient 6 6 Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 13 / 31

Example : Yeast cell cycle gene expressions 3 gene expression 1 1 3 6 8 1 1 time(min) 67 genes, t j = 7 (j 1) (minute), j = 1,,..., 18. Gene expressions are centered by mean of each gene; contains missing data. Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 1 / 31

Yeast gene expressions: individual fits Gene 6 Gene 1937 Gene 191 1 Gene 311 1 Gene 3 1 Gene 1 Gene 6 1 Gene 71 1 Gene 99 1 1 1 Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 1 / 31

Time-course microarray gene expressions Independent sampling: one measurement from each subject, e.g. mice Longitudinal sampling: repeated measurements from same subject, e.g. human Features of data: number of genes n very large, usually several thousands number of time points m small (m 1) very few replications at each time point, usually or 3 noisy, possibly with missing data Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 16 / 31

Time-course microarray gene expressions Problem interested: identify differentially expressed genes One group: difference from baseline; variation over time Two or more groups: difference between groups Methods: ANOVA approach: treat time variable as a particular experimental factor (instant extension from static microarray experiments) Continuous approach: treat gene expressions as noisy measurements from an underlying function; nonparametric estimation of the underlying function (possibly with random effects) Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 17 / 31

Time-course microarray gene expressions y ijk = x i (t j ) + ɛ ijk, i = 1,..., n; j = 1,,..., m; k = 1,..., K L x i (t) = β il φ l (t), ɛ ijk (, σ ) l= H : x i (t) =, i = 1,..., n φ l (t): spline basis or PC basis In real data, no clear cut Statistics that provide a good ranking Multiple testing adjustment to control error rare, e.g. False Discovery Rate (FDR) Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 18 / 31

Methods Individual nonparametric smoothing (EDGE) φ l (t) as fixed basis statistics: goodness-of-fit (F statistics); area under curve (AUC) fpca-integration method (individual estimate of PC scores) φ l (t) as as eigenfunctions, estimated from entire samples statistics: area under curve (AUC) Both use bootstrap to calculate the null distribution of the statistics Significance cut-off by controlling FDR Applicable to both independence and longitudinal cases Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 19 / 31

Simulation study n=1, m = 1, K = 3 observations equidistant in [, 1] proportion of significant genes p =.1 Under H : y ijk = ɛ ijk, ɛ ijk N (,. ) Under H 1 : y ijk = a i sin(ω i π(t j b i )) + ɛ ijk, where a i, ω i U(., ), b i U(, 1). simulations Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 / 31

Simulation I Error under H 1 : ɛ ijk N (,. ) EDGE num rejected corr rejected FDR FNR FDR=. 91. 87.3.1.139 FDR=.1 1.1 9.9.99.1 FDR=. 116. 93.98.191.68 PCA num rejected corr rejected FDR FNR FDR=. 96. 9...6 FDR=.1 1. 96.1.619.3 FDR=. 11.8 97.66.17.6 Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 1 / 31

Simulation II Error under H 1 : ɛ ijk N (, (. v i ) ), v i is a dispersion factor EDGE num rejected corr rejected FDR FNR FDR=. 66. 63.3..393 FDR=.1 81.99 7.1.9.78 FDR=. 1.1 8.1.187.17 PCA num rejected corr rejected FDR FNR FDR=. 8.8 8.8.16 FDR=.1 86. 86..13.18 FDR=. 9.78 91.7.11.91 Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 / 31

Gene data from lungs of mice number of probes: n = 37 days post infection (DPI):, 1,..., 1 (m = 11) repetition: 3 for DPI= 1,..., 1, 6 for DPI= (3 no flu virus, 3 killed immediately after receiving flu virus) normalized by Welle lab using the PLIER normalization method; log-transformation H : x i (t) = baseline, t Baseline 1: gene expression for DPI=, no flu virus Baseline : gene expression for DPI=, immediately after receiving flu virus Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 3 / 31

Gene data from lungs of mice: Baseline 1 EDGE (F) EDGE (AUC) PCA (AUC) 397 (FDR=.1) (FDR=.) 7133 (FDR=.) EDGE fails: oversmoothed observe an increase in gene expression between DPI=, no flu virus and DPI=, immediately after receiving flu virus = stress genes Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 / 31

Baseline 1: top 9 genes selected by PCA, not by EDGE (AUC) Gene 136 Gene 31 Gene 116 1 1 1 1 Gene 1863 1 3 Gene 616 1 1. Gene 9919 1 1 Gene 1636 1 1 Gene 1336 1. 1 1 Gene 6 1 1 1 Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 / 31

Baseline 1: top 9 genes selected by EDGE(AUC), not by PCA Gene 3899 1 Gene 376 1 Gene 33 1 1 Gene 33 1 Gene 17877 1 Gene 13133 1 Gene 831 1 1 Gene 118 1 1 Gene 133 1 1 1 1 1 Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 6 / 31

Gene data from lungs of mice: Baseline EDGE (F) EDGE (AUC) PCA (AUC) 119 (FDR=.1) 1 (FDR=.) 3 (FDR=.) 3 p values by PCA 1 p values by EDGE (auc) 1 1 1 1 8 6...6.8 1...6.8 1 Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 7 / 31

Baseline : top 9 genes selected by PCA, not by EDGE (AUC) Gene 1136 Gene 67 Gene 711 1 Gene 136 1 Gene 18639 1 1 Gene 17 1 Gene 838 1 Gene 61 1 Gene 3 1 1 1 1 Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 8 / 31

Baseline : top 9 genes selected by EDGE (AUC), not by PCA 1 Gene 398 1 Gene 1379 Gene 31 1 1 Gene 68 1 Gene 1 1 1 Gene 81 1 1 1 Gene 1697 1 1 1 1 1 Gene 6778 Gene 88 1 1 1 Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 9 / 31

Summary Nonparametric longitudinal data analysis methods: Individual nonparametric smoothing Not borrow information across subjects (curves) at all Deal with complete different curves for different subjects FPCA-individual estimates of PC scores Weakly borrow information across subjects via PC basis estimate PC basis: adaptive for some between-subject (Curve) variations FPCA-PACE Borrow information across subjects via mixed-effects PC score estimate PC basis: adaptive for large between-subject (Curve) variation Nonparametric mixed-effects (NPME) modeling Strongly borrow information across subjects (curves) Deal with longitudinal data with similar patterns Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 3 / 31

References Storey et al. () Significance analysis of time course microarray experiments. Proceedings of the National Academy of Sciences, 1, 1837-18. Wu, H. and Zhang, J.-T. (6) Nonparametric regression methods for longitudinal data analysis: mixed-effects modeling approaches. John Wiley & Sons, New York. Yao, F., Müller, H.-G., and Wang, J.-L. () Functional linear regression analysis for longitudinal data. The Annals of Statistics, 33, 873-93. Hulin Wu, PhD, Professor (with Dr. Shuang Wu) FDA (UR) and NPME for Longitudinal Data Analysis October, 1 31 / 31