Type Package Package metafuse November 7, 2015 Title Fused Lasso Approach in Regression Coefficient Clustering Version 1.0-1 Date 2015-11-06 Author Lu Tang, Peter X.K. Song Maintainer Lu Tang <lutang@umich.edu> Description For each covariate, cluster its coefficient effects across different data sets during data integration. Supports Gaussian, binomial and Poisson regression models. License GPL-2 Imports glmnet, Matrix, MASS Repository CRAN NeedsCompilation no Date/Publication 2015-11-07 07:24:15 R topics documented: metafuse-package...................................... 1 datagenerator........................................ 3 metafuse........................................... 4 metafuse.l.......................................... 6 Index 8 metafuse-package Fused Lasso Approach in Regression Coefficient Clustering Description For each covariate, cluster its coefficient effects across different data sets during data integration. Supports Gaussian, binomial and Poisson regression models. 1
2 metafuse-package Details Very simple to use. Accepts X, y, and sid (study ID) for regression models. Index of help topics: datagenerator metafuse metafuse-package metafuse.l simulate data fit a GLM with fusion penalty for data integraion Fused Lasso Approach in Regression Coefficient Clustering fit a GLM with fusion penalty for data integraion Author(s) Lu Tang, Peter X.K. Song Maintainer: Lu Tang <lutang@umich.edu> References Lu Tang, Peter X.K. Song. Fused Lasso Approach in Regression Coefficients Clustering - Learning Parameter Heterogeneity in Data Integration. In preparation. Fei Wang. Development of Joint Estimating Equation Approaches to Merging Clustered or Longitudinal Datasets from Multiple Biomedical Studies. PhD Thesis, The University of Michigan, 2012. (A Biometrics paper is tentatively accepted. We will revise this a later time.) Jiahua Chen and Zehua Chen. Extended bayesian information criteria for model selection with large model spaces. Biometrika, 95(3):759-771, 2008. Xin Gao and Peter X-K Song. Composite likelihood bayesian information criteria for model selection in high-dimensional data. Journal of the American Statistical Association, 105(492):1531-1540, 2010. Examples ########### Generate Data ########### n <- 200 # sample size in each study K <- 10 # number of studies p <- 3 # number of covariates in X (including intercept) N <- n*k # total sample size # the coefficient matrix, used this to set desired heterogeneous pattern (depends on p and K) beta0 <- matrix(c(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0, # intercept 0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0, # beta_1, etc. 0.0,0.0,0.0,0.0,0.5,0.5,0.5,1.0,1.0,1.0), K, p) # generate a data set, family=c("gaussian", "binomial", "poisson") data <- datagenerator(n=n, beta0=beta0, family="gaussian", seed=123) # prepare the input (y, X, studyid) y <- data$y sid <- data$group X <- data[,-c(1,ncol(data))]
datagenerator 3 ########### metafuse runs ########### # fuse slopes of X1 (it is heterogeneous with 2 groups) metafuse(x=x, y=y, sid=sid, fuse.which=c(1), family="gaussian", intercept=true, alpha=0, # fuse slopes of X2 (it is heterogeneous with 3 groups) metafuse(x=x, y=y, sid=sid, fuse.which=c(2), family="gaussian", intercept=true, alpha=0, # fuse all three covariates metafuse(x=x, y=y, sid=sid, fuse.which=c(0,1,2), family="gaussian", intercept=true, alpha=0, # fuse all three covariates, with sparsity penalty metafuse(x=x, y=y, sid=sid, fuse.which=c(0,1,2), family="gaussian", intercept=true, alpha=1, # fit metafuse at a given lambda metafuse.l(x=x, y=y, sid=sid, fuse.which=c(0,1,2), family="gaussian", intercept=true, alpha=1, lambda=0.5) datagenerator simulate data Description Simulate data for demonstration of flarcc. Usage datagenerator(n = n, beta0 = beta0, family = "gaussian", seed = seed) Arguments n beta0 family seed sample size for each study, a vector of length K, the number of studies; can also be an scalar, to specify equal sample size coefficient matrix, with dimension K * p, where K is the number of studies and p is the number of covariates "gaussian" for continuous response, "binomial" for binary response, "poisson" for count response set random seed for data generation Details These data sets are artifical, and used to test out some features of flarcc.
4 metafuse Value a simulated data frame will be returned, containing y, X, and study ID sid Examples n <- 200 K <- 10 p <- 3 N <- n*k # sample size in each study # number of studies # number of covariates in X (including intercept) # total sample size # the coefficient matrix, used this to set desired heterogeneous pattern (depends on p and K) beta0 <- matrix(c(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0, # intercept 0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0, # beta_1, etc. 0.0,0.0,0.0,0.0,0.5,0.5,0.5,1.0,1.0,1.0), K, p) # generate a data set, family=c("gaussian", "binomial", "poisson") data <- datagenerator(n=n, beta0=beta0, family="gaussian", seed=123) metafuse fit a GLM with fusion penalty for data integraion Description Usage Fit a GLM with fusion penalty on coefficients within each covariate, generate solution path for model selection. metafuse(x = X, y = y, sid = sid, fuse.which = c(0:ncol(x)), family = "gaussian", intercept = TRUE, alpha = 0, criterion = "EBIC", verbose = TRUE, plots = TRUE, loglambda = TRUE) Arguments X y sid fuse.which family intercept alpha criterion a matrix (or vector) of predictor(s), with dimensions of N*p, where N is the total sample size of all studies a vector of response, with length N, the total sample size of all studies study id, numbered from 1 to K a vector of a subset of integers from 0 to p, indicating which covariates to be considered for fusion; 0 corresponds to intercept "gaussian" for continuous response, "binomial" for binary response, "poisson" for count response if TRUE, intercept will be included in the model the ratio of sparsity penalty to fusion penalty, default is 0 (no penalty on sparsity) "AIC" for AIC, "BIC" for BIC, "EBIC" for extended BIC
metafuse 5 verbose plots loglambda if TRUE, output fusion events and tuning parameter lambda if TRUE, create plots of solution paths and clustering trees if TRUE, lambda will be plot in log-10 transformed scale Details Value Adaptive lasso penalty is used. See Zou (2006) for detail. a list containing the following items will be returned: family criterion alpha the model type model selection criterion used the ratio of sparsity penalty to fusion penalty if.fuse whether the covariate is fused (1) or not (0) betahat betainfo Examples the estimated coefficients additional information about the fit, including degree of freedom, lambda optimal, lambda fuse, friction of fusion for each covariate n <- 200 K <- 10 p <- 3 N <- n*k # sample size in each study # number of studies # number of covariates in X (including intercept) # total sample size # the coefficient matrix, used this to set desired heterogeneous pattern (depends on p and K) beta0 <- matrix(c(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0, # intercept 0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0, # beta_1, etc. 0.0,0.0,0.0,0.0,0.5,0.5,0.5,1.0,1.0,1.0), K, p) # generate a data set, family=c("gaussian", "binomial", "poisson") data <- datagenerator(n=n, beta0=beta0, family="gaussian", seed=123) # prepare the input (y, X, studyid) y <- data$y sid <- data$group X <- data[,-c(1,ncol(data))] # fuse slopes of X1 (it is heterogeneous with 2 groups) metafuse(x=x, y=y, sid=sid, fuse.which=c(1), family="gaussian", intercept=true, alpha=0, # fuse slopes of X2 (it is heterogeneous with 3 groups) metafuse(x=x, y=y, sid=sid, fuse.which=c(2), family="gaussian", intercept=true, alpha=0, # fuse all three covariates metafuse(x=x, y=y, sid=sid, fuse.which=c(0,1,2), family="gaussian", intercept=true, alpha=0,
6 metafuse.l # fuse all three covariates, with sparsity penalty metafuse(x=x, y=y, sid=sid, fuse.which=c(0,1,2), family="gaussian", intercept=true, alpha=1, metafuse.l fit a GLM with fusion penalty for data integraion Description Usage Fit a GLM with fusion penalty on coefficients within each covariate at given lambda. metafuse.l(x = X, y = y, sid = sid, fuse.which = c(0:ncol(x)), family = "gaussian", intercept = TRUE, alpha = 0, lambda = lambda) Arguments X y sid fuse.which family intercept alpha lambda a matrix (or vector) of predictor(s), with dimensions of N*p, where N is the total sample size of all studies a vector of response, with length N, the total sample size of all studies study id, numbered from 1 to K a vector of a subset of integers from 0 to p, indicating which covariates to be considered for fusion; 0 corresponds to intercept "gaussian" for continuous response, "binomial" for binary response, "poisson" for count response if TRUE, intercept will be included in the model the ratio of sparsity penalty to fusion penalty, default is 0 (no penalty on sparsity) tuning parameter for fusion penalty Details Adaptive lasso penalty is used. See Zou (2006) for detail. Value a list containing the following items will be returned: family the model type alpha the ratio of sparsity penalty to fusion penalty if.fuse whether the covariate is fused (1) or not (0) betahat the estimated coefficients betainfo additional information about the fit, including degree of freedom
metafuse.l 7 Examples n <- 200 K <- 10 p <- 3 N <- n*k # sample size in each study # number of studies # number of covariates in X (including intercept) # total sample size # the coefficient matrix, used this to set desired heterogeneous pattern (depends on p and K) beta0 <- matrix(c(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0, # intercept 0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0, # beta_1, etc. 0.0,0.0,0.0,0.0,0.5,0.5,0.5,1.0,1.0,1.0), K, p) # generate a data set, family=c("gaussian", "binomial", "poisson") data <- datagenerator(n=n, beta0=beta0, family="gaussian", seed=123) # prepare the input (y, X, studyid) y <- data$y sid <- data$group X <- data[,-c(1,ncol(data))] # fit metafuse at a given lambda metafuse.l(x=x, y=y, sid=sid, fuse.which=c(0,1,2), family="gaussian", intercept=true, alpha=1, lambda=0.5)
Index Topic data datagenerator, 3 Topic generator datagenerator, 3 Topic package metafuse-package, 1 datagenerator, 3 metafuse, 4 metafuse-package, 1 metafuse.l, 6 8