Package metafuse. November 7, 2015



Similar documents
Package bigdata. R topics documented: February 19, 2015

Package empiricalfdr.deseq2

Package dsmodellingclient

Package EstCRM. July 13, 2015

New Work Item for ISO Predictive Analytics (Initial Notes and Thoughts) Introduction

Building risk prediction models - with a focus on Genome-Wide Association Studies. Charles Kooperberg

Generalized Linear Models

Package MDM. February 19, 2015

Cross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models.

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Package neuralnet. February 20, 2015

Package MixGHD. June 26, 2015

Penalized Logistic Regression and Classification of Microarray Data

SAS Software to Fit the Generalized Linear Model

Package changepoint. R topics documented: November 9, Type Package Title Methods for Changepoint Detection Version 2.

Package acrm. R topics documented: February 19, 2015

Package smoothhr. November 9, 2015

Package MFDA. R topics documented: July 2, Version Date Title Model Based Functional Data Analysis

Package ATE. R topics documented: February 19, Type Package Title Inference for Average Treatment Effects using Covariate. balancing.

Package hier.part. February 20, Index 11. Goodness of Fit Measures for a Regression Hierarchy

Package robfilter. R topics documented: February 20, Version 4.1 Date Title Robust Time Series Filters

Package retrosheet. April 13, 2015

Predicting Health Care Costs by Two-part Model with Sparse Regularization

Model selection in R featuring the lasso. Chris Franck LISA Short Course March 26, 2013

Model Selection and Claim Frequency for Workers Compensation Insurance

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Random effects and nested models with SAS

TOWARD BIG DATA ANALYSIS WORKSHOP

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model

Package support.ces. R topics documented: April 27, Type Package

Package depend.truncation

Detection of changes in variance using binary segmentation and optimal partitioning

Linear Mixed-Effects Modeling in SPSS: An Introduction to the MIXED Procedure

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Package MBA. February 19, Index 7. Canopy LIDAR data

Package support.ces. R topics documented: August 27, Type Package

Package trimtrees. February 20, 2015

Package missforest. February 20, 2015

Package CRM. R topics documented: February 19, 2015

Package multivator. R topics documented: February 20, Type Package Title A multivariate emulator Version Depends R(>= 2.10.

Introduction to Structural Equation Modeling (SEM) Day 4: November 29, 2012

Multivariate Logistic Regression

[3] Big Data: Model Selection

Logistic Regression (a type of Generalized Linear Model)

5. Multiple regression

VI. Introduction to Logistic Regression

Statistics in Retail Finance. Chapter 2: Statistical models of default

Simple Linear Regression Inference

BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING

Package cpm. July 28, 2015

Technical report. in SPSS AN INTRODUCTION TO THE MIXED PROCEDURE

Package bigrf. February 19, 2015

Joint models for classification and comparison of mortality in different countries.

Package pesticides. February 20, 2015

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Automated Biosurveillance Data from England and Wales,

Part 2: Analysis of Relationship Between Two Variables

Package hybridensemble

HLM software has been one of the leading statistical packages for hierarchical

Lasso on Categorical Data

Statistical Machine Learning

Package dunn.test. January 6, 2016

GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE

Tests for Two Survival Curves Using Cox s Proportional Hazards Model

R2MLwiN Using the multilevel modelling software package MLwiN from R

Analysis of Bayesian Dynamic Linear Models

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Time Series Analysis

13. Poisson Regression Analysis

Statistics Graduate Courses

Illustration (and the use of HLM)

Regularized Logistic Regression for Mind Reading with Parallel Validation

Package TSfame. February 15, 2013

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F

Binary Logistic Regression

ECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Regression Model I. Regression Node

Package png. February 20, 2015

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

PaRFR : Parallel Random Forest Regression on Hadoop for Multivariate Quantitative Trait Loci Mapping. Version 1.0, Oct 2012

Package hive. January 10, 2011

CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA

Electronic Thesis and Dissertations UCLA

Introducing the Multilevel Model for Change

Week 5: Multiple Linear Regression

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Package lmertest. July 16, 2015

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Package CoImp. February 19, 2015

Package SHELF. February 5, 2016

Probability Calculator

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression (1/24/13)

Package PACVB. R topics documented: July 10, Type Package

Data Mining Lab 5: Introduction to Neural Networks

From the help desk: Swamy s random-coefficients model

Package GSA. R topics documented: February 19, 2015

6 Scalar, Stochastic, Discrete Dynamic Systems

Transcription:

Type Package Package metafuse November 7, 2015 Title Fused Lasso Approach in Regression Coefficient Clustering Version 1.0-1 Date 2015-11-06 Author Lu Tang, Peter X.K. Song Maintainer Lu Tang <lutang@umich.edu> Description For each covariate, cluster its coefficient effects across different data sets during data integration. Supports Gaussian, binomial and Poisson regression models. License GPL-2 Imports glmnet, Matrix, MASS Repository CRAN NeedsCompilation no Date/Publication 2015-11-07 07:24:15 R topics documented: metafuse-package...................................... 1 datagenerator........................................ 3 metafuse........................................... 4 metafuse.l.......................................... 6 Index 8 metafuse-package Fused Lasso Approach in Regression Coefficient Clustering Description For each covariate, cluster its coefficient effects across different data sets during data integration. Supports Gaussian, binomial and Poisson regression models. 1

2 metafuse-package Details Very simple to use. Accepts X, y, and sid (study ID) for regression models. Index of help topics: datagenerator metafuse metafuse-package metafuse.l simulate data fit a GLM with fusion penalty for data integraion Fused Lasso Approach in Regression Coefficient Clustering fit a GLM with fusion penalty for data integraion Author(s) Lu Tang, Peter X.K. Song Maintainer: Lu Tang <lutang@umich.edu> References Lu Tang, Peter X.K. Song. Fused Lasso Approach in Regression Coefficients Clustering - Learning Parameter Heterogeneity in Data Integration. In preparation. Fei Wang. Development of Joint Estimating Equation Approaches to Merging Clustered or Longitudinal Datasets from Multiple Biomedical Studies. PhD Thesis, The University of Michigan, 2012. (A Biometrics paper is tentatively accepted. We will revise this a later time.) Jiahua Chen and Zehua Chen. Extended bayesian information criteria for model selection with large model spaces. Biometrika, 95(3):759-771, 2008. Xin Gao and Peter X-K Song. Composite likelihood bayesian information criteria for model selection in high-dimensional data. Journal of the American Statistical Association, 105(492):1531-1540, 2010. Examples ########### Generate Data ########### n <- 200 # sample size in each study K <- 10 # number of studies p <- 3 # number of covariates in X (including intercept) N <- n*k # total sample size # the coefficient matrix, used this to set desired heterogeneous pattern (depends on p and K) beta0 <- matrix(c(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0, # intercept 0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0, # beta_1, etc. 0.0,0.0,0.0,0.0,0.5,0.5,0.5,1.0,1.0,1.0), K, p) # generate a data set, family=c("gaussian", "binomial", "poisson") data <- datagenerator(n=n, beta0=beta0, family="gaussian", seed=123) # prepare the input (y, X, studyid) y <- data$y sid <- data$group X <- data[,-c(1,ncol(data))]

datagenerator 3 ########### metafuse runs ########### # fuse slopes of X1 (it is heterogeneous with 2 groups) metafuse(x=x, y=y, sid=sid, fuse.which=c(1), family="gaussian", intercept=true, alpha=0, # fuse slopes of X2 (it is heterogeneous with 3 groups) metafuse(x=x, y=y, sid=sid, fuse.which=c(2), family="gaussian", intercept=true, alpha=0, # fuse all three covariates metafuse(x=x, y=y, sid=sid, fuse.which=c(0,1,2), family="gaussian", intercept=true, alpha=0, # fuse all three covariates, with sparsity penalty metafuse(x=x, y=y, sid=sid, fuse.which=c(0,1,2), family="gaussian", intercept=true, alpha=1, # fit metafuse at a given lambda metafuse.l(x=x, y=y, sid=sid, fuse.which=c(0,1,2), family="gaussian", intercept=true, alpha=1, lambda=0.5) datagenerator simulate data Description Simulate data for demonstration of flarcc. Usage datagenerator(n = n, beta0 = beta0, family = "gaussian", seed = seed) Arguments n beta0 family seed sample size for each study, a vector of length K, the number of studies; can also be an scalar, to specify equal sample size coefficient matrix, with dimension K * p, where K is the number of studies and p is the number of covariates "gaussian" for continuous response, "binomial" for binary response, "poisson" for count response set random seed for data generation Details These data sets are artifical, and used to test out some features of flarcc.

4 metafuse Value a simulated data frame will be returned, containing y, X, and study ID sid Examples n <- 200 K <- 10 p <- 3 N <- n*k # sample size in each study # number of studies # number of covariates in X (including intercept) # total sample size # the coefficient matrix, used this to set desired heterogeneous pattern (depends on p and K) beta0 <- matrix(c(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0, # intercept 0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0, # beta_1, etc. 0.0,0.0,0.0,0.0,0.5,0.5,0.5,1.0,1.0,1.0), K, p) # generate a data set, family=c("gaussian", "binomial", "poisson") data <- datagenerator(n=n, beta0=beta0, family="gaussian", seed=123) metafuse fit a GLM with fusion penalty for data integraion Description Usage Fit a GLM with fusion penalty on coefficients within each covariate, generate solution path for model selection. metafuse(x = X, y = y, sid = sid, fuse.which = c(0:ncol(x)), family = "gaussian", intercept = TRUE, alpha = 0, criterion = "EBIC", verbose = TRUE, plots = TRUE, loglambda = TRUE) Arguments X y sid fuse.which family intercept alpha criterion a matrix (or vector) of predictor(s), with dimensions of N*p, where N is the total sample size of all studies a vector of response, with length N, the total sample size of all studies study id, numbered from 1 to K a vector of a subset of integers from 0 to p, indicating which covariates to be considered for fusion; 0 corresponds to intercept "gaussian" for continuous response, "binomial" for binary response, "poisson" for count response if TRUE, intercept will be included in the model the ratio of sparsity penalty to fusion penalty, default is 0 (no penalty on sparsity) "AIC" for AIC, "BIC" for BIC, "EBIC" for extended BIC

metafuse 5 verbose plots loglambda if TRUE, output fusion events and tuning parameter lambda if TRUE, create plots of solution paths and clustering trees if TRUE, lambda will be plot in log-10 transformed scale Details Value Adaptive lasso penalty is used. See Zou (2006) for detail. a list containing the following items will be returned: family criterion alpha the model type model selection criterion used the ratio of sparsity penalty to fusion penalty if.fuse whether the covariate is fused (1) or not (0) betahat betainfo Examples the estimated coefficients additional information about the fit, including degree of freedom, lambda optimal, lambda fuse, friction of fusion for each covariate n <- 200 K <- 10 p <- 3 N <- n*k # sample size in each study # number of studies # number of covariates in X (including intercept) # total sample size # the coefficient matrix, used this to set desired heterogeneous pattern (depends on p and K) beta0 <- matrix(c(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0, # intercept 0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0, # beta_1, etc. 0.0,0.0,0.0,0.0,0.5,0.5,0.5,1.0,1.0,1.0), K, p) # generate a data set, family=c("gaussian", "binomial", "poisson") data <- datagenerator(n=n, beta0=beta0, family="gaussian", seed=123) # prepare the input (y, X, studyid) y <- data$y sid <- data$group X <- data[,-c(1,ncol(data))] # fuse slopes of X1 (it is heterogeneous with 2 groups) metafuse(x=x, y=y, sid=sid, fuse.which=c(1), family="gaussian", intercept=true, alpha=0, # fuse slopes of X2 (it is heterogeneous with 3 groups) metafuse(x=x, y=y, sid=sid, fuse.which=c(2), family="gaussian", intercept=true, alpha=0, # fuse all three covariates metafuse(x=x, y=y, sid=sid, fuse.which=c(0,1,2), family="gaussian", intercept=true, alpha=0,

6 metafuse.l # fuse all three covariates, with sparsity penalty metafuse(x=x, y=y, sid=sid, fuse.which=c(0,1,2), family="gaussian", intercept=true, alpha=1, metafuse.l fit a GLM with fusion penalty for data integraion Description Usage Fit a GLM with fusion penalty on coefficients within each covariate at given lambda. metafuse.l(x = X, y = y, sid = sid, fuse.which = c(0:ncol(x)), family = "gaussian", intercept = TRUE, alpha = 0, lambda = lambda) Arguments X y sid fuse.which family intercept alpha lambda a matrix (or vector) of predictor(s), with dimensions of N*p, where N is the total sample size of all studies a vector of response, with length N, the total sample size of all studies study id, numbered from 1 to K a vector of a subset of integers from 0 to p, indicating which covariates to be considered for fusion; 0 corresponds to intercept "gaussian" for continuous response, "binomial" for binary response, "poisson" for count response if TRUE, intercept will be included in the model the ratio of sparsity penalty to fusion penalty, default is 0 (no penalty on sparsity) tuning parameter for fusion penalty Details Adaptive lasso penalty is used. See Zou (2006) for detail. Value a list containing the following items will be returned: family the model type alpha the ratio of sparsity penalty to fusion penalty if.fuse whether the covariate is fused (1) or not (0) betahat the estimated coefficients betainfo additional information about the fit, including degree of freedom

metafuse.l 7 Examples n <- 200 K <- 10 p <- 3 N <- n*k # sample size in each study # number of studies # number of covariates in X (including intercept) # total sample size # the coefficient matrix, used this to set desired heterogeneous pattern (depends on p and K) beta0 <- matrix(c(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0, # intercept 0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0, # beta_1, etc. 0.0,0.0,0.0,0.0,0.5,0.5,0.5,1.0,1.0,1.0), K, p) # generate a data set, family=c("gaussian", "binomial", "poisson") data <- datagenerator(n=n, beta0=beta0, family="gaussian", seed=123) # prepare the input (y, X, studyid) y <- data$y sid <- data$group X <- data[,-c(1,ncol(data))] # fit metafuse at a given lambda metafuse.l(x=x, y=y, sid=sid, fuse.which=c(0,1,2), family="gaussian", intercept=true, alpha=1, lambda=0.5)

Index Topic data datagenerator, 3 Topic generator datagenerator, 3 Topic package metafuse-package, 1 datagenerator, 3 metafuse, 4 metafuse-package, 1 metafuse.l, 6 8