Package msma. January 1, 2016

Transcription

1 Type Package Title Multiblock Sparse Multivariable Analysis Version 0.7 Date Author Atsushi Kawaguchi Package msma January 1, 2016 Maintainer Atsushi Kawaguchi Depends mvtnorm There are several functions to implement the method for analysis in a multiblock multivariable data. If the input is only a matrix, then the principal components analysis (PCA) is implemented. If the input is a list of matrices, then the multiblock PCA is implemented. If the input is two matrices for exploratory and objective variables, then the partial least squares (PLS) analysis is implemented. If the input is two list of matrices for exploratory and objective variables, then the multiblock PLS analysis is implemented. Moreover, if the extra outcome variable is specified, then the supervised version for the methods above is implemented. For each methods, the sparse modeling is also incorporated. Functions to select the number of components and the regularized parameters are also provided. License GPL (>= 2) NeedsCompilation no Repository CRAN Date/Publication :47:02 R topics documented: msma-package cvmsma msma ncompsearch predict.msma regparasearch simdata summary.msma Index 13 1

2 2 cvmsma msma-package Multiblock Sparse Multivariable Analysis Package A Package for implementation of multiblock multivariable data analysis. Author(s) Atsushi Kawaguchi. <kawa_a24@yahoo.co.jp> See Also msma cvmsma Cross-Validation cross-validated method to evaluate the fit of msma. cvmsma(, = NULL, Z = NULL, comp = 1, lambda, lambda = NULL, eta = 1, type = "lasso", in = NULL, in = NULL, mu = 0, mu = 0, nfold = 5, seed = 1) Z comp lambda lambda eta type a (list of) matrix, explanatory variable(s) which is required. a (list of) matrix, objective variable(s). This is optional. If no input for, then the PCA method is implemented. a vector, response variable(s). This is optional. The length is the number of subjects. If no input for Z, then the unsupervised PLS/PCA is implemented. numeric scalar for the number of components to be considered. numeric vector of regularized parameters for with length equal to the number of blocks. If omitted, no regularization is conducted. numeric vector of regularized parameters for with length equal to the number of blocks. If omitted, no regularization is conducted. numeric scalar, the parameter indexing the penalty family. This version has only the choice 1. a character, the penalty family. This version has only the choice "lasso".

3 msma 3 in in mu mu a (list of) numeric vector to specify the variables of which are always in the a (list of) numeric vector to specify the variables of which are always in the a numeric scalar for the weight of for the supervised. 0<=mu<=1. a numeric scalar for the weight of for the supervised. 0<=mu<=1. nfold number of folds - default is 5. seed number of seed for the random number. k-fold cross-validation for msma. The evaluation is based on the matrix element wise errors. Value err The mean cross-validated errors which has three elements consisting of the mean of errors for and, the errors for and for in the PLS and only the errors for in the PCA. Examples ##### data ##### tmpdata = simdata(n = 50, rho = 0.8, ps = c(10, 12, 15), ps = 20, seed=1) = tmpdata$; = tmpdata$ ##### One Component CV ##### cv1 = cvmsma(,, comp = 1, lambda=2, lambda=1:3, nfold=5, seed=1) cv1 ##### Two Component CV ##### cv2 = cvmsma(,, comp = 2, lambda=2, lambda=1:3, nfold=5, seed=1) cv2 msma Multiblock Sparse Multivariable Analysis This is a function for a matrix decomposition method incorporating sparse and supervised modeling for a multiblock multivariable data analysis

4 4 msma msma(,...) ## Default S3 method: msma(, = NULL, Z = NULL, comp = 2, lambda = NULL, lambda = NULL, eta = 1, type = "lasso", in = NULL, in = NULL, mu = 0, mu = 0, defmethod = "canonical", scaling = TRUE, verbose = FALSE,...) ## S3 method for class msma print(x,...) ## S3 method for class msma plot(x,...) a (list of) matrix, explanatory variable(s) which is required.... further arguments passed to or from other methods. Z comp lambda lambda eta type in in mu mu defmethod scaling verbose x a (list of) matrix, objective variable(s). This is optional. If no input for, then the PCA method is implemented. a vector, response variable(s). This is optional. The length is the number of subjects. If no input for Z, then the unsupervised PLS/PCA is implemented. numeric scalar for the number of components to be considered. numeric vector of regularized parameters for with length equal to the number of blocks. If omitted, no regularization is conducted. numeric vector of regularized parameters for with length equal to the number of blocks. If omitted, no regularization is conducted. numeric scalar, the parameter indexing the penalty family. This version has only the choice 1. a character, the penalty family. This version has only the choice "lasso". a (list of) numeric vector to specify the variables of which are always in the a (list of) numeric vector to specify the variables of which are always in the a numeric scalar for the weight of for the supervised. 0<=mu<=1. a numeric scalar for the weight of for the supervised. 0<=mu<=1. a character, the deflation method, this version has only the choice "canonical". a logical, whether the scaling data is done, the default is TRUE. information an object of class "msma", usually, a result of a call to msma

5 msma 5 Value msma requires at least one input as the matrix or the list. In this case, the (multiblock) principal components analysis is conducted. If is also specified, the partial least squares with as explanatory variables and as objective variables. This function scaled each data matrix to mean 0 and variance 1 in the default. The block structure can be represented as the list. If Z is also specified, the supervised version is implemented and the degree is controlled by mu or mu where 0<=mu<=1, 0<=mu<=1, and 0<=mu+mu<1. If the positive lambda or lambda is specified, the sparse estimation based on L1 penalty is implemented. dmode scale scale comp wb sb wb sb ss ws ss ws nzwb nzwb selectnames selectnames Which modes "PLS" or "PCA" Scaled which has a list form. Scaled which has a list form. Scaling information for. The means and standard deviations for each block of are returned. Scaling information for. The means and standard deviations for each block of are returned. the number of components block loading for. The list has same length as that of the input list (the number of blocks) and consists of the matrix with the number of variables in the row and the number of components in the column. block score for. The list has same length as that of the input list (the number of blocks) and consists of the matrix with the number of subjects in the row and the number of components in the column. block loading for. The list has same length as that of the input list (the number of blocks) and consists of the matrix with the number of variables in the row and the number of components in the column. block score for. The list has same length as that of the input list (the number of blocks) and consists of the matrix with the number of subjects in the row and the number of components in the column. super score for. The matrix has the number of subjects in the row and the number of components in the column. super loading for. The matrix has the number of blocks in the row and the number of components in the column. super score for. The matrix has the number of subjects in the row and the number of components in the column. super loading for. The matrix has the number of blocks in the row and the number of components in the column. number of nonzeros in block loading for number of nonzeros in block loading for names of selected variables for. This returns the names of names of selected variables for. This returns the names of

6 6 ncompsearch Examples ##### data ##### tmpdata = simdata(n = 50, rho = 0.8, ps = c(10, 12, 15), ps = 20, seed=1) = tmpdata$; = tmpdata$ ##### One Component ##### fit1 = msma(,, comp=1, lambda=2, lambda=1:3) fit1 ##### Two Component ##### fit2 = msma(,, comp=2, lambda=2, lambda=1:3) fit2 ##### Matrix data ##### sigma = matrix(0.8, 10, 10) diag(sigma) = 1 2 = rmvnorm(50, rep(0, 10), sigma) 2 = rmvnorm(50, rep(0, 10), sigma) fit3 = msma(2, 2, comp=1, lambda=2, lambda=2) fit3 ##### Sparse Principal Component Analysis ##### fit5 = msma(2, comp=5, lambda=2.5) summary(fit5) ncompsearch Search for Number of Components Determination for the number of components based on cross-validated method or the Bayesian information criterion (BIC) ncompsearch(, = NULL, Z = NULL, comps = 1:3, lambda = NULL, lambda = NULL, eta = 1, type = "lasso", in = NULL, in = NULL, mu = 0, mu = 0, nfold = 5, regpara = FALSE, maxrep = 3, method = c("bic", "CV")[1]) ## S3 method for class ncompsearch print(x,...) ## S3 method for class ncompsearch plot(x,...)

7 ncompsearch 7 Z comps lambda lambda eta type in in mu mu a (list of) matrix, explanatory variable(s) which is required. a (list of) matrix, objective variable(s). This is optional. If no input for, then the PCA method is implemented. a vector, response variable(s). This is optional. The length is the number of subjects. If no input for Z, then the unsupervised PLS/PCA is implemented. numeric vector for the candidates of the numbers of components to be selected. numeric vector of regularized parameters for with length equal to the number of blocks. If omitted, no regularization is conducted. numeric vector of regularized parameters for with length equal to the number of blocks. If omitted, no regularization is conducted. numeric scalar, the parameter indexing the penalty family. This version has only the choice 1. a character, the penalty family. This version has only the choice "lasso". a (list of) numeric vector to specify the variables of which are always in the a (list of) numeric vector to specify the variables of which are always in the a numeric scalar for the weight of for the supervised. 0<=mu<=1. a numeric scalar for the weight of for the supervised. 0<=mu<=1. nfold number of folds - default is 5. regpara maxrep method x logical, If TRUE, the regularized parameters search is also conducted simultaneously. numeric scalar for the number of iteration. a character, the evaluation method, "CV" for cross-validation based on matrix element-wise error, and "BIC" for Bayesian information criteria. The default is the BIC. an object of class "ncompsearch", usually, a result of a call to ncompsearch... further arguments passed to or from other methods. This function provide an implementation of a search the optimal number of components. Value comps mincriterion criterions optncomp numbers of components minimum criterion value criterion values optimal number of components with the minimum criteria value

8 8 predict.msma Examples ##### data ##### tmpdata = simdata(n = 50, rho = 0.8, ps = c(10, 12, 15), ps = 20, seed=1) = tmpdata$; = tmpdata$ ##### number of components search ##### ncomp1 = ncompsearch(,, comps = c(1, 5, 10*(1:5)), nfold=5) plot(ncomp1) predict.msma Prediction predict method for class "msma". ## S3 method for class msma predict(object, new, new = NULL,...) object new Value an object of class "msma", usually, a result of a call to msma a matrix in which to look for variables with which to predict required. new a matrix in which to look for variables with which to predict.... further arguments passed to or from other methods. This function provide the prediction from new data based on the msma fit and is mainly used in the cross-validation predicted predicted Examples ##### data ##### tmpdata = simdata(n = 50, rho = 0.8, ps = c(10, 12, 15), ps = 20, seed=1) = tmpdata$; = tmpdata$ ##### Two Component ##### fit2 = msma(,, comp=2, lambda=2, lambda=1:3)

9 regparasearch 9 summary(fit2) ##### Predict ##### test = predict(fit2, new=, new=) regparasearch Regularized Parameters Search Regularized parameters search method for "msma". regparasearch(, = NULL, Z = NULL, eta = 1, type = "lasso", in = NULL, in = NULL, mu = 0, mu = 0, comp = 1, nfold = 5, maxrep = 3, minpct = 0, maxpct = 1, method = c("bic", "CV")[1]) ## S3 method for class regparasearch print(x,...) Z eta type in in mu mu comp a (list of) matrix, explanatory variable(s) which is required. a (list of) matrix, objective variable(s). This is optional. If no input for, then the PCA method is implemented. a vector, response variable(s). This is optional. The length is the number of subjects. If no input for Z, then the unsupervised PLS/PCA is implemented. numeric scalar, the parameter indexing the penalty family. This version has only the choice 1. a character, the penalty family. This version has only the choice "lasso". a (list of) numeric vector to specify the variables of which are always in the a (list of) numeric vector to specify the variables of which are always in the a numeric scalar for the weight of for the supervised. 0<=mu<=1. a numeric scalar for the weight of for the supervised. 0<=mu<=1. numeric scalar for the number of components to be considered. nfold number of folds - default is 5. maxrep minpct maxpct numeric scalar for the number of iteration. percent of minimum candidate parameters. percent of maximum candidate parameters.

10 10 regparasearch method x a character, the evaluation method, "CV" for cross-validation based on matrix element-wise error, and "BIC" for Bayesian information criteria. The default is the BIC. an object of class "regparasearch", usually, a result of a call to regparasearch... further arguments passed to or from other methods. This is a function to search regularized parameters of sparseness lambda and lambda for msma. The initial range of candidates are computed based on the fit with the values of regularized parameters of 0. The binary search is conducted for the divided parameter range into two regions. The representative value for the region is a median and the optimal region is selected with the minimum criteria obtained from the fit with the value. The CV error or BIC can be used as criteria. The selected region is also divided into two region and the same process is iterated by maxrep times. Thus, the final median value in the selected region is set to be the optimal regularized parameter. The search is conducted with combinations of parameters for and. The range of candidates for regularized parameters can be restricted with the percentile of the limit (minimum or maximum) of the range. Value optlambda optlambda mincriterion criterions pararange Optimal parameters for Optimal parameters for Minimum criterion value All resulting criterion values in the process Range of candidates parameters Examples ##### data ##### tmpdata = simdata(n = 50, rho = 0.8, ps = c(10, 12, 15), ps = 20, seed=1) = tmpdata$; = tmpdata$ ##### Regularized parameters search ##### opt1 = regparasearch(,, comp=1, nfold=5, maxrep=2) opt1 fit4 = msma(,, comp=1, lambda=opt1$optlambda, lambda=opt1$optlambda) fit4 summary(fit4) ##### Restrict search range ##### opt2 = regparasearch(,, comp=1, nfold=5, maxrep=2, minpct=0.5) opt2

11 simdata 11 simdata Generate Test Data Sets This is a function for generating multiblock data based on the multivariable normal distribution simdata(n = 100, rho = 0.8, ps = c(100, 120, 150), ps = 500, seed = 1) n rho ps ps seed a numeric scalar, sample size. a numeric scalar, correlation coefficient for all matrices. a numeric vector, numbers of columns for. The length of vector corresponds to the number of blocks. a numeric vector, numbers of columns for. The length of vector corresponds to the number of blocks. a seed number for generating random numbers for the reproductivity and should be changed in the iterative study. The output is a list of matrics. Value Simulated which has a list form Simulated which has a list form summary.msma Summarizing Fits summary method for class "msma". ## S3 method for class msma summary(object,...) ## S3 method for class summary.msma print(x,...)

12 12 summary.msma object, x an object of class "msma", usually, a result of a call to msma... further arguments passed to or from other methods. This function provide the summary of results. Examples ##### data ##### tmpdata = simdata(n = 50, rho = 0.8, ps = c(10, 12, 15), ps = 20, seed=1) = tmpdata$; = tmpdata$ ##### One Component ##### fit1 = msma(,, comp=1, lambda=2, lambda=1:3) summary(fit1)

13 Index Topic documentation msma-package, 2 cvmsma, 2 msma, 2, 3, 4, 12 msma-package, 2 ncompsearch, 6 plot.msma (msma), 3 plot.ncompsearch (ncompsearch), 6 predict.msma, 8 print.msma (msma), 3 print.ncompsearch (ncompsearch), 6 print.regparasearch (regparasearch), 9 print.summary.msma (summary.msma), 11 regparasearch, 9 simdata, 11 summary.msma, 11 13