Package bigdata. R topics documented: February 19, 2015



Similar documents
Package metafuse. November 7, 2015

Package empiricalfdr.deseq2

Lasso on Categorical Data

Machine Learning Big Data using Map Reduce

Package tagcloud. R topics documented: July 3, 2015

Package neuralnet. February 20, 2015

Package cpm. July 28, 2015

Regularized Logistic Regression for Mind Reading with Parallel Validation

Package hoarder. June 30, 2015

Package missforest. February 20, 2015

Package smoothhr. November 9, 2015

Model selection in R featuring the lasso. Chris Franck LISA Short Course March 26, 2013

Package MBA. February 19, Index 7. Canopy LIDAR data

Package MDM. February 19, 2015

Package plan. R topics documented: February 20, 2015

Package RIGHT. March 30, 2015

Package ATE. R topics documented: February 19, Type Package Title Inference for Average Treatment Effects using Covariate. balancing.

Package polynom. R topics documented: June 24, Version 1.3-8

Package EstCRM. July 13, 2015

Penalized Logistic Regression and Classification of Microarray Data

Package sendmailr. February 20, 2015

Package AMORE. February 19, 2015

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Package ChannelAttribution

Package acrm. R topics documented: February 19, 2015

Machine Learning Logistic Regression

Package multivator. R topics documented: February 20, Type Package Title A multivariate emulator Version Depends R(>= 2.10.

Package TSfame. February 15, 2013

Package PRISMA. February 15, 2013

Least Squares Estimation

CSCI-599 DATA MINING AND STATISTICAL INFERENCE

Package biganalytics

Package changepoint. R topics documented: November 9, Type Package Title Methods for Changepoint Detection Version 2.

Package png. February 20, 2015

Package treemap. February 15, 2013

Package HHG. July 14, 2015

Package MixGHD. June 26, 2015

Package xtal. December 29, 2015

Package dunn.test. January 6, 2016

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Package fastghquad. R topics documented: February 19, 2015

TOWARD BIG DATA ANALYSIS WORKSHOP

Package GSA. R topics documented: February 19, 2015

Package ERP. December 14, 2015

Package SHELF. February 5, 2016

Package retrosheet. April 13, 2015

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP

Package trimtrees. February 20, 2015

Cluster Analysis using R

Package cgdsr. August 27, 2015

Lecture 3: Linear methods for classification

Package ISLR. R topics documented: February 19, Type Package

Package forensic. February 19, 2015

Predict the Popularity of YouTube Videos Using Early View Data

Statistical Machine Learning

Package hazus. February 20, 2015

Package hive. January 10, 2011

Package wordcloud. R topics documented: February 20, 2015

Implementation of Logistic Regression with Truncated IRLS

Package uptimerobot. October 22, 2015

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Package CoImp. February 19, 2015

Package bigrf. February 19, 2015

Package httprequest. R topics documented: February 20, 2015

Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data

Package lmertest. July 16, 2015


Package hybridensemble

Package erp.easy. September 26, 2015

Statistical issues in the analysis of microarray data

Package benford.analysis

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Generalizing Random Forests Principles to other Methods: Random MultiNomial Logit, Random Naive Bayes, Anita Prinzie & Dirk Van den Poel

Oracle Data Miner (Extension of SQL Developer 4.0)

Package NHEMOtree. February 19, 2015

Simple Linear Regression Inference

Building risk prediction models - with a focus on Genome-Wide Association Studies. Charles Kooperberg

Cyber-Security Analysis of State Estimators in Power Systems

Faculty of Science School of Mathematics and Statistics

An Introduction to Machine Learning

Profile Data Mining with PerfExplorer. Sameer Shende Performance Reseaerch Lab, University of Oregon

Package fimport. February 19, 2015

The degrees of freedom of the Lasso in underdetermined linear regression models

Package bcrm. September 24, 2015

Introduction to General and Generalized Linear Models

Regression III: Advanced Methods


Package colortools. R topics documented: February 19, Type Package

Package EasyHTMLReport

Package survpresmooth

[3] Big Data: Model Selection

Transcription:

Type Package Title Big Data Analytics Version 0.1 Date 2011-02-12 Author Han Liu, Tuo Zhao Maintainer Han Liu <hanliu@cs.jhu.edu> Depends glmnet, Matrix, lattice, Package bigdata February 19, 2015 The big data package is a collection of scalable methods for large-scale data analysis. License GPL-2 Repository CRAN Date/Publication 2012-04-06 15:55:53 NeedsCompilation no R topics documented: bigdata-package....................................... 1.......................................... 2 plot.stars........................................... 4 print.stars.......................................... 5 Index 6 bigdata-package Big Data Analytics Details a collection of scalable methods for large-scale data analysis. 1

2 Package: bigdata Type: Package Version: 0.1 Date: 2012-04-06 License: GPL-2 LazyLoad: yes Han Liu, Tuo Zhao Maintainers: Han Liu <hanliu@cs.jhu.edu> References 1. Han Liu, Kathryn Roeder and Larry Wasserman. Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models. Advances in Neural Information Processing Systems(NIPS), 2010. Stability Approach to Regularization Selection for Lasso Implements the Stability Approach to Regularization Selection (StARS) for Lasso Usage (x, y, rep.num = 20, lambda = NULL, nlambda = 100, lambda.min.ratio = 0.001, stars.thresh = 0.1, sample.ratio = NULL, alpha = 1, verbose = TRUE) Arguments x The n by d data matrix representing n observations in d dimensions y The n-dimensional response vector rep.num The number of subsampling for StARS. The default value is 20.

3 Details Value lambda A sequence of decresing positive numbers to control regularization. Typical usage is to leave the input lambda = NULL and have the program compute its own lambda sequence based on nlambda and lambda.min.ratio. Users can also specify a sequence to override this. Use with care - it is better to supply a decreasing sequence values than a single (small) value. nlambda The number of regularization paramters. The default value is 100. lambda.min.ratio The smallest value for lambda, as a fraction of the uppperbound (MAX) of the regularization parameter which makes all estimates equal to 0. The program can automatically generate lambda as a sequence of length = nlambda starting from MAX to lambda.min.ratio*max in log scale. The default value is 0.001. stars.thresh sample.ratio The threshold of the variability in StARS. The default value is 0.1. The alternative value is 0.05. Only applicable when criterion = "stars" The subsampling ratio. The default value is 10*sqrt(n)/n when n>144 and 0.8 when n<=144, where n is the sample size. alpha The tuning parameter for the elastic-net regression. The default value is 1 (lasso). verbose If verbose = FALSE, tracing information printing is disabled. The default value is TRUE. StARS selects the optimal regularization parameter based on the variability of the solution path. It chooses the least sparse graph among all solutions with the same variability. An alternative threshold 0.05 is chosen under the assumption that the model is correctly specified. In applications, the model is usually an approximation of the true model, 0.1 is a safer choice. The implementation is based on the popular package "glmnet". An object with S3 class "stars" is returned: path lambda opt.index opt.beta opt.lambda Variability The solution path of regression coefficients (in an d by nlambda matrix) The regularization parameters used in Lasso The index of the optimal regularization parameter. The optimal regression coefficients. The optimal regularization parameter. The variability along the solution path. Note This function can only work under the setting when d>1 Tuo Zhao, Han Liu, Kathryn Roeder, John Lafferty, and Larry Wasserman Maintainers: Tuo Zhao<tourzhao@andrew.cmu.edu>; Han Liu <hanliu@cs.jhu.edu>

4 plot.stars References 1.Han Liu, Kathryn Roeder and Larry Wasserman. Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models. Advances in Neural Information Processing Systems, 2010. 2.Jerome Friedman, Trevor Hastie and Rob Tibshirani. Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, Vol.33, No.1, 2008. bigdata-package Examples #generate data x = matrix(rnorm(50*80),50,80) beta = c(3,2,1.5,rep(0,77)) y = rnorm(50) + x%*%beta #StARS for Lasso z1 = (x,y) summary(z1) plot(z1) #StARS for Lasso z2 = (x,y, stars.thresh = 0.05) summary(z2) plot(z2) #StARS for Lasso z3 = (x,y,rep.num = 50) summary(z3) plot(z3) plot.stars Plot function for S3 class "stars" Usage Visualize the solution path and plot the optimal solution by model selection ## S3 method for class stars plot(x,...) Arguments x An object with S3 class "stars"... System reserved (No specific usage)

print.stars 5 Han Liu and Tuo Zhao Maintainers: Han Liu <hanliu@cs.jhu.edu> print.stars Print function for S3 class "stars" Usage Print the information about the solution path length and the degree of freedom s along the solution path. ## S3 method for class stars print(x,...) Arguments x An object with S3 class "stars"... System reserved (No specific usage) Han Liu and Tuo Zhao Maintainers: Han Liu <hanliu@cs.jhu.edu>

Index bigdata-package, 1, 2, 2, 5 plot.stars, 4 print.stars, 5 6