Introduction This note documents the transition from creating R functions from C++ code via Rcpp helper functions to creating R functions by building and installing an R package. In both cases, many of low-level details (e.g. compiling C++ code, linking binary files, and loading shared library objects into R) are handled for us. This is a good thing. However, the R package approach is more flexible and powerful than what we can do with evalcpp(), cppfunction(), or sourcecpp(). By the end of this note, we will have created an R package that provides an R function interface to our C++ code for running a linear regression. Getting Started Recall from the sessions our use of sourcecpp() to pass a snippet of C++ code through Rcpp s helper functions. In particular, we used sourcecpp(file = "bootstrap_functions.cpp") which would lead to the Rcpp R package reading from the C++ snippet file. It would create a valid C++ file, compile, link, load the shared library, and then create the appropriate R functions. 1 // [[ Rcpp :: depends ( RcppArmadillo )]] /* this is called an Rcpp attribute, it is a comment from the perspective of C++, but sourcecpp () knows to look at them 6 and act in a certain way when it sees 7 them 8 9 the intervention here is making sure the 10 RcppArmadillo.h file can be found 11 */ 1 1 # include <RcppArmadillo.h> 1 # include <RcppArmadilloExtensions / sample.h> 1 16 using namespace Rcpp ; 17 18 // ---------------------------------------------------------------------------- 19 0 1 // [[ Rcpp :: export ]] /* this intervention is make a corresponding runreg R function visible */ 6 7 Rcpp :: NumericVector runreg ( NumericMatrix Y, NumericMatrix X) { 8 arma :: mat y = as < arma :: mat >(Y) ; 9 arma :: mat x = as < arma :: mat >(X) ; 0 arma :: mat tmp1 ; 1 tmp1 = inv (x.t() * x) ; arma :: mat tmp = x.t() * y ; arma :: mat betahat = tmp1 * tmp ; NumericVector Betahat (X. cols ()) ; for ( int i = 0; i < X. cols () ; i++) { 6 Betahat (i) = betahat (i) ; 7 } 8 return Betahat ; /* We are returning a C++ class defined by the Rcpp 9 C++ code 0 */ 1 } // ---------------------------------------------------------------------------- 6 // [[ Rcpp :: export ]] 7 Rcpp :: List getbsdata ( NumericMatrix Y, NumericMatrix X) { 8 int N = Y. nrow () ; 9 IntegerVector indices = seq_len (N) - 1 ; 0 1 // // Draw a sample from sampling frame IntegerVector indices = indices ; indices = RcppArmadillo :: sample ( indices, N, 6 TRUE, 7 NumericVector :: create () 1
8 ) ; /* looks and behaves like 9 the sample () in R 60 */ 61 6 // 6 // 6 6 // 66 // Construct BS Data based on sampling from sampling frame. 67 arma :: mat Ybs (as <arma :: mat >( Y)) ; // BS Sample Y 68 arma :: mat Xbs (as <arma :: mat >( X)) ; // BS Sample X 69 for ( int r = 0; r < N ; r ++) { 70 // note, Xbs [1, ] == X[ indices [1], ] 71 // Xbs [r, ] == X[ indices [r], ] 7 // and note that the above brackets are R style and not valid C++ 7 Ybs. row (r) = (as <arma :: mat >( Y)). row ( indices (r)) ; 7 Xbs. row (r) = (as <arma :: mat >( X)). row ( indices (r)) ; 7 } 76 // 77 // 78 79 return Rcpp :: List :: create (_["Y"] = wrap ( Ybs ), 80 _["X"] = wrap ( Xbs ) /* single quotes are 81 for single char chars only 8 */ 8 ) ; 8 } 8 86 // ---------------------------------------------------------------------------- 87 88 89 // [[ Rcpp :: export ]] 90 91 NumericMatrix runbs ( NumericMatrix YY, NumericMatrix XX, int M) { 9 arma :: mat betas (M, XX. ncol ()) ; // an arm :: mat b/c of the member functions 9 betas. fill (0) ; 9 9 arma :: mat tmpbeta (M, XX. nrow ()) ; 96 List tmpdata ; 97 for ( int m = 0 ; m < M ; m ++) { 98 tmpdata = getbsdata (YY, XX) ; 99 tmpbeta = runreg ( tmpdata ["Y"], tmpdata ["X"]) ; 100 /* extract 101 elements from list in 10 R- ish fashion 10 */ 10 betas. row (m) = tmpbeta.t() ; 10 /* be careful with what 106 runreg did to tmpbeta 107 */ 108 109 /* if we really wanted to 110 optimize this code, we 111 would access columns only -- 11 more efficient. 11 */ 11 } 11 return as < NumericMatrix >( wrap ( betas )) ; 116 }./bootstrap_functions.cpp We can control which C++ functions generate corresponding R functions by using the // [[ Rcpp :: export ]] directive. From the perspective of C++ code, this is just a comment, but the Rcpp helper functions look for these to determine the C++ code they generate based on the snippet we write. As it stands, bootstrap_functions.cpp creates three R function and these R functions have names in R that are identical to the names in C++. Creating a Minimal R Package R has long had a package.skeleton() function which creates a minimal working package. Although the created package does not provide any useful functionality, it sets up the structure of the files inside the toplevel directory automatically. It even populates some of them with useful default values and helpful hints. If we were going to be using just the C++ code provided by Rcpp (i.e. # include <Rcpp.h>, we d only need
to use the Rcpp.package.skeleton() function. However, because we will use the Armadillo C++ code and access it by using the RcppArmadillo package, we will use RcppArmadillo.package.skeleton(). At a minimum, we should call this function with a package name and a path where R should set up this dummy package. For example, one might use RcppArmadillo.package.skeleton(package="RcppLM", path="~/desktop"). Of course, the path argument should be a character string to a valid location on the machine it is being run on. After running this R function, navigate to the directory for this package that was created. The contents are: man (a directory for help files) R (a directory for R code) src (a directory for code to be compiled) DESCRIPTION NAMESPACE Read-and-delete-me Before using or distributing an R package, you should check its validity. This can be done with R CMD check RcppLM (in our case). As it stands, this package won t succeed and we will make a small change to rectify this. Open the only Rd file in the man directory. The contents of \examples{} must be valid R code. Clearly this isn t the case. Go ahead and add # at the beginning of the line. Save the changes. This package, though it does nothing useful, will now check successfully. Creating R Functions Create an R script in the R sub-directory of the package directory. It can be named anything, but we ll want to use a name that will be meaningful for us. For now, we can create a file runreg.r because we ll define the R function runreg in it. For now, we can put the following contents in this file: 1 runreg.r <- function (Y, X){ ##. Call ( " runreg ", as. matrix (Y), as. matrix (X), PACKAGE = " RcppLM ") } Ultimately, we will uncomment the.call line. However, that code presently would not work (without the comment) given that we haven t defined any functions that we can access from a shared library object names "runreg". Creating C++ Functions There are two kinds of C++ functions we will create. The first corresponds to the C++ functions we were writing in our C++ snippets like bootstrap_functions.cpp. These take C++ class arguments and map to C++ class return values. We can create the file runreg.cpp in the src subdirectoy. 1 # include <RcppArmadillo.h> using namespace Rcpp ; Rcpp :: NumericVector runreg ( NumericMatrix Y, NumericMatrix X) { 6 arma :: mat y = as < arma :: mat >(Y) ; 7 arma :: mat x = as < arma :: mat >(X) ; 8 arma :: mat tmp1 ; 9 tmp1 = inv (x.t() * x) ; 10 arma :: mat tmp = x.t() * y ; 11 arma :: mat betahat = tmp1 * tmp ; 1 NumericVector Betahat (X. cols ()) ; 1 for ( int i = 0; i < X. cols () ; i++) { 1 Betahat (i) = betahat (i) ;
1 } 16 return Betahat ; /* We are returning a C++ class defined by the Rcpp 17 C++ code 18 */ 19 }./RcppLM/src/runReg.cpp This is simply our code to calculate the regression coefficients using the linear algebra expression (X X) 1 X y. The function is declared such that it will return NumericVector output, and we can see that this is the actual class of Betahat. After creating this file, we can check the R package. It will pass. However, still, nothing useful is done and while we have defined an R function and a C++ function, they are not connected in any way. Creating R-callable C++ Functions One of the steps that the Rcpp helper functions performed for us invisibly is the creation of C++ functions which are actually callable from R. The C++ function we ve written so far which provide our useful functionality are not. And, while it isn t required, we will walk through the process of creating an R-callable C++ function that subsequently calls a C++ function instead of making the C++ function runreg callable from R. R-callable C++ functions are a bit different than what we ve seen so far. Their arguments are all of the class SEXP and their output is of the class SEXP. The way we achieve this is no different from other classes, however. Additionally, we must include RcppExport at the beginning of the declaration. We can create the file _runreg.cpp in the src subdirectoy with the following contents. 1 // # include " runreg.h" using namespace Rcpp ; RcppExport SEXP _ runreg ( SEXP Y, SEXP X) { 6 NumericMatrix y = as < NumericMatrix >( Y) ; 7 NumericMatrix x = as < NumericMatrix >( X) ; 8 NumericVector output ; 9 // output = runreg (y, x) ; 10 return wrap ( output ) ; 11 } In this case, we are just using an underscore to indicate that this is a wrapper around C++ code to be accessed in R. Notice that two lines are commented out. Commenting these out allows us to check that the package we ve made so far is valid. Still, no useful functionality has been provided. Before we can have _runreg (the wrapper function) call runreg (the function doing computation), we must define a local header. This will tell the compiler what runreg looks like before it goes and finds the full definition. Still, _runreg is a valid (but useless) C++ function. Because it is now defined, we can update our R function definition to the following: 1 runreg.r <- function (Y, X){. Call ( "_ runreg ", as. matrix (Y), as. matrix (X), PACKAGE = " RcppLM ") }./RcppLM/R/runReg.R Be careful to.call the C++ function _runreg, it is callable..call-ing the C++ function runreg will, on the other hand, throw an error. It is not callable. Creating Local Headers to Bridge R Functions Our _runreg C++ function must ultimately call our runreg C++ function. However, we d get an error about runreg being undefined in the current scope if we tried to compile the package without commenting
out the output=runreg(y,x) line. Although runreg is defined in another file, we have to explicitly provide a summary of that definition. We do that by using a local header. Here, we create the following runreg.h in the src directory. 1 # ifndef _ RcppLM _ RUNREG _H # define _ RcppLM _ RUNREG _H # include <RcppArmadillo.h> 6 using namespace Rcpp ; 7 8 NumericVector runreg ( NumericMatrix Y, NumericMatrix X) ; 9 10 # endif./rcpplm/src/runreg.h This gives us access to the signature of the function the types and classes of inputs and outputs. In order to check the basic structure of our _runreg() function, this is all we need. The pre-processor directives #ifndef,#define, and #endif, ensure that we only provide the signature once. 1 In practice we will need to do this for other functions, but the structure is the same. We would just have to change _RcppLM_RUNREG_H to a different name and update the function for whatever we needed. Now that we have this header file, we can #include it in _runreg.cpp and uncomment our use of runreg(). After this, its contents are as follows: 1 # include <RcppArmadillo.h> # include " runreg.h" using namespace Rcpp ; 6 RcppExport SEXP _ runreg ( SEXP Y, SEXP X) { 7 NumericMatrix y = as < NumericMatrix >( Y) ; 8 NumericMatrix x = as < NumericMatrix >( X) ; 9 NumericVector output ; 10 output = runreg (y, x) ; 11 return wrap ( output ) ; 1 }./RcppLM/src/_runReg.cpp Finishing Touches This note walked through the inclusion of a single C++ function into an R package. In order to access the functionality we coded in C++ in R, we had to wrap it in another C++ function that was R-callable and then create the R function which actually did the calling. For actual work, there is more work to do in creating a package: writing documentation, writing unit-tests, and writing vignettes. However, as our package stands, it checks. And, after we install it, we can test it. user@machine$ Rscript -e "library(rcpplm); runreg.r(rnorm(), rnorm())" Loading required package: Rcpp Loading required package: RcppArmadillo [1] -0.08689 1 These are called header guards or includes guards. See http://cran.r-project.org/doc/manuals/r-exts.html for all of this.