Packing C++ with R Advanced Statistical Programming Camp

Similar documents
Storage Classes CS 110B - Rule Storage Classes Page 18-1 \handouts\storclas

Advanced Tornado TWENTYONE Advanced Tornado Accessing MySQL from Python LAB

Distribute your R code with R package

Hypercosm. Studio.

An Incomplete C++ Primer. University of Wyoming MA 5310

Netbeans IDE Tutorial for using the Weka API

Lazy OpenCV installation and use with Visual Studio

To reduce or not to reduce, that is the question

Week 2 Practical Objects and Turtles

Exposing C++ functions and classes with Rcpp modules

Writing R packages. Tools for Reproducible Research. Karl Broman. Biostatistics & Medical Informatics, UW Madison

1 Abstract Data Types Information Hiding

C++ INTERVIEW QUESTIONS

Basics of I/O Streams and File I/O

How To Port A Program To Dynamic C (C) (C-Based) (Program) (For A Non Portable Program) (Un Portable) (Permanent) (Non Portable) C-Based (Programs) (Powerpoint)

Chapter 13 - The Preprocessor

Xcode Project Management Guide. (Legacy)

Creating a Java application using Perfect Developer and the Java Develo...

ArcGIS Tutorial: Adding Attribute Data

Introduction to the data.table package in R

CDD user guide. PsN Revised

Coding conventions and C++-style

0.8 Rational Expressions and Equations

Configuring CitectSCADA SNMP projects with MIB2CIT. A reference for CitectSCADA Customers

How To Write Portable Programs In C

5 Correlation and Data Exploration

HTML Form Widgets. Review: HTML Forms. Review: CGI Programs

Creating a Simple Macro

Introduction to ROOT and data analysis

VLOOKUP Functions How do I?

WebPublish User s Manual

Printing. Jason Healy, Director of Networks and Systems

MS ACCESS DATABASE DATA TYPES

SimbaEngine SDK 9.4. Build a C++ ODBC Driver for SQL-Based Data Sources in 5 Days. Last Revised: October Simba Technologies Inc.

Step by Step Tutorial to creating R Packages. Heng Wang Michigan State University

Sources: On the Web: Slides will be available on:

sqlite driver manual

Rweb: Web-based Statistical Analysis

Illustration 1: Diagram of program function and data flow

PloneSurvey User Guide (draft 3)

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

Object systems available in R. Why use classes? Information hiding. Statistics 771. R Object Systems Managing R Projects Creating R Packages

Programming Exercise 3: Multi-class Classification and Neural Networks

Chapter 19: XML. Working with XML. About XML

Lecture 5: Java Fundamentals III

Brian Caffo. Using.Call in R

Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program.

VHDL Test Bench Tutorial

Time Clock Import Setup & Use

ASCII Encoding. The char Type. Manipulating Characters. Manipulating Characters

CMPT 373 Software Development Methods. Building Software. Nick Sumner Some materials from Shlomi Fish & Kitware

Caml Virtual Machine File & data formats Document version: 1.4

Product: DQ Order Manager Release Notes

Lecture 11: Tail Recursion; Continuations

IVI Configuration Store

How To Use Standard Pos On A Pc Or Macbook Powerbook (Powerbook 2)

Reading and Understanding Java s API Documentation

Configuring the Server(s)

CREATING AND DEPLOYING ABL WEB SERVICES

INTRODUCTION TO OBJECTIVE-C CSCI 4448/5448: OBJECT-ORIENTED ANALYSIS & DESIGN LECTURE 12 09/29/2011

FileBench's Multi-Client feature

Step by Step Tutorial to creating R Packages. Heng Wang Michigan State University

HELP DESK MANUAL INSTALLATION GUIDE

Microsoft Virtual Labs. Building Windows Presentation Foundation Applications - C# - Part 1

Project 2: Bejeweled

Code Estimation Tools Directions for a Services Engagement

HOMEWORK # 2 SOLUTIO

Project 2 Database Design and ETL

STRUCTURE AND FLOWS. By Hagan Rivers, Two Rivers Consulting FREE CHAPTER

An Introduction to Modern Software Development Tools Creating A Simple GUI-Based Tool Appleʼs XCode Version 3.2.6

Hands-On UNIX Exercise:

How to Configure Outlook 2013 to connect to Exchange 2010

MS Enterprise Library 5.0 (Logging Application Block)

Object-Oriented Programming in Java

The little endl that couldn t

Jenkins on Windows with StreamBase

Package packrat. R topics documented: March 28, Type Package

Developing a Web Server Platform with SAPI Support for AJAX RPC using JSON

Ethereal: Getting Started

Beginner s Matlab Tutorial

Embedded Software Development with MPS

G563 Quantitative Paleontology. SQL databases. An introduction. Department of Geological Sciences Indiana University. (c) 2012, P.

Chapter One Introduction to Programming

This document presents the new features available in ngklast release 4.4 and KServer 4.2.

MATLAB Functions. function [Out_1,Out_2,,Out_N] = function_name(in_1,in_2,,in_m)

By Glenn Fleishman. WebSpy. Form and function

CPLEX Tutorial Handout

Setting up PostgreSQL

Examining the InDesign Server Solution

User s Manual

Customizing forms and writing QuickBooks Letters

CheckBook Pro 2 Help

Network Security EDA /2012. Laboratory assignment 4. Revision A/576, :13:02Z

CEFNS Web Hosting a Guide for CS212

Introduction to SQL for Data Scientists

MS Excel Template Building and Mapping for Neat 5

App Building Guidelines

GP REPORTS VIEWER USER GUIDE

Year 9 set 1 Mathematics notes, to accompany the 9H book.

Lab 1: Introduction to C, ASCII ART and the Linux Command Line Environment

Transcription:

Introduction This note documents the transition from creating R functions from C++ code via Rcpp helper functions to creating R functions by building and installing an R package. In both cases, many of low-level details (e.g. compiling C++ code, linking binary files, and loading shared library objects into R) are handled for us. This is a good thing. However, the R package approach is more flexible and powerful than what we can do with evalcpp(), cppfunction(), or sourcecpp(). By the end of this note, we will have created an R package that provides an R function interface to our C++ code for running a linear regression. Getting Started Recall from the sessions our use of sourcecpp() to pass a snippet of C++ code through Rcpp s helper functions. In particular, we used sourcecpp(file = "bootstrap_functions.cpp") which would lead to the Rcpp R package reading from the C++ snippet file. It would create a valid C++ file, compile, link, load the shared library, and then create the appropriate R functions. 1 // [[ Rcpp :: depends ( RcppArmadillo )]] /* this is called an Rcpp attribute, it is a comment from the perspective of C++, but sourcecpp () knows to look at them 6 and act in a certain way when it sees 7 them 8 9 the intervention here is making sure the 10 RcppArmadillo.h file can be found 11 */ 1 1 # include <RcppArmadillo.h> 1 # include <RcppArmadilloExtensions / sample.h> 1 16 using namespace Rcpp ; 17 18 // ---------------------------------------------------------------------------- 19 0 1 // [[ Rcpp :: export ]] /* this intervention is make a corresponding runreg R function visible */ 6 7 Rcpp :: NumericVector runreg ( NumericMatrix Y, NumericMatrix X) { 8 arma :: mat y = as < arma :: mat >(Y) ; 9 arma :: mat x = as < arma :: mat >(X) ; 0 arma :: mat tmp1 ; 1 tmp1 = inv (x.t() * x) ; arma :: mat tmp = x.t() * y ; arma :: mat betahat = tmp1 * tmp ; NumericVector Betahat (X. cols ()) ; for ( int i = 0; i < X. cols () ; i++) { 6 Betahat (i) = betahat (i) ; 7 } 8 return Betahat ; /* We are returning a C++ class defined by the Rcpp 9 C++ code 0 */ 1 } // ---------------------------------------------------------------------------- 6 // [[ Rcpp :: export ]] 7 Rcpp :: List getbsdata ( NumericMatrix Y, NumericMatrix X) { 8 int N = Y. nrow () ; 9 IntegerVector indices = seq_len (N) - 1 ; 0 1 // // Draw a sample from sampling frame IntegerVector indices = indices ; indices = RcppArmadillo :: sample ( indices, N, 6 TRUE, 7 NumericVector :: create () 1

8 ) ; /* looks and behaves like 9 the sample () in R 60 */ 61 6 // 6 // 6 6 // 66 // Construct BS Data based on sampling from sampling frame. 67 arma :: mat Ybs (as <arma :: mat >( Y)) ; // BS Sample Y 68 arma :: mat Xbs (as <arma :: mat >( X)) ; // BS Sample X 69 for ( int r = 0; r < N ; r ++) { 70 // note, Xbs [1, ] == X[ indices [1], ] 71 // Xbs [r, ] == X[ indices [r], ] 7 // and note that the above brackets are R style and not valid C++ 7 Ybs. row (r) = (as <arma :: mat >( Y)). row ( indices (r)) ; 7 Xbs. row (r) = (as <arma :: mat >( X)). row ( indices (r)) ; 7 } 76 // 77 // 78 79 return Rcpp :: List :: create (_["Y"] = wrap ( Ybs ), 80 _["X"] = wrap ( Xbs ) /* single quotes are 81 for single char chars only 8 */ 8 ) ; 8 } 8 86 // ---------------------------------------------------------------------------- 87 88 89 // [[ Rcpp :: export ]] 90 91 NumericMatrix runbs ( NumericMatrix YY, NumericMatrix XX, int M) { 9 arma :: mat betas (M, XX. ncol ()) ; // an arm :: mat b/c of the member functions 9 betas. fill (0) ; 9 9 arma :: mat tmpbeta (M, XX. nrow ()) ; 96 List tmpdata ; 97 for ( int m = 0 ; m < M ; m ++) { 98 tmpdata = getbsdata (YY, XX) ; 99 tmpbeta = runreg ( tmpdata ["Y"], tmpdata ["X"]) ; 100 /* extract 101 elements from list in 10 R- ish fashion 10 */ 10 betas. row (m) = tmpbeta.t() ; 10 /* be careful with what 106 runreg did to tmpbeta 107 */ 108 109 /* if we really wanted to 110 optimize this code, we 111 would access columns only -- 11 more efficient. 11 */ 11 } 11 return as < NumericMatrix >( wrap ( betas )) ; 116 }./bootstrap_functions.cpp We can control which C++ functions generate corresponding R functions by using the // [[ Rcpp :: export ]] directive. From the perspective of C++ code, this is just a comment, but the Rcpp helper functions look for these to determine the C++ code they generate based on the snippet we write. As it stands, bootstrap_functions.cpp creates three R function and these R functions have names in R that are identical to the names in C++. Creating a Minimal R Package R has long had a package.skeleton() function which creates a minimal working package. Although the created package does not provide any useful functionality, it sets up the structure of the files inside the toplevel directory automatically. It even populates some of them with useful default values and helpful hints. If we were going to be using just the C++ code provided by Rcpp (i.e. # include <Rcpp.h>, we d only need

to use the Rcpp.package.skeleton() function. However, because we will use the Armadillo C++ code and access it by using the RcppArmadillo package, we will use RcppArmadillo.package.skeleton(). At a minimum, we should call this function with a package name and a path where R should set up this dummy package. For example, one might use RcppArmadillo.package.skeleton(package="RcppLM", path="~/desktop"). Of course, the path argument should be a character string to a valid location on the machine it is being run on. After running this R function, navigate to the directory for this package that was created. The contents are: man (a directory for help files) R (a directory for R code) src (a directory for code to be compiled) DESCRIPTION NAMESPACE Read-and-delete-me Before using or distributing an R package, you should check its validity. This can be done with R CMD check RcppLM (in our case). As it stands, this package won t succeed and we will make a small change to rectify this. Open the only Rd file in the man directory. The contents of \examples{} must be valid R code. Clearly this isn t the case. Go ahead and add # at the beginning of the line. Save the changes. This package, though it does nothing useful, will now check successfully. Creating R Functions Create an R script in the R sub-directory of the package directory. It can be named anything, but we ll want to use a name that will be meaningful for us. For now, we can create a file runreg.r because we ll define the R function runreg in it. For now, we can put the following contents in this file: 1 runreg.r <- function (Y, X){ ##. Call ( " runreg ", as. matrix (Y), as. matrix (X), PACKAGE = " RcppLM ") } Ultimately, we will uncomment the.call line. However, that code presently would not work (without the comment) given that we haven t defined any functions that we can access from a shared library object names "runreg". Creating C++ Functions There are two kinds of C++ functions we will create. The first corresponds to the C++ functions we were writing in our C++ snippets like bootstrap_functions.cpp. These take C++ class arguments and map to C++ class return values. We can create the file runreg.cpp in the src subdirectoy. 1 # include <RcppArmadillo.h> using namespace Rcpp ; Rcpp :: NumericVector runreg ( NumericMatrix Y, NumericMatrix X) { 6 arma :: mat y = as < arma :: mat >(Y) ; 7 arma :: mat x = as < arma :: mat >(X) ; 8 arma :: mat tmp1 ; 9 tmp1 = inv (x.t() * x) ; 10 arma :: mat tmp = x.t() * y ; 11 arma :: mat betahat = tmp1 * tmp ; 1 NumericVector Betahat (X. cols ()) ; 1 for ( int i = 0; i < X. cols () ; i++) { 1 Betahat (i) = betahat (i) ;

1 } 16 return Betahat ; /* We are returning a C++ class defined by the Rcpp 17 C++ code 18 */ 19 }./RcppLM/src/runReg.cpp This is simply our code to calculate the regression coefficients using the linear algebra expression (X X) 1 X y. The function is declared such that it will return NumericVector output, and we can see that this is the actual class of Betahat. After creating this file, we can check the R package. It will pass. However, still, nothing useful is done and while we have defined an R function and a C++ function, they are not connected in any way. Creating R-callable C++ Functions One of the steps that the Rcpp helper functions performed for us invisibly is the creation of C++ functions which are actually callable from R. The C++ function we ve written so far which provide our useful functionality are not. And, while it isn t required, we will walk through the process of creating an R-callable C++ function that subsequently calls a C++ function instead of making the C++ function runreg callable from R. R-callable C++ functions are a bit different than what we ve seen so far. Their arguments are all of the class SEXP and their output is of the class SEXP. The way we achieve this is no different from other classes, however. Additionally, we must include RcppExport at the beginning of the declaration. We can create the file _runreg.cpp in the src subdirectoy with the following contents. 1 // # include " runreg.h" using namespace Rcpp ; RcppExport SEXP _ runreg ( SEXP Y, SEXP X) { 6 NumericMatrix y = as < NumericMatrix >( Y) ; 7 NumericMatrix x = as < NumericMatrix >( X) ; 8 NumericVector output ; 9 // output = runreg (y, x) ; 10 return wrap ( output ) ; 11 } In this case, we are just using an underscore to indicate that this is a wrapper around C++ code to be accessed in R. Notice that two lines are commented out. Commenting these out allows us to check that the package we ve made so far is valid. Still, no useful functionality has been provided. Before we can have _runreg (the wrapper function) call runreg (the function doing computation), we must define a local header. This will tell the compiler what runreg looks like before it goes and finds the full definition. Still, _runreg is a valid (but useless) C++ function. Because it is now defined, we can update our R function definition to the following: 1 runreg.r <- function (Y, X){. Call ( "_ runreg ", as. matrix (Y), as. matrix (X), PACKAGE = " RcppLM ") }./RcppLM/R/runReg.R Be careful to.call the C++ function _runreg, it is callable..call-ing the C++ function runreg will, on the other hand, throw an error. It is not callable. Creating Local Headers to Bridge R Functions Our _runreg C++ function must ultimately call our runreg C++ function. However, we d get an error about runreg being undefined in the current scope if we tried to compile the package without commenting

out the output=runreg(y,x) line. Although runreg is defined in another file, we have to explicitly provide a summary of that definition. We do that by using a local header. Here, we create the following runreg.h in the src directory. 1 # ifndef _ RcppLM _ RUNREG _H # define _ RcppLM _ RUNREG _H # include <RcppArmadillo.h> 6 using namespace Rcpp ; 7 8 NumericVector runreg ( NumericMatrix Y, NumericMatrix X) ; 9 10 # endif./rcpplm/src/runreg.h This gives us access to the signature of the function the types and classes of inputs and outputs. In order to check the basic structure of our _runreg() function, this is all we need. The pre-processor directives #ifndef,#define, and #endif, ensure that we only provide the signature once. 1 In practice we will need to do this for other functions, but the structure is the same. We would just have to change _RcppLM_RUNREG_H to a different name and update the function for whatever we needed. Now that we have this header file, we can #include it in _runreg.cpp and uncomment our use of runreg(). After this, its contents are as follows: 1 # include <RcppArmadillo.h> # include " runreg.h" using namespace Rcpp ; 6 RcppExport SEXP _ runreg ( SEXP Y, SEXP X) { 7 NumericMatrix y = as < NumericMatrix >( Y) ; 8 NumericMatrix x = as < NumericMatrix >( X) ; 9 NumericVector output ; 10 output = runreg (y, x) ; 11 return wrap ( output ) ; 1 }./RcppLM/src/_runReg.cpp Finishing Touches This note walked through the inclusion of a single C++ function into an R package. In order to access the functionality we coded in C++ in R, we had to wrap it in another C++ function that was R-callable and then create the R function which actually did the calling. For actual work, there is more work to do in creating a package: writing documentation, writing unit-tests, and writing vignettes. However, as our package stands, it checks. And, after we install it, we can test it. user@machine$ Rscript -e "library(rcpplm); runreg.r(rnorm(), rnorm())" Loading required package: Rcpp Loading required package: RcppArmadillo [1] -0.08689 1 These are called header guards or includes guards. See http://cran.r-project.org/doc/manuals/r-exts.html for all of this.