Integrating R with the Go programming language using interprocess communication



Similar documents
Apache Thrift and Ruby

Programming Languages CIS 443

SUMMER SCHOOL ON ADVANCES IN GIS

DSC 2003 Working Papers (Draft Versions) Rserve

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

General Introduction

Resource Utilization of Middleware Components in Embedded Systems

Software Engineering

Bug hunting. Vulnerability finding methods in Windows 32 environments compared. FX of Phenoelit

FAQs. This material is built based on. Lambda Architecture. Scaling with a queue. 8/27/2015 Sangmi Pallickara

Boolean Expressions, Conditions, Loops, and Enumerations. Precedence Rules (from highest to lowest priority)

Rserve A Fast Way to Provide R Functionality to Applications

Crash Course in Java

Review from last time. CS 537 Lecture 3 OS Structure. OS structure. What you should learn from this lecture

AQA GCSE in Computer Science Computer Science Microsoft IT Academy Mapping

VIRTUAL LABORATORY: MULTI-STYLE CODE EDITOR

AUTOMATED CONFERENCE CD-ROM BUILDER AN OPEN SOURCE APPROACH Stefan Karastanev

Kernel Types System Calls. Operating Systems. Autumn 2013 CS4023

MADOCA II Data Logging System Using NoSQL Database for SPring-8

Agenda. Distributed System Structures. Why Distributed Systems? Motivation

Distributed Network Management Using SNMP, Java, WWW and CORBA

Outline. hardware components programming environments. installing Python executing Python code. decimal and binary notations running Sage

Computational Mathematics with Python

opalang - Rapid & Secure Web Development

Moving from CS 61A Scheme to CS 61B Java

Integrating VoltDB with Hadoop

Hadoop and Map-Reduce. Swati Gore

HelenOS IPC and Behavior Protocols

Sources: On the Web: Slides will be available on:

.NET and J2EE Intro to Software Engineering

Outline Basic concepts of Python language

Computer Programming I

Computational Mathematics with Python

Overview of CS 282 & Android

High-Level Programming Languages. Nell Dale & John Lewis (adaptation by Michael Goldwasser)

Streaming Data, Concurrency And R

Lecture 17: Mobile Computing Platforms: Android. Mythili Vutukuru CS 653 Spring 2014 March 24, Monday

CSC230 Getting Starting in C. Tyler Bletsch

Driving force. What future software needs. Potential research topics

CS 378 Big Data Programming. Lecture 9 Complex Writable Types

Oracle Solaris Studio Code Analyzer

Jonathan Worthington Scarborough Linux User Group

Security Testing For RESTful Applications

Java in Education. Choosing appropriate tool for creating multimedia is the first step in multimedia design

Continuous Delivery and Test Automation in Agile SW projects with Robot Framework Antti Pohjonen

Performance Counter. Non-Uniform Memory Access Seminar Karsten Tausche

Invitation to Ezhil : A Tamil Programming Language for Early Computer-Science Education 07/10/13

Calling R Externally : from command line, Python, Java, and C++

Ambient-oriented Programming & AmbientTalk

Example of Standard API

#820 Computer Programming 1A

Introduction to the. Barracuda Embedded Web-Server

Chapter 6: Programming Languages

Limi Kalita / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 5 (3), 2014, Socket Programming

Introduction to Python

Infrastructure that supports (distributed) componentbased application development

Layering a computing infrastructure. Middleware. The new infrastructure: middleware. Spanning layer. Middleware objectives. The new infrastructure

APPLICATION VIRTUALIZATION TECHNOLOGIES WHITEPAPER

Application Note: AN00121 Using XMOS TCP/IP Library for UDP-based Networking


AUTOMATIC EVALUATION OF COMPUTER PROGRAMS USING MOODLE S VIRTUAL PROGRAMMING LAB (VPL) PLUG- IN

Computer Science 217

How To Write A Web Server In Javascript

CONTROL MICROSYSTEMS DNP3. User and Reference Manual

What is An Introduction

Agile Business Process Automation

presentation Contact information:

CS 141: Introduction to (Java) Programming: Exam 1 Jenny Orr Willamette University Fall 2013

Developing a Web Server Platform with SAPI Support for AJAX RPC using JSON

UDR: UDT + RSYNC. Open Source Fast File Transfer. Allison Heath University of Chicago

Introduction to Virtual Machines

Real-Time Analytics on Large Datasets: Predictive Models for Online Targeted Advertising

Agent Languages. Overview. Requirements. Java. Tcl/Tk. Telescript. Evaluation. Artificial Intelligence Intelligent Agents

The C Programming Language course syllabus associate level

Parrot in a Nutshell. Dan Sugalski dan@sidhe.org. Parrot in a nutshell 1

CSE 130 Programming Language Principles & Paradigms

Lecture 15 - Web Security

This is DEEPerent: Tracking App behaviors with (Nothing changed) phone for Evasive android malware

A Test Suite for Basic CWE Effectiveness. Paul E. Black.

Kodo - Cross-platform Network Coding Software Library. Morten V. Pedersen - Aalborg University / Steinwurf ApS mvp@es.aau.dk

Google Cloud Data Platform & Services. Gregor Hohpe

CS 106 Introduction to Computer Science I

CS3813 Performance Monitoring Project

The K 2 System: Lisp at the Core of the ISP Business

Informatica e Sistemi in Tempo Reale

Chapter 15 Windows Operating Systems


Efficiency of Web Based SAX XML Distributed Processing

Between Mutual Trust and Mutual Distrust: Practical Fine-grained Privilege Separation in Multithreaded Applications

Announcements. Lab 2 now on web site

MODEL DRIVEN DEVELOPMENT OF BUSINESS PROCESS MONITORING AND CONTROL SYSTEMS

MapReduce. MapReduce and SQL Injections. CS 3200 Final Lecture. Introduction. MapReduce. Programming Model. Example

Intro to scientific programming (with Python) Pietro Berkes, Brandeis University

Middleware and Distributed Systems. Introduction. Dr. Martin v. Löwis

Development of complex KNX Devices

Synthesis for Developing Apps on Mobile Platforms

3.5. cmsg Developer s Guide. Data Acquisition Group JEFFERSON LAB. Version

Eclipse Help

How To Protect Your Source Code From Reverse Engineering

First Java Programs. V. Paúl Pauca. CSC 111D Fall, Department of Computer Science Wake Forest University. Introduction to Computer Science

Transcription:

Integrating R with the Go programming language using interprocess communication Christoph Best, Karl Millar, Google Inc. chbest@google.com

Statistical software in practice & production Production environments!!!= R development environment Scale: machines, people, tools, lines of code discipline of software engineering Maintainable code, common standards and processes Central problem: The programming language to use How do you integrate statistical software in production? Rewrite everything in your canonical language? Patch things together with scripts, dedicated servers,...?

Everybody should just write Java!

Programming language diversity Programming language diversity is hard Friction, maintenance, tooling, bugs, but sometimes you need to have it Many statistics problems can only be solved in R * How do you integrate R code with production code? without breaking production * though my colleagues keep pointing out that any Turing-complete language can solve any problem

The Go programming language Open-source language, developed by small team at Google Aims to put the fun back in (systems) programming Fast compilation and development cycle, little baggage Made to feel like C (before C++) Made not to feel like Java or C++ (enterprise languages) Growing user base (inside and outside Google)

Integration: Intra-process vs inter-process Intra-process: Link different languages through C ABI smallest common denominator issues: stability, ABI evolution, memory management, threads, Can we do better? Or at least differently? Idea: Sick of crashes? Execute R in a separate process Runs alongside main process, closely integrated: lamprey Provide communication layer between R and host process A well-defined compact interface surface

Integration: Intra-process vs inter-process C runtime Go C++ Java Python... R runtime (library) R code RPC client IPC Messages RPC server R single process shared memory shared crashes two processes memory isolation

How it works Host process starts R subprocess Tightly coupled on same machine/container Lamprey R subprocess loads required packages R executes executionservice::runexecutionservice() listens for connections, executes incoming requests, returns results leverages existing RPC package Communication layer: grpc(-like) / Protocol buffers All messages are proto buffers R subprocess is server, host language process is client

Data model Host sees R subprocess as REPL Sends R commands and R values, reads results Only R values, no references handled on this level READ-EVALUATE-PRINT LOOP R values encoded as proto buffers on wire Only basic R types go on the wire: vectors of elementary data types lists everything else must be expressed by basic types

Four simple requests from Go to R CreateContext() returns Context: create an execution context (isolation) Set(ctx, variablename, Rvalue) Assign a value to a named variable Do(ctx, Rexpression) returns RValue Evaluate an expression (a string) in R Expression refers to previously set variabkes Return result value CloseContext(ctx): free resources in context (e.g. variables)

Wire representation for R values message REXP { required RClass rclass = 1; STRING, INTEGER, REAL, LOGICAL, NULLTYPE, LIST repeated double realvalue = 2 [packed=true]; repeated sint32 intvalue = 3 [packed=true]; repeated RBOOLEAN booleanvalue = 4; repeated STRING stringvalue = 5; repeated REXP rexpvalue = 8; }basic R vectors list of R values only one present } repeated string attrname = 11; repeated REXP attrvalue = 12; Object attributes from RProtoBuf package, Originally written by Saptarshi Guha for RHIPE (http://www.rhipe.org)

Wire representation for R values enum RBOOLEAN { F=0; T=1; Boolean is an enum with three values NA=2; } String contains a flag to indicate NA value message STRING { optional string strval = 1; optional bool isna = 2 [default=false]; }

Set request: wire representation message SetRequest { optional Context context = 1; Context in which to assign the variable optional string variable_name = 2; Variable name to assign to } optional rexp.rexp value = 3; Value in wire encoding message SetResponse { } No response necessary Error conditions are transmitted separately

Evaluate request: wire representation message EvaluateRequest { optional Context context = 1; repeated string expression = 2; Context in which to assign the variable R expression as string Can refer to variables } optional bool return_result = 3 [default=true]; Indicates whether a result is expected message EvaluateResponse { optional rexp.rexp result = 1; } Result value in wire representation

A quick example service, err := rexp.newservice(context.background())) x := []float64{1, 2, 3} Set up input data y := []float64{2, 4, 6} Execute R code (magically sets up context etc.) r, err := service.do( rexp.set("x", x), Transfer input data to R process rexp.set("y", y), "d <- data.frame(x=x, y=y)", Make input data into a data frame "m <- lm(x ~ y, d)", Do statistics here "list(coef=m$coefficients, res=m$residuals)") Prepare results coefficients := r.get("coef").toany().([]float64) residuals := r.get("res").toany().([]float64) Extract results

Strategies Problem: You can only transfer basic R values Solution: Construct higher types explicitly (e.g. data frames) In the future, we can hide this complexity using improvements to the Go libraries Problem: Only values can be transferred, no references Solution: You can keep references as variables on the R side Go library code can allocate variable names, etc, automate a lot of things This library only provides the bottom layer.

Does it work? Yes. Used in several experimental projects. Statisticians/analysts able to deal with interface. Is it fast enough? Yes, for reasonably sized datasets (10-100 MBytes) About 3ms for CreateContext/Set/Evaluate/CloseContext sequence About 50-100 MByte/s for transferring data Speed more dominated by R runtime than wire protocol

Future work Better data types on the Go side Data frames natively in Go Automatic construction of data.frame in R Callbacks and inverted server Callbacks: Allow R to make calls to Go Inverted server: Run Go as a subprocess of R Could be used to extend R with Go code Open sourcing

Summary Inter-process communication is a (surprisingly) effective way to couple two programming languages Simplicity Robustness Clarity