Packrat: A Dependency Management System for R



Similar documents
Package packrat. R topics documented: March 28, Type Package

BarTender Version Upgrades. Best practices for updating your BarTender installation WHITE PAPER

CPSC 491. Today: Source code control. Source Code (Version) Control. Exercise: g., no git, subversion, cvs, etc.)

Over-the-top Upgrade Guide for Snare Server v7

Revision Control. Solutions to Protect Your Documents and Track Workflow WHITE PAPER

EnterpriseLink Benefits

SHAREPOINT CONSIDERATIONS

3. Where can I obtain the Service Pack 5 software?

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs

How To Migrate To Redhat Enterprise Linux 4

EMC E EMC Content Management Foundation Exam(CMF)

Best Overall Use of Technology. Jaspersoft

Surround SCM Best Practices

citools Documentation

RHEL to SLES Migration Overview

Backup and Recovery in Laserfiche 8. White Paper

4/25/2016 C. M. Boyd, Practical Data Visualization with JavaScript Talk Handout

White Paper Server. SUSE Linux Enterprise Server 12 Modules

K-Backup: Network automatic backup data to free iscsi SAN

Effective Release Management for HPOM Monitoring

Continuous integration for databases using

Installing and Administering VMware vsphere Update Manager

MATLAB as a Collaboration Platform Marta Wilczkowiak Senior Applications Engineer MathWorks

Build management & Continuous integration. with Maven & Hudson

Backup with synchronization/ replication

BarTender Integration Methods. Integrating BarTender s Printing and Design Functionality with Your Custom Application WHITE PAPER

Upping the game. Improving your software development process

Web Developer Toolkit for IBM Digital Experience

Continuous Integration (CI) for Mobile Applications

VMware vcenter Update Manager Administration Guide

Migration Use Cases & Processes

System Planning, Deployment, and Best Practices Guide

AWS CodePipeline. User Guide API Version

Building Library Website using Drupal

StriderCD Book. Release 1.4. Niall O Higgins

Zero-Touch Drupal Deployment

Upgrading Horizon Workspace

The remedies set forth in this SLA are your sole and exclusive remedies for any failure of the service.

A block based storage model for remote online backups in a trust no one environment

Administration of Symantec Enterprise Vault 8.0 for Exchange Exam.

Continuous integration for databases using Redgate tools

IBM DB2 CommonStore for Lotus Domino, Version 8.3

1. PROJECT MANAGEMENT INTRO

Simplify Your Windows Server Migration

MOVING THE CLINICAL ANALYTICAL ENVIRONMENT INTO THE CLOUD

sqlite driver manual

MontaVista Linux 6. Streamlining the Embedded Linux Development Process

Why enterprise data archiving is critical in a changing landscape

Version Control for Computational Economists: An Introduction

Xopero Centrally managed backup solution. User Manual

VMTurbo Operations Manager 4.5 Installing and Updating Operations Manager

OFBiz Addons goals, howto use, howto manage. Nicolas Malin, Nov. 2012

Redundancy Options. Presented By: Chris Williams

Developing Software in a Private workspace PM PMS

EndNote Beyond the Basics

ETSI TS V1.1.1 ( ) Technical Specification

Archive One Policy V4.2 Quick Start Guide October 2005

Software Configuration Management Plan

Plan for model and data quality assurance for the SR-Site project

Backup Assistant. User Guide. NEC NEC Unified Solutions, Inc. March 2008 NDA-30282, Revision 6

The Real Challenges of Configuration Management

Tech Notes. Corporate Headquarters EMEA Headquarters Asia-Pacific Headquarters 100 California Street, 12th Floor San Francisco, California 94111

Software Tool House Inc.

W H I T E P A P E R. Understanding VMware Consolidated Backup

Data Management, Analysis Tools, and Analysis Mechanics

Northwestern University Dell Kace Patch Management

Backup Strategies for Integrity Virtual Machines

AppConnect FAQ for MobileIron Technology Partners! AppConnect Overview

Cyber Security: Guidelines for Backing Up Information. A Non-Technical Guide

Mining a Change-Based Software Repository

Jenkins on Windows with StreamBase

Considerations for Management of Laboratory Data

Patch Management. Module VMware Inc. All rights reserved

Livelink ECM edocs Suite Roadmap

pbuilder Debian Conference 2004

Chapter 11 Managing Core Database Downloads

Potential of Virtualization Technology for Long-term Data Preservation

Deploying Physical Solutions to InfoSphere Master Data Management Server Advanced Edition v11

Exchange Mailbox Protection Whitepaper

Backup and Recovery by using SANWatch - Snapshot

django-cron Documentation

Quantum Q-Cloud Backup-as-a-Service Reference Architecture

What Is Ad-Aware Update Server?

1Y0-A09. Implementing Citrix XenServer Enterprise Edition

VMware Mirage Web Manager Guide

Inmagic Content Server Workgroup Configuration Technical Guidelines

MySQL Storage Engines

Component Details Notes Tested. The virtualization host is a windows 2008 R2 Hyper-V server. Yes

Continuous Integration and Delivery. manage development build deploy / release

Continuous integration for databases using Red Gate tools

Where Are My Primary Documents?

Transcription:

Packrat: A Dependency Management System for R J.J. Allaire June 27, 2014 3/23

Reproducible Research Foundational as a basis for scientific claims "The goal of reproducible research is to tie specific instructions to data analysis and experimental data so that scholarship can be recreated, better understood and verified." CRAN Task View on Reproducible Research Crisis of confidence in results of data analysis due to lack of reproducibility. Across time (running the same analysis again years later) Across space (moving code from a desktop to a server, or between the systems of collaborators) In R we do better than in many environments, but we don't do well enough. 4/23

Tools for Reproducibility Computation Output Can we execute again and get the same results? Yes, because we preserve our analysis in R Scripts Can we produce the same enduser output consistently? Yes, because we have tools like Sweave and knitr Configuraiton Can we run our computations and create our output with the same configuration across time and space? No (or yes only with a lot of effort and bother) 5/23

Configuration Rot As packages evolve over the years they inevitably: At best, change behavior in subtle ways At worst, outright break previous code As a result, an analysis or report that works today against e.g. R 3.1 it unlikely to work without modification in 5 years time. This is already a widely observed problem with Sweave and knitr documents that users attempt to update with new data and assumptions (or even just rerun with code and data unchanged). 6/23

Rdevel [RFC] A case for freezing CRAN http://goo.gl/k77z6f Proposal to freeze CRAN along with R releases. Projects built against a given version of R/CRAN would be able to rely on stable package versions, and therefore be expected to continue to work in the future. Attractive notion because it's simple and requires no extra effort from users. 7/23

What if we could freeze CRAN? That would solve part of the problem, but wouldn't account for: Packages obtained from other repositories Development versions of packages installed from Rforge and GitHub Internally developed packages Users (inevitably) needing one more feature or bugfix and requiring the very latest version of a package. 8/23

What would a frozen CRAN not have? Bug fixes delivered in a timely fashion. Vitality and dynamism associated with making work available immediately to the community. The ability to use older versions of R with newer versions of packages. "To me it boils down to one simple question: is an update to a package on CRAN more likely to (1) fix a bug, (2) introduce a bug or downward incompatibility, or (3) add a new feature or fix a compatibility problem without introducing a bug? I think the probability of (1) (3) is much greater than the probability of (2), hence the current approach maximizes user benefit." Frank Harrell "People then will start finding ways around these limitations and then we're back to square one of having people use a set of R packages and R versions that could potentially be all over the place." Gavin Simpson 9/23

Freezing is the answer, but what to freeze? Freezing CRAN solve only a subset of the problem, and introduces it's own problems. The only complete answer to this problem is freezing projects. Individual projects should be able to freeze arbitrary combinations of R packages with a guarantee of being able to use them in the future. Note that even if we freeze CRAN we still need this as well, so why create the bother of freezing CRAN? Let's just do project freezing right! 10/23

How do other environments handle these concerns? Most have some variation of: A perproject private library The specification of explicit versions (or version ranges) of each dependency The ability to programatically reconstruct the library based on the specifications Some examples: Ruby Bundler (http://bundler.io/) Node.js NPM (https://www.npmjs.org/) Python Virtualenv (https://virtualenv.pypa.io/en/latest/) 11/23

How might a solution tailored to R users look? Fundamental difference at work: R users do not selfidentify as software developers and therefore have little tolerance for additional workflow overhead. Any solution must therefore be highly automated, and work with both existing projects created without packrat as well as new projects. We want the same benefits (private library and capturing of dependencies), with none of the following required: Hand editing of dependency declarations Retrieval and management of package source code 12/23

Packrat as a Possible Solution Packrat is an R package that implements a dependency management system for R: GitHub: https://github.com/rstudio/packrat Will be submitted to CRAN later this year Creates a private package library for a given R project (i.e. working directory) snapshotfunction that records the package versions used by a project and downloads their source code for storage with the project. restorefunction that applies the snapshot to a directory (building packages from source as necessary) 13/23

Packrat Fundamentals >packrat::init() Create a packrat project within a directory, giving the project it's own private package library. >packrat::snapshot() Finds the packages in use in the project and stores a list of those packages, their current versions, and their source code. >packrat::restore() Restore the directory to the last snapshotted state (building packages from source as necessary). 14/23

Initializing a Project >packrat::init() Adding these packages to packrat: _ packrat 0.2.0.130 Fetching sources for packrat(0.2.0.130)...ok(github) Snapshot written to'~/projects/reshape/packrat/packrat.lock' Installing packrat(0.2.0.130)...ok(built source) Bootstrap complete! 15/23

Snapshotting Installed Packages >packrat::snapshot() Adding these packages to packrat: _ plyr 1.8.1 Rcpp 0.11.2 reshape2 1.4 stringr 0.6.2 Fetching sources for plyr(1.8.1)...ok(cran current) Fetching sources for Rcpp(0.11.2)...OK(CRAN current) Fetching sources for reshape2(1.4)...ok(cran current) Fetching sources for stringr(0.6.2)...ok(cran current) Snapshot written to'~/projects/reshape/packrat/packrat.lock' 16/23

Restoring the State of the Library >packrat::restore() Installing Rcpp(0.11.2)...OK(downloaded binary) Installing stringr(0.6.2)...ok(downloaded binary) Installing plyr(1.8.1)...ok(downloaded binary) Installing reshape2(1.4)... OK(downloaded binary) 17/23

Updating a Package from Github >packrat::install_github("rcppcore/rcpp") >packrat::snapshot() Upgrading these packages already present in packrat: from to Rcpp 0.11.2 0.11.2.1 Snapshot written to'~/projects/reshape/packrat/packrat.lock' >packrat::restore() Installing Rcpp(0.11.2.1)...OK(built source) 18/23

Bundling and Unbundling >packrat::bundle() The packrat project has been bundled at: "~/projects/reshape/packrat/bundles/reshape20140624.tar.gz" > packrat::unbundle("reshape20140624.tar.gz",where ="~/Desktop") Untarring'reshape20140624.tar.gz'in directory'~/desktop'... Restoring project library... Installing packrat(0.2.0.130)...ok(built source) Installing Rcpp(0.11.2.1)...OK(built source) Installing stringr(0.6.2)...ok(downloaded binary) Installing plyr(1.8.1)...ok(downloaded binary) Installing reshape2(1.4)... OK(downloaded binary) Done!The project has been unbundled and restored at: "~/Desktop/reshape" 19/23

Anatomy of a Packrat Project.Rprofile Directs R to use the private package library (when it is started from the project directory). packrat/lib/ Private package library for this project. packrat/src/ Source packages of all the dependencies that packrat has been made aware of. packrat/packrat.lock Lists the precise package versions that were used to satisfy dependencies, including dependencies of dependencies. packrat/packrat.opts Projectspecific packrat options. 20/23

Packrat and Version Control 21/23

Packrat Objectives Isolated, portable, and reproducible environment for R projects Capture all source code required to reproduce configurations Requires no changes to CRAN and capable of working with arbitrary other repositories Flexible and easy to use solution to the problem of reproducibility: "One button" snapshot/restore Simple and convenient archiving (bundle/unbunble) Optional integration with version control 22/23

Questions? Packrat website: http://rstudio.github.io/packrat Packrat source: https://github.com/rstudio/packrat 23/23