An Introduction to the Use of R for Clinical Research



Similar documents
Interactive Applications for Modeling and Analysis with Shiny

R: A Free Software Project in Statistical Computing

R for Clinical Trial Reporting: Reproducible Research, Quality and Validation

ECON 424/CFRM 462 Introduction to Computational Finance and Financial Econometrics

R: A Free Software Project in Statistical Computing

The R Environment. A high-level overview. Deepayan Sarkar. 22 July Indian Statistical Institute, Delhi

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Interaction between quantitative predictors

Shiny Server Pro: Regulatory Compliance and Validation Issues

Object systems available in R. Why use classes? Information hiding. Statistics 771. R Object Systems Managing R Projects Creating R Packages

Statistical Models in R

Multiple Linear Regression

Guidelines for Data Collection & Data Entry

Introduction to R software for statistical computing

Data Analytics and Business Intelligence (8696/8697)

FileMaker Pro and Microsoft Office Integration

The Not Quite R (NQR) Project: Explorations Using the Parrot Virtual Machine

The power of IBM SPSS Statistics and R together

Psychology 205: Research Methods in Psychology

Time Series Analysis of Aviation Data

WSL Course Introduction to Statistics Software, Installation and Use

R Language Fundamentals

Using R for Linear Regression

Computational Assignment 4: Discriminant Analysis

National Frozen Foods Case Study

Polynomial Neural Network Discovery Client User Guide

Writing reproducible reports

GETTING STARTED WITH R AND DATA ANALYSIS

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

MULTIPLE REGRESSION EXAMPLE

Sweave = R LATEX 2. A brief tutorial. Nicola Sartori. Università Ca Foscari Venezia.

Improved Interaction Interpretation: Application of the EFFECTPLOT statement and other useful features in PROC LOGISTIC

Time Series Analysis AMS 316

Outline. hardware components programming environments. installing Python executing Python code. decimal and binary notations running Sage

Links. Blog. Great Images for Papers and Presentations 5/24/2011. Overview. Find help for entire process Quick link Theses and Dissertations

Creating R Packages, Using CRAN, R-Forge, And Local R Archive Networks And Subversion (SVN) Repositories

Moderator and Mediator Analysis

Python for Data Analysis and Visualiza4on. Fang (Cherry) Liu, Ph.D PACE Gatech July 2013

Model Calibration with Open Source Software: R and Friends. Dr. Heiko Frings Mathematical Risk Consulting

CS 3530 Operating Systems. L02 OS Intro Part 1 Dr. Ken Hoganson

FreeFem++-cs, the FreeFem++ Graphical Interface

(La)TeX Support for manuscript preparation. Conference Paper Management System

Package acrm. R topics documented: February 19, 2015

Examining a Fitted Logistic Model

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

Simple Predictive Analytics Curtis Seare

Scatter Plot, Correlation, and Regression on the TI-83/84

TIBCO Fulfillment Provisioning Session Layer for FTP Installation

Embedded SQL. Unit 5.1. Dr Gordon Russell, Napier University

SAP HANA Client Installation and Update Guide

R Commander Tutorial

To be productive in today s graphic s industry, a designer, artist, or. photographer needs to have some basic knowledge of various file

Financial Risk Models in R: Factor Models for Asset Returns. Workshop Overview

A short course in Longitudinal Data Analysis ESRC Research Methods and Short Course Material for Practicals with the joiner package.

10. Analysis of Longitudinal Studies Repeat-measures analysis

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

Modelling with R and MySQL. - Manual - Gesine Bökenkamp, Frauke Wiese, Clemens Wingenbach

Packrat: A Dependency Management System for R

Computational Mathematics with Python

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Lab 13: Logistic Regression

JMulTi/JStatCom - A Data Analysis Toolkit for End-users and Developers

Statistical Operations: The Other Half of Good Statistical Practice

Librarian. Integrating Secure Workflow and Revision Control into Your Production Environment WHITE PAPER

The Clinical Research Center

5. Correlation. Open HeightWeight.sav. Take a moment to review the data file.

Interactive Data Visualization for the Web Scott Murray


Software, Shareware and Opensource CSCU9B2

PIC 10A. Lecture 7: Graphics II and intro to the if statement

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs

We extended the additive model in two variables to the interaction model by adding a third term to the equation.

A Short Guide to R with RStudio

Tutorial 2 Online and offline Ship Visualization tool Table of Contents

Compliance Response Edition 07/2009. SIMATIC WinCC V7.0 Compliance Response Electronic Records / Electronic Signatures. simatic wincc DOKUMENTATION

Computational Mathematics with Python

TIBCO ActiveMatrix BusinessWorks Plug-in for TIBCO Managed File Transfer Software Installation

R software tutorial: Random Forest Clustering Applied to Renal Cell Carcinoma Steve Horvath and Tao Shi

Getting Started With Author-it

Introduction to RStudio

Transcription:

An Introduction to the Use of R for Clinical Research Dimitris Rizopoulos Department of Biostatistics, Erasmus Medical Center d.rizopoulos@erasmusmc.nl PSDM Event: Open Source Software in Clinical Research June 19th, 2012

Outline What is R and how to obtain in it Features of R Using R R and clinical research PSDM Event: Open Source Software in Clinical Research June 19th, 2012 1/28

What is R? R is a free software environment for statistical computing and graphics. it was initiated in 1992 by Ross Ihaka and Robert Gentleman at University of Auckland, New Zealand in 1997 the R Core Team was established with renowned members of the statistical computing community nowadays, the R Core Team has grown and consists of about 20 members, experts in computing Free Software the source code is available users are allowed to modify and redistribute the code PSDM Event: Open Source Software in Clinical Research June 19th, 2012 2/28

How to Install R? Download R from the CRAN web site http://cran.r-project.org choose your platform, e.g., Windows, Linux e.g., for Windows: Windows base Download R 2.15.0 for Windows Install... Download R packages from the CRAN web site within R Packages Install package(s)... make your choice(s) load the package using library() (note: install does not mean load) PSDM Event: Open Source Software in Clinical Research June 19th, 2012 3/28

Features of R Why R because is free it compiles and runs on a wide variety of UNIX platforms as well as Windows and MacOS R has extensive and powerful graphics & data manipulation capabilities it can easily interface with low-level programming languages, e.g., C/C++ or Fortran it can be easily extended via R packages PSDM Event: Open Source Software in Clinical Research June 19th, 2012 4/28

Features of R (cont d) Disadvantages of R steep learning curve (some might say) output is not so nice looking (but there are some alternatives) * Sweave, odfweave exporting output is more difficult cannot easily handle very very big data sets (depends on the installed RAM) * use 64bit OSs a lot of things are available but it is sometimes hard to find your way the quality of the available packages is greatly varying PSDM Event: Open Source Software in Clinical Research June 19th, 2012 5/28

Examples using R R is a command-based functional language write and execute commands use and define functions You may write the commands in the R console (Windows) or in a shell (Linux) Strongly advisable to use a suitable text editor Some available options: Tinn-R (for Windows; http://sciviews.org/tinn-r/) Rstudio (all major platforms; http://www.rstudio.org/) for more check http://www.sciviews.org/ rgui/projects/editors.html PSDM Event: Open Source Software in Clinical Research June 19th, 2012 6/28

Examples using R (cont d) R has very flexible and compact syntax Example: Calculate the coefficient of variation (sample std. dev. / sample mean), for blood pressure separately for males and females, in the age groups (20, 40) and (40, 60), and conditionally of being obese (BMI > 30) or not with(bpdata, tapply(bp, list(cut(age, c(20, 40, 60)), sex, weight / height^2 > 30), function (x) sd(x)/mean(x)) PSDM Event: Open Source Software in Clinical Research June 19th, 2012 7/28

Examples using R (cont d) Example: Fit a linear model for blood pressure levels taking as explanatory variables the linear and quadratic effects of age the main effect of gender and their interaction fm <- lm(bp ~ poly(age, 2) * sex, data = BPdata) summary(fm) # parameter estimates, standard errors, etc. plot(fm) # all basic residuals plots fitted(fm) # extract fitted values predict(fm, newdata) # make predictions for new patients PSDM Event: Open Source Software in Clinical Research June 19th, 2012 8/28

Examples using R (cont d) Plotting Predicted values with 95% CIs per operation type xyplot(pred + low + upp ~ time TypeOp, data = Preds, type = "l", col = "black", lty = c(1,2,2), ylab = "Aortic Gradient (mmhg)", xlab = "Time (years)") PSDM Event: Open Source Software in Clinical Research June 19th, 2012 9/28

Examples using R (cont d) Subcoronary Implantation 0 5 10 15 20 Root Replacement Aortic Gradient (mmhg) 30 20 10 0 5 10 15 20 Time (years) PSDM Event: Open Source Software in Clinical Research June 19th, 2012 10/28

Reporting Results in R : The Standard Communicating the results of a statistical analysis perform the analysis using your preferred statistical software results from this analysis constitute the basis for a statistical report Usually, this is a two-stage procedure, first do the analysis, and then write the report Statistician s hope: I won t have to change the analysis after I have finalized the report unfortunately, this is seldom the case PSDM Event: Open Source Software in Clinical Research June 19th, 2012 11/28

Reporting Results in R : An Alternative Embed the analysis into the report end up with only the report and data files Advantages reproducible reports dynamic reports Example: The client asks you to redo the whole analysis excluding some patients you just subset your original data and the report gets automatically updated!! PSDM Event: Open Source Software in Clinical Research June 19th, 2012 12/28

Dynamic Reports with Sweave What is Sweave Sweave is a tool that allows to embed the output of R code in L A TEX documents How it works your report file will contain both documentation parts (written in L A TEX) and code parts (written in R) the code is evaluated in R the results as plain output or tables and/or figures are embedded into a final.tex file you can then run pdflatex or latex to produce a pdf of your report PSDM Event: Open Source Software in Clinical Research June 19th, 2012 13/28

Dynamic Reports with Sweave (cont d) Requirements: if you know how to use R and L A TEX * no need to learn something new * Sweave ships directly with R * it is relatively straightforward to start using it if you do not know L A TEX * alternative: odfweave (open document format) PSDM Event: Open Source Software in Clinical Research June 19th, 2012 14/28

Dynamic Reports with Sweave (cont d) Assuming basic R and L A TEX knowledge How does it work write your L A TEXfile as usual, but with extension.rnw instead of.tex e.g., myfile.rnw the file will also contain R code segments suitably separated from L A TEX from R execute Sweave("...\myfile.Rnw") this will create myfile.tex run L A TEX to obtain your report PSDM Event: Open Source Software in Clinical Research June 19th, 2012 15/28

Dynamic Reports with Sweave (cont d) How do we combine the R and L A TEX source code using the Noweb syntax we separate between different segments (chunks) of source code, i.e., * << options >>= denotes the start of an R code chunk * @ denotes the start of a documentation L A TEX chunk Basic options for code chunks label: an optional name for the chunk useful for locating errors echo: if TRUE, the commands are included in the document fig: if TRUE, it includes the plot created in the code PSDM Event: Open Source Software in Clinical Research June 19th, 2012 16/28

Dynamic Reports with Sweave (cont d) Basic options for code chunks eval: if TRUE, the R code is evaluated results: * if hide, all output is completely suppressed * if tex, the output is taken to be already proper L A TEX markup and included as is * if verbatim, the output of R commands is included in a verbatim-like R output environment... (check?rweavelatex) PSDM Event: Open Source Software in Clinical Research June 19th, 2012 17/28

Dynamic Reports with Sweave (cont d) More info/material for Sweave available online: http://www.stat.uni-muenchen.de/~leisch/sweave/sweave-manual.pdf http://www.stat.uni-muenchen.de/~leisch/sweave/faq.html http://www.stat.umn.edu/~charlie/sweave/ http://www.biostat.jhsph.edu/~rpeng/enar2009/lecture-slides.pdf http://biostat.mc.vanderbilt.edu/wiki/pub/main/sweavelatex/fhsweave.pdf many more Google it PSDM Event: Open Source Software in Clinical Research June 19th, 2012 18/28

Getting Help in R Within R help.search("topic") or??"topic" (depends on the installed packages) RSiteSearch("topic") (requires internet connection) help() or? invoke the on-line help file for the specified function checking the FAQ On the internet R-help (https://stat.ethz.ch/mailman/listinfo/r-help mailing list) R-seek (http://www.rseek.org Google-like searched engine) R-wiki (http://rwiki.sciviews.org/doku.php) PSDM Event: Open Source Software in Clinical Research June 19th, 2012 19/28

Getting Help in R (cont d) On the internet CRAN Task Views (http://cran.r-project.org/web/views/ categorization of packages) Crantastic (http://crantastic.org/ categorization of packages + reviews) Equalis (http://www.equalis.com/forums/ R forum) R4stats (http://www.r4stats.com/) examples of basic R programs PSDM Event: Open Source Software in Clinical Research June 19th, 2012 20/28

Getting Help in R (cont d) Intro with applications in statistics Dalgaard, P. (2008) Introductory Statistics with R, 2nd Ed. New York: Springer-Verlag. (moderate) Venables, W. and Ripley, B. (2002) Modern Applied Statistics with S. New York: Springer-Verlag. (advanced) Programming Venables, W. and Ripley, B. (2000) S Programming. New York: Springer-Verlag. Chambers, J. (2008) Software for Data Analysis Programming with R. New York: Springer-Verlag. PSDM Event: Open Source Software in Clinical Research June 19th, 2012 21/28

Getting Help in R (cont d) Clinical research Peace, K. and Chen, D.-G. (2010) Clinical Trial Data Analysis Using R. Boca Raton: Chapman and Hall/CRC. More books that use R (or S) can be found at: http://www.r-project.org/doc/bib/r-books.html, or http://www.r-project.org/doc/bib/r-jabref.html PSDM Event: Open Source Software in Clinical Research June 19th, 2012 22/28

R For Clinical Research The Is R Validated? saga: There is the PERCEPTION that a certain three-lettered statistical analysis system is the Gold Standard and, worse, is perhaps the only one accepted by the FDA This is not TRUE! A key aspect of the CT regulatory framework is 21 CFR 11 with respect to digital signatures, audit trails, etc. Questions regarding the applicability of 21 CFR 11 to stand-alone statistical applications as opposed to databases that acquire, store and manage source electronic records PSDM Event: Open Source Software in Clinical Research June 19th, 2012 23/28

R For Clinical Research (cont d) Most decision makers want to see documentation of compliance with applicable aspects of the regulations Efforts to create a guidance document for R began in earnest at user! 2006 conference in Vienna Working Group began drafting a document with the goal of addressing key issues as they specifically pertain to R Marc Schwartz (Vice President, Biostatistics, MedNet Study Solutions) Frank Harrell (Chair at Dept. Biostatistics, Vanderbilt University School of Medicine) Tony Rossini (Group Head, Novartis Pharma AG) PSDM Event: Open Source Software in Clinical Research June 19th, 2012 24/28

R For Clinical Research Leverage existing information on development, version control, testing, maintenance, bug reporting/resolution, stable release cycles, updates, documentation, end user support, etc. received constructive criticism from multiple parties Document submitted to The R Foundation for approval on June 15, 2007 Notified of approval by The R Foundation on July 27, 2007 Available at: http://www.r-project.org/doc/r-fda.pdf PSDM Event: Open Source Software in Clinical Research June 19th, 2012 25/28

R For Clinical Research Covers explicitly listed packages from Base R and the Recommended Packages Does NOT cover other CRAN and non-cran R packages Qualification and Validation Specifically addresses 21 CFR 11.10 (a-i) and 11.30 functional requirements PSDM Event: Open Source Software in Clinical Research June 19th, 2012 26/28

R For Clinical Research Changing from the standard to R is possible for clinical research However, it will require time ( learning curve) Time is money! The relevant question is: How much money compared to the money payed annually for licences? PSDM Event: Open Source Software in Clinical Research June 19th, 2012 27/28

Thank you for your attention! PSDM Event: Open Source Software in Clinical Research June 19th, 2012 28/28