Introduction to R software for statistical computing



Similar documents
Importing Data into R

What is R? R s Advantages R s Disadvantages Installing and Maintaining R Ways of Running R An Example Program Where to Learn More

Time Series Analysis AMS 316

Using R for Windows and Macintosh

Installing R and the psych package

Biostatistics: Types of Data Analysis

R Language Fundamentals

R FOR SAS AND SPSS USERS. Bob Muenchen

Racial and Ethnic Differences in Health Insurance Coverage Among Adult Workers in Florida. Jacky LaGrace Mentor: Dr. Allyson Hall

An Introduction to the Use of R for Clinical Research

Additional sources Compilation of sources:

Dataframes. Lecture 8. Nicholas Christian BIOST 2094 Spring 2011

The Role of Insurance in Providing Access to Cardiac Care in Maryland. Samuel L. Brown, Ph.D. University of Baltimore College of Public Affairs

Jeff Schiff MD MBA Medical Director Minnesota Health Care Programs, DHS 23 April 2015

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups

UNSOM Health Policy Report

Interactive Applications for Modeling and Analysis with Shiny

Graphics in R. Biostatistics 615/815

Racial Disparities and Barrier to Statin Utilization in Patients with Diabetes in the U.S. School of Pharmacy Virginia Commonwealth University

By: Latarsha Chisholm, MSW, Ph.D. Department of Health Management & Informatics University of Central Florida

Software Solutions Appendix B. B.1 The R Software

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Reading and writing files

Cancer Data for South Florida: A Tool for Identifying Communities in Need

5 Correlation and Data Exploration

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Student Placement in Mathematics Courses by Demographic Group Background

Getting your data into R

Jay Weiss Institute for Health Equity Sylvester Comprehensive Cancer Center University of Miami. COMMUNITY PROFILE Liberty City, Florida

High-Frequency Data Modelling with R software

R data import and export

Disparities in Individuals Access and Use of Health IT in 2013

Tutorial 2: Reading and Manipulating Files Jason Pienaar and Tom Miller

TRAINING PROGRAM INFORMATICS

CS1112 Spring 2014 Project 4. Objectives. 3 Pixelation for Identity Protection. due Thursday, 3/27, at 11pm

Population Percent C.I. All Hennepin County adults aged 18 and older 11.9% ± 1.1

Multivariate Logistic Regression

College of Medicine Enrollment MD and MD/MPH Fall 2002 to Fall 2006

GETTING STARTED WITH R AND DATA ANALYSIS

Model Calibration with Open Source Software: R and Friends. Dr. Heiko Frings Mathematical Risk Consulting

IBM SPSS Statistics 20 Part 1: Descriptive Statistics

SPSS 12 Data Analysis Basics Linda E. Lucek, Ed.D

GeoLytics. User Guide for Business Demographics & Historical Business Demographics

RUTLAND COUNTY TREATMENT COURT

Demographics of Atlanta, Georgia:

Data Storage STAT 133. Gaston Sanchez. Department of Statistics, UC Berkeley

Logan City. Analysis of Impediments to Fair Housing

Machine Learning Logistic Regression

INTRODUCTION TO SURVEY DATA ANALYSIS THROUGH STATISTICAL PACKAGES

Linda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén Table Of Contents

Medical Care Costs for Diabetes Associated with Health Disparities Among Adults Enrolled in Medicaid in North Carolina

MASSACHUSETTS RESIDENTS CENTRAL MA. Acute Care Hospital Utilization Trends in Massachusetts FY

Lost in Translation: The use of in-person interpretation vs. telephone interpretation services in the clinic setting with Spanish speaking patients

R2MLwiN Using the multilevel modelling software package MLwiN from R

Education and Wage Differential by Race: Convergence or Divergence? *

Seton Medical Center Hepatocellular Carcinoma Patterns of Care Study Rate of Treatment with Chemoembolization N = 50

National Cancer Institute

Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees

Big data in R EPIC 2015

A Short Guide to R with RStudio

A Hybrid Modeling Platform to meet Basel II Requirements in Banking Jeffery Morrision, SunTrust Bank, Inc.

R: A Free Software Project in Statistical Computing

4. Are you satisfied with the outcome? Why or why not? Offer a solution and make a new graph (Figure 2).

Health Disparities in New Orleans

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

Package tagcloud. R topics documented: July 3, 2015

Access to Health Services

Top 5 Leading Causes of Death

Improved Interaction Interpretation: Application of the EFFECTPLOT statement and other useful features in PROC LOGISTIC

Hands-On Data Science with R Dealing with Big Data. Graham.Williams@togaware.com. 27th November 2014 DRAFT

Using Names To Check Accuracy of Race and Gender Coding in NAEP

Results from the 2010 National Survey on Drug Use and Health: Mental Health Findings

CITY OF MILWAUKEE POLICE SATISFACTION SURVEY

Notes on computer programmes for statistical analysis

Report to Congress. Improving the Identification of Health Care Disparities in. Medicaid and CHIP

Access Provided by your local institution at 02/06/13 5:22PM GMT

ADVANCED PRACTICE NURSES MEANINGFUL USE OF ELECTRONIC HEALTH RECORDS

Basics of using the R software

Section 4: Home Mortgage Disclosure Act (HMDA) Data Analysis

Transcription:

to R software for statistical computing JoAnn Rudd Alvarez, MA joann.alvarez@vanderbilt.edu biostat.mc.vanderbilt.edu/joannalvarez Department of Biostatistics Division of Cancer Biostatistics Center for Quantitative Sciences Vanderbilt University School of Medicine 2014 February 13 Epidemiology Community of Practice Tennessee Department of Health

Overview 1 Introduction 2 3 4 5

Introduction R Statistical programming language Software environment Developed by Ihaka and Gentleman, R Development Core Team Appeared in 1993 Decendent of S, influenced by C http://www.r-project.org

Introduction Capabilities of R Data management Analysis Computation comparable to MATLAB Graphics

Introduction Characteristics of R Command line Functions, objects Data stuctures: (rectangular) data frame, list, vector, matrix

Introduction Advantages of R Freely available Can easily write your own functions Intuitive to use Flexible Fast implementation of new methods Users have access to source code

Introduction Disadvantages of R Contributed packages should be used with caution Limited capacity for working with very large datasets. Data must fit into RAM

Demo to illustrate Basic variable assignment and some structures Reading in a text file Data manipulation: subsets, re-categorizing variables Calculate non-parametric survival estimates Examine the stucture of R objects Extract individual items from R objects

Installing R cran.r-project.org Windows intallation: cran.r-project.org/bin/windows/base/

Editing your R code You can use your favorite text editor/ide (integrated development environment) or use the one that is built-in. R s editor very basic R Studio More features like highlighting and matching brackets Easy to use Includes a built-in terminal Freely available at Textmate http://www.rstudio.com/ide/download/ Available only on Mac Even more features and funcitonality Kind of a hybrid between GUI and high-powered editors Freely available at http://macromates.com

Reading and writing files Functions read.table(), read.csv(), read.fwf() Do not have to specify details as in SAS Example: mydata <- read.table(file = "filename.csv", header = TRUE, sep = ",") Can read other types of files with foreign package

Installing and using packages All R functions and datasets are in packages. CRAN cran.r-project.org, bioconductor Install: install.packages( Hmisc ) Load: library(hmisc) Update: update.packages() See all currently loaded packages: SessionInfo()

Using the R documentation apropos() help()

Reproducible research Combines R code and L A TEXmarkup into one report document Enables raw output, tables, graphics, and report text to be dynamically updated

knitr Example report made in knitr Aim 1 analyses for microhematuria R03 Dan Barocas, PI Tatsuki Koyama and, Department of Biostatistics February 10, 2014 Contents 1 Written summary 1 2 Demographic and univariate association tables 3 3 Model results 11 4 Mechanism of gender disparity in cystoscopy use in microhematuria work up 13 5 Raw model output 15 5.1 Saw urologist........................................ 15 5.2 Procedure.......................................... 15 5.3 Imaging........................................... 16 5.4 Complete evaluation.................................... 17 5.5 Workup quality (ordinal)................................. 18 6 Supplementary material 20 6.1 Check of proportional odds assumption for workup intensity.............. 20 1 Written summary Research aim: Determine the association between race and receipt of a timely and complete evaluation of hematuria in a nation-wide 5% sample of Medicare patients. We hypothesize that African-Americans with hematuria receive sub-optimal workup of hematuria compared to Whites, and that controlling for socio-economic status will attenuate the racial differences. Data considerations: Data from areas other than the fifty US states or the District of Columbia were excluded. There were 21 such observations. Variable definitions: Patients were considered to have a cancer diagnosis if they had any of the following diagnoses: pca.cis, renal.ca.cis, bladder.ca.cis, other.ca.cis, prostate.benign, renal.benign, bladder.benign, other.benign. Race was categorized as black, white, Hispanic, Asian, or other. Patients whose race was listed as unknown in the data were treated as having missing information on race, rather than including them in the other category. Anticoagulant use indicates chronic use of anticoagulant medication. Income is the median household income in the patient s county. Region is a four-category variable indicating one whether the patient? provider? facility? was in the 1

knitr Example example.rnw example.tex \documentclass{article} \begin{document} <<echo=false>>= data <- rexp(100, 1/7) meandata <- mean(data) @ \noindent The mean of the data was \Sexpr{round(meandata, 2)}. <<echo=false>>= boxplot(data, boxwex = 0.5, border = "gray", las = 1, outline = FALSE, ylim = c(0, max(data))) stripchart(data, method = "jitter", pch = 19, vertical = TRUE, add = TRUE) @ \end{document} knit( example.rnw ) \documentclass{article} \usepackage{graphicx, color} \begin{document} \noindent The mean of the data was 7.12. \begin{knitrout} \definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\col \includegraphics[width=\maxwidth]{figure/unnamed-chunk \end{figure} \end{knitrout} \end{document}

Simple knitr Example 0 10 20 30 40 50 Figure 1: This is my figure caption The mean of the data was 7.12.

for learning R R resources Terri Scott s teaching materials http://biostat.mc. vanderbilt.edu/wiki/main/theresascott Robert Meunchen s R for SAS and SPSS Users http://r4stats.com/books/r4sas-spss/ Robert Meunchen s https://science.nature.nps.gov/im/datamgmt/ statistics/r/documents/r_for_sas_spss_users.pdf

for learning R More resources Nashville R Users Group http://www.meetup.com/nashville-r-users-group/ UCLA s website: http://www.ats.ucla.edu/stat/r/ R manuals http://cran.r-project.org/manuals.html http://cran.r-project.org/doc/manuals/r-patched/ R-intro.pdf Vanderbilt Department of Biostatistics wiki http://biostat.mc.vanderbilt.edu/wiki/main/rs

Contact information joann.alvarez@vanderbilt.edu http://biostat.mc.vanderbilt.edu/joannalvarez