1 Microarray Data Analysis Workshop MedVetNet Workshop, DTU 2008 Introduction to R and UNIX Working with microarray data in a multi-user environment Carsten Friis Media glna tnra GlnA TnrA C2 glnr C3 C5 C6 K GlnR C1 C4 C7
2 Introduction to UNIX Working in a multi-user environment
3 What do you need to know? This is not a course on computers But you will need some UNIX for the exercises, and for your final project You will also need to know some R to handle the exercises and for your final project And you will be expected to perform analysis in R during your evaluated projects
4 What is UNIX? UNIX is not just one operating system, but a collection of many different systems sharing a common interface It has evolved alongside the internet and is geared toward multi-user environments, and big multiprocessors like our servers At it s heart, UNIX is a text-based operating system that means a little typing is required UNIX Lives! (Particularly Linux and MAC OS X!)
5 UNIX comes with a help system It depends on the program in question: Either as a man-page manual page using the command man Example: man less Or through the -h option Example: acroread -h By the way, in the man-pages you scroll with k and j
6 Navigating Directories Key commands: ls #lists the files in the current directory cd dir #changes working directory to dir mkdir dir #makes a new directory called dir Nice tricks: The shorthand ll is short for ls l The asterisk * is a wildcard ls *.txt will list all files ending with.txt cd.. takes you back one directory plain cd (no arguments) takes you to your home directory
7 Basic File Handling Key commands: cp x y #creates a copy of file x named y mv x y #moves file x to file y. File x is erased rm x #removes (erases) file x. (Tip: use ls x first!) ln s x y #creates a symbolic link to x called y. Similar to shortcuts in Windows Nice tricks: A lone period. means current directory. This means you can copy a file into current dir like this cp elsewhere/file. The -r option to rm removes directories. e.g. rm r dir Many commands have options accessed using a - Read the man-pages to see all options
8 Handling text (i.e. data) files Key commands: wc #Size of file in lines, words and characters nedit #Text editor for manipulating text files less #Views text files cut #Extracts columns from tabular text files grep #Finds lines in files with specified keywords Use man or -h for more info on any of these...
9 UNIX: Moving files... In the ssh program under Windows, look for an icon resembling a yellow directory folder with blue dots on If you use putty, you need the separate psftp program, which is not that user-friendly
10 Can you see the papers between those spots?
11 Working with R (...and other hazardous activities)
12 Literature on R Documentation used in this course can be found on the course webpage, it includes: Beginner's Guide to UNIX These lecture notes (hopefully ) An Introduction to R Many good R manuals for further reading can be found on the web:
13 What is R? It began with S. S is a statistical tool developed back in the 70s R was introduced as a free implementation of S. The two are still quite similar R is freeware under the GNU license, and is developed by a large net of contributors
14 Why use R? (And not Excel?)
15 Paper in BMC Bioinformatics :80 Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics Barry R Zeeberg, Joseph Riss, David W Kane, Kimberly J Bussey, Edward Uchio, W Marston Linehan, J Carl Barrett and John N Weinstein Background: When processing microarray data sets, we recently noticed that some gene names were being changed inadvertently to non-gene names. Results: A little detective work traced the problem to default date format conversions and floating-point format conversions in the very useful Excel program package. The date conversions affect at least 30 gene names; the floating-point conversions affect at least 2,000 if Riken identifiers are included. These conversions are irreversible; the original gene names cannot be recovered. Conclusions: Users of Excel for analyses involving gene names should be aware of this problem, which can cause genes, including medically important ones, to be lost from view and which has contaminated even carefully curated public databases. We provide work-arounds and scripts for circumventing the problem.
16 LocusLink Screenshot (Zeeberg et al. 2004)
17 Why use R? (And not Excel?) R has specific functions for bioinformatics in general, and for microarrays in particular. R is available for (almost) all platforms e.g. Linux, MacOS, Win32/WinXP The R community is quite strong, and updates appear regularly Oh, and R happens to be freeware...
18 Starting with R Just type R on the command line Or R-2.6.1, to ensure you get a recent version How to get help: > help.start() #Opens browser > help() #For more on using help > help(..) #For help on.. > help.search(.. ) #To search for.. How to leave again: > q() #Image can be saved to.rdata
19 Basic R commands Most arithmetic operators work like you would expect in R: > #Prints 6 > 3 * 4 #Prints 12 Operators have precedence as known from basic algebra: > * 4 #Prints 9, while > (1 + 2) * 4 #Prints 12
20 Functions A function call in R looks like this: function_name(arguments) Examples: > cos(pi/3) #Prints 0.5 > exp(1) #Prints A function is identified in R by the parentheses That s why it s: help(), and not: help
21 Variables (Objects) in R To assign a value to a variable (object): > x <- 4 #Assigns 4 to x > x = 4 #Assigns 4 to x (new) > x #Prints 4 > y <- x + 2 #Assigns 6 to y Functions for managing variables: ls()or objects() lists all existing objects str(x) tells the structure (type) of object x rm(x) removes (deletes) the object x
22 Vectors in R A vector in R is like a sequence of elements of the same mode. > x <- 1:10 #Creates a vector > y <- c( a, b, c ) #So does this Handy functions for vectors: c() Concatenates arguments into a vector min() Returns the smallest value in vector max() Returns the largest value in vector mean() Returns the mean of the vector
23 R is prepared to handle objects with large amounts of data. All simple numerical objects in R function like a long string of numbers. In fact, even the simple: x <- 1, can be thought of like a vector with one element. The functions dim(x) and str(x) returns information on the dimensionality of x.
24 Important object types in R vector A series of numbers matrix Tables of numbers data.frame More powerful matrix (list of vectors) list Collections of other objects class Intelligent(?) lists, holds objects and functions
25 More on Vectors Elements in a vector can be accessed individually: > x #Prints first element > x[1:10] #Prints first 10 elements > x[c(1,3)] #Prints element 1 and 3 Most functions expect one vector as argument, rather than individual numbers > mean(1,2,3) #Replies 1 > mean(c(1,2,3)) #Replies 2
26 Conditional indexes and Booleans You can use conditionals as indexes! > x <- 1:10 #Data to play with > x > 5 #Prints Boolean vector > x[x > 5] #Elements greater than 5 You can use the which() function for even greater control, and the grep() function to search for complete or partial strings
27 The Recycling Rule The recycling rule is a key concept for vector algebra in R. When a vector is too short for a given operation, the elements are recycled and used again. Examples of vectors that are too short: > x <- c(1,2,3,4) > y <- c(1,2) #y is too short > x + y #Returns 2,4,4,6
28 Data Matrices Matrices are created with the matrix() function. > m <- matrix(1:12,nrow=3) This produces something like this: [,1] [,2] [,3] [,4] [1,] [2,] [3,]
29 Matrices also recycle The recycling rule still applies: > m <- matrix(c(2,5),nrow=3,ncol=3) Gives the following matrix: [,1] [,2] [,3] [1,] [2,] [3,] 2 5 2
30 Indexing Matrices For vectors we could specify one index vector like this: > x <- c(2,0,1,5) > x[c(1,3)] #Returns 2 and 1 For matrices we have to specify two vectors: > m <- matrix(1:3,nrow=3,ncol=3) > m[c(1,3),c(1,3)] #Ret. 2*2 matrix > m[1,] #First row as vector
31 The apply function The apply() function applies another function along a specified dimension. apply(m,1,sum) #Sum of rows apply(m,2,sum) #Sum of columns apply(m,1,mean) #Mean of rows Apply has cousins: Like lapply() which works for list objects.
32 Beyond two Dimensions You can actually assign to dim(): > x <- 1:12 > dim(x) #Returns NULL > dim(x) <- c(3,4) #3*4 Matrix > dim(x) #Returns 3 4 > dim(x) <- c(2,3,2) #x is now in 3d > dim(x) #Returns But functions like mean() still work: > mean(x) #Returns 6.5
33 List Objects A list is like a vector, but it can store anything, even other objects! Lists are created with the list() function: > l1 <- list(1:10, c( a, b )) > l1[] #Returns a b > l2 <- list(a=1:10, b=c( a, b )) > l2$b #Returns a b
34 Data.frame Objects Data frames work like a list of vectors of identical length. Data frames are created like this: f <- data.frame(cbind(x=1,y=1:5)) The demonstration dataset USArrests is a data.frame. > data(usarrests) #Activate dataset > str(usarrests) #View the structure
35 Data frames work like Matrices (But lists do NOT!) Standard algebra, add to each element: > USArrests +100 Indexing data frames: > USArrests[2,] #Stats for Alaska Average arrests across the States: > apply(usarrests,2,mean)
36 Graphics and Visualization Visualization is one of R s strong points. R has many functions for drawing graphs, including: hist(x) Draws a histogram of values in x plot(x,y) Draws a basic xy plot of x against y Adding stuff to plots points(x,y) Add point (x,y) to existing graph. lines(x,y) Connect points with line. text(x,y,str) Writes string at (x,y).
37 Graphical Devices in R A graphical device is what displays the graph. It can be a window, it can be the printer. Functions for plotting Devices : X11() This function allows you to change the size and composition of the plotting window. par(mfrow=c(x,y)) Splits a plotting device into x rows and y columns. dev.copy2eps(file=???.ps ) Use this function to copy the active device to a file.
38 Exercises in R To warm you up, open the Basic R Commands exercise on the course webpage When finished, feel free to play with some more demos type demo() to see what s available [Optional] Proceed with the extra exercise: These exercises are hard! (that s why they are optional)
Pearson Inform v4.0 Educators Guide Part Number 606 000 508 A Educators Guide v4.0 Pearson Inform First Edition (August 2005) Second Edition (September 2006) This edition applies to Release 4.0 of Inform
Hot Potatoes version 6 Half-Baked Software Inc., 1998-2009 p1 Table of Contents Contents 4 General introduction and help 4 What do these programs do? 4 Conditions for using Hot Potatoes 4 Notes for upgraders
Programming from the Ground Up Jonathan Bartlett Edited by Dominick Bruno, Jr. Programming from the Ground Up by Jonathan Bartlett Edited by Dominick Bruno, Jr. Copyright 2003 by Jonathan Bartlett Permission
Part 1: Jumping into C++... 2 Chapter 1: Introduction and Developer Environment Setup... 4 Chapter 2: The Basics of C++... 35 Chapter 3: User Interaction and Saving Information with Variables... 43 Chapter
User Guide Version 5.1 Copyright 2010 Pearson Education, Inc. or its affiliate(s). All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic
STUDENT S BOOK 5 th module Using Databases OpenOffice.org Base This work is licensed under a Creative Commons Attribution- ShareAlike 3.0 Unported License. http://creativecommons.org/license s/by-sa/3.0
Power Editor Guide February 29, 2012 Why should I use Power Editor?...2 Get started in 5 easy steps...3 Power Editor buttons and features.................................... 4 Navigate within Power Editor....6
Guide for Texas Instruments TI-83, TI-83 Plus, or TI-84 Plus Graphing Calculator This Guide is designed to offer step-by-step instruction for using your TI-83, TI-83 Plus, or TI-84 Plus graphing calculator
DOCUMENT OVERVIEW Learn specific ways to optimize your RightNow knowledge base to help your customers find the information they need. Review these best practices, including tips on configuration settings,
Quantity One User Guide for Version 4.2.1 Windows and Macintosh P/N 4000126-10 RevA Quantity One User Guide Bio-Rad Technical Service Department Phone: (800) 424-6723, option 2, option 3 (510) 741-6576
Aras Corporation 2005 Aras Corporation. All rights reserved Notice of Rights All rights reserved. Aras Corporation (Aras) owns this document. No part of this document may be reproduced or transmitted in
Chapter 2 Getting Data into R In the following chapter we address entering data into R and organising it as scalars (single values), vectors, matrices, data frames, or lists. We also demonstrate importing
me What s new in 3.2? 1 Introduction... 3 New features in 3.2... 3 New text editor... 3 Useful features in the editor... 4 Messages... 5 Actions... 6 Message filters... 7 Drafts folder... 7 Favourites...
Why Johnny Can t Encrypt: A Usability Evaluation of PGP 5.0 Alma Whitten School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 email@example.com J. D. Tygar 1 EECS and SIMS University
QUANTITATIVE ECONOMICS with Julia Thomas Sargent and John Stachurski April 28, 2015 2 CONTENTS 1 Programming in Julia 7 1.1 Setting up Your Julia Environment............................. 7 1.2 An Introductory
Pearson 2012 Inform How to open, edit and print reports; create assessments and upload data; and access student progress and SRBI/RTI reports How to use Inform This manual is written to give Inform users
HOW TO: A Guide to Using Qualtrics Research Suite 2 How this ebook can Benefit You Welcome to Qualtrics Research Suite! We are committed to your success, and want you to get the most out of your experience
How to Make Your Word 2010 Documents 508-Compliant Centers for Medicare & Medicaid Services Making 508 Easy Continuous Improvement Initiative July 2, 2014 This page intentionally left blank. How to Make
In Security and Usability: Designing Secure Systems that People Can Use, eds. L. Cranor and G. Simson. O'Reilly, 2005, pp. 679-702 CHAPTER THIRTY- FOUR Why Johnny Can t Encrypt A Usability Evaluation of
Equation Editor and MathType: Tips to Make Your Life Easier Presented by: Bob Mathews Director of Training Design Science, Inc. E-mail: firstname.lastname@example.org Welcomee to Equation Editor and MathType: Tips to
Getting Started with New Relic: A Newbie s Table of Contents INTRODUCTION: Hello There, Newbie CHAPTER 1: Application Monitoring Overview CHAPTER 2: Real User Monitoring (RUM) CHAPTER 3: Transaction Traces
1 Whitebox Geospatial Analysis Tools Tutorial Series Tutorial 1: Using Whitebox GAT 2 Tutorial version 1.0, March, 2010 Written by John Lindsay, Whitebox Geospatial Analysis Tools Project Lead Developer,
Ready Reference Guide Customizing an Output Style September 2008 Customizing an Output Style Table of Contents Overview Page 3 Getting Started Page 4 Modifying an Output Style Page 6 Sharing a Style With
How to Convert Outlook Email Folder Into a Single PDF Document An introduction to converting emails with AutoPortfolio plug-in for Adobe Acrobat Table of Contents What Software Do I Need?... 2 Converting
Camera Base Version 1.6.1 User Guide Mathias Tobler October 2014 Table of contents 1 Introduction... 3 2 License... 4 3 Overview... 5 3.1 Data Management... 5 3.2 Analysis and outputs... 5 4 Installation...
How To Use A Spreadsheet Excel for the Mac and PC-Windows by John D. Winter Most good spreadsheets have very similar capabilities, but the syntax of the commands differs slightly. I will use the keyboard
Squeak by Example Andrew P. Black Stéphane Ducasse Oscar Nierstrasz Damien Pollet with Damien Cassou and Marcus Denker Version of 2009-09-29 ii This book is available as a free download from SqueakByExample.org,