1 Microarray Data Analysis Workshop MedVetNet Workshop, DTU 2008 Introduction to R and UNIX Working with microarray data in a multi-user environment Carsten Friis Media glna tnra GlnA TnrA C2 glnr C3 C5 C6 K GlnR C1 C4 C7
2 Introduction to UNIX Working in a multi-user environment
3 What do you need to know? This is not a course on computers But you will need some UNIX for the exercises, and for your final project You will also need to know some R to handle the exercises and for your final project And you will be expected to perform analysis in R during your evaluated projects
4 What is UNIX? UNIX is not just one operating system, but a collection of many different systems sharing a common interface It has evolved alongside the internet and is geared toward multi-user environments, and big multiprocessors like our servers At it s heart, UNIX is a text-based operating system that means a little typing is required UNIX Lives! (Particularly Linux and MAC OS X!)
5 UNIX comes with a help system It depends on the program in question: Either as a man-page manual page using the command man Example: man less Or through the -h option Example: acroread -h By the way, in the man-pages you scroll with k and j
6 Navigating Directories Key commands: ls #lists the files in the current directory cd dir #changes working directory to dir mkdir dir #makes a new directory called dir Nice tricks: The shorthand ll is short for ls l The asterisk * is a wildcard ls *.txt will list all files ending with.txt cd.. takes you back one directory plain cd (no arguments) takes you to your home directory
7 Basic File Handling Key commands: cp x y #creates a copy of file x named y mv x y #moves file x to file y. File x is erased rm x #removes (erases) file x. (Tip: use ls x first!) ln s x y #creates a symbolic link to x called y. Similar to shortcuts in Windows Nice tricks: A lone period. means current directory. This means you can copy a file into current dir like this cp elsewhere/file. The -r option to rm removes directories. e.g. rm r dir Many commands have options accessed using a - Read the man-pages to see all options
8 Handling text (i.e. data) files Key commands: wc #Size of file in lines, words and characters nedit #Text editor for manipulating text files less #Views text files cut #Extracts columns from tabular text files grep #Finds lines in files with specified keywords Use man or -h for more info on any of these...
9 UNIX: Moving files... In the ssh program under Windows, look for an icon resembling a yellow directory folder with blue dots on If you use putty, you need the separate psftp program, which is not that user-friendly
10 Can you see the papers between those spots?
11 Working with R (...and other hazardous activities)
12 Literature on R Documentation used in this course can be found on the course webpage, it includes: Beginner's Guide to UNIX These lecture notes (hopefully ) An Introduction to R Many good R manuals for further reading can be found on the web:
13 What is R? It began with S. S is a statistical tool developed back in the 70s R was introduced as a free implementation of S. The two are still quite similar R is freeware under the GNU license, and is developed by a large net of contributors
14 Why use R? (And not Excel?)
15 Paper in BMC Bioinformatics :80 Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics Barry R Zeeberg, Joseph Riss, David W Kane, Kimberly J Bussey, Edward Uchio, W Marston Linehan, J Carl Barrett and John N Weinstein Background: When processing microarray data sets, we recently noticed that some gene names were being changed inadvertently to non-gene names. Results: A little detective work traced the problem to default date format conversions and floating-point format conversions in the very useful Excel program package. The date conversions affect at least 30 gene names; the floating-point conversions affect at least 2,000 if Riken identifiers are included. These conversions are irreversible; the original gene names cannot be recovered. Conclusions: Users of Excel for analyses involving gene names should be aware of this problem, which can cause genes, including medically important ones, to be lost from view and which has contaminated even carefully curated public databases. We provide work-arounds and scripts for circumventing the problem.
16 LocusLink Screenshot (Zeeberg et al. 2004)
17 Why use R? (And not Excel?) R has specific functions for bioinformatics in general, and for microarrays in particular. R is available for (almost) all platforms e.g. Linux, MacOS, Win32/WinXP The R community is quite strong, and updates appear regularly Oh, and R happens to be freeware...
18 Starting with R Just type R on the command line Or R-2.6.1, to ensure you get a recent version How to get help: > help.start() #Opens browser > help() #For more on using help > help(..) #For help on.. > help.search(.. ) #To search for.. How to leave again: > q() #Image can be saved to.rdata
19 Basic R commands Most arithmetic operators work like you would expect in R: > #Prints 6 > 3 * 4 #Prints 12 Operators have precedence as known from basic algebra: > * 4 #Prints 9, while > (1 + 2) * 4 #Prints 12
20 Functions A function call in R looks like this: function_name(arguments) Examples: > cos(pi/3) #Prints 0.5 > exp(1) #Prints A function is identified in R by the parentheses That s why it s: help(), and not: help
21 Variables (Objects) in R To assign a value to a variable (object): > x <- 4 #Assigns 4 to x > x = 4 #Assigns 4 to x (new) > x #Prints 4 > y <- x + 2 #Assigns 6 to y Functions for managing variables: ls()or objects() lists all existing objects str(x) tells the structure (type) of object x rm(x) removes (deletes) the object x
22 Vectors in R A vector in R is like a sequence of elements of the same mode. > x <- 1:10 #Creates a vector > y <- c( a, b, c ) #So does this Handy functions for vectors: c() Concatenates arguments into a vector min() Returns the smallest value in vector max() Returns the largest value in vector mean() Returns the mean of the vector
23 R is prepared to handle objects with large amounts of data. All simple numerical objects in R function like a long string of numbers. In fact, even the simple: x <- 1, can be thought of like a vector with one element. The functions dim(x) and str(x) returns information on the dimensionality of x.
24 Important object types in R vector A series of numbers matrix Tables of numbers data.frame More powerful matrix (list of vectors) list Collections of other objects class Intelligent(?) lists, holds objects and functions
25 More on Vectors Elements in a vector can be accessed individually: > x #Prints first element > x[1:10] #Prints first 10 elements > x[c(1,3)] #Prints element 1 and 3 Most functions expect one vector as argument, rather than individual numbers > mean(1,2,3) #Replies 1 > mean(c(1,2,3)) #Replies 2
26 Conditional indexes and Booleans You can use conditionals as indexes! > x <- 1:10 #Data to play with > x > 5 #Prints Boolean vector > x[x > 5] #Elements greater than 5 You can use the which() function for even greater control, and the grep() function to search for complete or partial strings
27 The Recycling Rule The recycling rule is a key concept for vector algebra in R. When a vector is too short for a given operation, the elements are recycled and used again. Examples of vectors that are too short: > x <- c(1,2,3,4) > y <- c(1,2) #y is too short > x + y #Returns 2,4,4,6
28 Data Matrices Matrices are created with the matrix() function. > m <- matrix(1:12,nrow=3) This produces something like this: [,1] [,2] [,3] [,4] [1,] [2,] [3,]
29 Matrices also recycle The recycling rule still applies: > m <- matrix(c(2,5),nrow=3,ncol=3) Gives the following matrix: [,1] [,2] [,3] [1,] [2,] [3,] 2 5 2
30 Indexing Matrices For vectors we could specify one index vector like this: > x <- c(2,0,1,5) > x[c(1,3)] #Returns 2 and 1 For matrices we have to specify two vectors: > m <- matrix(1:3,nrow=3,ncol=3) > m[c(1,3),c(1,3)] #Ret. 2*2 matrix > m[1,] #First row as vector
31 The apply function The apply() function applies another function along a specified dimension. apply(m,1,sum) #Sum of rows apply(m,2,sum) #Sum of columns apply(m,1,mean) #Mean of rows Apply has cousins: Like lapply() which works for list objects.
32 Beyond two Dimensions You can actually assign to dim(): > x <- 1:12 > dim(x) #Returns NULL > dim(x) <- c(3,4) #3*4 Matrix > dim(x) #Returns 3 4 > dim(x) <- c(2,3,2) #x is now in 3d > dim(x) #Returns But functions like mean() still work: > mean(x) #Returns 6.5
33 List Objects A list is like a vector, but it can store anything, even other objects! Lists are created with the list() function: > l1 <- list(1:10, c( a, b )) > l1[] #Returns a b > l2 <- list(a=1:10, b=c( a, b )) > l2$b #Returns a b
34 Data.frame Objects Data frames work like a list of vectors of identical length. Data frames are created like this: f <- data.frame(cbind(x=1,y=1:5)) The demonstration dataset USArrests is a data.frame. > data(usarrests) #Activate dataset > str(usarrests) #View the structure
35 Data frames work like Matrices (But lists do NOT!) Standard algebra, add to each element: > USArrests +100 Indexing data frames: > USArrests[2,] #Stats for Alaska Average arrests across the States: > apply(usarrests,2,mean)
36 Graphics and Visualization Visualization is one of R s strong points. R has many functions for drawing graphs, including: hist(x) Draws a histogram of values in x plot(x,y) Draws a basic xy plot of x against y Adding stuff to plots points(x,y) Add point (x,y) to existing graph. lines(x,y) Connect points with line. text(x,y,str) Writes string at (x,y).
37 Graphical Devices in R A graphical device is what displays the graph. It can be a window, it can be the printer. Functions for plotting Devices : X11() This function allows you to change the size and composition of the plotting window. par(mfrow=c(x,y)) Splits a plotting device into x rows and y columns. dev.copy2eps(file=???.ps ) Use this function to copy the active device to a file.
38 Exercises in R To warm you up, open the Basic R Commands exercise on the course webpage When finished, feel free to play with some more demos type demo() to see what s available [Optional] Proceed with the extra exercise: These exercises are hard! (that s why they are optional)
Command Line - Part 1 STAT 133 Gaston Sanchez Department of Statistics, UC Berkeley gastonsanchez.com github.com/gastonstat Course web: gastonsanchez.com/teaching/stat133 GUIs 2 Graphical User Interfaces
Introduction to Mac OS X The Mac OS X operating system both a graphical user interface and a command line interface. We will see how to use both to our advantage. Using DOCK The dock on Mac OS X is the
Tutorial Guide to the IS Unix Service The aim of this guide is to help people to start using the facilities available on the Unix and Linux servers managed by Information Services. It refers in particular
CSIL MiniCourses Introduction To Unix (I) John Lekberg Sean Hogan Cannon Matthews Graham Smith Updated on: 2015-10-14 What s a Unix? 2 Now what? 2 Your Home Directory and Other Things 2 Making a New Directory
Command-Line Operations : The Shell Don't fear the command line... Shell Graphical User Interface (GUI) Graphical User Interface : displays to interact with the computer - Open and manipulate files and
Lab 1 Beginning C Program Overview This lab covers the basics of compiling a basic C application program from a command line. Basic functions including printf() and scanf() are used. Simple command line
INASP: Effective Network Management Workshops Linux Familiarization and Commands (Exercises) Based on the materials developed by NSRC for AfNOG 2013, and reused with thanks. Adapted for the INASP Network
Linux Overview Local facilities Linux commands The vi (gvim) editor MobiLan This system consists of a number of laptop computers (Windows) connected to a wireless Local Area Network. You need to be careful
SQL databases An introduction AMP: Apache, mysql, PHP This installations installs the Apache webserver, the PHP scripting language, and the mysql database on your computer: Apache: runs in the background
CS61B, Fall 2009 Simple UNIX Commands P. N. Hilfinger 1 Basic commands This section describes a list of commonly used commands that are available on the EECS UNIX systems. Most commands are executed by
Getting Started with R and RStudio 1 1 What is R? R is a system for statistical computation and graphics. It is the statistical system that is used in Mathematics 241, Engineering Statistics, for the following
Linux command line An introduction to the Linux command line for genomics Susan Fairley Aims Introduce the command line Provide an awareness of basic functionality Illustrate with some examples Provide
A Crash Course on UNIX UNIX is an "operating system". Interface between user and data stored on computer. A Windows-style interface is not required. Many flavors of UNIX (and windows interfaces). Solaris,
R: A self-learn tutorial 1 Introduction R is a software language for carrying out complicated (and simple) statistical analyses. It includes routines for data summary and exploration, graphical presentation
Tutorial 0A Programming on the command line Operating systems User Software Program 1 Program 2 Program n Operating System Hardware CPU Memory Disk Screen Keyboard Mouse 2 Operating systems Microsoft Apple
Module 9: Operating Systems Objective What is an operating system (OS)? OS kernel, and basic functions OS Examples: MS-DOS, MS Windows, Mac OS Unix/Linux Features of modern OS Graphical operating system
A Quick Introduction to R Guy Lebanon January 18, 2011 1 Overview R is a relatively new programming language aimed at computational data analysis, statistical modeling, and data visualization. It is similar
1 Introduction In this lab you will login to your Linux VM and write your first C/C++ program, compile it, and then execute it. 2 What you will learn In this lab you will learn the basic commands and navigation
Getting Started Guide Cloud Server powered by Mac OS X Getting Started Guide Page 1 Getting Started Guide: Cloud Server powered by Mac OS X Version 1.0 (02.16.10) Copyright 2010 GoDaddy.com Software, Inc.
Chapter 2 Text Processing with the Command Line Interface Abstract This chapter aims to help demystify the command line interface that is commonly used in UNIX and UNIX-like systems such as Linux and Mac
Christopher Lum firstname.lastname@example.org Introduction Beginner s Matlab Tutorial This document is designed to act as a tutorial for an individual who has had no prior experience with Matlab. For any questions
MPATE-GE 2618: C Programming for Music Technology Unit 1.1 What is an algorithm? An algorithm is a precise, unambiguous procedure for producing certain results (outputs) from given data (inputs). It is
1 Using SAS outside of ITCs Statistical Methods and Computing, 22S:30/105 Instructor: Cowles Lab 1 Jan 31, 2014 You can access SAS from off campus by using the ITC Virtual Desktop Go to https://virtualdesktopuiowaedu
PuTTY/Cygwin Tutorial By Ben Meister Written for CS 23, Winter 2007 This tutorial will show you how to set up and use PuTTY to connect to CS Department computers using SSH, and how to install and use the
CPSC2800: Linux Hands-on Lab #3 Explore Linux file system and file security Project 3-1 Linux support many different file systems that can be mounted using the mount command. In this project, you use the
Basic C Shell email@example.com 11th August 2003 This is a very brief guide to how to use cshell to speed up your use of Unix commands. Googling C Shell Tutorial can lead you to more detailed information.
OVERVIEW OF R SOFTWARE AND PRACTICAL EXERCISE Hukum Chandra Indian Agricultural Statistics Research Institute, New Delhi-110012 1. INTRODUCTION R is a free software environment for statistical computing
Using the Radmind Command Line Tools to Maintain Multiple Mac OS X Machines Version 0.8.1 This document describes how to install, configure and use the radmind client and server tools to maintain a small
Introduction to RStudio (v 1.3) Oscar Torres-Reyna firstname.lastname@example.org August 2013 http://dss.princeton.edu/training/ Introduction RStudio allows the user to run R in a more user-friendly environment.
CLC Server Command Line Tools USER MANUAL Manual for CLC Server Command Line Tools 2.5 Windows, Mac OS X and Linux September 4, 2015 This software is for research purposes only. QIAGEN Aarhus A/S Silkeborgvej
Linux Tutorial 1 Prepared by Mohamed Elmahdy 1 This Linux tutorial is a modified version of the tutorial A beginners guide to the Unix and Linux operating system by M.Stonebank. (surrey.ac.uk) under Creative
Computing Service G72 File Transfer Using SCP, SFTP or FTP many leaflets can be found at: http://www.cam.ac.uk/cs/docs 50 pence April 2003 Contents Introduction Servers and clients... 2 Comparison of FTP,
Enrique Soriano, Gorka Guardiola Laboratorio de Sistemas, Grupo de Sistemas y Comunicaciones, URJC 29 de septiembre de 2011 (cc) 2010 Grupo de Sistemas y Comunicaciones. Some rights reserved. This work
H-ITT CRS V2 Quick Start Guide Revision E Congratulations on acquiring what may come to be one of the most important technology tools in your classroom! The H-ITT Classroom Response System is quite easy
NaviCell Data Visualization Python API Tutorial - Version 1.0 The NaviCell Data Visualization Python API is a Python module that let computational biologists write programs to interact with the molecular
ACF Supercomputer Access Instructions 1 Instructions for Accessing the Advanced Computing Facility Supercomputing Cluster at the University of Kansas ACF Supercomputer Access Instructions 2 Contents Instructions
Interactive Data Language Guide Chris Lowder Summer 2014 Contents 1 WebGDL 1 1.1 Web access.............................. 1 1.2 Command layout........................... 1 1.3 Persistence of files..........................
C H A P T E R 34 Designing and Implementing Forms 34 You can add forms to your site to collect information from site visitors; for example, to survey potential customers, conduct credit-card transactions,
Debian/GNU Linux Introduction II. Károly Erdei November 21, 2009 Károly Erdei Debian/GNU Linux 1/45 1 File management 2 Editing 3 X Window 4 KDE Károly Erdei Debian/GNU Linux 2/45 Agenda 1 File management
SeattleSNPs Interactive Tutorial: Web Tools for Site Selection, Linkage Disequilibrium and Haplotype Analysis Goal: This tutorial introduces several websites and tools useful for determining linkage disequilibrium
Metadata Import Plugin User manual User manual for Metadata Import Plugin 1.0 Windows, Mac OS X and Linux August 30, 2013 This software is for research purposes only. CLC bio Silkeborgvej 2 Prismet DK-8000
Introduction to Operating Systems It is important that you familiarize yourself with Windows and Linux in preparation for this course. The exercises in this book assume a basic knowledge of both of these
Department of Veterans Affairs VistA Integration Adapter Release 22.214.171.124 Enhancement Manual Version 1.1 September 2014 Revision History Date Version Description Author 09/28/2014 1.0 Updates associated
Attix5 Pro Server Edition V7.0.3 User Manual for Linux and Unix operating systems Your guide to protecting data with Attix5 Pro Server Edition. Copyright notice and proprietary information All rights reserved.
Introduction to UNIX and SFTP Introduction to UNIX 1. What is it? 2. Philosophy and issues 3. Using UNIX 4. Files & folder structure 1. What is UNIX? UNIX is an Operating System (OS) All computers require
October 6, 2015 1 Introduction The laboratory exercises in this course are to be conducted in an environment that might not be familiar to many of you. It is based on open source software. We use an open
Wakanda Studio Features Discover the many features in Wakanda Studio. The main features each have their own chapters and other features are documented elsewhere: Wakanda Server Administration Data Browser
GeoGebra Statistics and Probability Project Maths Development Team 2013 www.projectmaths.ie Page 1 of 24 Index Activity Topic Page 1 Introduction GeoGebra Statistics 3 2 To calculate the Sum, Mean, Count,
Graphics in R Biostatistics 615/815 Last Lecture Introduction to R Programming Controlling Loops Defining your own functions Today Introduction to Graphics in R Examples of commonly used graphics functions
Engineering Problem Solving and Excel EGN 1006 Introduction to Engineering Mathematical Solution Procedures Commonly Used in Engineering Analysis Data Analysis Techniques (Statistics) Curve Fitting techniques
Command Line Crash Course For Unix Controlling Your Computer From The Terminal Zed A. Shaw December 2011 Introduction How To Use This Course You cannot learn to do this from videos alone. You can learn
DataPA OpenAnalytics End User Training DataPA End User Training Lesson 1 Course Overview DataPA Chapter 1 Course Overview Introduction This course covers the skills required to use DataPA OpenAnalytics
Topics: LECTURE-7 Introduction to DOS. Introduction to UNIX/LINUX OS. Introduction to Windows. BASIC INTRODUCTION TO DOS OPERATING SYSTEM DISK OPERATING SYSTEM (DOS) In the 1980s or early 1990s, the operating
Programming Exercise 3: Multi-class Classification and Neural Networks Machine Learning November 4, 2011 Introduction In this exercise, you will implement one-vs-all logistic regression and neural networks
Microsoft Consulting Services PerformancePoint Services for Project Server 2010 Author: Emmanuel Fadullon, Delivery Architect Microsoft Consulting Services August 2011 Information in the document, including
SETTING UP RASPBERRY PI FOR TOPPY FTP ACCESS (Draft 5) 1 INTRODUCTION These notes describe how I set up my Raspberry Pi to allow FTP connection to a Toppy. Text in blue indicates Linux commands or file
Kerby Shedden October, 2007 Overview of R R R is a programming language for statistical computing, data analysis, and graphics. It is a re-implementation of the S language, which was developed in the 1980
Personal Portfolios on Blackboard This handout has four parts: 1. Creating Personal Portfolios p. 2-11 2. Creating Personal Artifacts p. 12-17 3. Sharing Personal Portfolios p. 18-22 4. Downloading Personal
The Einstein Depot server Have you ever needed a way to transfer large files to colleagues? Or allow a colleague to send large files to you? Do you need to transfer files that are too big to be sent as
A Brief Introduction to MySQL by Derek Schuurman Introduction to Databases A database is a structured collection of logically related data. One common type of database is the relational database, a term
PROGRAMMING FOR BIOLOGISTS BIOL 6297 Monday, Wednesday 10 am -12 pm Tomorrow is Ada Lovelace Day Ada Lovelace was the first person to write a computer program Today s Lecture Overview of the course Philosophy
BioSense 2.0 User Community Extension Project Getting Started With The Data Lockers Information Contributed by: Harold Gil, County of San Diego, Public Health Services Edward Castagna, Maine Centers for
Witango Application Server 6 Installation Guide for OS X January 2011 Tronics Software LLC 503 Mountain Ave. Gillette, NJ 07933 USA Telephone: (570) 647 4370 Email: email@example.com Web: www.witango.com
Linux Overview Written by: Josh Lowery The Senator Patrick Leahy Center for Digital Investigation Champlain College October 29, 2012 Disclaimer: This document contains information based on research that
How to Download Census Data from American Factfinder and Display it in ArcMap Factfinder provides census and ACS (American Community Survey) data that can be downloaded in a tabular format and joined with
LESSON 3: ADDING IMAGE MAPS, ANIMATION, AND FORMS CREATING AN IMAGE MAP OBJECTIVES By the end of this part of the lesson you will: understand how image maps can enhance a design and make a site more interactive
Performing Database and File System Backups and Restores Using Oracle Secure Backup Purpose This lesson introduces you to Oracle Secure Backup which enables you to perform database and file system backups
WinSCP PuTTY as an alternative to F-Secure July 11, 2006 Brief Summary of this Document F-Secure SSH Client 5.4 Build 34 is currently the Berkeley Lab s standard SSH client. It consists of three integrated
Using little green plane is a breeze. If you have signed up for a free trial and want some guidance on the next steps, check out our Quick Start Guide. This gives you everything you need to create stunning
Privilege Manager for Unix How To Administer Event and Keystroke Logs Event and Keystroke Logs Quest One Privilege Manager for UNIX can record two different types of log information for audit purposes.
Mapping ITS s File Server Folder to Mosaic Windows to Publish a Website April 16 2012 The following instructions are to show you how to map your Home drive using ITS s Network in order to publish a website
Prepare the environment Practical Part 1.1 The first exercise should get you comfortable with the computer environment. I am going to assume that either you have some minimal experience with command line
ROCHESTER INSTITUTE OF TECHNOLOGY Rochester, New York COLLEGE of SCIENCE Department of IMAGING SCIENCE REVISED COURSE: 1051-211 1.0 TITLE: COMPUTING FOR IMAGING SCIENCE DATE: 27 February 2003 CREDIT HOURS:
Getting Started Guide Chapter 12 Creating Web Pages Saving Documents as HTML Files Copyright This document is Copyright 2010 2012 by its contributors as listed below. You may distribute it and/or modify
Oracle University Contact Us: 1.800.529.0165 Oracle Database 10g: Introduction to SQL Duration: 5 Days What you will learn This course offers students an introduction to Oracle Database 10g database technology.
ICS 351: Today's plan routing protocols linux commands Routing protocols: overview maintaining the routing tables is very labor-intensive if done manually so routing tables are maintained automatically:
Demonstration Red text = commands entered in the command window Black text = Matlab responses Blue text = comments 2+2 Just type and press enter and the answer comes up 4 sin(4)^2.5728 The elementary functions
Chapter 4. Spreadsheets We ve discussed rather briefly the use of computer algebra in 3.5. The approach of relying on www.wolframalpha.com is a poor subsititute for a fullfeatured computer algebra program
BID2WIN Workshop Advanced Report Writing Please Note: Please feel free to take this workbook home with you! Electronic copies of all lab documentation are available for download at http://www.bid2win.com/userconf/2011/labs/