PROGRAMMING FOR BIOLOGISTS BIOL 6297 Monday, Wednesday 10 am -12 pm
Tomorrow is Ada Lovelace Day Ada Lovelace was the first person to write a computer program
Today s Lecture Overview of the course Philosophy & Goals Getting Started Logging onto xanadu (Jerry Ebalunode) Course Organization Grading UNIX operating system History Getting Started in UNIX Connecting remotely Working from the command-line
Contact Info Elizabeth Ostrowski SR2, Room 221E eaostrowski@uh.edu Office Hours: After class, Monday and Wednesday (or by appointment)
Course Goals Teach you a programming language Teach you how to carry out particular bioinformatics analyses You can teach yourself Everyone needs to learn something different (customize) Most intro programming is written for computer scientists or software engineers, not biologists. Data Scientist How to generate, analyze, and synthesize large data sets Introduce tools and techniques that are necessary for bioinformatics Use a computing cluster How to write scripts Universal features of languages e.g., loops How to build computational pipelines
Example Bioinformatics Task Text file of sequencing reads Sequence Alignments
An Example Pipeline Raw sequencing reads Mapping Perl script UNIX tools Ruby scripts MAQ Shell script List of Mismatches Filtering UNIX tools AWK List of high confidence SNPs Sequences UNIX tools Ruby BioRuby Genome Sequences Convert Gene Sequences Align Sequence Alignments Molecular Evolution Statistics R UNIX tools (Bio)Ruby RESULTS UNIX tools Muscle Macse Revtrans.py Ruby scripts UNIX tools Ruby Analysis package
What we will learn How to use the UNIX command line to efficiently submit, pipeline, and analyze large data sets Command-line to organize and sort data Shell programming Text editors (emacs) Awk and Sed: Extract and manipulate information from data sets Introduction to two programming languages An interpreted language (Python) A language for statistical analysis & data visualization (R) Ethics - Data Management and Reproducibility Good programming practices
Course Grading Attendance (10%) Class Exercises (30%) Submit answers by the end of class Quizzes (60%) Will drop the two worst grades No make-ups
The Hacker Mentality http://en.wikipedia.org/wiki/cheating Code is not like writing or works of art you are encouraged to be resourceful and to re-purpose code from anywhere you can. Use Google or online user help forums: Debugging Establish proper syntax However, for any graded assignments in this class: You may discuss strategies in general terms ( pseudocode ) Do not show each other actual code (written or typed)
Course Organization Part I => UNIX Operating System Part II => Programming Languages Python Part III => R Data visualization and statistics
Connecting to the UH cluster Connect via ssh ( secure shell ) Installed by default on any UNIX-based machine (Mac/Linux) Mac (Applications->Utilities) Linux Machine (Terminal) Windows PuTTY, a free ssh software Need: A user account IP address Format: $ ssh username@ipaddress For example: ssh eaostrow@xanadu.tlc2.uh.edu ssh elizabeth@171.28.41.6 For our class: ssh biol6297eo1@xanadu.tlc2.uh.edu Check out a node to use in interactive mode: $ qsub I $ exit Do not run jobs (i.e., work) on the login nodes!! Substitute your user name here
Let s Practice Practice Exercises: Learn Code the Hard Way (LCTHW) Command-line http://cli.learncodethehardway.org/book/ Start at: Paths, Folders and Directories
Shell is the user interface Locally, access the shell by opening a Terminal Remotely, use ssh (secure shell) This will open a secure (encrypted) session From a UNIX machine open a Terminal: $ ssh hpc13f52@xanadu.tlc2.uh.edu! Command prompt User name Host name From a Windows machine: Use PuTTY, or some other SSH software
Notes for Mac Users MacOSX is a UNIX-based operating system, but most UNIX utilities are not installed by default Two methods to get these tools from Apple: Install Xcode Preferences -> Command-line tools Download from Apple Developer Website Must register with Apple ID Or install a virtual machine Consider a ports software (e.g., Macports, Fink, or Homebrew)
Getting Started Windows Users: Install PuTTY and/or WinScp http://www.putty.org http://ged.msu.edu/angus/tutorials/using-putty-on-windows.html http://rcc.its.psu.edu/user_guides/remote_connectivity/putty/ Linux Users: Open a Terminal Mac Users: Open a Terminal Developer Tools: http://www.cnet.com/how-to/install-command-line-developer-toolsin-os-x/