Useful Scripting for Biologists



Similar documents
A Comparison of Programming Languages for Graphical User Interface Programming

Introduction to bioknoppix: Linux for the life sciences

Bioinformatics using Python for Biologists

AUTOMATIC EVALUATION OF COMPUTER PROGRAMS USING MOODLE S VIRTUAL PROGRAMMING LAB (VPL) PLUG- IN

Windows Script Host Fundamentals

Chapter 13 Computer Programs and Programming Languages. Discovering Computers Your Interactive Guide to the Digital World


UNIVERSITY OF WATERLOO Software Engineering. Analysis of Different High-Level Interface Options for the Automation Messaging Tool

Bioinformatics Grid - Enabled Tools For Biologists.

Computational Mathematics with Python

Chapter 13: Program Development and Programming Languages

Chapter 13: Program Development and Programming Languages

Viewpoint. Choosing the right automation tool and framework is critical to project success. - Harsh Bajaj, Technical Test Lead ECSIVS, Infosys

The first time through running an Ad Hoc query or Stored Procedure, SQL Server will go through each of the following steps.

Total Quality Management (TQM) Quality, Success and Failure. Total Quality Management (TQM) vs. Process Reengineering (BPR)

Time Limit: X Flags: -std=gnu99 -w -O2 -fomitframe-pointer. Time Limit: X. Flags: -std=c++0x -w -O2 -fomit-frame-pointer - lm

Metacognition. Complete the Metacognitive Awareness Inventory for a quick assessment to:

Computational Mathematics with Python

Microsoft Windows PowerShell v2 For Administrators

ESPResSo Summer School 2012

Data Management, Analysis Tools, and Analysis Mechanics

Perl in a nutshell. First CGI Script and Perl. Creating a Link to a Script. print Function. Parsing Data 4/27/2009. First CGI Script and Perl

Jonathan Worthington Scarborough Linux User Group

Computational Mathematics with Python

CS106A, Stanford Handout #38. Strings and Chars

n Introduction n Art of programming language design n Programming language spectrum n Why study programming languages? n Overview of compilation

Lab 4.4 Secret Messages: Indexing, Arrays, and Iteration

PYTHON Basics

Exercise 1: Python Language Basics

Introduction to Parallel Programming and MapReduce

A Python Tour: Just a Brief Introduction CS 303e: Elements of Computers and Programming

Introduction to Python

Building Software Systems. Multi-module C Programs. Example Multi-module C Program. Example Multi-module C Program

by Pearson Education, Inc. All Rights Reserved.

1/20/2016 INTRODUCTION

Parsing Technology and its role in Legacy Modernization. A Metaware White Paper

6.170 Tutorial 3 - Ruby Basics

Programming Exercises

PHP Tutorial From beginner to master

Game Programming & Game Design

CSCI 3136 Principles of Programming Languages

Smartphone Enterprise Application Integration

CS 1133, LAB 2: FUNCTIONS AND TESTING

Introduction to the Data Migration Framework (DMF) in Microsoft Dynamics WHITEPAPER

VHDL Test Bench Tutorial

SGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, Abstract. Haruna Cofer*, PhD

Power Tools for Pivotal Tracker

MetaKit: Quick and Easy Storage for your Tcl Application 1

Debian Med. Integrated software environment for all medical purposes based on Debian GNU/Linux. Andreas Tille. OSWC, Malaga Debian.

Sorting. Lists have a sort method Strings are sorted alphabetically, except... Uppercase is sorted before lowercase (yes, strange)

Learn Perl by Example - Perl Handbook for Beginners - Basics of Perl Scripting Language

SEMINAR. Content Management System. Presented by: Radhika Khandelwal

Course Scheduling Support System

Programming Languages

1 Introduction. 2 An Interpreter. 2.1 Handling Source Code

Thomas Jefferson High School for Science and Technology Program of Studies Foundations of Computer Science. Unit of Study / Textbook Correlation

GUI application set up using QT designer. Sana Siddique. Team 5

Microsoft Visual Basic Scripting Edition and Microsoft Windows Script Host Essentials

JAVASCRIPT AND COOKIES

Components of a Computing System. What is an Operating System? Resources. Abstract Resources. Goals of an OS. System Software

Miami University RedHawk Cluster Working with batch jobs on the Cluster

1Intro. Apache is an open source HTTP web server for Unix, Apache

Chapter 3.2 C++, Java, and Scripting Languages. The major programming languages used in game development.

Smart Shopping Cart. Group 5. March 11, Advisor: Professor Haibo He

Chapter 1. Dr. Chris Irwin Davis Phone: (972) Office: ECSS CS-4337 Organization of Programming Languages

Advantages of PML as an iseries Web Development Language

Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers

SOFTWARE TESTING TRAINING COURSES CONTENTS

VACA: A Tool for Qualitative Video Analysis

Euler s Method and Functions

Graphical Environment Tool for Development versus Non Graphical Development Tool

Advanced Tornado TWENTYONE Advanced Tornado Accessing MySQL from Python LAB

Introduction to Shell Scripting

Integrated Pictometry Analytics Quick Start Guide. July 2012

Advanced compiler construction. General course information. Teacher & assistant. Course goals. Evaluation. Grading scheme. Michel Schinz

Introduction to Linux operating system. module Basic Bioinformatics PBF

Gender Differences in Computer Technology Achievement

CS177 MIDTERM 2 PRACTICE EXAM SOLUTION. Name: Student ID:

McGraw-Hill The McGraw-Hill Companies, Inc.,

Applications Development

Language Evaluation Criteria. Evaluation Criteria: Readability. Evaluation Criteria: Writability. ICOM 4036 Programming Languages

HappyFace3 everything now in Python ;-)

Master Degree Project Ideas (Fall 2014) Proposed By Faculty Department of Information Systems College of Computer Sciences and Information Technology

Lecture 2, Introduction to Python. Python Programming Language

Building and Customizing a CDISC Compliance and Data Quality Application Wayne Zhong, Accretion Softworks, Chester Springs, PA

Pattern Insight Clone Detection

STUDY AND ANALYSIS OF AUTOMATION TESTING TECHNIQUES

DATA 301 Introduction to Data Analytics Microsoft Excel VBA. Dr. Ramon Lawrence University of British Columbia Okanagan

Programming in Access VBA

White Paper. Java versus Ruby Frameworks in Practice STATE OF THE ART SOFTWARE DEVELOPMENT 1

Transcription:

Useful Scripting for Biologists Brad Chapman 23 Jan 2003

Objectives Explain what scripting languages are Describe some of the things you can do with a scripting language Show some tools to use once you pick a language Show an example of how I ve used scripting languages Overall, convince you it s worthwhile to learn to program in a scripting language 1

Who am I? Biologist all of my background is in lab work Took two programming classes as an undergrad (C++) Needed programming for my work when I came to UGa Taught myself to program in the scripting language python 2

What is a scripting language? In Computer Science terms Dynamically typed (do not have to declare variables) Are not compiled In Practical Terms Faster to learn to program in Easier to debug and fix problems Programs run slower, however, because of the niceties 3

Examples of scripting languages Perl http://www.perl.com Python http://www.python.org Tcl http://tcl.sourceforge.net/ Ruby http://www.ruby-lang.org/en/ Web languages JavaScript, PHP Others Lisp, Scheme, Ocaml 4

Some things scripting languages make easier Dealing with text files Interacting with the web Making graphical user interfaces Creating dynamic web pages Tieing together multiple programs Parsing information from files or web pages Retrieving information from Databases Reorganizing data (ie. spreadsheet ready format) Generating figures into a Automating repetitive daily tasks Writing one-time programs 5

Other Practical Advantages of Scripting Languages All those mentioned are freely available and most have very non-restrictive licenses Lots of free documentation on the web for learning how to program in the language Python http://www.python.org/doc/current/tut/tut.html Ruby http://www.rubycentral.com/book/ Really relieves your frustration level in programming compared to compiled languages like C++ Programs can often be written quicker because you don t have to worry about compiling woes, variable declaration and so on 6

How do you choose a scripting language? Want to pick something easy to learn and get started with Make a list of some things you want to do with the language and look for languages that have ready made packages for what you d like to do Python Interact with Excel on a Windows computer: Python Windows Extensions Perl Create charts for display on the web: GDGraph Tcl write a graphical user interface : Tk Looking at the code in different languages to see what appeals to you Try them out 7

Perl code $nbottles = $ARGV[0]; $nbottles = 100 if $nbottles eq $nbottles < 0; foreach (reverse(1.. $nbottles)) { $s = ($_ == 1)? "" : "s"; $onelesss = ($_ == 2)? "" : "s"; print "\n$_ bottle$s of beer on the wall,\n"; print "$_ bottle$s of beer,\n"; print "Take one down, pass it around,\n"; print $_ - 1, " bottle$onelesss of beer on the wall\n"; } print "\n*burp*\n"; 8

Python code def bottle(n): try: return { 0: "no more bottles", 1: "1 bottle"} [n] + " of beer" except KeyError: return "%d bottles of beer" % n for i in range(99, 0, -1): b1, b0 = bottle(i), bottle(i-1) print "%(b1)s on the wall, %(b1)s,\n"\ "take one down, pass it around,\n"\ "%(b0)s on the wall." % locals() 9

Tcl code proc bottles {i} { return "$i bottle[expr $i!=1?"s":""] of beer" } proc line123 {i} { puts "[bottles $i] on the wall," puts "[bottles $i]," puts "take one down, pass it around," } proc line4 {i} { puts "[bottles $i] on the wall.\n" } for {set i 99} {$i>0} {} { line123 $i incr i -1 line4 $i } 10

Biological Tools for Scripting Languages Because the advantages of scripting languages help ease a lot of biological problems (ie. file parsing) there are good resources for code for scripting language Open-Bio set of projects BioPerl http://www.bioperl.org Biopython http://www.biopython.org BioRuby http://www.bioruby.org/ Projects contain code for representing sequences, parsing biological file formats, running bioinformatics programs and tons more 11

More Resources for Scripting Languages Most scripting languages come out of the open-source community Community built up around them that is encouraged to give away their code Can easily find code other people have written that may be useful to you: Perl Comprehensive Perl Archive Network http://www.cpan.org/ Python Vaults of Parnassus http://www.vex.net/parnassus/ Ruby Ruby Application Archive http://www.ruby-lang.org/raa/ In many cases you can find code to solve your problem or close to your problem without having to start from scratch 12

Solving problems The ultimate advantage of scripting languages is that they let you solve your problems faster Most of my problems in biology are not huge programs to write, but small tasks that need to get finished. Retrieve information from a web page Parse a ton of BLAST outputs Rewrite files into different phylogenetic formats Generate files for GenBank submission Retrieve information from ABI trace files 13

Example: batch primer design Simplified from a real example where we needed to generate primers for several hundred sequences Our goals: Start with a file full of sequences Design primers for each sequence with specific criterion (have to span a central region) Write the files to a format that can be loaded into Excel Accomplished in python with the help of the Biopython libraries 14

Our starting sequences Begin with files in Fasta format, a standard sequence format >CL031541.94_2268.amyltp CGCGGCCGCSGACCTYGCCGCAGCCTCCCCTTCMATCCTCCTCCCGCTCC TCCTACGCGACGCCGGTGACCGTGATGAGGCCGTCGCCGCYTCCGCGCTC CCTCAAGGCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN AAGTGCCTTGGCTTCACGCTCTACCATCCGGTCCTCGTCTCCACCGTCTC AGGTAGAATGGCTCCCCTGTATCCACAACTCGTCAGTGAAATGTGCAGTT Biopython has a module (Bio.Fasta) which can read these Fasta formatted files. 15

Our primer design program Primer3, from the Whitehead Institute http://www-genome.wi.mit. edu/genome_software/other/primer3.html EMBOSS, a free set of bioinformatics programs, has an interface (eprimer3) to make the program easier to use http://www.emboss. org Biopython has code to work with eprimer3 Primer3Commandline run the program Primer3Parser parse the output from the program 16

Program Step 1 read the Fasta File Load the biopython module that deals with Fasta from Bio import Fasta Open the Fasta file from python open_fasta = open(fasta_file, "r") Initialize an Iterator that lets us step through the file parser = Fasta.RecordParser() iterator = Fasta.Iterator(open_fasta, parser) 17

Program Step 2 Read through the file We want to go through the file one record at a time. Generate a loop and use the iterator to get the next record. We stop if we reach the end of the file (get a record that is None). while 1: cur_record = iterator.next() if not(cur_record): break 18

Program Step 3 Set up the primer3 program We want to create a run of the primer3 program with our parameters from Bio.Emboss.Applications import Primer3Commandline primer_cl = Primer3Commandline() primer_cl.set_parameter("-sequence", "in.pr3") primer_cl.set_parameter("-outfile", "out.pr3") primer_cl.set_parameter("-productsizerange", "350,10000") primer_cl.set_parameter("-target", "%s,%s" % (start, end)) start and end are the middle region we want to design primers around 19

Program Step 4 Run the primer3 program Biopython has a single way to run a program and get the output. Python interacts readily with the operating system, making it easy to run other programs from it. from Bio.Application import generic_run result, messages, errors = generic_run(primer_cl) 20

The Primer3 output file Primer3 generates an output file that looks like: # PRIMER3 RESULTS FOR CLB11512.1_789.buhui # Start Len Tm GC% Sequence 1 PRODUCT SIZE: 227 FORWARD PRIMER 728 20 59.91 50.00 TTCACCTACTGCAAACGCAC REVERSE PRIMER 935 20 59.57 50.00 TTGGTACGTTGTCCATCTCG A ton of these files are not easy to deal with. 21

Program Step 5 parse the output file We use the Biopython parser to parse these files. open_outfile = open("out.pr3", "r") from Bio.Emboss.Primer import Primer3Parser parser = Primer3Parser() primer_record = parser.parse(open_outfile) The result is that we get the information into a python ready format that we can readily output. 22

Program Step 6 Write the output We want to write the output to an Excel ready format We write the forward and reverse sequences along with the sequence name to a comma separated value file. open_excelfile = open(excel_file, "w") primer = primer_record.primers[0] open_excelfile.write("%s,%s,%s" % ( sequence_name, primer.forward_seq, primer.reverse_seq)) The result is a file full of primers that you can then deal with. 23

Conclusions from all of this Scripting languages can be used to automate repetitive tasks (designing 500 primers) Scripting languages readily interact with the things biologists need to do (parse files, run programs) Scripting languages allow you to tackle the problem in small simpler steps (parse a file, run a program, write a file) Scripting languages aren t too hard to learn (see that code was easy) You should run right out and learn a scripting language 24