Introduction to Programming with Python. A Useful Reference. http://www.pasteur.fr/formation/infobio/python/



Similar documents
GenBank, Entrez, & FASTA

Hidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006

Chapter 2: Algorithm Discovery and Design. Invitation to Computer Science, C++ Version, Third Edition

agucacaaacgcu agugcuaguuua uaugcagucuua

Overview of Eukaryotic Gene Prediction

A Multiple DNA Sequence Translation Tool Incorporating Web Robot and Intelligent Recommendation Techniques

Translating to Java. Translation. Input. Many Level Translations. read, get, input, ask, request. Requirements Design Algorithm Java Machine Language

Biological Sequence Data Formats

Bioinformatics Resources at a Glance

Gene Finding CMSC 423

Lecture 2, Introduction to Python. Python Programming Language

Sorting. Lists have a sort method Strings are sorted alphabetically, except... Uppercase is sorted before lowercase (yes, strange)

( TUTORIAL. (July 2006)

Chapter 5. Selection 5-1

Introduction to Computer Science I Spring 2014 Mid-term exam Solutions

PL / SQL Basics. Chapter 3

Molecular Genetics. RNA, Transcription, & Protein Synthesis

High-Level Programming Languages. Nell Dale & John Lewis (adaptation by Michael Goldwasser)

Flowchart Techniques

We will learn the Python programming language. Why? Because it is easy to learn and many people write programs in Python so we can share.

While and Do-While Loops Summer 2010 Margaret Reid-Miller

Concluding lesson. Student manual. What kind of protein are you? (Basic)

A Tutorial in Genetic Sequence Classification Tools and Techniques

CSCE 110 Programming I Basics of Python: Variables, Expressions, and Input/Output

Python Loops and String Manipulation

CS 141: Introduction to (Java) Programming: Exam 1 Jenny Orr Willamette University Fall 2013

River Dell Regional School District. Computer Programming with Python Curriculum

Introduction to Genome Annotation

CSC 221: Computer Programming I. Fall 2011

2. The number of different kinds of nucleotides present in any DNA molecule is A) four B) six C) two D) three

TECHNOLOGY Computer Programming II Grade: 9-12 Standard 2: Technology and Society Interaction

Part ONE. a. Assuming each of the four bases occurs with equal probability, how many bits of information does a nucleotide contain?

Gene Models & Bed format: What they represent.

Algorithm & Flowchart & Pseudo code. Staff Incharge: S.Sasirekha

Lab # 12: DNA and RNA

Selection: Boolean Expressions (2) Precedence chart:

Python Lists and Loops

ALGORITHMS AND FLOWCHARTS. By Miss Reham Tufail

LESSON 4. Using Bioinformatics to Analyze Protein Sequences. Introduction. Learning Objectives. Key Concepts

Lecture 2 Notes: Flow of Control

Chapter 9: Building Bigger Programs

6. Control Structures

QUIZ-II QUIZ-II. Chapter 5: Control Structures II (Repetition) Objectives. Objectives (cont d.) 20/11/2015. EEE 117 Computer Programming Fall

Python Programming: An Introduction to Computer Science

Debugging. Common Semantic Errors ESE112. Java Library. It is highly unlikely that you will write code that will work on the first go

PIC 10A. Lecture 7: Graphics II and intro to the if statement

Outline Basic concepts of Python language

Serial Cloner 1.2. User Manual Part I -Basic functions -

LOOPS CHAPTER CHAPTER GOALS

Chapter 2 Introduction to Java programming

Analyzing A DNA Sequence Chromatogram

Introduction to Python

The Little Man Computer

DNA is found in all organisms from the smallest bacteria to humans. DNA has the same composition and structure in all organisms!

Welcome to Introduction to programming in Python

Notes on Assembly Language

DNA Sequence formats

COMP 110 Prasun Dewan 1

Compiler Construction

MORPHEUS. Prediction of Transcription Factors Binding Sites based on Position Weight Matrix.

CS177 MIDTERM 2 PRACTICE EXAM SOLUTION. Name: Student ID:

Perl in a nutshell. First CGI Script and Perl. Creating a Link to a Script. print Function. Parsing Data 4/27/2009. First CGI Script and Perl

Writing Control Structures

Name Class Date. Figure Which nucleotide in Figure 13 1 indicates the nucleic acid above is RNA? a. uracil c. cytosine b. guanine d.

Thomas Jefferson High School for Science and Technology Program of Studies Foundations of Computer Science. Unit of Study / Textbook Correlation

ECE 3401 Lecture 7. Concurrent Statements & Sequential Statements (Process)

ESCI 386 IDL Programming for Advanced Earth Science Applications Lesson 6 Program Control

Coding sequence the sequence of nucleotide bases on the DNA that are transcribed into RNA which are in turn translated into protein

Conditional Statements Summer 2010 Margaret Reid-Miller

GCE. Computing. Mark Scheme for January Advanced Subsidiary GCE Unit F452: Programming Techniques and Logical Methods

Curriculum Map. Discipline: Computer Science Course: C++

1.3 Conditionals and Loops

PROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE

Introduction to Programming (in C++) Loops. Jordi Cortadella, Ricard Gavaldà, Fernando Orejas Dept. of Computer Science, UPC

JavaScript: Control Statements I

Example. Introduction to Programming (in C++) Loops. The while statement. Write the numbers 1 N. Assume the following specification:

Some programming experience in a high-level structured programming language is recommended.

El Dorado Union High School District Educational Services

Moving from CS 61A Scheme to CS 61B Java

>

Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program.

Computer Programming I & II*

WESTMORELAND COUNTY PUBLIC SCHOOLS Integrated Instructional Pacing Guide and Checklist Computer Math

Boolean Expressions, Conditions, Loops, and Enumerations. Precedence Rules (from highest to lowest priority)

J a v a Quiz (Unit 3, Test 0 Practice)

Pseudo code Tutorial and Exercises Teacher s Version

Exercise 1: Python Language Basics

Topic 11 Scanner object, conditional execution

Analysis of Binary Search algorithm and Selection Sort algorithm

CS106A, Stanford Handout #38. Strings and Chars

PRACTICE TEST QUESTIONS

24 Uses of Turing Machines

In mathematics, it is often important to get a handle on the error term of an approximation. For instance, people will write

Transcription:

Introduction to Programming with Python A Useful Reference http://www.pasteur.fr/formation/infobio/python/ 1

What is Computer Programming? An algorithm is a series of steps for solving a problem A programming language is a way to express our algorithm to a computer Programming is the process of writing instructions (i.e., an algorithm) to a computer for the purpose of solving a problem We will be using the programming language Python A variable is a mnemonic name for something that may change value over time. kozak = ACCATGG name = Brian year = 2007 year = 2008 GC_content = 0.46 variable_name = value Variables (generic variable assignment) Variable - I don t think that word means what you think it means 2008 = year (wrong!) 0.46 = GC_content (wrong!) 2

Types Variables store values of some type. Types have operators associated with them. year = 2008 nextyear = year + 1 GC_content = 2.0 * 0.21 kozak = ACC + ACCATGG year = year + 1 kozak = kozak + TT + kozak variable_name = value (generic variable assignment) You can have the computer tell you the value of a variable print nextyear print The GC content is:, GC_content print year print kozak Strings Strings are a sequence of characters kozak = ACCATGG Strings are index-able kozak[0] refers to A, the first character in kozak kozak[4] refers to T, the fifth character in kozak Strings have lots of operations kozak.lower() returns accatgg kozak.count( A ) returns 2 kozak.replace( A, q ) returns qccqtgg len(kozak) returns 7 3

Nucleotide Content kozak = ACCATGG What percent of the sequence corresponds to adenine nucleotides? numberofadenines = kozak.count( A ) totalnucleotides = len(kozak) A_content = numberofadenines / totalnucleotides print A_content What went wrong? Reading a File Suppose the DNA sequence is stored in the file kozak.txt. We can read the sequence from the file... file = open( kozak.txt ) sequence = file.read() print sequence kozak.txt ACCATGG Generic code for reading a file variable_name_1 = open(string referring to file name) variable_name_2 = variable_name_1.read() 4

Putting it all Together # Read in file and store string in variable *sequence* file = open( kozak.txt ) sequence = file.read() kozak.txt ACCATGG # Calculate number of adenines in sequence numberofadenines = float(sequence.count( A )) totalnucleotides = float(len(sequence)) A_content = numberofadenines / totalnucleotides print A_content What about GC content? Slicing a String # Read in file and store string in variable *sequence* file = open( kozak.txt ) sequence = file.read() kozak.txt ACCATGG # Grab a piece of the sequence firstthreeletters = sequence[0:3] print firstthreeletters middlethreeletters = sequence[2:5] print middlethreeletters What about your gene? 5

Booleans kozak = ACCATGG Booleans are either True or False kozak == ACCATGG kozak == GCATCAG kozak == accatgg kozak.lower() == accatgg len(kozak) > 10 len(kozak) < 10 A in kozak U in kozak Decisions, Decisions, Decisions Normal execution flow: SEQUENTIAL Often you want to execute code (instructions) only in certain circumstances (i.e., conditionally) file = open( kozak.txt ) sequence = file.read() # Do we have a short sequence? if (len(sequence) < 50): print This is a short sequence. # Is this an RNA sequence? if (sequence.count( U ) > 0): print Sequence has RNA nucleotides # Check if sequence starts off looking like a gene if (sequence[0:3] == ATG ): print Sequence has start codon. length = len(sequence) finalcodon = sequence[length-3:length] print Final three NTs are: + finalcodon 6

Otherwise... Sometimes you want to decide between two alternatives file = open( kozak.txt ) sequence = file.read() # Do we have a short sequence? if (len(sequence) < 50): print This is a short sequence. else: print This is a long sequence. # Is this an RNA sequence? if (sequence.count( U ) > 0): print Sequence has RNA nucleotides else: numofthymines = sequence.count( T ) print Sequence has, numofthymines, thymines. Nesting Conditionals You can put any code in the body of a conditional statement, including other conditional statements # Check if sequence starts and ends looking like a gene if (sequence[0:3] == ATG ): print Sequence has start codon. length = len(sequence) finalcodon = sequence[length-3:length] if (finalcodon == TGA ): print Sequence has stop codon. if (finalcodon == TAG ): print Sequence has stop codon. if (finalcodon == TAA ): print Sequence has stop codon. # Is this an RNA sequence? if (sequence.count( U ) > 0): print Sequence has RNA nucleotides else: if (sequence.count( T ) > 0): print Sequence has DNA nucleotides. 7

Generic Conditionals # if-then if (boolean_expression): # Statements to execute if boolean_expression is true # if-then-else if (boolean_expression): # Statements to execute if boolean_expression is true else: # Statements to execute if boolean_expression is false # nested conditionals if (boolean_expression_1): if (boolean_expression_2): # Statements to execute if boolean_expression_2 is true else: # Statements to execute if boolean_expression_2 is false Reading in a FASTA File 8

Repetition is a Powerful Idea Suppose you want to repeat a series of instructions # Tell us how you feel about this class counter = 5 while (counter > 0): print I love Bioinformatics! counter = counter - 1 # Assuming we have a coding sequence, print out each codon startofcodon = 0 while (startofcodon < len(sequence)): codon = sequence[startofcodon:startofcodon+3] print codon startofcodon = startofcodon + 3 Loop (i.e., Repetition) Examples # Find the start of all possible ORFs in sequence startofcodon = 0 while (startofcodon < len(sequence)): codon = sequence[startofcodon:startofcodon+3] if (codon == ATG ): print Found start codon at, startofcodon startofcodon = startofcodon + 1 # Search for ambiguous nucleotides in sequence indexofcurrentnucleotide = 0 while (indexofcurrentnucleotide < len(sequence)): if (sequence[indexofcurrentnucleotide] not in ACGT ): print I don t recognize the character:, sequence[indexofcurrentnucleotide] indexofcurrentnucleotide = indexofcurrentnucleotide + 1 9

Generic Repetition # Loop while (boolean_expression): # Statements to execute as long as boolean_expression is true. # Statements should ensure that, eventually, boolean_expression # will be false. Otherwise, the loop will repeat indefinitely. Python Summary Types of variables: numbers, strings, Booleans Assigning values to variables Slicing and dicing with strings Reading in files; text and variable value output Conditionals (if-then, if-then-else) Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition 10