Module 10: Bioinformatics



Similar documents
Bioinformatics Resources at a Glance

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

GenBank, Entrez, & FASTA

A Multiple DNA Sequence Translation Tool Incorporating Web Robot and Intelligent Recommendation Techniques

MAKING AN EVOLUTIONARY TREE

Bioinformatics Grid - Enabled Tools For Biologists.

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS

Bio-Informatics Lectures. A Short Introduction

Biological Sequence Data Formats

Introduction to Bioinformatics 3. DNA editing and contig assembly

ID of alternative translational initiation events. Description of gene function Reference of NCBI database access and relative literatures

Usability in bioinformatics mobile applications

Clone Manager. Getting Started

Core Bioinformatics. Degree Type Year Semester Bioinformàtica/Bioinformatics OB 0 1

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment

Introduction to Bioinformatics AS Laboratory Assignment 6

Pairwise Sequence Alignment

Gene Models & Bed format: What they represent.

Activity 7.21 Transcription factors

Genome and DNA Sequence Databases. BME 110/BIOL 181 CompBio Tools Todd Lowe March 31, 2009

Searching Nucleotide Databases

Arabidopsis. A Practical Approach. Edited by ZOE A. WILSON Plant Science Division, School of Biological Sciences, University of Nottingham

Sequence Formats and Sequence Database Searches. Gloria Rendon SC11 Education June, 2011

BUDAPEST: Bioinformatics Utility for Data Analysis of Proteomics using ESTs

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Amazing DNA facts. Hands-on DNA: A Question of Taste Amazing facts and quiz questions

BLAST. Anders Gorm Pedersen & Rasmus Wernersson

Similarity Searches on Sequence Databases: BLAST, FASTA. Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003

Calculating Nucleic Acid or Protein Concentration Using the GloMax Multi+ Microplate Instrument

1. INTRODUCTION TABLE OF CONTENTS INTRODUCTION 1-3. How This Guide Is Organized 1-3 Additional Documentation 1-4 Conventions Used in This Guide 1-4

Module 1. Sequence Formats and Retrieval. Charles Steward

DNA Technology Mapping a plasmid digesting How do restriction enzymes work?

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology

Mitochondrial DNA Analysis

A Tutorial in Genetic Sequence Classification Tools and Techniques

What is a contig? What are the contig assembly programs?

European Medicines Agency

Linear Sequence Analysis. 3-D Structure Analysis

Biotechnology: DNA Technology & Genomics

Lecture Outline. Introduction to Databases. Introduction. Data Formats Sample databases How to text search databases. Shifra Ben-Dor Irit Orr

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want

DNA Scissors: Introduction to Restriction Enzymes

PROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE

The sequence of bases on the mrna is a code that determines the sequence of amino acids in the polypeptide being synthesized:

LESSON 4. Using Bioinformatics to Analyze Protein Sequences. Introduction. Learning Objectives. Key Concepts

Core Bioinformatics. Titulació Tipus Curs Semestre Bioinformàtica/Bioinformatics OB 0 1

INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE Q5B

Committee on WIPO Standards (CWS)

Replication Study Guide

Becker Muscular Dystrophy

Organelle Speed Dating Game Instructions and answers for teachers

Green Fluorescent Protein (GFP): Genetic Transformation, Synthesis and Purification of the Recombinant Protein

CD-HIT User s Guide. Last updated: April 5,

The Galaxy workflow. George Magklaras PhD RHCE

Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance?

Exercises for the UCSC Genome Browser Introduction

Genetic Technology. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Vector NTI Advance 11 Quick Start Guide

BIO 3352: BIOINFORMATICS II HYBRID COURSE SYLLABUS

Molecular and Cell Biology Laboratory (BIOL-UA 223) Instructor: Ignatius Tan Phone: Office: 764 Brown

Lecture 13: DNA Technology. DNA Sequencing. DNA Sequencing Genetic Markers - RFLPs polymerase chain reaction (PCR) products of biotechnology

Guide for Bioinformatics Project Module 3

DNA Sequence formats

Structure and Function of DNA

Lab 2/Phylogenetics/September 16, PHYLOGENETICS

Using MATLAB: Bioinformatics Toolbox for Life Sciences

Introduction to Genome Annotation

STUDIES ON SEED STORAGE PROTEINS OF SOME ECONOMICALLY MINOR PLANTS

Biology Behind the Crime Scene Week 4: Lab #4 Genetics Exercise (Meiosis) and RFLP Analysis of DNA

REGULATIONS FOR THE DEGREE OF BACHELOR OF SCIENCE IN BIOINFORMATICS (BSc[BioInf])

From DNA to Protein. Proteins. Chapter 13. Prokaryotes and Eukaryotes. The Path From Genes to Proteins. All proteins consist of polypeptide chains

1 Mutation and Genetic Change

Problem Set 3 KEY

DNA Replication & Protein Synthesis. This isn t a baaaaaaaddd chapter!!!

The world of non-coding RNA. Espen Enerly

Appendix 2 Molecular Biology Core Curriculum. Websites and Other Resources

Recombinant DNA Unit Exam

RNA Structure and folding

HCS Exercise 1 Dr. Jones Spring Recombinant DNA (Molecular Cloning) exercise:

Bio 102 Practice Problems Recombinant DNA and Biotechnology

Annex 6: Nucleotide Sequence Information System BEETLE. Biological and Ecological Evaluation towards Long-Term Effects

Error Tolerant Searching of Uninterpreted MS/MS Data

Description: Molecular Biology Services and DNA Sequencing

Genetics Lecture Notes Lectures 1 2

GENEWIZ, Inc. DNA Sequencing Service Details for USC Norris Comprehensive Cancer Center DNA Core

Lab # 12: DNA and RNA

Integrated Protein Services

AP BIOLOGY 2007 SCORING GUIDELINES

Comparing Methods for Identifying Transcription Factor Target Genes

GENE CONSTRUCTION KIT 4

Innovations in Molecular Epidemiology

Molecular Genetics. RNA, Transcription, & Protein Synthesis

The Techniques of Molecular Biology: Forensic DNA Fingerprinting

2. The number of different kinds of nucleotides present in any DNA molecule is A) four B) six C) two D) three

BioBoot Camp Genetics

RNA & Protein Synthesis

A Web Based Software for Synonymous Codon Usage Indices

Unipro UGENE User Manual Version

Ms. Campbell Protein Synthesis Practice Questions Regents L.E.

BMC Bioinformatics. Open Access. Abstract

Transcription:

Module 10: Bioinformatics 1.) Goal: To understand the general approaches for basic in silico (computer) analysis of DNA- and protein sequences. We are going to discuss sequence formatting required prior to analysis, DNA restriction mapping, DNA translation into protein coding regions (= finding open reading frames ORFs), protein sequence analysis, sequence comparisons and database searching. 2.) Introduction DNA sequencing has of late become very easy, fast and cheap. The elucidation of the complete human genome sequence (a mere 3 x 10 9 basepairs) has only been possible because of these technical advances. Protein sequencing on the other hand is also possible, but technically much harder, and slower. Because a DNA sequence predicts the encoded protein sequence thru the rules of the genetic code, we can sequence a piece of DNA and deduce or predict its encoded protein sequence, instead of painstakingly purifying and then sequencing the corresponding protein. Given the large size of some of the genomes that have been sequenced to date, it becomes clear that powerful in silico approaches must go hand in hand with the wet lab procedures. 3.) Background information Sequence input files can be generated in WORD, or can be copied from other source documents, websites etc formatting: for some applications, files first have to be converted into a specific format ( FASTA being a popular choice); removal of non- standard characters from the files is necessary (example: DNA sequence files can only contain the characters G, A, T and C) Many free sequence analysis programs are available on line; all you have to do is simply copy and paste your sequence(s) into a browser window and run the analysis, but remember in some cases the correct file format must be used. Tip: when working with sequence files in WORD, use the Courier font, as it is the only font in which each letter/character uses the same amount of space, resulting in well- aligned sequences 4.) Steps in an in silico exercise. Note: different exercises may use a different set of steps Open sequence file supplied in WORD Check for absence of non- standard characters, convert file into appropriate format Copy sequence Open browser window with particular application Paste sequence into window Choose specific analysis parameters Run analysis Results and files can be copied out of browser window and pasted/saved back into the original Word file 5.) Materials supplied: general plasmid map (Appendix A), related DHFR protein sequence for sequence alignment (Appendix B), Complete Plasmid Sequence with the Bacillus thermophilis DHFR (Appendix C), 6.) Boyer book chapter: #2 1

7.) Basic examples of things that can be done with in silico analyses With DNA sequence sequence generation: type or copy- paste in Word format sequence formatting, sequence length : http://www.ebi.ac.uk/tools/sfc/emboss_seqret/ restriction digestion and mapping (linear vs circular maps): http://www.restrictionmapper.org/ translation of open reading frames (ORFs): http://web.expasy.org/translate/ With Protein sequence determine AA sequence length, AA composition, molecular weight, pi, molar extinction coefficient : http://www.ebi.ac.uk/tools/seqstats/emboss_pepstats/ DNA of Protein Sequence comparison these programs can be used to compare complete protein sequences to establish evolutionary relationships or find single point mutations Pairwise DNA alignment: http://www.ebi.ac.uk/tools/psa/emboss_needle/nucleotide.html, Pairwise Protein alignment: http://www.ebi.ac.uk/tools/services/web/toolform.ebi?tool=emboss_needle&context=protein Multiple sequence alignment: http://www.ebi.ac.uk/tools/msa/clustalo/ Sequence Databases DNA and proteins @ Pubmed: http://www.ncbi.nlm.nih.gov/pubmed Also: www.uniprot.org (well curated protein DB, can do Blasts and other alignments) 8.) Protocol: Do the following: 1. Convert the complete plasmid sequence in Appendix C to GCG and EMBL format, indicate length of plasmid DNA in bp. Include properly labeled copies of these in your report 2. Perform restriction mapping for the complete plasmid sequence supplied in Appendix C, using the restriction enzymes NdeI and BamHI. Show result table in your report 3. In your report show translation of all 6 reading frames and indicate the frame with the DHFR ORF (open reading frame). The Bacillus thermophilus ORF starts with MISHI. Show the amino acid sequence of the complete B. thermophilus ORF in your report. 4. Protein analysis: Use the B. thermophilus DHFR protein sequence. In your report only include molecular weight, number of amino acids, pi, and molar extinction coefficient 5. Sequence comparison: use the DHFR - protein sequence from above and align one at a time to the three DHFR protein sequences supplied. Show the sequence alignment and % identity for all three (B. thermophilus with human; B. thermophilus with Bacillus amyloliquefaciens; B. thermophilus with Geobacillus thermodenitrificans) alignments in your report. 6. Sequence comparison: Align all four DHFR protein sequences. Show the sequence alignment in your report. Hand in via e- mail, as a word document, one per group 2

9.) Materials Appendix A: Plasmid Map Appendix B: DHFR sequences to be used for sequence alignment: This is the sequence for human DHFR: 1 mvgslnciva vsqnmgigkn gdlpwpplrn efryfqrmtt tssvegkqnl vimgkktwfs 61 ipeknrplkg rinlvlsrel keppqgahfl srslddalkl teqpelankv dmvwivggss 121 vykeamnhpg hlklfvtrim qdfesdtffp eidlekykll peypgvlsdv qeekgikykf 181 evyeknd This is the DHFR sequence from Bacillus amyloliquefaciens: 1 misfifamde nrligkdndl pwhlpddlay fkkvttghti vmgrktfesi grplpnrrni 61 vvtsrdeslf pgcitadsae evlklippde ecfviggaql ysalfpyadr lymtkihhvf 121 egdrffpefn eaeweltsrk qgvkdeknpy dyeylvyekk n This is the DHFR sequence from Geobacillus thermodenitrificans: 1 mnmtilkssv mtlirrlkrq wrckgektmi shivamdenr vigkdnqlpw hlpadlayfk 61 rvtmghaivm grktfeaigr plpgrdnvvv trnpqfrpeg clvlhsleev kqwiaargee 121 vfiiggaelf katmpiadrl yvtnifasfp gdtfyppise kewkvvsytp gvkdeknpye 181 hafliyerk 3

Appendix C: Complete Plasmid Sequence with the Bacillus thermophilis DHFR DHFR ORF is situated between pos 5205 (NdeI)and pos 5699 (BamHI) tggcgaatgggacgcgccctgtagcggcgcattaagcgcggcgggtgtgg tggttacgcgcagcgtgaccgctacacttgccagcgccctagcgcccgct cctttcgctttcttcccttcctttctcgccacgttcgccggctttccccg tcaagctctaaatcgggggctccctttagggttccgatttagtgctttac ggcacctcgaccccaaaaaacttgattagggtgatggttcacgtagtggg ccatcgccctgatagacggtttttcgccctttgacgttggagtccacgtt ctttaatagtggactcttgttccaaactggaacaacactcaaccctatct cggtctattcttttgatttataagggattttgccgatttcggcctattgg ttaaaaaatgagctgatttaacaaaaatttaacgcgaattttaacaaaat attaacgtttacaatttcaggtggcacttttcggggaaatgtgcgcggaa cccctatttgtttatttttctaaatacattcaaatatgtatccgctcatg agacaataaccctgataaatgcttcaataatattgaaaaaggaagagtat gagtattcaacatttccgtgtcgcccttattcccttttttgcggcatttt gccttcctgtttttgctcacccagaaacgctggtgaaagtaaaagatgct gaagatcagttgggtgcacgagtgggttacatcgaactggatctcaacag cggtaagatccttgagagttttcgccccgaagaacgttttccaatgatga gcacttttaaagttctgctatgtggcgcggtattatcccgtattgacgcc gggcaagagcaactcggtcgccgcatacactattctcagaatgacttggt tgagtactcaccagtcacagaaaagcatcttacggatggcatgacagtaa gagaattatgcagtgctgccataaccatgagtgataacactgcggccaac ttacttctgacaacgatcggaggaccgaaggagctaaccgcttttttgca caacatgggggatcatgtaactcgccttgatcgttgggaaccggagctga atgaagccataccaaacgacgagcgtgacaccacgatgcctgcagcaatg gcaacaacgttgcgcaaactattaactggcgaactacttactctagcttc ccggcaacaattaatagactggatggaggcggataaagttgcaggaccac ttctgcgctcggcccttccggctggctggtttattgctgataaatctgga gccggtgagcgtgggtctcgcggtatcattgcagcactggggccagatgg taagccctcccgtatcgtagttatctacacgacggggagtcaggcaacta tggatgaacgaaatagacagatcgctgagataggtgcctcactgattaag cattggtaactgtcagaccaagtttactcatatatactttagattgattt aaaacttcatttttaatttaaaaggatctaggtgaagatcctttttgata atctcatgaccaaaatcccttaacgtgagttttcgttccactgagcgtca gaccccgtagaaaagatcaaaggatcttcttgagatcctttttttctgcg cgtaatctgctgcttgcaaacaaaaaaaccaccgctaccagcggtggttt gtttgccggatcaagagctaccaactctttttccgaaggtaactggcttc agcagagcgcagataccaaatactgtccttctagtgtagccgtagttagg ccaccacttcaagaactctgtagcaccgcctacatacctcgctctgctaa tcctgttaccagtggctgctgccagtggcgataagtcgtgtcttaccggg ttggactcaagacgatagttaccggataaggcgcagcggtcgggctgaac ggggggttcgtgcacacagcccagcttggagcgaacgacctacaccgaac tgagatacctacagcgtgagctatgagaaagcgccacgcttcccgaaggg agaaaggcggacaggtatccggtaagcggcagggtcggaacaggagagcg cacgagggagcttccagggggaaacgcctggtatctttatagtcctgtcg ggtttcgccacctctgacttgagcgtcgatttttgtgatgctcgtcaggg gggcggagcctatggaaaaacgccagcaacgcggcctttttacggttcct ggccttttgctggccttttgctcacatgttctttcctgcgttatcccctg attctgtggataaccgtattaccgcctttgagtgagctgataccgctcgc cgcagccgaacgaccgagcgcagcgagtcagtgagcgaggaagcggaaga gcgcctgatgcggtattttctccttacgcatctgtgcggtatttcacacc gcatatatggtgcactctcagtacaatctgctctgatgccgcatagttaa gccagtatacactccgctatcgctacgtgactgggtcatggctgcgcccc gacacccgccaacacccgctgacgcgccctgacgggcttgtctgctcccg gcatccgcttacagacaagctgtgaccgtctccgggagctgcatgtgtca gaggttttcaccgtcatcaccgaaacgcgcgaggcagctgcggtaaagct catcagcgtggtcgtgaagcgattcacagatgtctgcctgttcatccgcg tccagctcgttgagtttctccagaagcgttaatgtctggcttctgataaa gcgggccatgttaagggcggttttttcctgtttggtcactgatgcctccg tgtaagggggatttctgttcatgggggtaatgataccgatgaaacgagag aggatgctcacgatacgggttactgatgatgaacatgcccggttactgga acgttgtgagggtaaacaactggcggtatggatgcggcgggaccagagaa aaatcactcagggtcaatgccagcgcttcgttaatacagatgtaggtgtt ccacagggtagccagcagcatcctgcgatgcagatccggaacataatggt gcagggcgctgacttccgcgtttccagactttacgaaacacggaaaccga agaccattcatgttgttgctcaggtcgcagacgttttgcagcagcagtcg cttcacgttcgctcgcgtatcggtgattcattctgctaaccagtaaggca accccgccagcctagccgggtcctcaacgacaggagcacgatcatgcgca cccgtggggccgccatgccggcgataatggcctgcttctcgccgaaacgt ttggtggcgggaccagtgacgaaggcttgagcgagggcgtgcaagattcc gaataccgcaagcgacaggccgatcatcgtcgcgctccagcgaaagcggt cctcgccgaaaatgacccagagcgctgccggcacctgtcctacgagttgc 4

5 atgataaagaagacagtcataagtgcggcgacgatagtcatgccccgcgc ccaccggaaggagctgactgggttgaaggctctcaagggcatcggtcgag atcccggtgcctaatgagtgagctaacttacattaattgcgttgcgctca ctgcccgctttccagtcgggaaacctgtcgtgccagctgcattaatgaat cggccaacgcgcggggagaggcggtttgcgtattgggcgccagggtggtt tttcttttcaccagtgagacgggcaacagctgattgcccttcaccgcctg gccctgagagagttgcagcaagcggtccacgctggtttgccccagcaggc gaaaatcctgtttgatggtggttaacggcgggatataacatgagctgtct tcggtatcgtcgtatcccactaccgagatatccgcaccaacgcgcagccc ggactcggtaatggcgcgcattgcgcccagcgccatctgatcgttggcaa ccagcatcgcagtgggaacgatgccctcattcagcatttgcatggtttgt tgaaaaccggacatggcactccagtcgccttcccgttccgctatcggctg aatttgattgcgagtgagatatttatgccagccagccagacgcagacgcg ccgagacagaacttaatgggcccgctaacagcgcgatttgctggtgaccc aatgcgaccagatgctccacgcccagtcgcgtaccgtcttcatgggagaa aataatactgttgatgggtgtctggtcagagacatcaagaaataacgccg gaacattagtgcaggcagcttccacagcaatggcatcctggtcatccagc ggatagttaatgatcagcccactgacgcgttgcgcgagaagattgtgcac cgccgctttacaggcttcgacgccgcttcgttctaccatcgacaccacca cgctggcacccagttgatcggcgcgagatttaatcgccgcgacaatttgc gacggcgcgtgcagggccagactggaggtggcaacgccaatcagcaacga ctgtttgcccgccagttgttgtgccacgcggttgggaatgtaattcagct ccgccatcgccgcttccactttttcccgcgttttcgcagaaacgtggctg gcctggttcaccacgcgggaaacggtctgataagagacaccggcatactc tgcgacatcgtataacgttactggtttcacattcaccaccctgaattgac tctcttccgggcgctatcatgccataccgcgaaaggttttgcgccattcg atggtgtccgggatctcgacgctctcccttatgcgactcctgcattagga agcagcccagtagtaggttgaggccgttgagcaccgccgccgcaaggaat ggtgcatgcaaggagatggcgcccaacagtcccccggccacggggcctgc caccatacccacgccgaaacaagcgctcatgagcccgaagtggcgagccc gatcttccccatcggtgatgtcggcgatataggcgccagcaaccgcacct gtggcgccggtgatgccggccacgatgcgtccggcgtagaggatcgagat ctcgatcccgcgaaattaatacgactcactataggggaattgtgagcgga taacaattcccctctagaaataattttgtttaactttaagaaggagatat acatatgatttcgcacattgtggcaatggatgaaaaccgggtgatcggca aagacaaccgcttgccttggcatttgccggccgatttggcgtattttaaa cgggtgacaatgggccatgccatcgtgatggggcgcaagacgtttgaagc gatcggccggccgcttcccggccgcgataacgtcgttgtcacgcgcaacc gctcgtttcgtccggaaggctgccttgtgcttcattcgctcgaggaagtc aagcaatggatcgcatcgcgcgctgatgaagtgtttatcatcggcggggc cgaactgtttcgggcgacgatgccgattgtcgaccggctgtatgtgacaa aaatttttgcttccttccccggcgatacgttttatccgcccatttctgac gatgaatgggaaatcgtttcctatacgccaggagggaaagatgaaaagaa tccgtatgaacacgcctttatcatttatgagcggaaaaaggcgaaataat GGATCCgaattcgagctccgtcgacaagcttgcggccgcactcgagcacc accaccaccaccactgagatccggctgctaacaaagcccgaaaggaagct gagttggctgctgccaccgctgagcaataactagcataaccccttggggc ctctaaacgggtcttgaggggttttttgctgaaaggaggaactatatccg gat