# Pairwise Sequence Alignment

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Pairwise Sequence Alignment SS 2013

2 Outline Pairwise sequence alignment global - Needleman Wunsch Gotoh algorithm local - Smith Waterman algorithm BLAST - heuristics

3 What is a Sequence Alignment? Quite simply, the comparison of two or more DNA or protein sequences to each other. The purpose of alignment is to highlight similarity between sequences. Alignment is the procedure of writing two (or more) sequences in a way that a maximum of identical or similar characters are placed in the same column by -

4 Word Alignment Species 1: SOMEONE Species 2: AWESOME Species 1: SOMEONE Species 2: AWESOME - - -

5 Less trivial Species 1: ACGTTAGA Species 2: CGTTGAA Species 1: ACGTTAGA Species 2: CGTTGAA Species 1: ACGTTAGA Species 2: - CGTT- GAA

6 Less trivial Species 1: ACGTTAGA Species 2: CGTTGAA score: -15 (gaps = -1, match = 1) Species 1: ACGTTAGA Species 2: - CGTT - GAA score: 3

7 FASTA Format - Input Standard input format for alignment programs >Name1 ASEQUENCE1 >Name2 comments SEQU CE2 Strictly speaking, should not contain gaps

8 FASTA Format - Output Increasingly, multiple alignment returned in FASTAlike format >Name1 ASEQUENCE1 >Name2 comments -SEQU--CE2 etc... - Order of sequences may be different in output to input.

9 Relatedness of residues in same column Making these alignments is EASY... As we know where and which evolutionary events occurred - and must infer it

10 Quiz Which alignment (X, Y or Z) shows only residues related by substitution events in the same column?

11 Types of alignments methods We cannot enummerate all possible alignments. Approaches are: Dot matrix Dynamic Programming Word-based or k-tupel methods (database searches)

12 Dot Matrix Given two

13 In a dot matrix we can identify: Existing alignable parts of sequences Possible indels Duplicated sequences and repeats Self-complementarity Gene-order differences among genomes

14 Dot plots

15 a) A continuous main diagonal shows perfect similarity for symbols with the same indices. b) Parallels to the main diagonal indicate repeated regions in the same reading direction on different parts of the sequences. In this case a region D is found twice in the sequence (D1, D2, so called c) Lines perpendicular to the main diagonal indicate palindromic areas. In this case the sequence is completely palindromic in the displayed area. d) Partially palindromic sequence (For DNA sequences this refers to a perfect match of the normal strand with its reverse complement, which is frequently found for many transposable elements. e) Bold blocks on the main diagonal indicate repetition of the same symbol in both sequences, e.g. (G)50, so called microsatellite repeats f) Parallel lines indicate tandem repeats of a larger motif in both sequences, e.g. (AGCTCTGAC)20, so called minisatellite patterns. The distance between the diagonals equals the distance of the motif. g) When the diagonal is a discontinuous line this indicates that the sequences T1 and T2 share a common source. In literal analyses we may have to deal with plagiarism or in DNA analyses sequences may be homologous because of a common ancestor. The number of interruptions increases with modifications on the text or the time of independent evolution and mutation rate. h) indel sequences this can be often observed for many different types of domains, which got lost or substituted during evolution.

16 Aligning a pair of sequences gap = -15 match = +10, mismatch = 0 Aim: get from one corner to other Moves have a cost Choose cheapest way Fill in table Trace route backwards to find alignment

17 Aligning a pair of sequences (Dynamic Programming) Aim: get from one corner to other Moves have a cost Choose cheapest way Fill in table Trace route backwards to find alignment A G G G A - - G C Aim: get from one corner to other Moves have a cost Choose cheapest way Fill in table Trace route backwards to find alignment A G G G T T T G C

18 Needlemann-Wunsch Algorithm Initialize NxM matrix with the sequences A and B of length N and M Starting at the top left corner set the intermediate scoring value =

19 Substitution matrices Some amino acids are more similar than others Adjust cost according to some similarity matrix E.g. Blosum62 Leu -> Leu: 4 Leu -> Met: 2 Leu -> Pro: etc.

20 Gap panalties Gaps tend to occur together one penalty unrealistic a gap of length three should not cost three times as much Use affine gap cost Make extending an already existing gap cheaper Gap opening (G) / gap extension (E) Total cost for gap length x: G + x E

21 Global vs Local Alignment Global: Find the best overall alignment between sequences. Local: Find short regions of highly conserved sequence.

22 Global vs Local Species 1: SOMEONE Species 2: AWESOME Species 1: SOMEONE Species 2: AWESOME Species 1: SOME Species 2: SOME

23 Smith Watermann Algorithm Instead of looking at each sequence in its entirety this compares segments of all possible lengths (LOCAL alignments) and chooses whichever maximizes the similarity For every cell the algorithm calculates ALL possible paths leading to it. These paths can be of any length and contain insertions and deletions

24 Calculating significance We have calculated the optimal alignment the alignment with the best score related or not call this the maximum segment pair (MSP) How many MSPs do we expect with at least the same score by chance?

25 Calculating significance We make use of the extreme value distribution (EVD) to calculate the number of alignments between random sequences that we expect given our score or better This is known as the e-value E(S) = Kmn K and = scaling parameters calculated based on the search space (K) and scoring scheme ( ) m, n = size of the search space The probability of finding at least one match with our score(the p value) 1-e -E(S) As both the e value and the p value decrease, the biological significance increases

26 BLAST Basic Local Alignment Search Tool: Used to find local sequence alignments between protein and nucleotide sequences (Altschul et al., 1990, cited over 43,000 times) Heuristic so it is an approximate best match (SW is a guarantee) calculate the high scoring matches instead of the maximum scoring matches (HSP instead of MSP)

27 BLAST 28, we will look at 4) GTTCACATCATCCTGC GTTC TTCA TCAC CACA ACAT CATC ATCA...

28 BLAST on scoring matrices) you could call this the neighborhood GTTCACATCATCCTGC GTTC: CTTC,GTTC,GATC... TTCA: TTCT,TTGA,TTGT... TCAC: AGAC,CCAC,TCTG... CACA:... ACAT:... CATC:... ATCA:......

29 BLAST calculate E values expectation that you would get that alignment by change given the database of sequences return significant results we already talked about these e-values and p-values with Smith-Waterman significance

30 BLAST Types: Nucleotide vs. Nucleotide: blastn Protein vs Protein: blastp Translated Nucleotide vs Protein: blastx Protein vs Translated Nucleotide: tblastn Translated Nucleotide vs translated database: tblastx

31 DNA vs protein Should you use blastn or blastp? There are four potential nucleotides A,C,GT and therefore four potential states There are 22 standard amino acids and therefore 22 potential states blastp should be more sensitive because of the lower chance of a random hit than blastn because of the state space If there is the possibility of highly similar sequences, DNA works well intergenic spacers RNA genes

32 Things to consider nothing is 90% homologous there may be a degree of your belief in homology statistical significance depends on the size of the alignments and the database e-value increases as database gets bigger more chance for a random hit e-value decreases as alignments get longer more significant the longer the alignment

33 Therefore sequence similarity can suggest homology a significant alignment over the length of both sequences strongly suggests homology homologous sequences do not always produce significant alignments! regions with low complexity (but that are not cleaned out by initial steps in BLAST) can produce significant alignments with no homology

34 Rules There are no hard and fast rules Nucleotides it has been suggested that sequence identity of more than 70% suggests homology e-values of 10^-6 or less too bad Proteins 25% or more sequence identity e-values of 10^-3 or less nope you have to verify somehow, and if you are high throughput, there will be errors

35 Next We will go over some examples in lab Needleman-Wunsch BLAST

### BLAST. Anders Gorm Pedersen & Rasmus Wernersson

BLAST Anders Gorm Pedersen & Rasmus Wernersson Database searching Using pairwise alignments to search databases for similar sequences Query sequence Database Database searching Most common use of pairwise

### RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the

### Algorithms for Sequence Alignment. Dynamic programming

Algorithms for Sequence Alignment Previous lectures Global alignment (Needleman-Wunsch algorithm) Local alignment (Smith-Waterman algorithm) Heuristic method BLAST Statistics of BLAST scores Dynamic programming

### Similarity Searches on Sequence Databases: BLAST, FASTA. Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003

Similarity Searches on Sequence Databases: BLAST, FASTA Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003 Outline Importance of Similarity Heuristic Sequence Alignment:

### Protein & DNA Sequence Analysis. Bobbie-Jo Webb-Robertson May 3, 2004

Protein & DNA Sequence Analysis Bobbie-Jo Webb-Robertson May 3, 2004 Sequence Analysis Anything connected to identifying higher biological meaning out of raw sequence data. 2 Genomic & Proteomic Data Sequence

### Bio-Informatics Lectures. A Short Introduction

Bio-Informatics Lectures A Short Introduction The History of Bioinformatics Sanger Sequencing PCR in presence of fluorescent, chain-terminating dideoxynucleotides Massively Parallel Sequencing Massively

### What next? Computational Biology and Bioinformatics. Finding homologs 2. Finding homologs. 4. Searching for homologs with BLAST

Computational Biology and Bioinformatics 4. Searching for homologs with BLAST What next? Comparing sequences and searching for homologs Sequence alignment and substitution matrices Searching for sequences

### PROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE 2006 1. E-mail: msm_eng@k-space.org

BIOINFTool: Bioinformatics and sequence data analysis in molecular biology using Matlab Mai S. Mabrouk 1, Marwa Hamdy 2, Marwa Mamdouh 2, Marwa Aboelfotoh 2,Yasser M. Kadah 2 1 Biomedical Engineering Department,

### Rapid alignment methods: FASTA and BLAST. p The biological problem p Search strategies p FASTA p BLAST

Rapid alignment methods: FASTA and BLAST p The biological problem p Search strategies p FASTA p BLAST 257 BLAST: Basic Local Alignment Search Tool p BLAST (Altschul et al., 1990) and its variants are some

### Bioinformatics Resources at a Glance

Bioinformatics Resources at a Glance A Note about FASTA Format There are MANY free bioinformatics tools available online. Bioinformaticists have developed a standard format for nucleotide and protein sequences

### Pairwise sequence alignments

Pairwise sequence alignments Volker Flegel Vassilios Ioannidis VI - 2004 Page 1 Outline Introduction Definitions Biological context of pairwise alignments Computing of pairwise alignments Some programs

### NSilico Life Science Introductory Bioinformatics Course

NSilico Life Science Introductory Bioinformatics Course INTRODUCTORY BIOINFORMATICS COURSE A public course delivered over three days on the fundamentals of bioinformatics and illustrated with lectures,

### Network Protocol Analysis using Bioinformatics Algorithms

Network Protocol Analysis using Bioinformatics Algorithms Marshall A. Beddoe Marshall_Beddoe@McAfee.com ABSTRACT Network protocol analysis is currently performed by hand using only intuition and a protocol

### Introduction to Bioinformatics 3. DNA editing and contig assembly

Introduction to Bioinformatics 3. DNA editing and contig assembly Benjamin F. Matthews United States Department of Agriculture Soybean Genomics and Improvement Laboratory Beltsville, MD 20708 matthewb@ba.ars.usda.gov

### Clone Manager. Getting Started

Clone Manager for Windows Professional Edition Volume 2 Alignment, Primer Operations Version 9.5 Getting Started Copyright 1994-2015 Scientific & Educational Software. All rights reserved. The software

### Amino Acids and Their Properties

Amino Acids and Their Properties Recap: ss-rrna and mutations Ribosomal RNA (rrna) evolves very slowly Much slower than proteins ss-rrna is typically used So by aligning ss-rrna of one organism with that

### BIOINFORMATICS TUTORIAL

Bio 242 BIOINFORMATICS TUTORIAL Bio 242 α Amylase Lab Sequence Sequence Searches: BLAST Sequence Alignment: Clustal Omega 3d Structure & 3d Alignments DO NOT REMOVE FROM LAB. DO NOT WRITE IN THIS DOCUMENT.

### An Introduction to Sequence Similarity ( Homology ) Searching

An Introduction to Sequence Similarity ( Homology ) Searching Gary D. Stormo 1 UNIT 3.1 1 Washington University, School of Medicine, St. Louis, Missouri ABSTRACT Homologous sequences usually have the same,

### BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS NEW YORK CITY COLLEGE OF TECHNOLOGY The City University Of New York School of Arts and Sciences Biological Sciences Department Course title:

### Molecular Databases and Tools

NWeHealth, The University of Manchester Molecular Databases and Tools Afternoon Session: NCBI/EBI resources, pairwise alignment, BLAST, multiple sequence alignment and primer finding. Dr. Georgina Moulton

### CSE8393 Introduction to Bioinformatics Lecture 3: More problems, Global Alignment. DNA sequencing

SE8393 Introduction to Bioinformatics Lecture 3: More problems, Global lignment DN sequencing Recall that in biological experiments only relatively short segments of the DN can be investigated. To investigate

### Score, Bit-score, P-value, E-value

Score, Bit-score, P-value, E-value Score: A number used to assess the biological relevance of a finding. In the context of sequence alignments, a score is a numerical value that describes the overall quality

### DNA & Protein Sequence Comparison

equence Comparison & rotein equence Comparison harm 207 / Bio 207 ecture 2 utbuddin octor equence is often known early in analysis rotein sequence confers more information. lignment between sequences arious

### A Tutorial in Genetic Sequence Classification Tools and Techniques

A Tutorial in Genetic Sequence Classification Tools and Techniques Jake Drew Data Mining CSE 8331 Southern Methodist University jakemdrew@gmail.com www.jakemdrew.com Sequence Characters IUPAC nucleotide

### Algorithms in Bioinformatics I, WS06/07, C.Dieterich 47. This lecture is based on the following, which are all recommended reading:

Algorithms in Bioinformatics I, WS06/07, C.Dieterich 47 5 BLAST and FASTA This lecture is based on the following, which are all recommended reading: D.J. Lipman and W.R. Pearson, Rapid and Sensitive Protein

### Database searching with DNA and protein sequences: An introduction Clare Sansom Date received (in revised form): 12th November 1999

Dr Clare Sansom works part time at Birkbeck College, London, and part time as a freelance computer consultant and science writer At Birkbeck she coordinates an innovative graduate-level Advanced Certificate

### Welcome to the Plant Breeding and Genomics Webinar Series

Welcome to the Plant Breeding and Genomics Webinar Series Today s Presenter: Dr. Candice Hansey Presentation: http://www.extension.org/pages/ 60428 Host: Heather Merk Technical Production: John McQueen

### Elementary Sequence Analysis

Last modified August 19, 2015 Brian Golding, Dick Morton and Wilfried Haerty Department of Biology McMaster University Hamilton, Ontario L8S 4K1 ii These notes are in Adobe Acrobat format (they are available

### Design Style of BLAST and FASTA and Their Importance in Human Genome.

Design Style of BLAST and FASTA and Their Importance in Human Genome. Saba Khalid 1 and Najam-ul-haq 2 SZABIST Karachi, Pakistan Abstract: This subjected study will discuss the concept of BLAST and FASTA.BLAST

### Focusing on results not data comprehensive data analysis for targeted next generation sequencing

Focusing on results not data comprehensive data analysis for targeted next generation sequencing Daniel Swan, Jolyon Holdstock, Angela Matchan, Richard Stark, John Shovelton, Duarte Mohla and Simon Hughes

### CD-HIT User s Guide. Last updated: April 5, 2010. http://cd-hit.org http://bioinformatics.org/cd-hit/

CD-HIT User s Guide Last updated: April 5, 2010 http://cd-hit.org http://bioinformatics.org/cd-hit/ Program developed by Weizhong Li s lab at UCSD http://weizhong-lab.ucsd.edu liwz@sdsc.edu 1. Introduction

### DNA Insertions and Deletions in the Human Genome. Philipp W. Messer

DNA Insertions and Deletions in the Human Genome Philipp W. Messer Genetic Variation CGACAATAGCGCTCTTACTACGTGTATCG : : CGACAATGGCGCT---ACTACGTGCATCG 1. Nucleotide mutations 2. Genomic rearrangements 3.

### Sequence Analysis 15: lecture 5. Substitution matrices Multiple sequence alignment

Sequence Analysis 15: lecture 5 Substitution matrices Multiple sequence alignment A teacher's dilemma To understand... Multiple sequence alignment Substitution matrices Phylogenetic trees You first need

### Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6

Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6 In the last lab, you learned how to perform basic multiple sequence alignments. While useful in themselves for determining conserved residues

### A Randomized Algorithm for Distance Matrix Calculations in Multiple Sequence Alignment *

33 A Randomized Algorithm for Distance Matrix Calculations in Multiple Sequence Alignment * Sanguthevar Rajasekaran, Vishal Thapar, Hardik Dave, and Chun-Hsi Huang School of Computer Science and Engineering,

### MORPHEUS. http://biodev.cea.fr/morpheus/ Prediction of Transcription Factors Binding Sites based on Position Weight Matrix.

MORPHEUS http://biodev.cea.fr/morpheus/ Prediction of Transcription Factors Binding Sites based on Position Weight Matrix. Reference: MORPHEUS, a Webtool for Transcripton Factor Binding Analysis Using

### Ordered Index Seed Algorithm for Intensive DNA Sequence Comparison

Ordered Index Seed Algorithm for Intensive DNA Sequence Comparison Dominique Lavenier IRISA / CNRS Campus de Beaulieu 35042 Rennes, France lavenier@irisa.fr Abstract This paper presents a seed-based algorithm

### Heuristics for the Sorting by Length-Weighted Inversions Problem on Signed Permutations

Heuristics for the Sorting by Length-Weighted Inversions Problem on Signed Permutations AlCoB 2014 First International Conference on Algorithms for Computational Biology Thiago da Silva Arruda Institute

### Comprehensive Examinations for the Program in Bioinformatics and Computational Biology

Comprehensive Examinations for the Program in Bioinformatics and Computational Biology The Comprehensive exams will be given once a year. The format will be six exams. Students must show competency on

### Exercise 11 - Understanding the Output for a blastn Search (excerpted from a document created by Wilson Leung, Washington University)

Exercise 11 - Understanding the Output for a blastn Search (excerpted from a document created by Wilson Leung, Washington University) Read the following tutorial to better understand the BLAST report for

### Linear Sequence Analysis. 3-D Structure Analysis

Linear Sequence Analysis What can you learn from a (single) protein sequence? Calculate it s physical properties Molecular weight (MW), isoelectric point (pi), amino acid content, hydropathy (hydrophilic

### Core Bioinformatics. Degree Type Year Semester. 4313473 Bioinformàtica/Bioinformatics OB 0 1

Core Bioinformatics 2014/2015 Code: 42397 ECTS Credits: 12 Degree Type Year Semester 4313473 Bioinformàtica/Bioinformatics OB 0 1 Contact Name: Sònia Casillas Viladerrams Email: Sonia.Casillas@uab.cat

### Bioinformática BLAST. Blast information guide. Buscas de sequências semelhantes. Search for Homologies BLAST

BLAST Bioinformática Search for Homologies BLAST BLAST - Basic Local Alignment Search Tool http://blastncbinlmnihgov/blastcgi 1 2 Blast information guide Buscas de sequências semelhantes http://blastncbinlmnihgov/blastcgi?cmd=web&page_type=blastdocs

### Sequence Alignment Ulf Leser

Sequence Alignment Ulf Leser his Lecture Approximate String Matching Edit distance and alignment Computing a global alignment Local alignment Ulf Leser: Bioinformatics, Summer Semester 2011 2 ene Function

### Accelerated BLAST Performance with Tera-BLAST : a comparison of FPGA versus GPU and CPU BLAST implementations

Technical Note Accelerated BLAST Performance with : a comparison of FPGA versus GPU and CPU BLAST implementations TimeLogic Division, Active Motif Inc, 1914 Palomar Oaks Way, Suite 150, Carlsbad, CA 92008

### Bioinformatics Grid - Enabled Tools For Biologists.

Bioinformatics Grid - Enabled Tools For Biologists. What is Grid-Enabled Tools (GET)? As number of data from the genomics and proteomics experiment increases. Problems arise for the current sequence analysis

### Computational searches of biological sequences

UNAM, México, Enero 78 Computational searches of biological sequences Special thanks to all the scientis that made public available their presentations throughout the web from where many slides were taken

### Biological Databases and Protein Sequence Analysis

Biological Databases and Protein Sequence Analysis Introduction M. Madan Babu, Center for Biotechnology, Anna University, Chennai 25, India Bioinformatics is the application of Information technology to

3. About R2oDNA Designer Please read these publications for more details: Casini A, Christodoulou G, Freemont PS, Baldwin GS, Ellis T, MacDonald JT. R2oDNA Designer: Computational design of biologically-neutral

### A COMPARISON OF COMPUTATION TECHNIQUES FOR DNA SEQUENCE COMPARISON

International Journal of Research in Computer Science eissn 2249-8265 Volume 2 Issue 3 (2012) pp. 1-6 White Globe Publications A COMPARISON OF COMPUTATION TECHNIQUES FOR DNA SEQUENCE COMPARISON Harshita

### Sequence Formats and Sequence Database Searches. Gloria Rendon SC11 Education June, 2011

Sequence Formats and Sequence Database Searches Gloria Rendon SC11 Education June, 2011 Sequence A is the primary structure of a biological molecule. It is a chain of residues that form a precise linear

### SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications

Product Bulletin Sequencing Software SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications Comprehensive reference sequence handling Helps interpret the role of each

### SGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, 2012. Abstract. Haruna Cofer*, PhD

White Paper SGI High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems Haruna Cofer*, PhD January, 2012 Abstract The SGI High Throughput Computing (HTC) Wrapper

### Genome Explorer For Comparative Genome Analysis

Genome Explorer For Comparative Genome Analysis Jenn Conn 1, Jo L. Dicks 1 and Ian N. Roberts 2 Abstract Genome Explorer brings together the tools required to build and compare phylogenies from both sequence

### Sequence Analysis Instructions

Sequence Analysis Instructions In order to predict your drug metabolizing phenotype from your CYP2D6 gene sequence, you must determine: 1) The assembled sequence from your two opposing sequencing reactions

### Module 1. Sequence Formats and Retrieval. Charles Steward

The Open Door Workshop Module 1 Sequence Formats and Retrieval Charles Steward 1 Aims Acquaint you with different file formats and associated annotations. Introduce different nucleotide and protein databases.

### Biology Performance Level Descriptors

Limited A student performing at the Limited Level demonstrates a minimal command of Ohio s Learning Standards for Biology. A student at this level has an emerging ability to describe genetic patterns of

### Nature of Genetic Material. Nature of Genetic Material

Core Category Nature of Genetic Material Nature of Genetic Material Core Concepts in Genetics (in bold)/example Learning Objectives How is DNA organized? Describe the types of DNA regions that do not encode

### Biological Sequence Data Formats

Biological Sequence Data Formats Here we present three standard formats in which biological sequence data (DNA, RNA and protein) can be stored and presented. Raw Sequence: Data without description. FASTA

### Guide for Bioinformatics Project Module 3

Structure- Based Evidence and Multiple Sequence Alignment In this module we will revisit some topics we started to look at while performing our BLAST search and looking at the CDD database in the first

### SAM Teacher s Guide DNA to Proteins

SAM Teacher s Guide DNA to Proteins Note: Answers to activity and homework questions are only included in the Teacher Guides available after registering for the SAM activities, and not in this sample version.

### Using MATLAB: Bioinformatics Toolbox for Life Sciences

Using MATLAB: Bioinformatics Toolbox for Life Sciences MR. SARAWUT WONGPHAYAK BIOINFORMATICS PROGRAM, SCHOOL OF BIORESOURCES AND TECHNOLOGY, AND SCHOOL OF INFORMATION TECHNOLOGY, KING MONGKUT S UNIVERSITY

### 12/22/2014. Read the introduction. How does a cell make proteins with the information from DNA? Protein Synthesis: Transcription and Translation

EQ How does a cell make proteins with the information from DNA? Protein Synthesis: Get Started Get Started Think of a corn cell that is genetically modified to contain the Bt gene and a corn cell that

### A greedy algorithm for the DNA sequencing by hybridization with positive and negative errors and information about repetitions

BULLETIN OF THE POLISH ACADEMY OF SCIENCES TECHNICAL SCIENCES, Vol. 59, No. 1, 2011 DOI: 10.2478/v10175-011-0015-0 Varia A greedy algorithm for the DNA sequencing by hybridization with positive and negative

### The sequence of bases on the mrna is a code that determines the sequence of amino acids in the polypeptide being synthesized:

Module 3F Protein Synthesis So far in this unit, we have examined: How genes are transmitted from one generation to the next Where genes are located What genes are made of How genes are replicated How

### Developing an interactive webbased learning. environment for bioinformatics. Master thesis. Daniel Løkken Rustad UNIVERSITY OF OSLO

UNIVERSITY OF OSLO Department of Informatics Developing an interactive webbased learning environment for bioinformatics Master thesis Daniel Løkken Rustad 27th July 2005 Preface Preface This thesis is

Phylogenetic Trees Made Easy A How-To Manual Fourth Edition Barry G. Hall University of Rochester, Emeritus and Bellingham Research Institute Sinauer Associates, Inc. Publishers Sunderland, Massachusetts

### Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

1 of 8 11/7/2004 11:00 AM National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools

### BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16

Course Director: Dr. Barry Grant (DCM&B, bjgrant@med.umich.edu) Description: This is a three module course covering (1) Foundations of Bioinformatics, (2) Statistics in Bioinformatics, and (3) Systems

### PREDA S4-classes. Francesco Ferrari October 13, 2015

PREDA S4-classes Francesco Ferrari October 13, 2015 Abstract This document provides a description of custom S4 classes used to manage data structures for PREDA: an R package for Position RElated Data Analysis.

### Apply PERL to BioInformatics (II)

Apply PERL to BioInformatics (II) Lecture Note for Computational Biology 1 (LSM 5191) Jiren Wang http://www.bii.a-star.edu.sg/~jiren BioInformatics Institute Singapore Outline Some examples for manipulating

### Analyzing A DNA Sequence Chromatogram

LESSON 9 HANDOUT Analyzing A DNA Sequence Chromatogram Student Researcher Background: DNA Analysis and FinchTV DNA sequence data can be used to answer many types of questions. Because DNA sequences differ

### Student Guide for Mesquite

MESQUITE Student User Guide 1 Student Guide for Mesquite This guide describes how to 1. create a project file, 2. construct phylogenetic trees, and 3. map trait evolution on branches (e.g. morphological

### Chapter 6 DNA Replication

Chapter 6 DNA Replication Each strand of the DNA double helix contains a sequence of nucleotides that is exactly complementary to the nucleotide sequence of its partner strand. Each strand can therefore

### Innovations in Molecular Epidemiology

Innovations in Molecular Epidemiology Molecular Epidemiology Measure current rates of active transmission Determine whether recurrent tuberculosis is attributable to exogenous reinfection Determine whether

### Using the NCBI Genome Databases to Compare the Genes for Human & Chimpanzee Beta Hemoglobin

Using the NCBI Genome Databases to Compare the Genes for Human & Chimpanzee Beta Hemoglobin Author(s) :Susan Offner Source: The American Biology Teacher, 72(4):252-256. 2010. Published By: National Association

### Phylogenetic Analysis using MapReduce Programming Model

2015 IEEE International Parallel and Distributed Processing Symposium Workshops Phylogenetic Analysis using MapReduce Programming Model Siddesh G M, K G Srinivasa*, Ishank Mishra, Abhinav Anurag, Eklavya

### When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want

1 When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want to search other databases as well. There are very

### DNA Replication & Protein Synthesis. This isn t a baaaaaaaddd chapter!!!

DNA Replication & Protein Synthesis This isn t a baaaaaaaddd chapter!!! The Discovery of DNA s Structure Watson and Crick s discovery of DNA s structure was based on almost fifty years of research by other

### HIV NOMOGRAM USING BIG DATA ANALYTICS

HIV NOMOGRAM USING BIG DATA ANALYTICS S.Avudaiselvi and P.Tamizhchelvi Student Of Ayya Nadar Janaki Ammal College (Sivakasi) Head Of The Department Of Computer Science, Ayya Nadar Janaki Ammal College

### DSEARCH: sensitive database searching using distributed computing

DSEARCH: sensitive database searching using distributed computing Keane T.M. 1 and Naughton T.J. 1 1 Department of Computer Science, National University of Ireland, Maynooth, Ireland Email: tom.naughton@may.ie

### What is a Gene? HC70AL Spring An Introduction to Bioinformatics -- Part I. What are the 4 Nucleotides By in DNA?

APPENDIX 2 - BIOINFORMATICS (PARTS I AND II) What is a Gene? HC70AL Spring 2004 An ordered sequence of nucleotides An Introduction to Bioinformatics -- Part I What are the 4 Nucleotides By in DNA? Brandon

### Module 10: Bioinformatics

Module 10: Bioinformatics 1.) Goal: To understand the general approaches for basic in silico (computer) analysis of DNA- and protein sequences. We are going to discuss sequence formatting required prior

### BSCI410-Liu/SP07 Exam #2 Apr. 5, 2007

Your Name: KEY UID# 1. (20 points) Dr. Liu has isolated a recessive Arabidopsis mutation; mutants homozygous for this mutation produce small seeds. She named this mutant tiny. To map and clone the corresponding

### Next Generation Sequencing: Technology, Mapping, and Analysis

Next Generation Sequencing: Technology, Mapping, and Analysis Gary Benson Computer Science, Biology, Bioinformatics Boston University gbenson@bu.edu http://tandem.bu.edu/ The Human Genome Project took

### Biological Sciences Initiative. Human Genome

Biological Sciences Initiative HHMI Human Genome Introduction In 2000, researchers from around the world published a draft sequence of the entire genome. 20 labs from 6 countries worked on the sequence.

### String Edit Distance (and intro to dynamic programming) Lecture #4 Computational Linguistics CMPSCI 591N, Spring 2006

String Edit Distance (and intro to dynamic programming) Lecture # omputational Linguistics MPSI 59N, Spring 6 University of Massachusetts mherst ndrew Mcallum Dynamic Programming (Not much to do with programming

### LAB 21 Using Bioinformatics to Investigate Evolutionary Relationships; Have a BLAST!

LAB 21 Using Bioinformatics to Investigate Evolutionary Relationships; Have a BLAST! Introduction: Between 1990-2003, scientists working on an international research project known as the Human Genome Project,

### Multiple Sequence Alignment the basics

BSC4933(04)/ISC5224(01): Introduction to Bioinformatics Florida State University School of Computational Science and Department of Biological Science Feb. 9, 2009 Multiple Sequence Alignment the basics

BIO 5099: Molecular Biology for Computer Scientists (et al) Lecture 13: DNA replication and repair http://compbio.uchsc.edu/hunter/bio5099 Larry.Hunter@uchsc.edu Some comments on biochemistry The last

### Chapter 12 - DNA Technology

Bio 100 DNA Technology 1 Chapter 12 - DNA Technology Among bacteria, there are 3 mechanisms for transferring genes from one cell to another cell: transformation, transduction, and conjugation 1. Transformation

### Multiple Sequence Alignment. Hot Topic 5/24/06 Kim Walker

Multiple Sequence Alignment Hot Topic 5/24/06 Kim Walker Outline Why are Multiple Sequence Alignments useful? What Tools are Available? Brief Introduction to ClustalX Tools to Edit and Add Features to

### BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 seema@iasri.res.in Genomics A genome is an organism s

### This task contains question. Please answer these questions in groups of two persons and make a small report.

Tasks Monday January 21st 2006 Goals: - to work with public databases on the internet to find gene and protein information. - To use tools to analyse and compare DNA sequences - To find homologous sequences

### Final Project Report

CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes

### Usability in bioinformatics mobile applications

Usability in bioinformatics mobile applications what we are working on Noura Chelbah, Sergio Díaz, Óscar Torreño, and myself Juan Falgueras App name Performs Advantajes Dissatvantajes Link The problem

### Data for phylogenetic analysis

Data for phylogenetic analysis The data that are used to estimate the phylogeny of a set of tips are the characteristics of those tips. Therefore the success of phylogenetic inference depends in large