ABSTRACT. KEY WORDS sequence alignment, global alignment, local alignment, dynamic programming, progressive alignment, iterative alignment

Size: px
Start display at page:

Download "ABSTRACT. KEY WORDS sequence alignment, global alignment, local alignment, dynamic programming, progressive alignment, iterative alignment"

Transcription

1 BIOINFORMATICS: SEQUENCE ALIGNMENT Carmen Nigro Department of Computing Sciences Villanova University Villanova, Pennsylvania ABSTRACT As more data from DNA and protein sequences is discovered, sequence alignment programs are becoming increasingly important for analyzing this data. These alignments can help us learn more about the functions of certain genes and proteins and these observations could ultimately lead to the discovery of cures for certain genetic diseases or even a better insight into the evolutionary process. There are many different types of alignment strategies and choosing the appropriate strategy depends on the ultimate goal of the alignment. This paper outlines and compares the basic approaches and algorithms for sequence alignment. KEY WORDS sequence alignment, global alignment, local alignment, dynamic programming, progressive alignment, iterative alignment 1. INTRODUCTION Sequence alignment is an important division of bioinformatics, which attempts to analyze and compare sequences that make up DNA or proteins. Sequence alignment is a way of comparing two or more sequences by searching for a series of individual characters that are in the same order of both sequences [1]. As improved methods were found for collecting biological data, such as nucleotide and amino acid sequences, there were privacy concerns for the creation of a database for easy storage, retrieval, and revision of the data. Today, bioinformatics scientists are interested in the analysis and interpretation of that data. Because these sequences are too long to be analyzed manually by people, efficient and accurate alignment programs are essential for comparing DNA or protein sequences. The sequences that are being compared are usually represented by nitrogenous bases for DNA sequences or amino acids for protein sequences. There are four different nitrogenous bases which code for DNA, while there are 20 different amino acids which code for different proteins. Through sequence alignments, attempts can be made to identify homologous sequences, that is, sequences with a common evolutionary origin [1]. The discovery of homologous sequences may help to predict the evolutionary process based on segments with mutations and segments which have remained the same over time. Sequence alignment also has functional importance, as sequences that are alike may have the same role or code for the same entity. The Drug Industry has benefited from applying this notion when designing new drugs to treat certain diseases. Some diseases are caused by the lack of certain parts of a protein sequence. Sequence alignment can help to identify those regions, while the lack of parts of a sequence may be compensated by injecting the missing sequence into the protein. Sequence alignment has also been useful for analyzing protein structure. Protein molecules that are alike in sequence are also more likely to have similar structures, as many of the same bonds will form. In addition, similar protein sequences have been used to determine protein structure-function relationships [2]. 2. GLOBAL AND LOCAL ALIGNMENTS Global and local alignments are two different methods of aligning a sequence. Deciding which method to choose depends on the purpose of the alignment. Global alignments attempt to compare every residue of every sequence and are best employed when the sequences are similar and are of the same size, because different sized sequences will produce mismatches at the ends of an alignment. However, when attempting to align every element of dissimilar sequences many gaps will be produced because of the many mismatches between the two sequences, as seen in figure 1. When comparing two long sequences, these gaps can become difficult to analyze. Local alignments are best employed for dissimilar sequences that may have similar regions [3]. Local alignments are very useful for finding a particular pattern that exists on both sequences, as that pattern may also have a similar function. If both sequences are very similar, it should not make a difference which method is used, because the alignments should produce similar results. There is also no difference in time efficiency between the two methods. The most fundamental global and local alignment algorithms are based on dynamic programming. The Needleman-Wunsch algorithm is based on dynamic programming and solves the global alignment problem, while the Smith-Waterman algorithm is also based on dynamic programming and solves the local alignment problem.

2 Figure 1. Examples of local and global alignments 3. PAIRWISE AND MULTIPLE ALIGNMENTS Pairwise alignments attempt to align two sequences at a time while multiple alignments attempt to align three or more sequences at a time. Analyzing three or more sequences at a time can be useful for studying molecular evolution and analyzing sequence-structure relationships [1]. Also, the detection of a pattern common to a set of sequences may only be apparent through multiple sequence alignment [1]. While the dynamic programming techniques described above are reliable methods of alignment, they are not practical to implement for multiple alignments. By extending the dynamic programming algorithm for multiple alignments, an optimal alignment will be produced in time O(n k ) for k sequences [13]. The problem of multiple sequence alignment grows exponentially every time another sequence is added and becomes unreasonable for comparing more than three sequences at a time [1]. Due to the impracticality of using dynamic programming algorithms to solve the multiple alignment problem, many heuristic algorithms have been sought after, which sacrifice accuracy for time efficiency. Heuristic approaches attempt to optimize pairwise alignments rather than searching for an overall optimal alignment [15]. Over 75 methods of solving the multiple alignment problem have been identified and the problem continues to be central to computational molecular biology [15]. 4. SCORING FUNCTIONS Scoring schemes are important for sequence alignment programs, because they are a means of comparing different alignments. In alignment algorithms a scoring function must exist so that scores may be assigned to different alignments based on the number of gaps and the number of matches. Scores are assigned to each possible pair of elements based on their similar chemical properties and evolutionary probability of the mutation. Gap costs are also an important part of any sequence alignment program and have been studied extensively. Gap costs may take into account that a mutational event may insert or delete multiple elements [2]. Gap costs must also take into account aligning elements with nulls, when sequences are of different lengths. Algorithms that have a fixed penalty for each gap are popular and are easily extendable to multiple alignments [2]. An example of a scoring scheme with fixed penalties can be seen in figure 3. Figure 2. A simple scoring function One type of scoring for multiple alignments is the Sum-of-pairs score, which increases with the number of sequences aligned correctly [3]. For multiple alignments, the sum of the pairs is the total of all alignment costs for each pair of the sequences in the alignment. A column score may also be implemented in a multiple sequence alignment program which tests the capability of the program to align all of the sequences correctly. Scoring functions are crucial to any alignment program, because they directly affect the choice of the optimal alignment. 5. SEQUENCE ALIGNMENT ALGORITHMS Significant research into algorithmic approaches to sequence alignment has been performed over the past 20 years [10]. The most popular and current sequence alignment algorithms in use today fall into the following major classifications. 5.1 Dynamic Programming Dynamic programming involves breaking a larger problem down into smaller, more manageable pieces. The basic dynamic programming approach for sequence alignment finds an optimal path through a rectangular path graph. It accomplishes this by turning one sequence into another through a series of edits. Each edit to the sequence is associated with a particular cost and the purpose is to find the edits that produce the lowest cost [1]. This method drastically reduces the number of alignments to be considered while always producing an optimal alignment. Both the Needleman-Wunsch and Smith-Waterman algorithms are based on the dynamic programming method and have a time efficiency of O(nm), n and m being the lengths of the two sequences. However, the basic dynamic programming algorithm runs in O(n k ) for multiple alignment, where k is the number of sequences. The Needle-Wunsch algorithm works by maximizing the number of matches and minimizing the number of gaps needed to align the two sequences. A scoring function must exist so that scores may be assigned to the alignments based on the number of matches and the number of gaps of the alignment. The alignment with the largest score will be the optimal alignment. It is

3 implemented through the use of a scoring matrix in which the horizontal and vertical axes correspond to the two sequences. The algorithm compares every element of a sequence to every other element in the other sequence and then traces back to find the optimal alignment. Execution of the Needleman-Wunsch algorithm can be seen in figure 3. Figure 3. Sample Execution of the Needleman-Wunsch algorithm The Smith-Waterman algorithm acts in a similar manner, but produces a local alignment by finding the region with the highest similarity. The Smith-Waterman algorithm may be obtained from the Needleman-Wunsch algorithm by adjusting the scoring function and changing the method of tracing back to find the longest matching subsequences. 5.2 Progressive Algorithms Progressive alignment is the most widely used heuristic method to align a large number of sequences and operates in O(n 2 k 2 ) time [4,15]. Progressive methods, also known as hierarchical or tree methods, produce a multiple sequence alignment by first aligning the most similar sequences and then successively adding less related sequences to the alignment until the entire set of sequences has been aligned. A guide tree is produced that determines the order in which the sequences are added to the alignment. The most related sequences are aligned first [4]. The tree describing sequence relatedness is usually produced through pairwise comparisons that may include heuristic pairwise alignment methods. This technique is used in many multiple alignment programs such as MULTALIGN, ClustalW, and T-Coffee [4]. However, results of progressive alignments depend heavily on the choice of the most related sequences, which can sometimes be difficult to determine. Also, because the alignment is built up progressively, errors made at any stage in the alignment will be reflected in the final result. These methods generally perform poorly on distantly related sequences. Most progressive methods modify their scoring function by incorporating a weighting function which assigns scaling factors to individual sequences based on their distance from their neighbors in the guide tree. This is used to correct the order in which the sequences are added to the alignment. 5.3 Iterative Algorithms Iterative methods have been produced to help solve the problems surrounding progressive algorithms [4]. Progressive alignments are largely dependent upon the initial alignment, because it is incorporated into the final result. In progressive methods, once a sequence has been aligned, its alignment is not revisited [4]. Iterative methods optimize an objective function based on an alignment scoring function by creating an initial global alignment and then realigning sequence subsets. The realigned subsets are then aligned to produce the next iteration s alignment. This approach has been implemented in programs such as, MUSCLE and DIALIGN [4]. 5.4 Summary While dynamic programming algorithms produce the most accurate sequence alignments, they are not always practical to implement for multiple alignment as their time efficiency grows exponentially as more sequences are added to the alignment. Heuristic algorithms, such as progressive and iterative methods, generally sacrifice accuracy for the sake of time. These types of algorithms have been implemented in the most widely used alignment programs today. It is generally believed that iterative methods are more accurate than progressive methods, because they take into account past alignments each time a new sequence is added. 6. PROPOSAL My proposal aims to identify the specific effects that iteration may have on a progressive alignment algorithm. By default, the ClustalW program performs a progressive alignment only; however, an option has been added which allows for iteration at each step of the progressive alignment. My proposed work aims to compare the scores of multiple alignments when they are aligned with and without iteration. This work also aims to trace the magnitude of the effect that the number of sequences has on the scores of the iterative alignments versus the noniterative alignments. This can be easily done by subsequently adding more sequences to each alignment. This proposal also aims to compare the progressive alignments observed from the program, MULTALIGN, with the both the iterative and non-iterative alignments from the ClustalW tests. These two programs are easily comparable as they both produce global alignments. In order to compare alignments from different programs, objective criteria are needed to determine the quality of an alignment. The BAliBASE benchmark alignment database would serve as a valuable tool for comparing alignments from the two programs. The

4 database contains 142 reference alignments, which could be used for this project [3]. This work will help to more clearly identify the effectiveness of iterative methods compared to progressive methods. It will also help to identify the most accurate kinds of sequence alignment programs available for biologists today. This will help to ensure that biologists are using the most accurate tools available for sequence alignment. Studying and comparing these algorithms could also help us gain better insight into other optimization problems. These algorithms may also be applicable to other fields within computer science, which makes the refinement of such algorithms even more significant. While the studies conducted by Julie D. Thompson et al and Iain M. Wallace et al both conclude that iterative methods are more accurate than progressive methods, this study intends to take a closer look at the effect of the number of sequences on the overall alignment [4,5]. It is hypothesized that iteration will have an even greater effect on multiple alignments as more sequences are added to the alignment. This proposed work has been influenced by and hopes to extend the works of Thompson et al and Wallace et al in the field of iterative multiple sequence alignment. Both ClustalW and MULTALIGN programs are available to download for free at and respectively. MULTALIGN runs on a Unix OS, while ClustalW runs on Windows OS and has been provided with a friendly user interface called ClustalX, where it may be specified whether or not to perform iteration for a specific alignment. One of the main reasons for the success of the ClustalW program is its ease of use [6]. The BAliBASE database is also available for download at All of the components needed for this project are readily available and easily accessible from any internet connection. The work surrounding the project will include becoming acquainted with both the ClustalW and MULTALIGN programs and their underlying algorithms, performing alignment tests for both the ClustalW and MULTALIGN programs, and comparing the results using the BAliBASE database. Tests will also be run on the ClustalW program with and without iteration on a series of different multiple alignments containing different numbers of sequences. Both my experiences as a computer science student and a biology student will be useful for this project. The Analysis of Algorithms class will have been particularly useful in analyzing the efficiency of the algorithms, while my biology class will have helped me to understand the needs of the biologist when analyzing alignment programs. My experiences in both fields will have helped me to become familiar with the terminology in a field of study which merges the two fields together. This project is expected to last about two months and a tentative timetable for the project can be seen in Table 1. Week Task 1 Become acquainted with ClustalW 2-4 Run ClustalW tests 5-6 Become acquainted with MULTALIGN 7-8 Run MULTALIGN tests and compare results Table 1. A tentative schedule for the project Less time has been set aside to become acquainted with the ClustalW program, because of its ease of use. It is believed that the tests described will help us gain a better understanding of the effects of iteration on progressive multiple alignment algorithms. 7. CONCLUSION Multiple sequence alignment is the backbone of comparative and evolutionary genomics, as it allows for a number of sequences to be matched against one another at the same time [13]. Although dynamic programming algorithms are the most accurate known algorithms for sequence alignment, they are inefficient for multiple sequence alignment. Currently heuristic algorithms are implemented for the most popular sequence alignment programs, because of their efficiency. However, these algorithms sacrifice accuracy for time. Additional research must be carried out to refine these algorithms in order to increase their accuracy while also maintaining their efficiency. REFERENCES [1] D.G. Brown, A survey of seeding for sequence alignment, University of Waterloo, Waterloo, Ontario, Canada, [2] D.J. Lipman, S.F. Altschul, and J.D. Kececioglu, A Tool for Multiple Sequence Alignment, Proc. Nail. Acad. Sci. USA, Vol. 86, pp , June [3] H. Rangwala and G. Karypis, Incremental window-based protein sequence alignment algorithms, Oxford Journals: Bioinformatics, Vol. 23, pp. e17-e23, [4] I. M. Wallace, O. Orla, and D. G. Higgins, Evaluation of Iterative Alignment Algorithms for Multiple Alignment, Oxford Journals: Bioinformatics, Vol. 21, pp , [5] J.D. Thompson, F. Plewniak, and O. Poch, A comprehensive comparison of multiple sequence

5 alignment programs, Oxford Journals: Nucleic Acids Research, Vol. 27, pp , [6] J.D. Thompson, T.J. Gibson, F. Plewniak, F. Jeanmougin, and D. G. Higgins, The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools, Oxford Journals: Nucleic Acids Research, Vol. 25, pp , [7] J. Hérisson, G. Payen, and R. Gherbi, A 3D pattern matching algorithm for DNA sequences, Oxford Journals: Bioinformatics, Vol. 23, pp , [8] J. M. Sauder, J. W. Arthur, and.r L. Dunbrack, Jr., Large-Scale Comparison of Protein Sequence Alignment Algorithms With Structure Alignments, Proteins: Structure, Function, and Genetics, Vol. 40, pp. 6-22, [9] L. A. Newberg, Memory efficient dynamic programming backtrace and pairwise local sequence alignment, Oxford Journals: Bioinformatics, Vol. 24, pp , [10] L. Delcher, A. Phillippy, J. Carlton and S. L. Salzberg, Fast algorithms for large-scale genome alignment and comparison, Oxford Journals: Nucleic Acids Research, Vol. 30, pp , [11] M.S. Waterman, Efficient Sequence Alignment Algorithms, J. theor. Biol., Vol. 108, pp , [12] R. Chenna, H. Sugawara, T. Koike, R. Lopez, T.J. Gibson, D.G. Higgins, and J.D. Thompson, Multiple sequence alignment with the Clustal series of programs, Oxford Journals: Nucleic Acids Research, Vol. 31, pp , [13] S. Kumar, A. Filipski, Multiple Sequence Alignment: In pursuit of homologous DNA positions, Cold Spring Harbor Laboratory Press: Genome Research, Vol. 17, pp , [14] T. W. Lam, W. K. Sung, S. L. Tam, C. K. Wong, and S. M. Yiu, Compressed indexing and local alignment of DNA, Oxford Journals: Bioinformatics, Vol. 24, pp , [15] Y. Bilu, P. K. Agarwal, R. Kolodny, Faster Algorithms for Optimal Multiple Sequence Alignment Based on Pairwise Comparisons, IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 3, pp , 2006.

Pairwise Sequence Alignment

Pairwise Sequence Alignment Pairwise Sequence Alignment carolin.kosiol@vetmeduni.ac.at SS 2013 Outline Pairwise sequence alignment global - Needleman Wunsch Gotoh algorithm local - Smith Waterman algorithm BLAST - heuristics What

More information

Network Protocol Analysis using Bioinformatics Algorithms

Network Protocol Analysis using Bioinformatics Algorithms Network Protocol Analysis using Bioinformatics Algorithms Marshall A. Beddoe Marshall_Beddoe@McAfee.com ABSTRACT Network protocol analysis is currently performed by hand using only intuition and a protocol

More information

Supplementary material: A benchmark of multiple sequence alignment programs upon structural RNAs Paul P. Gardner a Andreas Wilm b Stefan Washietl c

Supplementary material: A benchmark of multiple sequence alignment programs upon structural RNAs Paul P. Gardner a Andreas Wilm b Stefan Washietl c Supplementary material: A benchmark of multiple sequence alignment programs upon structural RNAs Paul P. Gardner a Andreas Wilm b Stefan Washietl c a Department of Evolutionary Biology, University of Copenhagen,

More information

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the

More information

PROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE 2006 1. E-mail: msm_eng@k-space.org

PROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE 2006 1. E-mail: msm_eng@k-space.org BIOINFTool: Bioinformatics and sequence data analysis in molecular biology using Matlab Mai S. Mabrouk 1, Marwa Hamdy 2, Marwa Mamdouh 2, Marwa Aboelfotoh 2,Yasser M. Kadah 2 1 Biomedical Engineering Department,

More information

SGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, 2012. Abstract. Haruna Cofer*, PhD

SGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, 2012. Abstract. Haruna Cofer*, PhD White Paper SGI High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems Haruna Cofer*, PhD January, 2012 Abstract The SGI High Throughput Computing (HTC) Wrapper

More information

Phylogenetic Trees Made Easy

Phylogenetic Trees Made Easy Phylogenetic Trees Made Easy A How-To Manual Fourth Edition Barry G. Hall University of Rochester, Emeritus and Bellingham Research Institute Sinauer Associates, Inc. Publishers Sunderland, Massachusetts

More information

BLAST. Anders Gorm Pedersen & Rasmus Wernersson

BLAST. Anders Gorm Pedersen & Rasmus Wernersson BLAST Anders Gorm Pedersen & Rasmus Wernersson Database searching Using pairwise alignments to search databases for similar sequences Query sequence Database Database searching Most common use of pairwise

More information

Genome Explorer For Comparative Genome Analysis

Genome Explorer For Comparative Genome Analysis Genome Explorer For Comparative Genome Analysis Jenn Conn 1, Jo L. Dicks 1 and Ian N. Roberts 2 Abstract Genome Explorer brings together the tools required to build and compare phylogenies from both sequence

More information

Rapid alignment methods: FASTA and BLAST. p The biological problem p Search strategies p FASTA p BLAST

Rapid alignment methods: FASTA and BLAST. p The biological problem p Search strategies p FASTA p BLAST Rapid alignment methods: FASTA and BLAST p The biological problem p Search strategies p FASTA p BLAST 257 BLAST: Basic Local Alignment Search Tool p BLAST (Altschul et al., 1990) and its variants are some

More information

Protein Sequence Analysis - Overview -

Protein Sequence Analysis - Overview - Protein Sequence Analysis - Overview - UDEL Workshop Raja Mazumder Research Associate Professor, Department of Biochemistry and Molecular Biology Georgetown University Medical Center Topics Why do protein

More information

Guide for Bioinformatics Project Module 3

Guide for Bioinformatics Project Module 3 Structure- Based Evidence and Multiple Sequence Alignment In this module we will revisit some topics we started to look at while performing our BLAST search and looking at the CDD database in the first

More information

Introduction to Bioinformatics 3. DNA editing and contig assembly

Introduction to Bioinformatics 3. DNA editing and contig assembly Introduction to Bioinformatics 3. DNA editing and contig assembly Benjamin F. Matthews United States Department of Agriculture Soybean Genomics and Improvement Laboratory Beltsville, MD 20708 matthewb@ba.ars.usda.gov

More information

Bio-Informatics Lectures. A Short Introduction

Bio-Informatics Lectures. A Short Introduction Bio-Informatics Lectures A Short Introduction The History of Bioinformatics Sanger Sequencing PCR in presence of fluorescent, chain-terminating dideoxynucleotides Massively Parallel Sequencing Massively

More information

Algorithms in Computational Biology (236522) spring 2007 Lecture #1

Algorithms in Computational Biology (236522) spring 2007 Lecture #1 Algorithms in Computational Biology (236522) spring 2007 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: Tuesday 11:00-12:00/by appointment TA: Ilan Gronau, Taub 700, tel 4894 Office

More information

Multiple Sequence Alignment. Hot Topic 5/24/06 Kim Walker

Multiple Sequence Alignment. Hot Topic 5/24/06 Kim Walker Multiple Sequence Alignment Hot Topic 5/24/06 Kim Walker Outline Why are Multiple Sequence Alignments useful? What Tools are Available? Brief Introduction to ClustalX Tools to Edit and Add Features to

More information

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering

More information

Bioinformatics Grid - Enabled Tools For Biologists.

Bioinformatics Grid - Enabled Tools For Biologists. Bioinformatics Grid - Enabled Tools For Biologists. What is Grid-Enabled Tools (GET)? As number of data from the genomics and proteomics experiment increases. Problems arise for the current sequence analysis

More information

Amino Acids and Their Properties

Amino Acids and Their Properties Amino Acids and Their Properties Recap: ss-rrna and mutations Ribosomal RNA (rrna) evolves very slowly Much slower than proteins ss-rrna is typically used So by aligning ss-rrna of one organism with that

More information

Worksheet - COMPARATIVE MAPPING 1

Worksheet - COMPARATIVE MAPPING 1 Worksheet - COMPARATIVE MAPPING 1 The arrangement of genes and other DNA markers is compared between species in Comparative genome mapping. As early as 1915, the geneticist J.B.S Haldane reported that

More information

Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6

Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6 Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6 In the last lab, you learned how to perform basic multiple sequence alignments. While useful in themselves for determining conserved residues

More information

BMC Bioinformatics. Open Access. Abstract

BMC Bioinformatics. Open Access. Abstract BMC Bioinformatics BioMed Central Software Recent Hits Acquired by BLAST (ReHAB): A tool to identify new hits in sequence similarity searches Joe Whitney, David J Esteban and Chris Upton* Open Access Address:

More information

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources 1 of 8 11/7/2004 11:00 AM National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools

More information

Similarity Searches on Sequence Databases: BLAST, FASTA. Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003

Similarity Searches on Sequence Databases: BLAST, FASTA. Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003 Similarity Searches on Sequence Databases: BLAST, FASTA Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003 Outline Importance of Similarity Heuristic Sequence Alignment:

More information

Database searching with DNA and protein sequences: An introduction Clare Sansom Date received (in revised form): 12th November 1999

Database searching with DNA and protein sequences: An introduction Clare Sansom Date received (in revised form): 12th November 1999 Dr Clare Sansom works part time at Birkbeck College, London, and part time as a freelance computer consultant and science writer At Birkbeck she coordinates an innovative graduate-level Advanced Certificate

More information

Replication Study Guide

Replication Study Guide Replication Study Guide This study guide is a written version of the material you have seen presented in the replication unit. Self-reproduction is a function of life that human-engineered systems have

More information

A Tutorial in Genetic Sequence Classification Tools and Techniques

A Tutorial in Genetic Sequence Classification Tools and Techniques A Tutorial in Genetic Sequence Classification Tools and Techniques Jake Drew Data Mining CSE 8331 Southern Methodist University jakemdrew@gmail.com www.jakemdrew.com Sequence Characters IUPAC nucleotide

More information

Protein & DNA Sequence Analysis. Bobbie-Jo Webb-Robertson May 3, 2004

Protein & DNA Sequence Analysis. Bobbie-Jo Webb-Robertson May 3, 2004 Protein & DNA Sequence Analysis Bobbie-Jo Webb-Robertson May 3, 2004 Sequence Analysis Anything connected to identifying higher biological meaning out of raw sequence data. 2 Genomic & Proteomic Data Sequence

More information

CD-HIT User s Guide. Last updated: April 5, 2010. http://cd-hit.org http://bioinformatics.org/cd-hit/

CD-HIT User s Guide. Last updated: April 5, 2010. http://cd-hit.org http://bioinformatics.org/cd-hit/ CD-HIT User s Guide Last updated: April 5, 2010 http://cd-hit.org http://bioinformatics.org/cd-hit/ Program developed by Weizhong Li s lab at UCSD http://weizhong-lab.ucsd.edu liwz@sdsc.edu 1. Introduction

More information

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

Focusing on results not data comprehensive data analysis for targeted next generation sequencing Focusing on results not data comprehensive data analysis for targeted next generation sequencing Daniel Swan, Jolyon Holdstock, Angela Matchan, Richard Stark, John Shovelton, Duarte Mohla and Simon Hughes

More information

Ordered Index Seed Algorithm for Intensive DNA Sequence Comparison

Ordered Index Seed Algorithm for Intensive DNA Sequence Comparison Ordered Index Seed Algorithm for Intensive DNA Sequence Comparison Dominique Lavenier IRISA / CNRS Campus de Beaulieu 35042 Rennes, France lavenier@irisa.fr Abstract This paper presents a seed-based algorithm

More information

Name Class Date. binomial nomenclature. MAIN IDEA: Linnaeus developed the scientific naming system still used today.

Name Class Date. binomial nomenclature. MAIN IDEA: Linnaeus developed the scientific naming system still used today. Section 1: The Linnaean System of Classification 17.1 Reading Guide KEY CONCEPT Organisms can be classified based on physical similarities. VOCABULARY taxonomy taxon binomial nomenclature genus MAIN IDEA:

More information

Cancer Genomics: What Does It Mean for You?

Cancer Genomics: What Does It Mean for You? Cancer Genomics: What Does It Mean for You? The Connection Between Cancer and DNA One person dies from cancer each minute in the United States. That s 1,500 deaths each day. As the population ages, this

More information

MAKING AN EVOLUTIONARY TREE

MAKING AN EVOLUTIONARY TREE Student manual MAKING AN EVOLUTIONARY TREE THEORY The relationship between different species can be derived from different information sources. The connection between species may turn out by similarities

More information

Clone Manager. Getting Started

Clone Manager. Getting Started Clone Manager for Windows Professional Edition Volume 2 Alignment, Primer Operations Version 9.5 Getting Started Copyright 1994-2015 Scientific & Educational Software. All rights reserved. The software

More information

The Central Dogma of Molecular Biology

The Central Dogma of Molecular Biology Vierstraete Andy (version 1.01) 1/02/2000 -Page 1 - The Central Dogma of Molecular Biology Figure 1 : The Central Dogma of molecular biology. DNA contains the complete genetic information that defines

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?

More information

Integrating DNA Motif Discovery and Genome-Wide Expression Analysis. Erin M. Conlon

Integrating DNA Motif Discovery and Genome-Wide Expression Analysis. Erin M. Conlon Integrating DNA Motif Discovery and Genome-Wide Expression Analysis Department of Mathematics and Statistics University of Massachusetts Amherst Statistics in Functional Genomics Workshop Ascona, Switzerland

More information

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS NEW YORK CITY COLLEGE OF TECHNOLOGY The City University Of New York School of Arts and Sciences Biological Sciences Department Course title:

More information

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16 Course Director: Dr. Barry Grant (DCM&B, bjgrant@med.umich.edu) Description: This is a three module course covering (1) Foundations of Bioinformatics, (2) Statistics in Bioinformatics, and (3) Systems

More information

Version 5.0 Release Notes

Version 5.0 Release Notes Version 5.0 Release Notes 2011 Gene Codes Corporation Gene Codes Corporation 775 Technology Drive, Ann Arbor, MI 48108 USA 1.800.497.4939 (USA) +1.734.769.7249 (elsewhere) +1.734.769.7074 (fax) www.genecodes.com

More information

HIV NOMOGRAM USING BIG DATA ANALYTICS

HIV NOMOGRAM USING BIG DATA ANALYTICS HIV NOMOGRAM USING BIG DATA ANALYTICS S.Avudaiselvi and P.Tamizhchelvi Student Of Ayya Nadar Janaki Ammal College (Sivakasi) Head Of The Department Of Computer Science, Ayya Nadar Janaki Ammal College

More information

Analyzing A DNA Sequence Chromatogram

Analyzing A DNA Sequence Chromatogram LESSON 9 HANDOUT Analyzing A DNA Sequence Chromatogram Student Researcher Background: DNA Analysis and FinchTV DNA sequence data can be used to answer many types of questions. Because DNA sequences differ

More information

OD-seq: outlier detection in multiple sequence alignments

OD-seq: outlier detection in multiple sequence alignments Jehl et al. BMC Bioinformatics (2015) 16:269 DOI 10.1186/s12859-015-0702-1 RESEARCH ARTICLE Open Access OD-seq: outlier detection in multiple sequence alignments Peter Jehl, Fabian Sievers * and Desmond

More information

Development and Implementation of Novel Data Compression Technique for Accelerate DNA Sequence Alignment Based on Smith Waterman Algorithm

Development and Implementation of Novel Data Compression Technique for Accelerate DNA Sequence Alignment Based on Smith Waterman Algorithm L JUNID SM et al: DEVELOPMEN ND IMPLEMENION OF NOVEL D OMSSION... Development and Implementation of Novel Data ompression echnique for ccelerate DN Sequence lignment Based on Smith Waterman lgorithm l

More information

Name: Date: Period: DNA Unit: DNA Webquest

Name: Date: Period: DNA Unit: DNA Webquest Name: Date: Period: DNA Unit: DNA Webquest Part 1 History, DNA Structure, DNA Replication DNA History http://www.dnaftb.org/dnaftb/1/concept/index.html Read the text and answer the following questions.

More information

DNA Insertions and Deletions in the Human Genome. Philipp W. Messer

DNA Insertions and Deletions in the Human Genome. Philipp W. Messer DNA Insertions and Deletions in the Human Genome Philipp W. Messer Genetic Variation CGACAATAGCGCTCTTACTACGTGTATCG : : CGACAATGGCGCT---ACTACGTGCATCG 1. Nucleotide mutations 2. Genomic rearrangements 3.

More information

Bioinformatics Resources at a Glance

Bioinformatics Resources at a Glance Bioinformatics Resources at a Glance A Note about FASTA Format There are MANY free bioinformatics tools available online. Bioinformaticists have developed a standard format for nucleotide and protein sequences

More information

Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing

Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing James D. Jackson Philip J. Hatcher Department of Computer Science Kingsbury Hall University of New Hampshire Durham,

More information

DNA Printer - A Brief Course in sequence Analysis

DNA Printer - A Brief Course in sequence Analysis Last modified August 19, 2015 Brian Golding, Dick Morton and Wilfried Haerty Department of Biology McMaster University Hamilton, Ontario L8S 4K1 ii These notes are in Adobe Acrobat format (they are available

More information

Molecular Databases and Tools

Molecular Databases and Tools NWeHealth, The University of Manchester Molecular Databases and Tools Afternoon Session: NCBI/EBI resources, pairwise alignment, BLAST, multiple sequence alignment and primer finding. Dr. Georgina Moulton

More information

Dynamic Programming. Lecture 11. 11.1 Overview. 11.2 Introduction

Dynamic Programming. Lecture 11. 11.1 Overview. 11.2 Introduction Lecture 11 Dynamic Programming 11.1 Overview Dynamic Programming is a powerful technique that allows one to solve many different types of problems in time O(n 2 ) or O(n 3 ) for which a naive approach

More information

Lecture 19: Proteins, Primary Struture

Lecture 19: Proteins, Primary Struture CPS260/BGT204.1 Algorithms in Computational Biology November 04, 2003 Lecture 19: Proteins, Primary Struture Lecturer: Pankaj K. Agarwal Scribe: Qiuhua Liu 19.1 The Building Blocks of Protein [1] Proteins

More information

Module 10: Bioinformatics

Module 10: Bioinformatics Module 10: Bioinformatics 1.) Goal: To understand the general approaches for basic in silico (computer) analysis of DNA- and protein sequences. We are going to discuss sequence formatting required prior

More information

Using MATLAB: Bioinformatics Toolbox for Life Sciences

Using MATLAB: Bioinformatics Toolbox for Life Sciences Using MATLAB: Bioinformatics Toolbox for Life Sciences MR. SARAWUT WONGPHAYAK BIOINFORMATICS PROGRAM, SCHOOL OF BIORESOURCES AND TECHNOLOGY, AND SCHOOL OF INFORMATION TECHNOLOGY, KING MONGKUT S UNIVERSITY

More information

Today you will extract DNA from some of your cells and learn more about DNA. Extracting DNA from Your Cells

Today you will extract DNA from some of your cells and learn more about DNA. Extracting DNA from Your Cells DNA Based on and adapted from the Genetic Science Learning Center s How to Extract DNA from Any Living Thing (http://learn.genetics.utah.edu/units/activities/extraction/) and BioRad s Genes in a bottle

More information

DNA Replication & Protein Synthesis. This isn t a baaaaaaaddd chapter!!!

DNA Replication & Protein Synthesis. This isn t a baaaaaaaddd chapter!!! DNA Replication & Protein Synthesis This isn t a baaaaaaaddd chapter!!! The Discovery of DNA s Structure Watson and Crick s discovery of DNA s structure was based on almost fifty years of research by other

More information

Modified Genetic Algorithm for DNA Sequence Assembly by Shotgun and Hybridization Sequencing Techniques

Modified Genetic Algorithm for DNA Sequence Assembly by Shotgun and Hybridization Sequencing Techniques International Journal of Electronics and Computer Science Engineering 2000 Available Online at www.ijecse.org ISSN- 2277-1956 Modified Genetic Algorithm for DNA Sequence Assembly by Shotgun and Hybridization

More information

Hidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006

Hidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006 Hidden Markov Models in Bioinformatics By Máthé Zoltán Kőrösi Zoltán 2006 Outline Markov Chain HMM (Hidden Markov Model) Hidden Markov Models in Bioinformatics Gene Finding Gene Finding Model Viterbi algorithm

More information

Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question. Name: Class: Date: Chapter 17 Practice Multiple Choice Identify the choice that best completes the statement or answers the question. 1. The correct order for the levels of Linnaeus's classification system,

More information

Lab 2/Phylogenetics/September 16, 2002 1 PHYLOGENETICS

Lab 2/Phylogenetics/September 16, 2002 1 PHYLOGENETICS Lab 2/Phylogenetics/September 16, 2002 1 Read: Tudge Chapter 2 PHYLOGENETICS Objective of the Lab: To understand how DNA and protein sequence information can be used to make comparisons and assess evolutionary

More information

Final Project Report

Final Project Report CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes

More information

Lecture 4: Exact string searching algorithms. Exact string search algorithms. Definitions. Exact string searching or matching

Lecture 4: Exact string searching algorithms. Exact string search algorithms. Definitions. Exact string searching or matching COSC 348: Computing for Bioinformatics Definitions A pattern (keyword) is an ordered sequence of symbols. Lecture 4: Exact string searching algorithms Lubica Benuskova http://www.cs.otago.ac.nz/cosc348/

More information

Heuristics for the Sorting by Length-Weighted Inversions Problem on Signed Permutations

Heuristics for the Sorting by Length-Weighted Inversions Problem on Signed Permutations Heuristics for the Sorting by Length-Weighted Inversions Problem on Signed Permutations AlCoB 2014 First International Conference on Algorithms for Computational Biology Thiago da Silva Arruda Institute

More information

Novel Mining of Cancer via Mutation in Tumor Protein P53 using Quick Propagation Network

Novel Mining of Cancer via Mutation in Tumor Protein P53 using Quick Propagation Network Novel Mining of Cancer via Mutation in Tumor Protein P53 using Quick Propagation Network Ayad. Ghany Ismaeel, and Raghad. Zuhair Yousif Abstract There is multiple databases contain datasets of TP53 gene

More information

Biological Sequence Data Formats

Biological Sequence Data Formats Biological Sequence Data Formats Here we present three standard formats in which biological sequence data (DNA, RNA and protein) can be stored and presented. Raw Sequence: Data without description. FASTA

More information

A CONTENT STANDARD IS NOT MET UNLESS APPLICABLE CHARACTERISTICS OF SCIENCE ARE ALSO ADDRESSED AT THE SAME TIME.

A CONTENT STANDARD IS NOT MET UNLESS APPLICABLE CHARACTERISTICS OF SCIENCE ARE ALSO ADDRESSED AT THE SAME TIME. Biology Curriculum The Georgia Performance Standards are designed to provide students with the knowledge and skills for proficiency in science. The Project 2061 s Benchmarks for Science Literacy is used

More information

Chapter 6 DNA Replication

Chapter 6 DNA Replication Chapter 6 DNA Replication Each strand of the DNA double helix contains a sequence of nucleotides that is exactly complementary to the nucleotide sequence of its partner strand. Each strand can therefore

More information

BIOINFORMATICS TUTORIAL

BIOINFORMATICS TUTORIAL Bio 242 BIOINFORMATICS TUTORIAL Bio 242 α Amylase Lab Sequence Sequence Searches: BLAST Sequence Alignment: Clustal Omega 3d Structure & 3d Alignments DO NOT REMOVE FROM LAB. DO NOT WRITE IN THIS DOCUMENT.

More information

A COMPARISON OF COMPUTATION TECHNIQUES FOR DNA SEQUENCE COMPARISON

A COMPARISON OF COMPUTATION TECHNIQUES FOR DNA SEQUENCE COMPARISON International Journal of Research in Computer Science eissn 2249-8265 Volume 2 Issue 3 (2012) pp. 1-6 White Globe Publications A COMPARISON OF COMPUTATION TECHNIQUES FOR DNA SEQUENCE COMPARISON Harshita

More information

REGULATIONS FOR THE DEGREE OF BACHELOR OF SCIENCE IN BIOINFORMATICS (BSc[BioInf])

REGULATIONS FOR THE DEGREE OF BACHELOR OF SCIENCE IN BIOINFORMATICS (BSc[BioInf]) 820 REGULATIONS FOR THE DEGREE OF BACHELOR OF SCIENCE IN BIOINFORMATICS (BSc[BioInf]) (See also General Regulations) BMS1 Admission to the Degree To be eligible for admission to the degree of Bachelor

More information

AC 2007-305: INTEGRATION OF BIOINFORMATICS IN SCIENCE CURRICULUM AT FORT VALLEY STATE UNIVERSITY

AC 2007-305: INTEGRATION OF BIOINFORMATICS IN SCIENCE CURRICULUM AT FORT VALLEY STATE UNIVERSITY AC 2007-305: INTEGRATION OF BIOINFORMATICS IN SCIENCE CURRICULUM AT FORT VALLEY STATE UNIVERSITY Ramana Gosukonda, Fort Valley State University Assistant Professor computer science Masoud Naghedolfeizi,

More information

Structure and Function of DNA

Structure and Function of DNA Structure and Function of DNA DNA and RNA Structure DNA and RNA are nucleic acids. They consist of chemical units called nucleotides. The nucleotides are joined by a sugar-phosphate backbone. The four

More information

Optimal Contact Map Alignment of Protein-Protein Interfaces Vinay Pulim, 1 Bonnie Berger, 1,2 * Jadwiga Bienkowska, 1,3,* 1

Optimal Contact Map Alignment of Protein-Protein Interfaces Vinay Pulim, 1 Bonnie Berger, 1,2 * Jadwiga Bienkowska, 1,3,* 1 Bioinformatics Advance Access published August, 008 Original Paper Optimal Contact Map Alignment of Protein-Protein Interfaces Vinay Pulim, Bonnie Berger,, * Jadwiga Bienkowska,,3,* Computer Science and

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

Hierarchical Bayesian Modeling of the HIV Response to Therapy

Hierarchical Bayesian Modeling of the HIV Response to Therapy Hierarchical Bayesian Modeling of the HIV Response to Therapy Shane T. Jensen Department of Statistics, The Wharton School, University of Pennsylvania March 23, 2010 Joint Work with Alex Braunstein and

More information

Integrating Bioinformatics, Medical Sciences and Drug Discovery

Integrating Bioinformatics, Medical Sciences and Drug Discovery Integrating Bioinformatics, Medical Sciences and Drug Discovery M. Madan Babu Centre for Biotechnology, Anna University, Chennai - 600025 phone: 44-4332179 :: email: madanm1@rediffmail.com Bioinformatics

More information

Module 1. Sequence Formats and Retrieval. Charles Steward

Module 1. Sequence Formats and Retrieval. Charles Steward The Open Door Workshop Module 1 Sequence Formats and Retrieval Charles Steward 1 Aims Acquaint you with different file formats and associated annotations. Introduce different nucleotide and protein databases.

More information

Bioinformatics: course introduction

Bioinformatics: course introduction Bioinformatics: course introduction Filip Železný Czech Technical University in Prague Faculty of Electrical Engineering Department of Cybernetics Intelligent Data Analysis lab http://ida.felk.cvut.cz

More information

Sequence Analysis 15: lecture 5. Substitution matrices Multiple sequence alignment

Sequence Analysis 15: lecture 5. Substitution matrices Multiple sequence alignment Sequence Analysis 15: lecture 5 Substitution matrices Multiple sequence alignment A teacher's dilemma To understand... Multiple sequence alignment Substitution matrices Phylogenetic trees You first need

More information

FART Neural Network based Probabilistic Motif Discovery in Unaligned Biological Sequences

FART Neural Network based Probabilistic Motif Discovery in Unaligned Biological Sequences FART Neural Network based Probabilistic Motif Discovery in Unaligned Biological Sequences M. Hemalatha, P. Ranjit Jeba Thangaiah and K. Vivekanandan, Member IEEE Abstract Finding Motif in bio-sequences

More information

Biological Databases and Protein Sequence Analysis

Biological Databases and Protein Sequence Analysis Biological Databases and Protein Sequence Analysis Introduction M. Madan Babu, Center for Biotechnology, Anna University, Chennai 25, India Bioinformatics is the application of Information technology to

More information

Activity IT S ALL RELATIVES The Role of DNA Evidence in Forensic Investigations

Activity IT S ALL RELATIVES The Role of DNA Evidence in Forensic Investigations Activity IT S ALL RELATIVES The Role of DNA Evidence in Forensic Investigations SCENARIO You have responded, as a result of a call from the police to the Coroner s Office, to the scene of the death of

More information

Analytical Study of Hexapod mirnas using Phylogenetic Methods

Analytical Study of Hexapod mirnas using Phylogenetic Methods Analytical Study of Hexapod mirnas using Phylogenetic Methods A.K. Mishra and H.Chandrasekharan Unit of Simulation & Informatics, Indian Agricultural Research Institute, New Delhi, India akmishra@iari.res.in,

More information

Usability in bioinformatics mobile applications

Usability in bioinformatics mobile applications Usability in bioinformatics mobile applications what we are working on Noura Chelbah, Sergio Díaz, Óscar Torreño, and myself Juan Falgueras App name Performs Advantajes Dissatvantajes Link The problem

More information

Chemical Basis of Life Module A Anchor 2

Chemical Basis of Life Module A Anchor 2 Chemical Basis of Life Module A Anchor 2 Key Concepts: - Water is a polar molecule. Therefore, it is able to form multiple hydrogen bonds, which account for many of its special properties. - Water s polarity

More information

Section 3 Comparative Genomics and Phylogenetics

Section 3 Comparative Genomics and Phylogenetics Section 3 Section 3 Comparative enomics and Phylogenetics At the end of this section you should be able to: Describe what is meant by DNA sequencing. Explain what is meant by Bioinformatics and Comparative

More information

The Steps. 1. Transcription. 2. Transferal. 3. Translation

The Steps. 1. Transcription. 2. Transferal. 3. Translation Protein Synthesis Protein synthesis is simply the "making of proteins." Although the term itself is easy to understand, the multiple steps that a cell in a plant or animal must go through are not. In order

More information

Principles of Evolution - Origin of Species

Principles of Evolution - Origin of Species Theories of Organic Evolution X Multiple Centers of Creation (de Buffon) developed the concept of "centers of creation throughout the world organisms had arisen, which other species had evolved from X

More information

200630 - FBIO - Fundations of Bioinformatics

200630 - FBIO - Fundations of Bioinformatics Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 2015 200 - FME - School of Mathematics and Statistics 1004 - UB - (ENG)Universitat de Barcelona MASTER'S DEGREE IN STATISTICS AND

More information

Ms. Campbell Protein Synthesis Practice Questions Regents L.E.

Ms. Campbell Protein Synthesis Practice Questions Regents L.E. Name Student # Ms. Campbell Protein Synthesis Practice Questions Regents L.E. 1. A sequence of three nitrogenous bases in a messenger-rna molecule is known as a 1) codon 2) gene 3) polypeptide 4) nucleotide

More information

Application of Graph-based Data Mining to Metabolic Pathways

Application of Graph-based Data Mining to Metabolic Pathways Application of Graph-based Data Mining to Metabolic Pathways Chang Hun You, Lawrence B. Holder, Diane J. Cook School of Electrical Engineering and Computer Science Washington State University Pullman,

More information

EMBL-EBI Web Services

EMBL-EBI Web Services EMBL-EBI Web Services Rodrigo Lopez Head of the External Services Team SME Workshop Piemonte 2011 EBI is an Outstation of the European Molecular Biology Laboratory. Summary Introduction The JDispatcher

More information

FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem

FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem Elsa Bernard Laurent Jacob Julien Mairal Jean-Philippe Vert September 24, 2013 Abstract FlipFlop implements a fast method for de novo transcript

More information

MCAS Biology. Review Packet

MCAS Biology. Review Packet MCAS Biology Review Packet 1 Name Class Date 1. Define organic. THE CHEMISTRY OF LIFE 2. All living things are made up of 6 essential elements: SPONCH. Name the six elements of life. S N P C O H 3. Elements

More information

Cluster detection algorithm in neural networks

Cluster detection algorithm in neural networks Cluster detection algorithm in neural networks David Meunier and Hélène Paugam-Moisy Institute for Cognitive Science, UMR CNRS 5015 67, boulevard Pinel F-69675 BRON - France E-mail: {dmeunier,hpaugam}@isc.cnrs.fr

More information

GenBank, Entrez, & FASTA

GenBank, Entrez, & FASTA GenBank, Entrez, & FASTA Nucleotide Sequence Databases First generation GenBank is a representative example started as sort of a museum to preserve knowledge of a sequence from first discovery great repositories,

More information

Distributed Bioinformatics Computing System for DNA Sequence Analysis

Distributed Bioinformatics Computing System for DNA Sequence Analysis Global Journal of Computer Science and Technology: A Hardware & Computation Volume 14 Issue 1 Version 1.0 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals Inc.

More information