Sequence analysis and comparison
|
|
- Doreen Sherilyn Richard
- 7 years ago
- Views:
Transcription
1 The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species which have a similar gene (ORF)? Has anybody already studied this protein or a similar one? What is the biochemical function and what physicochemical characteristics to expect? Search & analysis strategy: Sequence search based on homology (similarity). Pattern searches - search for occurrences of a predefined pattern (may be a short sequence motif). Annotation searches - search by keywords, authors, additional features. Search for a 3D structure of a homologous protein. Amino acid sequences Information regarding the proteins function (catalytic activity, specific recognition sites, etc.). The proteins evolutionary origin. Information regarding the type of its 3D structure (folding type). Extracting this information is the task for sequence analysis.
2 Other goals of sequence analysis: Even more goals of sequence analysis: Assembly of sequence fragments into complete units (proteins, genes, chromosomes). Finding open reading frames (ORFs) for cdnas or genomic DNA and using codon usage tables. Management of sequence information Prediction of the biochemical and physicalchemical characteristics of a protein (molecular weight, isoelectric point (pi), extinction coefficient). Finding and using consensus sequences Examples promoters transcription initiation sites transcription termination sites polyadenylation sites ribosome binding sites protein features post-translational modifications: forming of disulfide bonds, glycosylation, cleavage of signal sequences etc. Analysing relationships between proteins-some general rules: Proteins with the same function taken from closely related organisms have highly similar amino acid sequences. The greater the differences observed for related proteins, the longer the time since the organisms have diverged - genetic divergence. The opposite is genetic convergence. Types of sequence comparison and alignment: compare sequence to database - goal: find related sequences (SIMILARITY) compare sequence to sequence - goal: find matching domains (ALIGNMENT) compare database to database - goal: estimate genetic distance (EVOLUTION) either: determine consensus sequences comparisons can be pairwise or multiple.
3 Sequence alignment: Sequence alignment - Allows to align and compare a sequence to a family of related sequences, to reveal conserved regions of functional importance. An accurate alignment can be useful for obtaining an idea of the 3D structure of a protein. Since there are many ways of aligning two sequences (an alignment produced by a program is one of several possible), we need criteria to judge the quality of an alignment. Modifications of a protein sequence to be considered: Replacement of one amino acid by another aabb acbb Insertions and deletion of single amino acids and larger blocks ccc-dee c-cddee Large rearrangements of the gene aaaaaabbbbbb bbbbbbaaaaaa Alignment accuracy Mind the Gap! The best alignment is the one that has the maximum number of identical residues aligned against each other - % similarity. Example: Sequence 1!! CPKICIGGWFAAY Sequence 2!! CSGICKKAWFV-Y Alignment pattern:! C--IC---WF--Y! Similarity = 6/13 = 46 % Score (s) = matches mismatches = 6 7 = -1 GATC GTGC GAT-C G-TGC Generally: S = Σ gains (identities, replacements) - Σ penalties Penalties = number of gaps gap creation penalty The values of identities and replacements are elements of the replacement matrix Rules of thumb: As many residues as possible should be aligned A gap should be added only if it significantly increases the number of matches The size of the gap and its position are important
4 Substitution scoring schemes Needed to assign a score to each of the possible substitutions of one amino acid by another, totally 210 possible pairs (190 pairs of different a.a pairs of identical a.a.) presented in a form of a 20 X 20 matrix. Possible scoring schemes include: Identity scoring!! 0 if the a.a. are different and 1 if the same. Observed substitutions! assigns weights based on the analysis of substitution frequencies!! derived from manual alignments Chemical similarity score! higher weight to the alignment of a.a. with similar chemical!!! properties (V L,K R). Amino acid substitution matrices: PAM family of matrices (Dayhoff matrix): Take aligned set of closely related proteins (1300 sequences in 72 families in the original work) For each position in the set, find the most common amino acid observed. Calculate the frequency with which each other amino acid is observed at that position. Combine frequencies from all positions to give table of frequencies for each amino acid changing to each other amino acid. Take logarithm and normalize for frequency of each amino acid. Properties of the PAM matrix: Each element M i,j gives the probability of the a.a. in column j to be mutated to the a.a. in row i after a particular evolutionary time percentage of accepted mutations per 10 8 years (PAM). 1 PAM corresponds to an average change of in 1% of all a.a. positions. After 100 PAM of evolution not every residue will have changed: some will have mutated several times, perhaps returning to original state, while others not at all. AT 256 PAM 80 % of all a.a. will have changed, although to various degrees: 48% of Trp, 41% of Cys and 20% of His would be unchanged, but only 7% of Ser will remain. # PAM 250 matrix # Science June 5, # Values rounded to nearest integer A R N D C Q E G H I L K M F P S T W Y V A R N D C Q E G H I L K M F P S T W Y V
5 Other types of matrices: PET91 - version of PAM using a set of 2621 families of sequences. BLOSUM - blocks substitution matrix - amino acid substitution tables, which scores amino acid pairs based on the frequency of amino acid substitutions in aligned sequence motifs (blocks). Based on local alignments of 2000 blocks from 500 families. Different Blosum types: 30, 35, 40, 45, 50, 55, 60, 62, 65, 70, 75, 80, 85, 90. Blosum62 the most popular, is based on blocks with at least 62% identity. High Blosum - closely related sequences, Low Blosum - distant sequences Differences with PAM -Evolutionarily divergent proteins are used. Uses Blocks instead of Global alignment PAM-1!! BLOSUM-90! Small evolutionary distance High identity within short sequences Which matrix to use?: PAM-250!! BLOSUM-20! Large evolutionary distance Low identity within long sequences Relationships between matrices Biological criteria can be used in alignment: Methods for sequence comparisons Frequent and infrequent residues Structurally or functionally important amino acids A match to highly conserved residues Repetitive sequences Sliding window method Central to many of the algorithms used in sequence analysis. The basic idea is to define a "window" of a certain number of residues (nucleotides or amino acids) and to calculate some value for the residues in that fragment. Once the calculation is completed, the program shifts one residue and analyzes the next window of residues and this process repeats itself until the end of the sequence is reached.
6 Sliding window in sequence analysis: Given two sequences A and B, all possible overlapping segments of a particular length (window length) from A are compared to all segments of B. For each pair of segments the amino acid pair scores are accumulated over the length of the segment: For example the comparison of the two segments: ALGAWDE ALATWDE gives a score of =5 The dot matrix method for sequence comparison: Two axes represent each one of the two sequences: sequence A along the top from left to right and sequence B along the left from top to bottom. The matrix is filled in by taking a window of sequence A and scanning along sequence B. Whenever a match occurs a dot is placed in the matrix. After reaching the end of sequence B, a new query sequence is generated from sequence A by sliding the window to the next position in sequence A. Example of a dot matrix comparison of two protein sequences: Dot matrix comparison of genomic DNA and cdna sequences: When two sequences share similarity over their entire length a diagonal line will extend from one corner of the dot plot to the diagonally opposite corner. If two sequences only share patches of similarity this will be revealed by diagonal stretches. Jumps correspond to positions where one or the other sequence has more (or less) letters than the other one (insertions & deletions)
7 Alignment using dynamic programming: Graphical representation of dynamic programming: Having two sequences A and B, at each aligned position there are 3 possibilities: w(ai, Bj) - substitution of Ai by Bj w(ai, D) - deletion of Ai w(d, Bj) - deletion of Bj w - the weight is derived from the chosen scoring scheme (e.g. PAM matrix). Gaps (D) are given negative weight, called gap penalty, since insertions and deletions are less common than substitutions. Try to find the path that gives the maximal score There are three moves allowed. Matching residues (diagonal move), deleting a residue from one sequence (horizontal move) or deleting a residue from the other (vertical move). RNI-LVSDAKNVGI RDISLV---KNAGI Types of alignment : Global alignment: align two sequences from beginning to end, Insisting that all sequence positions must match. Used in the alignment of sequences known to be related. Local alignment: find the best region of similarity between two sequences without insisting that the entire sequences match (a result will be several alignments with close or different scores). Used in database searching and in alignment of distantly related sequences with several regions of homology.
8 Functional information from multiple sequence alignment: A multiple sequence alignment allows us to extract information which is difficult to extract from a single sequence or from an alignment of only two sequences. When making multiple sequence alignment, try to have both sequences that are very conserved and some that are more distantly related. If possible, use programs for automatic analysis of multiple sequence alignments (e.g. AMAS at Amas/amas.html). Structural information from multiple sequence alignment: Example: alignment of ferrochelatase Positions of insertions and deletions suggest regions of surface loops in the 3D structure. Conserved Gly and Pro suggest a β-turn. Hydrophobic residues conserved at i, i+2, i+4 etc separated by hydrophilic residues suggest a surface β- strand. A short run of hydrophobic residues (4 aa) may suggest a buried β-strand, longer stretches (20 aa) may suggest a membrane spanning helix. Pairs of conserved hydrophobic aa separated by pairs of hydrophilic residues suggest an a-helix with one face packed against the protein core.
9 Alignment accuracy: Alignment accuracy: The accuracy of a multiple sequence alignment is always higher than that of a pairwise alignment. Overall alignment accuracy: it is possible to compare the score to the distribution of scores for alignment of random sequences of the same length and composition. The result may be expressed in standard deviations units above the mean. The alignment of some regions is more reliable than others. The most reliable regions are those for which the alignment does not change when small changes are made to the gap penalty and matrix parameters. The least reliable are regions of insertions and deletion, often loop regions. Percentage identity: unrelated sequences, chosen at random are expected to be identical in about 5% of their residues. For certain homology higher than 20% identity is required. Percentage identity depends on the length of the alignment: an alignment of 200 residues with 30% identity is more significant than alignment of 50 residues with 30% identity. What are you trying to find out? Are you trying to locate similar domains or motifs --> Local alignment is probably best Are you trying to determine whether the sequences come from the same family? --> Use one of the BLOSUM matrices Are you trying to determine how closely related the sequences are evolutionary? --> Use one of the PAM matrices
10 THE END
Pairwise Sequence Alignment
Pairwise Sequence Alignment carolin.kosiol@vetmeduni.ac.at SS 2013 Outline Pairwise sequence alignment global - Needleman Wunsch Gotoh algorithm local - Smith Waterman algorithm BLAST - heuristics What
More informationRETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison
RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the
More informationAmino Acids and Their Properties
Amino Acids and Their Properties Recap: ss-rrna and mutations Ribosomal RNA (rrna) evolves very slowly Much slower than proteins ss-rrna is typically used So by aligning ss-rrna of one organism with that
More informationSimilarity Searches on Sequence Databases: BLAST, FASTA. Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003
Similarity Searches on Sequence Databases: BLAST, FASTA Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003 Outline Importance of Similarity Heuristic Sequence Alignment:
More informationDNA Replication & Protein Synthesis. This isn t a baaaaaaaddd chapter!!!
DNA Replication & Protein Synthesis This isn t a baaaaaaaddd chapter!!! The Discovery of DNA s Structure Watson and Crick s discovery of DNA s structure was based on almost fifty years of research by other
More informationTHREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS
THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering
More informationTranscription and Translation of DNA
Transcription and Translation of DNA Genotype our genetic constitution ( makeup) is determined (controlled) by the sequence of bases in its genes Phenotype determined by the proteins synthesised when genes
More informationPROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE 2006 1. E-mail: msm_eng@k-space.org
BIOINFTool: Bioinformatics and sequence data analysis in molecular biology using Matlab Mai S. Mabrouk 1, Marwa Hamdy 2, Marwa Mamdouh 2, Marwa Aboelfotoh 2,Yasser M. Kadah 2 1 Biomedical Engineering Department,
More informationBio-Informatics Lectures. A Short Introduction
Bio-Informatics Lectures A Short Introduction The History of Bioinformatics Sanger Sequencing PCR in presence of fluorescent, chain-terminating dideoxynucleotides Massively Parallel Sequencing Massively
More informationThe sequence of bases on the mrna is a code that determines the sequence of amino acids in the polypeptide being synthesized:
Module 3F Protein Synthesis So far in this unit, we have examined: How genes are transmitted from one generation to the next Where genes are located What genes are made of How genes are replicated How
More informationBIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS
BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS NEW YORK CITY COLLEGE OF TECHNOLOGY The City University Of New York School of Arts and Sciences Biological Sciences Department Course title:
More informationClone Manager. Getting Started
Clone Manager for Windows Professional Edition Volume 2 Alignment, Primer Operations Version 9.5 Getting Started Copyright 1994-2015 Scientific & Educational Software. All rights reserved. The software
More informationLinear Sequence Analysis. 3-D Structure Analysis
Linear Sequence Analysis What can you learn from a (single) protein sequence? Calculate it s physical properties Molecular weight (MW), isoelectric point (pi), amino acid content, hydropathy (hydrophilic
More informationRapid alignment methods: FASTA and BLAST. p The biological problem p Search strategies p FASTA p BLAST
Rapid alignment methods: FASTA and BLAST p The biological problem p Search strategies p FASTA p BLAST 257 BLAST: Basic Local Alignment Search Tool p BLAST (Altschul et al., 1990) and its variants are some
More informationFrom DNA to Protein. Proteins. Chapter 13. Prokaryotes and Eukaryotes. The Path From Genes to Proteins. All proteins consist of polypeptide chains
Proteins From DNA to Protein Chapter 13 All proteins consist of polypeptide chains A linear sequence of amino acids Each chain corresponds to the nucleotide base sequence of a gene The Path From Genes
More informationNetwork Protocol Analysis using Bioinformatics Algorithms
Network Protocol Analysis using Bioinformatics Algorithms Marshall A. Beddoe Marshall_Beddoe@McAfee.com ABSTRACT Network protocol analysis is currently performed by hand using only intuition and a protocol
More informationTranslation Study Guide
Translation Study Guide This study guide is a written version of the material you have seen presented in the replication unit. In translation, the cell uses the genetic information contained in mrna to
More informationBLAST. Anders Gorm Pedersen & Rasmus Wernersson
BLAST Anders Gorm Pedersen & Rasmus Wernersson Database searching Using pairwise alignments to search databases for similar sequences Query sequence Database Database searching Most common use of pairwise
More informationMUTATION, DNA REPAIR AND CANCER
MUTATION, DNA REPAIR AND CANCER 1 Mutation A heritable change in the genetic material Essential to the continuity of life Source of variation for natural selection New mutations are more likely to be harmful
More informationReplication Study Guide
Replication Study Guide This study guide is a written version of the material you have seen presented in the replication unit. Self-reproduction is a function of life that human-engineered systems have
More informationMolecular Genetics. RNA, Transcription, & Protein Synthesis
Molecular Genetics RNA, Transcription, & Protein Synthesis Section 1 RNA AND TRANSCRIPTION Objectives Describe the primary functions of RNA Identify how RNA differs from DNA Describe the structure and
More informationBioinformatics Resources at a Glance
Bioinformatics Resources at a Glance A Note about FASTA Format There are MANY free bioinformatics tools available online. Bioinformaticists have developed a standard format for nucleotide and protein sequences
More informationSequence Analysis 15: lecture 5. Substitution matrices Multiple sequence alignment
Sequence Analysis 15: lecture 5 Substitution matrices Multiple sequence alignment A teacher's dilemma To understand... Multiple sequence alignment Substitution matrices Phylogenetic trees You first need
More informationIntroduction to Genome Annotation
Introduction to Genome Annotation AGCGTGGTAGCGCGAGTTTGCGAGCTAGCTAGGCTCCGGATGCGA CCAGCTTTGATAGATGAATATAGTGTGCGCGACTAGCTGTGTGTT GAATATATAGTGTGTCTCTCGATATGTAGTCTGGATCTAGTGTTG GTGTAGATGGAGATCGCGTAGCGTGGTAGCGCGAGTTTGCGAGCT
More informationIntroduction to Bioinformatics 3. DNA editing and contig assembly
Introduction to Bioinformatics 3. DNA editing and contig assembly Benjamin F. Matthews United States Department of Agriculture Soybean Genomics and Improvement Laboratory Beltsville, MD 20708 matthewb@ba.ars.usda.gov
More informationCSC 2427: Algorithms for Molecular Biology Spring 2006. Lecture 16 March 10
CSC 2427: Algorithms for Molecular Biology Spring 2006 Lecture 16 March 10 Lecturer: Michael Brudno Scribe: Jim Huang 16.1 Overview of proteins Proteins are long chains of amino acids (AA) which are produced
More information1 Mutation and Genetic Change
CHAPTER 14 1 Mutation and Genetic Change SECTION Genes in Action KEY IDEAS As you read this section, keep these questions in mind: What is the origin of genetic differences among organisms? What kinds
More informationLecture Series 7. From DNA to Protein. Genotype to Phenotype. Reading Assignments. A. Genes and the Synthesis of Polypeptides
Lecture Series 7 From DNA to Protein: Genotype to Phenotype Reading Assignments Read Chapter 7 From DNA to Protein A. Genes and the Synthesis of Polypeptides Genes are made up of DNA and are expressed
More informationActivity 7.21 Transcription factors
Purpose To consolidate understanding of protein synthesis. To explain the role of transcription factors and hormones in switching genes on and off. Play the transcription initiation complex game Regulation
More informationModule 10: Bioinformatics
Module 10: Bioinformatics 1.) Goal: To understand the general approaches for basic in silico (computer) analysis of DNA- and protein sequences. We are going to discuss sequence formatting required prior
More informationStructure and Function of DNA
Structure and Function of DNA DNA and RNA Structure DNA and RNA are nucleic acids. They consist of chemical units called nucleotides. The nucleotides are joined by a sugar-phosphate backbone. The four
More informationa. Ribosomal RNA rrna a type ofrna that combines with proteins to form Ribosomes on which polypeptide chains of proteins are assembled
Biology 101 Chapter 14 Name: Fill-in-the-Blanks Which base follows the next in a strand of DNA is referred to. as the base (1) Sequence. The region of DNA that calls for the assembly of specific amino
More informationGenetic information (DNA) determines structure of proteins DNA RNA proteins cell structure 3.11 3.15 enzymes control cell chemistry ( metabolism )
Biology 1406 Exam 3 Notes Structure of DNA Ch. 10 Genetic information (DNA) determines structure of proteins DNA RNA proteins cell structure 3.11 3.15 enzymes control cell chemistry ( metabolism ) Proteins
More informationJust the Facts: A Basic Introduction to the Science Underlying NCBI Resources
1 of 8 11/7/2004 11:00 AM National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools
More informationBioinformatics Grid - Enabled Tools For Biologists.
Bioinformatics Grid - Enabled Tools For Biologists. What is Grid-Enabled Tools (GET)? As number of data from the genomics and proteomics experiment increases. Problems arise for the current sequence analysis
More informationBob Jesberg. Boston, MA April 3, 2014
DNA, Replication and Transcription Bob Jesberg NSTA Conference Boston, MA April 3, 2014 1 Workshop Agenda Looking at DNA and Forensics The DNA, Replication i and Transcription i Set DNA Ladder The Double
More informationChapter 6 DNA Replication
Chapter 6 DNA Replication Each strand of the DNA double helix contains a sequence of nucleotides that is exactly complementary to the nucleotide sequence of its partner strand. Each strand can therefore
More informationWhen you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want
1 When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want to search other databases as well. There are very
More informationRNA & Protein Synthesis
RNA & Protein Synthesis Genes send messages to cellular machinery RNA Plays a major role in process Process has three phases (Genetic) Transcription (Genetic) Translation Protein Synthesis RNA Synthesis
More informationLecture 19: Proteins, Primary Struture
CPS260/BGT204.1 Algorithms in Computational Biology November 04, 2003 Lecture 19: Proteins, Primary Struture Lecturer: Pankaj K. Agarwal Scribe: Qiuhua Liu 19.1 The Building Blocks of Protein [1] Proteins
More informationGuide for Bioinformatics Project Module 3
Structure- Based Evidence and Multiple Sequence Alignment In this module we will revisit some topics we started to look at while performing our BLAST search and looking at the CDD database in the first
More informationHeuristics for the Sorting by Length-Weighted Inversions Problem on Signed Permutations
Heuristics for the Sorting by Length-Weighted Inversions Problem on Signed Permutations AlCoB 2014 First International Conference on Algorithms for Computational Biology Thiago da Silva Arruda Institute
More informationConcluding lesson. Student manual. What kind of protein are you? (Basic)
Concluding lesson Student manual What kind of protein are you? (Basic) Part 1 The hereditary material of an organism is stored in a coded way on the DNA. This code consists of four different nucleotides:
More informationThe Central Dogma of Molecular Biology
Vierstraete Andy (version 1.01) 1/02/2000 -Page 1 - The Central Dogma of Molecular Biology Figure 1 : The Central Dogma of molecular biology. DNA contains the complete genetic information that defines
More informationProtein Synthesis How Genes Become Constituent Molecules
Protein Synthesis Protein Synthesis How Genes Become Constituent Molecules Mendel and The Idea of Gene What is a Chromosome? A chromosome is a molecule of DNA 50% 50% 1. True 2. False True False Protein
More informationLab # 12: DNA and RNA
115 116 Concepts to be explored: Structure of DNA Nucleotides Amino Acids Proteins Genetic Code Mutation RNA Transcription to RNA Translation to a Protein Figure 12. 1: DNA double helix Introduction Long
More informationGenBank, Entrez, & FASTA
GenBank, Entrez, & FASTA Nucleotide Sequence Databases First generation GenBank is a representative example started as sort of a museum to preserve knowledge of a sequence from first discovery great repositories,
More informationOverview of Eukaryotic Gene Prediction
Overview of Eukaryotic Gene Prediction CBB 231 / COMPSCI 261 W.H. Majoros What is DNA? Nucleus Chromosome Telomere Centromere Cell Telomere base pairs histones DNA (double helix) DNA is a Double Helix
More informationDatabase searching with DNA and protein sequences: An introduction Clare Sansom Date received (in revised form): 12th November 1999
Dr Clare Sansom works part time at Birkbeck College, London, and part time as a freelance computer consultant and science writer At Birkbeck she coordinates an innovative graduate-level Advanced Certificate
More informationAlgorithms in Computational Biology (236522) spring 2007 Lecture #1
Algorithms in Computational Biology (236522) spring 2007 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: Tuesday 11:00-12:00/by appointment TA: Ilan Gronau, Taub 700, tel 4894 Office
More informationInteraktionen von RNAs und Proteinen
Sonja Prohaska Computational EvoDevo Universitaet Leipzig June 9, 2015 Studying RNA-protein interactions Given: target protein known to bind to RNA problem: find binding partners and binding sites experimental
More informationCCR Biology - Chapter 9 Practice Test - Summer 2012
Name: Class: Date: CCR Biology - Chapter 9 Practice Test - Summer 2012 Multiple Choice Identify the choice that best completes the statement or answers the question. 1. Genetic engineering is possible
More informationDNA Insertions and Deletions in the Human Genome. Philipp W. Messer
DNA Insertions and Deletions in the Human Genome Philipp W. Messer Genetic Variation CGACAATAGCGCTCTTACTACGTGTATCG : : CGACAATGGCGCT---ACTACGTGCATCG 1. Nucleotide mutations 2. Genomic rearrangements 3.
More informationT cell Epitope Prediction
Institute for Immunology and Informatics T cell Epitope Prediction EpiMatrix Eric Gustafson January 6, 2011 Overview Gathering raw data Popular sources Data Management Conservation Analysis Multiple Alignments
More informationDNA, RNA, Protein synthesis, and Mutations. Chapters 12-13.3
DNA, RNA, Protein synthesis, and Mutations Chapters 12-13.3 1A)Identify the components of DNA and explain its role in heredity. DNA s Role in heredity: Contains the genetic information of a cell that can
More informationInnovations in Molecular Epidemiology
Innovations in Molecular Epidemiology Molecular Epidemiology Measure current rates of active transmission Determine whether recurrent tuberculosis is attributable to exogenous reinfection Determine whether
More informationProtein Physics. A. V. Finkelstein & O. B. Ptitsyn LECTURE 1
Protein Physics A. V. Finkelstein & O. B. Ptitsyn LECTURE 1 PROTEINS Functions in a Cell MOLECULAR MACHINES BUILDING BLOCKS of a CELL ARMS of a CELL ENZYMES - enzymatic catalysis of biochemical reactions
More informationDeveloping an interactive webbased learning. environment for bioinformatics. Master thesis. Daniel Løkken Rustad UNIVERSITY OF OSLO
UNIVERSITY OF OSLO Department of Informatics Developing an interactive webbased learning environment for bioinformatics Master thesis Daniel Løkken Rustad 27th July 2005 Preface Preface This thesis is
More informationISTEP+: Biology I End-of-Course Assessment Released Items and Scoring Notes
ISTEP+: Biology I End-of-Course Assessment Released Items and Scoring Notes Page 1 of 22 Introduction Indiana students enrolled in Biology I participated in the ISTEP+: Biology I Graduation Examination
More informationProtein & DNA Sequence Analysis. Bobbie-Jo Webb-Robertson May 3, 2004
Protein & DNA Sequence Analysis Bobbie-Jo Webb-Robertson May 3, 2004 Sequence Analysis Anything connected to identifying higher biological meaning out of raw sequence data. 2 Genomic & Proteomic Data Sequence
More informationFocusing on results not data comprehensive data analysis for targeted next generation sequencing
Focusing on results not data comprehensive data analysis for targeted next generation sequencing Daniel Swan, Jolyon Holdstock, Angela Matchan, Richard Stark, John Shovelton, Duarte Mohla and Simon Hughes
More informationGene mutation and molecular medicine Chapter 15
Gene mutation and molecular medicine Chapter 15 Lecture Objectives What Are Mutations? How Are DNA Molecules and Mutations Analyzed? How Do Defective Proteins Lead to Diseases? What DNA Changes Lead to
More informationProtein Sequence Analysis - Overview -
Protein Sequence Analysis - Overview - UDEL Workshop Raja Mazumder Research Associate Professor, Department of Biochemistry and Molecular Biology Georgetown University Medical Center Topics Why do protein
More informationHidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006
Hidden Markov Models in Bioinformatics By Máthé Zoltán Kőrösi Zoltán 2006 Outline Markov Chain HMM (Hidden Markov Model) Hidden Markov Models in Bioinformatics Gene Finding Gene Finding Model Viterbi algorithm
More informationBasic attributes of genetic processes (replication, transcription, translation)
411-3 2008 Lecture notes I. First general topic in the course will be mutation (in broadest sense, any change to an organismʼs genetic material). Intimately intertwined with this is the process of DNA
More informationAmino Acids. Amino acids are the building blocks of proteins. All AA s have the same basic structure: Side Chain. Alpha Carbon. Carboxyl. Group.
Protein Structure Amino Acids Amino acids are the building blocks of proteins. All AA s have the same basic structure: Side Chain Alpha Carbon Amino Group Carboxyl Group Amino Acid Properties There are
More informationSystematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals
Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals Xiaohui Xie 1, Jun Lu 1, E. J. Kulbokas 1, Todd R. Golub 1, Vamsi Mootha 1, Kerstin Lindblad-Toh
More informationName Class Date. Figure 13 1. 2. Which nucleotide in Figure 13 1 indicates the nucleic acid above is RNA? a. uracil c. cytosine b. guanine d.
13 Multiple Choice RNA and Protein Synthesis Chapter Test A Write the letter that best answers the question or completes the statement on the line provided. 1. Which of the following are found in both
More informationRegents Biology REGENTS REVIEW: PROTEIN SYNTHESIS
Period Date REGENTS REVIEW: PROTEIN SYNTHESIS 1. The diagram at the right represents a portion of a type of organic molecule present in the cells of organisms. What will most likely happen if there is
More information2006 7.012 Problem Set 3 KEY
2006 7.012 Problem Set 3 KEY Due before 5 PM on FRIDAY, October 13, 2006. Turn answers in to the box outside of 68-120. PLEASE WRITE YOUR ANSWERS ON THIS PRINTOUT. 1. Which reaction is catalyzed by each
More informationName Date Period. 2. When a molecule of double-stranded DNA undergoes replication, it results in
DNA, RNA, Protein Synthesis Keystone 1. During the process shown above, the two strands of one DNA molecule are unwound. Then, DNA polymerases add complementary nucleotides to each strand which results
More informationData Analysis for Ion Torrent Sequencing
IFU022 v140202 Research Use Only Instructions For Use Part III Data Analysis for Ion Torrent Sequencing MANUFACTURER: Multiplicom N.V. Galileilaan 18 2845 Niel Belgium Revision date: August 21, 2014 Page
More informationGraph theoretic approach to analyze amino acid network
Int. J. Adv. Appl. Math. and Mech. 2(3) (2015) 31-37 (ISSN: 2347-2529) Journal homepage: www.ijaamm.com International Journal of Advances in Applied Mathematics and Mechanics Graph theoretic approach to
More informationThe Steps. 1. Transcription. 2. Transferal. 3. Translation
Protein Synthesis Protein synthesis is simply the "making of proteins." Although the term itself is easy to understand, the multiple steps that a cell in a plant or animal must go through are not. In order
More informationBuilt from 20 kinds of amino acids
Built from 20 kinds of amino acids Each Protein has a three dimensional structure. Majority of proteins are compact. Highly convoluted molecules. Proteins are folded polypeptides. There are four levels
More informationSearching Nucleotide Databases
Searching Nucleotide Databases 1 When we search a nucleic acid databases, Mascot always performs a 6 frame translation on the fly. That is, 3 reading frames from the forward strand and 3 reading frames
More informationHidden Markov Models
8.47 Introduction to omputational Molecular Biology Lecture 7: November 4, 2004 Scribe: Han-Pang hiu Lecturer: Ross Lippert Editor: Russ ox Hidden Markov Models The G island phenomenon The nucleotide frequencies
More informationCurrent Motif Discovery Tools and their Limitations
Current Motif Discovery Tools and their Limitations Philipp Bucher SIB / CIG Workshop 3 October 2006 Trendy Concepts and Hypotheses Transcription regulatory elements act in a context-dependent manner.
More informationSeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications
Product Bulletin Sequencing Software SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications Comprehensive reference sequence handling Helps interpret the role of each
More informationID of alternative translational initiation events. Description of gene function Reference of NCBI database access and relative literatures
Data resource: In this database, 650 alternatively translated variants assigned to a total of 300 genes are contained. These database records of alternative translational initiation have been collected
More information2. The number of different kinds of nucleotides present in any DNA molecule is A) four B) six C) two D) three
Chem 121 Chapter 22. Nucleic Acids 1. Any given nucleotide in a nucleic acid contains A) two bases and a sugar. B) one sugar, two bases and one phosphate. C) two sugars and one phosphate. D) one sugar,
More informationFrom DNA to Protein
Nucleus Control center of the cell contains the genetic library encoded in the sequences of nucleotides in molecules of DNA code for the amino acid sequences of all proteins determines which specific proteins
More informationJune 09, 2009 Random Mutagenesis
Why Mutagenesis? Analysis of protein function June 09, 2009 Random Mutagenesis Analysis of protein structure Protein engineering Analysis of structure-function relationship Analysis of the catalytic center
More informationRNA Structure and folding
RNA Structure and folding Overview: The main functional biomolecules in cells are polymers DNA, RNA and proteins For RNA and Proteins, the specific sequence of the polymer dictates its final structure
More informationAP BIOLOGY 2008 SCORING GUIDELINES
AP BIOLOGY 2008 SCORING GUIDELINES Question 1 1. The physical structure of a protein often reflects and affects its function. (a) Describe THREE types of chemical bonds/interactions found in proteins.
More informationBasic Concepts of DNA, Proteins, Genes and Genomes
Basic Concepts of DNA, Proteins, Genes and Genomes Kun-Mao Chao 1,2,3 1 Graduate Institute of Biomedical Electronics and Bioinformatics 2 Department of Computer Science and Information Engineering 3 Graduate
More informationThe world of non-coding RNA. Espen Enerly
The world of non-coding RNA Espen Enerly ncrna in general Different groups Small RNAs Outline mirnas and sirnas Speculations Common for all ncrna Per def.: never translated Not spurious transcripts Always/often
More informationDNA Sequencing Overview
DNA Sequencing Overview DNA sequencing involves the determination of the sequence of nucleotides in a sample of DNA. It is presently conducted using a modified PCR reaction where both normal and labeled
More informationBIOINFORMATICS TUTORIAL
Bio 242 BIOINFORMATICS TUTORIAL Bio 242 α Amylase Lab Sequence Sequence Searches: BLAST Sequence Alignment: Clustal Omega 3d Structure & 3d Alignments DO NOT REMOVE FROM LAB. DO NOT WRITE IN THIS DOCUMENT.
More informationBiological Sciences Initiative. Human Genome
Biological Sciences Initiative HHMI Human Genome Introduction In 2000, researchers from around the world published a draft sequence of the entire genome. 20 labs from 6 countries worked on the sequence.
More informationFlexible Information Visualization of Multivariate Data from Biological Sequence Similarity Searches
Flexible Information Visualization of Multivariate Data from Biological Sequence Similarity Searches Ed Huai-hsin Chi y, John Riedl y, Elizabeth Shoop y, John V. Carlis y, Ernest Retzel z, Phillip Barry
More informationRNA and Protein Synthesis
Name lass Date RN and Protein Synthesis Information and Heredity Q: How does information fl ow from DN to RN to direct the synthesis of proteins? 13.1 What is RN? WHT I KNOW SMPLE NSWER: RN is a nucleic
More informationLecture 3: Mutations
Lecture 3: Mutations Recall that the flow of information within a cell involves the transcription of DNA to mrna and the translation of mrna to protein. Recall also, that the flow of information between
More informationAP BIOLOGY 2010 SCORING GUIDELINES (Form B)
AP BIOLOGY 2010 SCORING GUIDELINES (Form B) Question 2 Certain human genetic conditions, such as sickle cell anemia, result from single base-pair mutations in DNA. (a) Explain how a single base-pair mutation
More information2. True or False? The sequence of nucleotides in the human genome is 90.9% identical from one person to the next. False (it s 99.
1. True or False? A typical chromosome can contain several hundred to several thousand genes, arranged in linear order along the DNA molecule present in the chromosome. True 2. True or False? The sequence
More informationExpression and Purification of Recombinant Protein in bacteria and Yeast. Presented By: Puspa pandey, Mohit sachdeva & Ming yu
Expression and Purification of Recombinant Protein in bacteria and Yeast Presented By: Puspa pandey, Mohit sachdeva & Ming yu DNA Vectors Molecular carriers which carry fragments of DNA into host cell.
More informationGenetics Lecture Notes 7.03 2005. Lectures 1 2
Genetics Lecture Notes 7.03 2005 Lectures 1 2 Lecture 1 We will begin this course with the question: What is a gene? This question will take us four lectures to answer because there are actually several
More informationGenetics Module B, Anchor 3
Genetics Module B, Anchor 3 Key Concepts: - An individual s characteristics are determines by factors that are passed from one parental generation to the next. - During gamete formation, the alleles for
More information10.2 ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS. The Jacobi Method
578 CHAPTER 1 NUMERICAL METHODS 1. ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS As a numerical technique, Gaussian elimination is rather unusual because it is direct. That is, a solution is obtained after
More information