Efficient codon optimization with motif engineering

Size: px
Start display at page:

Download "Efficient codon optimization with motif engineering"

Transcription

1 Anne Condon and Chris Thachuk University of British Columbia IWOCA 2011 June 20, 2011

2 Why synthesize proteins? Image credit: Protein Data Bank

3 Central dogma Image credit: Daniel Horspool

4 How to synthesize DNA/RNA

5

6

7

8

9

10

11 Genetic code is redundant

12 Genetic code is redundant

13 Genetic code is a many-to-one mapping

14 Genetic code is a many-to-one mapping Definition λ(i) denotes the i th amino acid λ j (i) denotes its j th codon λ(i) denotes its codon count

15 Genetic code is a many-to-one mapping Definition λ(i) denotes the i th amino acid λ j (i) denotes its j th codon λ(i) denotes its codon count Example λ(2) = Leucine λ(2) = 6 λ 1 (2) = CUA

16 Codon frequency

17 Codon frequency Definition Let ρ j (i) denotes the relative frequency of j th codon of i th amino acid.

18 Codon frequency Definition Let ρ j (i) denotes the relative frequency of j th codon of i th amino acid. Example λ(3) = Tyrosine ρ 1 (3) = ρ 2 (3) = = 0.9 = 0.1

19 Codon frequency Definition Let ρ j (i) denotes the relative frequency of j th codon of i th amino acid. Example λ(3) = Tyrosine ρ 1 (3) = ρ 2 (3) = = 0.9 = 0.1 λ 1 (3) is a most frequent codon

20 Codon fitness Definition Let τ j (i) denote the codon fitness of j th codon of i th amino acid. A codon s fitness is its frequency relative to the most frequent codon of the same amino acid.

21 Codon fitness Definition Let τ j (i) denote the codon fitness of j th codon of i th amino acid. A codon s fitness is its frequency relative to the most frequent codon of the same amino acid. Example λ(3) = Tyrosine τ 2 (3) = ρ 2(3) ρ 1 (3) =

22 The codon adaption index Given an amino acid sequence A = α 1, α 2,..., α A, and a corresponding codon design S = c 1, c 2,..., c A : Example A CAI(S, A) = τ ci (α i ) i=1 1 A A Tyrosine Tyrosine Tyrosine Tyrosine S UAC UAU UAU UAC CAI (S, A) = ( )

23 Optimizing CAI The CAI codon optimization problem Instance: Amino acid sequence A = α 1, α 2,..., α A. Problem: Find a codon design S corresponding to A such that: CAI (S, A) = max{cai (S, A) S S(A)} where S(A) is the set of all codon designs for A.

24 Optimizing CAI The CAI codon optimization problem Instance: Amino acid sequence A = α 1, α 2,..., α A. Problem: Find a codon design S corresponding to A such that: CAI (S, A) = max{cai (S, A) S S(A)} where S(A) is the set of all codon designs for A. Note: trivially solvable in linear time

25 Forbidden motifs and desirable motifs The catch Certain motifs (substrings) should not appear in the codon design.

26 Forbidden motifs and desirable motifs The catch Certain motifs (substrings) should not appear in the codon design. restriction enzyme motifs (e.g., MlyI restriction enzyme motif: GAGTC)

27 Forbidden motifs and desirable motifs The catch Certain motifs (substrings) should not appear in the codon design. restriction enzyme motifs (e.g., MlyI restriction enzyme motif: GAGTC) polyhomomeric regions (e.g., CCCCCC)

28 Forbidden motifs and desirable motifs The catch Certain motifs (substrings) should not appear in the codon design. restriction enzyme motifs (e.g., MlyI restriction enzyme motif: GAGTC) polyhomomeric regions (e.g., CCCCCC) Certain motifs (substrings) are desirable in the codon design. Immuno stimulatory motifs

29 Forbidden motifs and desirable motifs Example F = {CCUU, AGC, UGGC}

30 Forbidden motifs and desirable motifs Example F = {CCUU, AGC, UGGC}

31 Forbidden motifs and desirable motifs Example F = {CCUU, AGC, UGGC}

32 Forbidden motifs and desirable motifs Example F = {CCUU, AGC, UGGC}

33 Forbidden motifs and desirable motifs Example F = {CCUU, AGC, UGGC}

34 Forbidden motifs and desirable motifs Example Detecting motifs Equivalent to the dictionary matching problem. Classic dictionary (Aho & Corasick 1975) Succinct dictionary (Belazzougui 2010, Thachuk 2011) F = {CCUU, AGC, UGGC} Text of length h scanned for patterns in O(h) time.

35 Motifs are of constant length Observation If the largest forbidden or desired motif is of length g, then any forbidden or desired motif can span at most k + 1 consecutive codons, where k = g/3.

36 Optimizing CAI and considering motifs The CAI codon optimization problem with motif engineering Instance: Amino acid sequence A = α 1, α 2,..., α A, a set of forbidden motifs F, and a set of desired motifs D.

37 Optimizing CAI and considering motifs The CAI codon optimization problem with motif engineering Instance: Amino acid sequence A = α 1, α 2,..., α A, a set of forbidden motifs F, and a set of desired motifs D. Problem: Find a codon design S corresponding to A such that: S is valid, with respect to F and D, CAI (S, A) = max{cai (S, A) S S(A)}

38 Optimizing CAI and considering motifs The CAI codon optimization problem with motif engineering Instance: Amino acid sequence A = α 1, α 2,..., α A, a set of forbidden motifs F, and a set of desired motifs D. Problem: Find a codon design S corresponding to A such that: S is valid, with respect to F and D, CAI (S, A) = max{cai (S, A) S S(A)} where S(A) is the set of all valid codon designs for A.

39 The algorithm Overview and Notation We will develop a dynamic programming algorithm. (We ignore desirable motifs.) Notation λ j (i) j th codon of i th amino acid τ j (i) j th codon s fitness w.r.t i th amino acid

40 The algorithm Overview and Notation We will develop a dynamic programming algorithm. (We ignore desirable motifs.) Notation λ j (i) j th codon of i th amino acid τ j (i) j th codon s fitness w.r.t i th amino acid M F (λ ci (α j )... λ ci+q (α j+q )) count of forbidden motifs M F (λ c i (α j )... λ ci+q (α j+q )) count of forbidden motifs ending in last codon

41 The algorithm A tale of two matrices We make use of two k-dimensional DP matrices. The Forbidden motif DP matrix F i c i k+1,...,c i 1,c i Denotes the minimum possible number of forbidden motifs in a DNA sequence which codes for an amino acid sequence A = α 1, α 2,..., α i, given that the last k codons (of i total codons) have indices denoted as c i k+1,..., c i 1, c i.

42 The algorithm A tale of two matrices We make use of two k-dimensional DP matrices. The Forbidden motif DP matrix F i c i k+1,...,c i 1,c i Denotes the minimum possible number of forbidden motifs in a DNA sequence which codes for an amino acid sequence A = α 1, α 2,..., α i, given that the last k codons (of i total codons) have indices denoted as c i k+1,..., c i 1, c i. The CAI value DP matrix P i c i k+1,...,c i 1,c i Denotes the maximum possible CAI score among all valid sequences.

43 The algorithm The base case Simply evaluate all assignments to first k codons. Fc k ( 1,c 2,...,c k 1,c k = M F λc1 (α 1 )λ c2 (α 2 )... λ ck 1 (α k 1 )λ ck (α k ) ) k Pc k 1,c 2,...,c k 1,c k = (τ ci (α i )) i=1

44 The algorithm The base case Simply evaluate all assignments to first k codons. Fc k ( 1,c 2,...,c k 1,c k = M F λc1 (α 1 )λ c2 (α 2 )... λ ck 1 (α k 1 )λ ck (α k ) ) k Pc k 1,c 2,...,c k 1,c k = (τ ci (α i )) i=1 Complexity analysis at most 6 codons for each of the k positions O(6 k ) assignments evaluating one assignment for motifs O(k) time O(6 k k) time and O(6 k ) words of space

45 The algorithm The recursive case case (i > k) For a fixed assignment to last k codons: Fc i i k+1,...,c i 1,c i = min 1 ci k λ(α i k ) { Fc i 1 i k,...,c i 2,c i 1 + M F ( λci k (α i k )... λ ci 1 (α i 1 )λ ci (α i ) )} Example

46 The algorithm The recursive case case (i > k) For a fixed assignment to last k codons: Fc i i k+1,...,c i 1,c i = min 1 ci k λ(α i k ) { Fc i 1 i k,...,c i 2,c i 1 + M F ( λci k (α i k )... λ ci 1 (α i 1 )λ ci (α i ) )} Example

47 The algorithm The recursive case case (i > k) For a fixed assignment to last k codons: Fc i i k+1,...,c i 1,c i = min 1 ci k λ(α i k ) { Fc i 1 i k,...,c i 2,c i 1 + M F ( λci k (α i k )... λ ci 1 (α i 1 )λ ci (α i ) )} Example

48 The algorithm The recursive case case (i > k) For a fixed assignment to last k codons: Fc i i k+1,...,c i 1,c i = min 1 ci k λ(α i k ) { Fc i 1 i k,...,c i 2,c i 1 + M F ( λci k (α i k )... λ ci 1 (α i 1 )λ ci (α i ) )} Example

49 The algorithm The recursive case case (i > k) For a fixed assignment to last k codons: Pc i i k+1,...,c i 1,c i = max 1 ci k λ(α i k ) Fc i 1, if i k (,...,c i 2,c i 1 + M F λci k (α i k )... λ ci (α i ) ) Fc i i k+1,...,c i 1,c i τ ci (α i ) Pc i 1 i k,...,c i 2,c i 1, otherwise Example

50 The algorithm The recursive case case (i > k) P i k = F i k = max 1 c i λ(α i ) 1 c i 1 λ(α i 1 ). 1 c i k+1 λ(α i k+1 ) min 1 c i λ(α i ) 1 c i 1 λ(α i 1 ). 1 c i k+1 λ(α i k+1 ) { {F i ci k+1,...,ci 1,ci } Pc i i k+1,...,c i 1,c i, if Fc i i k+1,...,c i 1,c i = F k i, otherwise }

51 The algorithm The recursive case case (i > k) P i k = F i k = max 1 c i λ(α i ) 1 c i 1 λ(α i 1 ). 1 c i k+1 λ(α i k+1 ) min 1 c i λ(α i ) 1 c i 1 λ(α i 1 ). 1 c i k+1 λ(α i k+1 ) { {F i ci k+1,...,ci 1,ci } Pc i i k+1,...,c i 1,c i, if Fc i i k+1,...,c i 1,c i = F k i, otherwise } Complexity analysis O(6 k ) codon assignments evaluated at n positions O(k) time to evaluate assignment O(6 k kn) = O(n) time and space overall previous state-of-the-art θ(n 2 ) time/space (Satya et al., 2003)

52 Experimental setup Data set 3,157 sequences from GENCODE subset of ENCODE dataset comprises 1% of human genome representative of genome in various characteristics range in length from 75 to 8186 bases mean length of 173 bases (267 bases standard deviation)

53 Experimental setup Data set 3,157 sequences from GENCODE subset of ENCODE dataset comprises 1% of human genome representative of genome in various characteristics range in length from 75 to 8186 bases mean length of 173 bases (267 bases standard deviation) using codon frequencies of Escherichia coli

54 Experimental setup Data set 3,157 sequences from GENCODE subset of ENCODE dataset comprises 1% of human genome representative of genome in various characteristics range in length from 75 to 8186 bases mean length of 173 bases (267 bases standard deviation) using codon frequencies of Escherichia coli F = 10 D = 33 k = 3 (motifs of length 9 or less)

55 Experimental setup Data set 3,157 sequences from GENCODE subset of ENCODE dataset comprises 1% of human genome representative of genome in various characteristics range in length from 75 to 8186 bases mean length of 173 bases (267 bases standard deviation) using codon frequencies of Escherichia coli F = 10 D = 33 k = 3 (motifs of length 9 or less) Implementation & Hardware Implemented in C++ Pentium IV 2.4 GhZ 1 GB RAM

56 motif sets CAI value # forbidden # desired none (wild-types) 0.65 (0.06) 9.24 (16.24) 0.49 (1.06)

57 motif sets CAI value # forbidden # desired none (wild-types) 0.65 (0.06) 9.24 (16.24) 0.49 (1.06) forbidden 0.92 (0.04) 0.14 (0.45) 0 (0.00)

58 motif sets CAI value # forbidden # desired none (wild-types) 0.65 (0.06) 9.24 (16.24) 0.49 (1.06) forbidden 0.92 (0.04) 0.14 (0.45) 0 (0.00) forbidden & desired 0.83 (0.05) 0.14 (0.45) (14.84)

59

60

61 runtime [CPU seconds] all motifs only forbidden motifs sequence length

62 Conclusions & Future Work Conclusions CAI of gene can be optimized effectively motifs can be removed/added effectively O(n) time/space algorithm for constant length motifs algorithm is fast in practice

63 Conclusions & Future Work Conclusions CAI of gene can be optimized effectively motifs can be removed/added effectively O(n) time/space algorithm for constant length motifs algorithm is fast in practice Open Problem Design codon sequence free of secondary structure

agucacaaacgcu agugcuaguuua uaugcagucuua

agucacaaacgcu agugcuaguuua uaugcagucuua RNA Secondary Structure Prediction: The Co-transcriptional effect on RNA folding agucacaaacgcu agugcuaguuua uaugcagucuua By Conrad Godfrey Abstract RNA secondary structure prediction is an area of bioinformatics

More information

Gene Models & Bed format: What they represent.

Gene Models & Bed format: What they represent. GeneModels&Bedformat:Whattheyrepresent. Gene models are hypotheses about the structure of transcripts produced by a gene. Like all models, they may be correct, partly correct, or entirely wrong. Typically,

More information

Gene Finding CMSC 423

Gene Finding CMSC 423 Gene Finding CMSC 423 Finding Signals in DNA We just have a long string of A, C, G, Ts. How can we find the signals encoded in it? Suppose you encountered a language you didn t know. How would you decipher

More information

Module 3 Questions. 7. Chemotaxis is an example of signal transduction. Explain, with the use of diagrams.

Module 3 Questions. 7. Chemotaxis is an example of signal transduction. Explain, with the use of diagrams. Module 3 Questions Section 1. Essay and Short Answers. Use diagrams wherever possible 1. With the use of a diagram, provide an overview of the general regulation strategies available to a bacterial cell.

More information

Specific problems. The genetic code. The genetic code. Adaptor molecules match amino acids to mrna codons

Specific problems. The genetic code. The genetic code. Adaptor molecules match amino acids to mrna codons Tutorial II Gene expression: mrna translation and protein synthesis Piergiorgio Percipalle, PhD Program Control of gene transcription and RNA processing mrna translation and protein synthesis KAROLINSKA

More information

RNA & Protein Synthesis

RNA & Protein Synthesis RNA & Protein Synthesis Genes send messages to cellular machinery RNA Plays a major role in process Process has three phases (Genetic) Transcription (Genetic) Translation Protein Synthesis RNA Synthesis

More information

A Web Based Software for Synonymous Codon Usage Indices

A Web Based Software for Synonymous Codon Usage Indices International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 3 (2013), pp. 147-152 International Research Publications House http://www. irphouse.com /ijict.htm A Web

More information

Molecular Genetics. RNA, Transcription, & Protein Synthesis

Molecular Genetics. RNA, Transcription, & Protein Synthesis Molecular Genetics RNA, Transcription, & Protein Synthesis Section 1 RNA AND TRANSCRIPTION Objectives Describe the primary functions of RNA Identify how RNA differs from DNA Describe the structure and

More information

13.2 Ribosomes & Protein Synthesis

13.2 Ribosomes & Protein Synthesis 13.2 Ribosomes & Protein Synthesis Introduction: *A specific sequence of bases in DNA carries the directions for forming a polypeptide, a chain of amino acids (there are 20 different types of amino acid).

More information

PRACTICE TEST QUESTIONS

PRACTICE TEST QUESTIONS PART A: MULTIPLE CHOICE QUESTIONS PRACTICE TEST QUESTIONS DNA & PROTEIN SYNTHESIS B 1. One of the functions of DNA is to A. secrete vacuoles. B. make copies of itself. C. join amino acids to each other.

More information

Gene and Chromosome Mutation Worksheet (reference pgs. 239-240 in Modern Biology textbook)

Gene and Chromosome Mutation Worksheet (reference pgs. 239-240 in Modern Biology textbook) Name Date Per Look at the diagrams, then answer the questions. Gene Mutations affect a single gene by changing its base sequence, resulting in an incorrect, or nonfunctional, protein being made. (a) A

More information

AP BIOLOGY 2010 SCORING GUIDELINES (Form B)

AP BIOLOGY 2010 SCORING GUIDELINES (Form B) AP BIOLOGY 2010 SCORING GUIDELINES (Form B) Question 2 Certain human genetic conditions, such as sickle cell anemia, result from single base-pair mutations in DNA. (a) Explain how a single base-pair mutation

More information

Name Class Date. Figure 13 1. 2. Which nucleotide in Figure 13 1 indicates the nucleic acid above is RNA? a. uracil c. cytosine b. guanine d.

Name Class Date. Figure 13 1. 2. Which nucleotide in Figure 13 1 indicates the nucleic acid above is RNA? a. uracil c. cytosine b. guanine d. 13 Multiple Choice RNA and Protein Synthesis Chapter Test A Write the letter that best answers the question or completes the statement on the line provided. 1. Which of the following are found in both

More information

FINDING RELATION BETWEEN AGING AND

FINDING RELATION BETWEEN AGING AND FINDING RELATION BETWEEN AGING AND TELOMERE BY APRIORI AND DECISION TREE Jieun Sung 1, Youngshin Joo, and Taeseon Yoon 1 Department of National Science, Hankuk Academy of Foreign Studies, Yong-In, Republic

More information

The Steps. 1. Transcription. 2. Transferal. 3. Translation

The Steps. 1. Transcription. 2. Transferal. 3. Translation Protein Synthesis Protein synthesis is simply the "making of proteins." Although the term itself is easy to understand, the multiple steps that a cell in a plant or animal must go through are not. In order

More information

RNA Structure and folding

RNA Structure and folding RNA Structure and folding Overview: The main functional biomolecules in cells are polymers DNA, RNA and proteins For RNA and Proteins, the specific sequence of the polymer dictates its final structure

More information

LabGenius. Technical design notes. The world s most advanced synthetic DNA libraries. hi@labgeni.us V1.5 NOV 15

LabGenius. Technical design notes. The world s most advanced synthetic DNA libraries. hi@labgeni.us V1.5 NOV 15 LabGenius The world s most advanced synthetic DNA libraries Technical design notes hi@labgeni.us V1.5 NOV 15 Introduction OUR APPROACH LabGenius is a gene synthesis company focussed on the design and manufacture

More information

Scaling from 1 PC to a super computer using Mascot

Scaling from 1 PC to a super computer using Mascot Scaling from 1 PC to a super computer using Mascot 1 Mascot - options for high throughput When is more processing power required? Options: Multiple Mascot installations Using a large Unix server Clusters

More information

Basic Concepts of DNA, Proteins, Genes and Genomes

Basic Concepts of DNA, Proteins, Genes and Genomes Basic Concepts of DNA, Proteins, Genes and Genomes Kun-Mao Chao 1,2,3 1 Graduate Institute of Biomedical Electronics and Bioinformatics 2 Department of Computer Science and Information Engineering 3 Graduate

More information

Algorithms in Computational Biology (236522) spring 2007 Lecture #1

Algorithms in Computational Biology (236522) spring 2007 Lecture #1 Algorithms in Computational Biology (236522) spring 2007 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: Tuesday 11:00-12:00/by appointment TA: Ilan Gronau, Taub 700, tel 4894 Office

More information

The sequence of bases on the mrna is a code that determines the sequence of amino acids in the polypeptide being synthesized:

The sequence of bases on the mrna is a code that determines the sequence of amino acids in the polypeptide being synthesized: Module 3F Protein Synthesis So far in this unit, we have examined: How genes are transmitted from one generation to the next Where genes are located What genes are made of How genes are replicated How

More information

SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs

SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs Fabian Hueske, TU Berlin June 26, 21 1 Review This document is a review report on the paper Towards Proximity Pattern Mining in Large

More information

Coding sequence the sequence of nucleotide bases on the DNA that are transcribed into RNA which are in turn translated into protein

Coding sequence the sequence of nucleotide bases on the DNA that are transcribed into RNA which are in turn translated into protein Assignment 3 Michele Owens Vocabulary Gene: A sequence of DNA that instructs a cell to produce a particular protein Promoter a control sequence near the start of a gene Coding sequence the sequence of

More information

Overview of Eukaryotic Gene Prediction

Overview of Eukaryotic Gene Prediction Overview of Eukaryotic Gene Prediction CBB 231 / COMPSCI 261 W.H. Majoros What is DNA? Nucleus Chromosome Telomere Centromere Cell Telomere base pairs histones DNA (double helix) DNA is a Double Helix

More information

Bioinformatics Grid - Enabled Tools For Biologists.

Bioinformatics Grid - Enabled Tools For Biologists. Bioinformatics Grid - Enabled Tools For Biologists. What is Grid-Enabled Tools (GET)? As number of data from the genomics and proteomics experiment increases. Problems arise for the current sequence analysis

More information

Ms. Campbell Protein Synthesis Practice Questions Regents L.E.

Ms. Campbell Protein Synthesis Practice Questions Regents L.E. Name Student # Ms. Campbell Protein Synthesis Practice Questions Regents L.E. 1. A sequence of three nitrogenous bases in a messenger-rna molecule is known as a 1) codon 2) gene 3) polypeptide 4) nucleotide

More information

Translation Study Guide

Translation Study Guide Translation Study Guide This study guide is a written version of the material you have seen presented in the replication unit. In translation, the cell uses the genetic information contained in mrna to

More information

Molecular Facts and Figures

Molecular Facts and Figures Nucleic Acids Molecular Facts and Figures DNA/RNA bases: DNA and RNA are composed of four bases each. In DNA the four are Adenine (A), Thymidine (T), Cytosine (C), and Guanine (G). In RNA the four are

More information

From DNA to Protein. Proteins. Chapter 13. Prokaryotes and Eukaryotes. The Path From Genes to Proteins. All proteins consist of polypeptide chains

From DNA to Protein. Proteins. Chapter 13. Prokaryotes and Eukaryotes. The Path From Genes to Proteins. All proteins consist of polypeptide chains Proteins From DNA to Protein Chapter 13 All proteins consist of polypeptide chains A linear sequence of amino acids Each chain corresponds to the nucleotide base sequence of a gene The Path From Genes

More information

AP BIOLOGY 2009 SCORING GUIDELINES

AP BIOLOGY 2009 SCORING GUIDELINES AP BIOLOGY 2009 SCORING GUIDELINES Question 4 The flow of genetic information from DNA to protein in eukaryotic cells is called the central dogma of biology. (a) Explain the role of each of the following

More information

Protein Synthesis How Genes Become Constituent Molecules

Protein Synthesis How Genes Become Constituent Molecules Protein Synthesis Protein Synthesis How Genes Become Constituent Molecules Mendel and The Idea of Gene What is a Chromosome? A chromosome is a molecule of DNA 50% 50% 1. True 2. False True False Protein

More information

Name: Date: Period: DNA Unit: DNA Webquest

Name: Date: Period: DNA Unit: DNA Webquest Name: Date: Period: DNA Unit: DNA Webquest Part 1 History, DNA Structure, DNA Replication DNA History http://www.dnaftb.org/dnaftb/1/concept/index.html Read the text and answer the following questions.

More information

Thymine = orange Adenine = dark green Guanine = purple Cytosine = yellow Uracil = brown

Thymine = orange Adenine = dark green Guanine = purple Cytosine = yellow Uracil = brown 1 DNA Coloring - Transcription & Translation Transcription RNA, Ribonucleic Acid is very similar to DNA. RNA normally exists as a single strand (and not the double stranded double helix of DNA). It contains

More information

Lecture 6: Single nucleotide polymorphisms (SNPs) and Restriction Fragment Length Polymorphisms (RFLPs)

Lecture 6: Single nucleotide polymorphisms (SNPs) and Restriction Fragment Length Polymorphisms (RFLPs) Lecture 6: Single nucleotide polymorphisms (SNPs) and Restriction Fragment Length Polymorphisms (RFLPs) Single nucleotide polymorphisms or SNPs (pronounced "snips") are DNA sequence variations that occur

More information

2. The number of different kinds of nucleotides present in any DNA molecule is A) four B) six C) two D) three

2. The number of different kinds of nucleotides present in any DNA molecule is A) four B) six C) two D) three Chem 121 Chapter 22. Nucleic Acids 1. Any given nucleotide in a nucleic acid contains A) two bases and a sugar. B) one sugar, two bases and one phosphate. C) two sugars and one phosphate. D) one sugar,

More information

Graph theoretic approach to analyze amino acid network

Graph theoretic approach to analyze amino acid network Int. J. Adv. Appl. Math. and Mech. 2(3) (2015) 31-37 (ISSN: 2347-2529) Journal homepage: www.ijaamm.com International Journal of Advances in Applied Mathematics and Mechanics Graph theoretic approach to

More information

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want 1 When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want to search other databases as well. There are very

More information

Guide for Bioinformatics Project Module 3

Guide for Bioinformatics Project Module 3 Structure- Based Evidence and Multiple Sequence Alignment In this module we will revisit some topics we started to look at while performing our BLAST search and looking at the CDD database in the first

More information

Extensions of the Partial Least Squares approach for the analysis of biomolecular interactions. Nina Kirschbaum

Extensions of the Partial Least Squares approach for the analysis of biomolecular interactions. Nina Kirschbaum Dissertation am Fachbereich Statistik der Universität Dortmund Extensions of the Partial Least Squares approach for the analysis of biomolecular interactions Nina Kirschbaum Erstgutachter: Prof. Dr. W.

More information

Integrating DNA Motif Discovery and Genome-Wide Expression Analysis. Erin M. Conlon

Integrating DNA Motif Discovery and Genome-Wide Expression Analysis. Erin M. Conlon Integrating DNA Motif Discovery and Genome-Wide Expression Analysis Department of Mathematics and Statistics University of Massachusetts Amherst Statistics in Functional Genomics Workshop Ascona, Switzerland

More information

Integrated Protein Services

Integrated Protein Services Integrated Protein Services Custom protein expression & purification Version DC04-0012 Expression strategy The first step in the recombinant protein generation process is to design an appropriate expression

More information

Introduction to Genome Annotation

Introduction to Genome Annotation Introduction to Genome Annotation AGCGTGGTAGCGCGAGTTTGCGAGCTAGCTAGGCTCCGGATGCGA CCAGCTTTGATAGATGAATATAGTGTGCGCGACTAGCTGTGTGTT GAATATATAGTGTGTCTCTCGATATGTAGTCTGGATCTAGTGTTG GTGTAGATGGAGATCGCGTAGCGTGGTAGCGCGAGTTTGCGAGCT

More information

Transcription: RNA Synthesis, Processing & Modification

Transcription: RNA Synthesis, Processing & Modification Transcription: RNA Synthesis, Processing & Modification 1 Central dogma DNA RNA Protein Reverse transcription 2 Transcription The process of making RNA from DNA Produces all type of RNA mrna, trna, rrna,

More information

Multiclass Classification. 9.520 Class 06, 25 Feb 2008 Ryan Rifkin

Multiclass Classification. 9.520 Class 06, 25 Feb 2008 Ryan Rifkin Multiclass Classification 9.520 Class 06, 25 Feb 2008 Ryan Rifkin It is a tale Told by an idiot, full of sound and fury, Signifying nothing. Macbeth, Act V, Scene V What Is Multiclass Classification? Each

More information

Current Motif Discovery Tools and their Limitations

Current Motif Discovery Tools and their Limitations Current Motif Discovery Tools and their Limitations Philipp Bucher SIB / CIG Workshop 3 October 2006 Trendy Concepts and Hypotheses Transcription regulatory elements act in a context-dependent manner.

More information

Module 10: Bioinformatics

Module 10: Bioinformatics Module 10: Bioinformatics 1.) Goal: To understand the general approaches for basic in silico (computer) analysis of DNA- and protein sequences. We are going to discuss sequence formatting required prior

More information

Hidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006

Hidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006 Hidden Markov Models in Bioinformatics By Máthé Zoltán Kőrösi Zoltán 2006 Outline Markov Chain HMM (Hidden Markov Model) Hidden Markov Models in Bioinformatics Gene Finding Gene Finding Model Viterbi algorithm

More information

T cell Epitope Prediction

T cell Epitope Prediction Institute for Immunology and Informatics T cell Epitope Prediction EpiMatrix Eric Gustafson January 6, 2011 Overview Gathering raw data Popular sources Data Management Conservation Analysis Multiple Alignments

More information

Lecture 18: Applications of Dynamic Programming Steven Skiena. Department of Computer Science State University of New York Stony Brook, NY 11794 4400

Lecture 18: Applications of Dynamic Programming Steven Skiena. Department of Computer Science State University of New York Stony Brook, NY 11794 4400 Lecture 18: Applications of Dynamic Programming Steven Skiena Department of Computer Science State University of New York Stony Brook, NY 11794 4400 http://www.cs.sunysb.edu/ skiena Problem of the Day

More information

Bio-Informatics Lectures. A Short Introduction

Bio-Informatics Lectures. A Short Introduction Bio-Informatics Lectures A Short Introduction The History of Bioinformatics Sanger Sequencing PCR in presence of fluorescent, chain-terminating dideoxynucleotides Massively Parallel Sequencing Massively

More information

Vector NTI Advance 11 Quick Start Guide

Vector NTI Advance 11 Quick Start Guide Vector NTI Advance 11 Quick Start Guide Catalog no. 12605050, 12605099, 12605103 Version 11.0 December 15, 2008 12605022 Published by: Invitrogen Corporation 5791 Van Allen Way Carlsbad, CA 92008 U.S.A.

More information

Regents Biology REGENTS REVIEW: PROTEIN SYNTHESIS

Regents Biology REGENTS REVIEW: PROTEIN SYNTHESIS Period Date REGENTS REVIEW: PROTEIN SYNTHESIS 1. The diagram at the right represents a portion of a type of organic molecule present in the cells of organisms. What will most likely happen if there is

More information

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center Computational Challenges in Storage, Analysis and Interpretation of Next-Generation Sequencing Data Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center Next Generation Sequencing

More information

INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE Q5B

INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE Q5B INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE ICH HARMONISED TRIPARTITE GUIDELINE QUALITY OF BIOTECHNOLOGICAL PRODUCTS: ANALYSIS

More information

Recombinant DNA Unit Exam

Recombinant DNA Unit Exam Recombinant DNA Unit Exam Question 1 Restriction enzymes are extensively used in molecular biology. Below are the recognition sites of two of these enzymes, BamHI and BclI. a) BamHI, cleaves after the

More information

BCOR101 Midterm II Wednesday, October 26, 2005

BCOR101 Midterm II Wednesday, October 26, 2005 BCOR101 Midterm II Wednesday, October 26, 2005 Name Key Please show all of your work. 1. A donor strain is trp+, pro+, met+ and a recipient strain is trp-, pro-, met-. The donor strain is infected with

More information

Computer based Scheduling Tool for Multi-product Scheduling Problems

Computer based Scheduling Tool for Multi-product Scheduling Problems Computer based Scheduling Tool for Multi-product Scheduling Problems Computer based Scheduling Tool for Multi-product Scheduling Problems Adirake Chainual, Tawatchai Lutuksin and Pupong Pongcharoen Department

More information

Final Project Report

Final Project Report CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes

More information

Lecture Data Warehouse Systems

Lecture Data Warehouse Systems Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART A: Architecture Chapter 1: Motivation and Definitions Motivation Goal: to build an operational general view on a company to support decisions in

More information

(http://genomes.urv.es/caical) TUTORIAL. (July 2006)

(http://genomes.urv.es/caical) TUTORIAL. (July 2006) (http://genomes.urv.es/caical) TUTORIAL (July 2006) CAIcal manual 2 Table of contents Introduction... 3 Required inputs... 5 SECTION A Calculation of parameters... 8 SECTION B CAI calculation for FASTA

More information

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering

More information

SICKLE CELL ANEMIA & THE HEMOGLOBIN GENE TEACHER S GUIDE

SICKLE CELL ANEMIA & THE HEMOGLOBIN GENE TEACHER S GUIDE AP Biology Date SICKLE CELL ANEMIA & THE HEMOGLOBIN GENE TEACHER S GUIDE LEARNING OBJECTIVES Students will gain an appreciation of the physical effects of sickle cell anemia, its prevalence in the population,

More information

BLAST. Anders Gorm Pedersen & Rasmus Wernersson

BLAST. Anders Gorm Pedersen & Rasmus Wernersson BLAST Anders Gorm Pedersen & Rasmus Wernersson Database searching Using pairwise alignments to search databases for similar sequences Query sequence Database Database searching Most common use of pairwise

More information

2006 7.012 Problem Set 3 KEY

2006 7.012 Problem Set 3 KEY 2006 7.012 Problem Set 3 KEY Due before 5 PM on FRIDAY, October 13, 2006. Turn answers in to the box outside of 68-120. PLEASE WRITE YOUR ANSWERS ON THIS PRINTOUT. 1. Which reaction is catalyzed by each

More information

A Mathematical Programming Solution to the Mars Express Memory Dumping Problem

A Mathematical Programming Solution to the Mars Express Memory Dumping Problem A Mathematical Programming Solution to the Mars Express Memory Dumping Problem Giovanni Righini and Emanuele Tresoldi Dipartimento di Tecnologie dell Informazione Università degli Studi di Milano Via Bramante

More information

Amino Acids and Their Properties

Amino Acids and Their Properties Amino Acids and Their Properties Recap: ss-rrna and mutations Ribosomal RNA (rrna) evolves very slowly Much slower than proteins ss-rrna is typically used So by aligning ss-rrna of one organism with that

More information

Dynamics of Biological Systems

Dynamics of Biological Systems Dynamics of Biological Systems Part I - Biological background and mathematical modelling Paolo Milazzo (Università di Pisa) Dynamics of biological systems 1 / 53 Introduction The recent developments in

More information

Analysis of Algorithms I: Binary Search Trees

Analysis of Algorithms I: Binary Search Trees Analysis of Algorithms I: Binary Search Trees Xi Chen Columbia University Hash table: A data structure that maintains a subset of keys from a universe set U = {0, 1,..., p 1} and supports all three dictionary

More information

Basic attributes of genetic processes (replication, transcription, translation)

Basic attributes of genetic processes (replication, transcription, translation) 411-3 2008 Lecture notes I. First general topic in the course will be mutation (in broadest sense, any change to an organismʼs genetic material). Intimately intertwined with this is the process of DNA

More information

SR2000 FREQUENCY MONITOR

SR2000 FREQUENCY MONITOR SR2000 FREQUENCY MONITOR THE FFT SEARCH FUNCTION IN DETAILS FFT Search is a signal search using FFT (Fast Fourier Transform) technology. The FFT search function first appeared with the SR2000 Frequency

More information

FREE computing using Amazon EC2

FREE computing using Amazon EC2 FREE computing using Amazon EC2 Seong-Hwan Jun 1 1 Department of Statistics Univ of British Columbia Nov 1st, 2012 / Student seminar Outline Basics of servers Amazon EC2 Setup R on an EC2 instance Stat

More information

MUTATION, DNA REPAIR AND CANCER

MUTATION, DNA REPAIR AND CANCER MUTATION, DNA REPAIR AND CANCER 1 Mutation A heritable change in the genetic material Essential to the continuity of life Source of variation for natural selection New mutations are more likely to be harmful

More information

160 CHAPTER 4. VECTOR SPACES

160 CHAPTER 4. VECTOR SPACES 160 CHAPTER 4. VECTOR SPACES 4. Rank and Nullity In this section, we look at relationships between the row space, column space, null space of a matrix and its transpose. We will derive fundamental results

More information

Data Integration via Constrained Clustering: An Application to Enzyme Clustering

Data Integration via Constrained Clustering: An Application to Enzyme Clustering Data Integration via Constrained Clustering: An Application to Enzyme Clustering Elisa Boari de Lima Raquel Cardoso de Melo Minardi Wagner Meira Jr. Mohammed Javeed Zaki Abstract When multiple data sources

More information

Introduction to Principal Components and FactorAnalysis

Introduction to Principal Components and FactorAnalysis Introduction to Principal Components and FactorAnalysis Multivariate Analysis often starts out with data involving a substantial number of correlated variables. Principal Component Analysis (PCA) is a

More information

Pairwise Sequence Alignment

Pairwise Sequence Alignment Pairwise Sequence Alignment carolin.kosiol@vetmeduni.ac.at SS 2013 Outline Pairwise sequence alignment global - Needleman Wunsch Gotoh algorithm local - Smith Waterman algorithm BLAST - heuristics What

More information

European Medicines Agency

European Medicines Agency European Medicines Agency July 1996 CPMP/ICH/139/95 ICH Topic Q 5 B Quality of Biotechnological Products: Analysis of the Expression Construct in Cell Lines Used for Production of r-dna Derived Protein

More information

Biological Databases and Protein Sequence Analysis

Biological Databases and Protein Sequence Analysis Biological Databases and Protein Sequence Analysis Introduction M. Madan Babu, Center for Biotechnology, Anna University, Chennai 25, India Bioinformatics is the application of Information technology to

More information

Bioinformatics Resources at a Glance

Bioinformatics Resources at a Glance Bioinformatics Resources at a Glance A Note about FASTA Format There are MANY free bioinformatics tools available online. Bioinformaticists have developed a standard format for nucleotide and protein sequences

More information

Lecture 2, Introduction to Python. Python Programming Language

Lecture 2, Introduction to Python. Python Programming Language BINF 3360, Introduction to Computational Biology Lecture 2, Introduction to Python Young-Rae Cho Associate Professor Department of Computer Science Baylor University Python Programming Language Script

More information

Chapter 4. SDP - difficulties. 4.1 Curse of dimensionality

Chapter 4. SDP - difficulties. 4.1 Curse of dimensionality Chapter 4 SDP - difficulties So far we have discussed simple problems. Not necessarily in a conceptual framework, but surely in a computational. It makes little sense to discuss SDP, or DP for that matter,

More information

Hands on Simulation of Mutation

Hands on Simulation of Mutation Hands on Simulation of Mutation Charlotte K. Omoto P.O. Box 644236 Washington State University Pullman, WA 99164-4236 omoto@wsu.edu ABSTRACT This exercise is a hands-on simulation of mutations and their

More information

The EcoCyc Curation Process

The EcoCyc Curation Process The EcoCyc Curation Process Ingrid M. Keseler SRI International 1 HOW OFTEN IS THE GOLDEN GATE BRIDGE PAINTED? Many misconceptions exist about how often the Bridge is painted. Some say once every seven

More information

HYBRID GENETIC ALGORITHMS FOR SCHEDULING ADVERTISEMENTS ON A WEB PAGE

HYBRID GENETIC ALGORITHMS FOR SCHEDULING ADVERTISEMENTS ON A WEB PAGE HYBRID GENETIC ALGORITHMS FOR SCHEDULING ADVERTISEMENTS ON A WEB PAGE Subodha Kumar University of Washington subodha@u.washington.edu Varghese S. Jacob University of Texas at Dallas vjacob@utdallas.edu

More information

Figure 1: RotemNet Main Screen

Figure 1: RotemNet Main Screen 1 REMOTE CONTROLLER ACCESS This paper summarizes the installation and configuration procedures needed to enable accessing your Communicator and controllers via the Internet. The information contained in

More information

Integrating Apache Spark with an Enterprise Data Warehouse

Integrating Apache Spark with an Enterprise Data Warehouse Integrating Apache Spark with an Enterprise Warehouse Dr. Michael Wurst, IBM Corporation Architect Spark/R/Python base Integration, In-base Analytics Dr. Toni Bollinger, IBM Corporation Senior Software

More information

Concluding lesson. Student manual. What kind of protein are you? (Basic)

Concluding lesson. Student manual. What kind of protein are you? (Basic) Concluding lesson Student manual What kind of protein are you? (Basic) Part 1 The hereditary material of an organism is stored in a coded way on the DNA. This code consists of four different nucleotides:

More information

MS SQL Installation Guide

MS SQL Installation Guide MS SQL Installation Guide Microsoft SQL Database For Debtors Manager Table of contents 1. Overview 2. Minimum server installation requirements for MS SQL 3. Installing MS SQL on your server 4. Installing

More information

Protein Prospector and Ways of Calculating Expectation Values

Protein Prospector and Ways of Calculating Expectation Values Protein Prospector and Ways of Calculating Expectation Values 1/16 Aenoch J. Lynn; Robert J. Chalkley; Peter R. Baker; Mark R. Segal; and Alma L. Burlingame University of California, San Francisco, San

More information

Heuristics for the Sorting by Length-Weighted Inversions Problem on Signed Permutations

Heuristics for the Sorting by Length-Weighted Inversions Problem on Signed Permutations Heuristics for the Sorting by Length-Weighted Inversions Problem on Signed Permutations AlCoB 2014 First International Conference on Algorithms for Computational Biology Thiago da Silva Arruda Institute

More information

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large

More information

Integrated Protein Services

Integrated Protein Services Integrated Protein Services Custom protein expression & purification Last date of revision June 2015 Version DC04-0013 www.iba-lifesciences.com Expression strategy The first step in the recombinant protein

More information

LECTURE 6 Gene Mutation (Chapter 16.1-16.2)

LECTURE 6 Gene Mutation (Chapter 16.1-16.2) LECTURE 6 Gene Mutation (Chapter 16.1-16.2) 1 Mutation: A permanent change in the genetic material that can be passed from parent to offspring. Mutant (genotype): An organism whose DNA differs from the

More information

Molecular Computing. david.wishart@ualberta.ca 3-41 Athabasca Hall Sept. 30, 2013

Molecular Computing. david.wishart@ualberta.ca 3-41 Athabasca Hall Sept. 30, 2013 Molecular Computing david.wishart@ualberta.ca 3-41 Athabasca Hall Sept. 30, 2013 What Was The World s First Computer? The World s First Computer? ENIAC - 1946 Antikythera Mechanism - 80 BP Babbage Analytical

More information

Step by Step Guide to Importing Genetic Data into JMP Genomics

Step by Step Guide to Importing Genetic Data into JMP Genomics Step by Step Guide to Importing Genetic Data into JMP Genomics Page 1 Introduction Data for genetic analyses can exist in a variety of formats. Before this data can be analyzed it must imported into one

More information

Interactive Level-Set Deformation On the GPU

Interactive Level-Set Deformation On the GPU Interactive Level-Set Deformation On the GPU Institute for Data Analysis and Visualization University of California, Davis Problem Statement Goal Interactive system for deformable surface manipulation

More information

Time series experiments

Time series experiments Time series experiments Time series experiments Why is this a separate lecture: The price of microarrays are decreasing more time series experiments are coming Often a more complex experimental design

More information

Data Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov

Data Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov Data Integration Lectures 16 & 17 Lectures Outline Goals for Data Integration Homogeneous data integration time series data (Filkov et al. 2002) Heterogeneous data integration microarray + sequence microarray

More information

Chapter 3. if 2 a i then location: = i. Page 40

Chapter 3. if 2 a i then location: = i. Page 40 Chapter 3 1. Describe an algorithm that takes a list of n integers a 1,a 2,,a n and finds the number of integers each greater than five in the list. Ans: procedure greaterthanfive(a 1,,a n : integers)

More information