Biotechnol. Prog. 2007, 23, Alignment-Based Approach for Durable Data Storage into Living Organisms
|
|
|
- Alexia Francis
- 10 years ago
- Views:
Transcription
1 Biotechnol. Prog. 2007, 23, NOTES Alignment-Based Approach for Durable Data Storage into Living Organisms Nozomu Yachie,, Kazuhide Sekiyama,, Junichi Sugahara,, Yoshiaki Ohashi,*,, and Masaru Tomita,,, Institute for Advanced Biosciences, Keio University, Nipponkoku, Daihoji, Tsuruoka, Yamagata, Japan; Bioinformatics Program, Graduate School of Media and Governance, Keio University, 5322 Endo, Fujisawa, Kanagawa, Japan; Department of Environmental Information, Keio University, 5322 Endo, Fujisawa, Kanagawa, Japan; and Human Metabolome Technologies, Inc., Mizukami, Kakuganji, Tsuruoka, Yamagata, Japan The practical realization of DNA data storage is a major scientific goal. Here we introduce a simple, flexible, and robust data storage and retrieval method based on sequence alignment of the genomic DNA of living organisms. Duplicated data encoded by different oligonucleotide sequences was inserted redundantly into multiple loci of the Bacillus subtilis genome. Multiple alignment of the bit data sequences decoded by B. subtilis genome sequences enabled the retrieval of stable and compact data without the need for template DNA, parity checks, or error-correcting algorithms. Combined with the computational simulation of data retrieval from mutated message DNA, a practical use of this alignment-based method is discussed. Introduction DNA has often been proposed to be a reproducible, heritable media in which a substantial amount of information can be included in its nucleotide sequence, while paper, magnetic media, and silicon chips are used for data storage today. Since all of these media are easily destroyed by personal and natural accidents, they are in need of endless attention to maintain their contents. Development of a DNA memory technology utilizing living organisms is of much greater potential than any of the existing counterparts to render a service of inheriting data. The robustness of DNA data, however, ensures the maintenance of archived information over extensive periods of time (hundreds to thousands of years) (1-4). Message DNA has been used in steganography (5), as a means of short trademarks/signatures (6) and long-term storage (3, 7). For economical and secure data encoding, sophisticated designs of DNA message regions and template primer sequences based on the polymerase chain reaction (PCR) method have been reported for data retrieval (4, 8). Various codes have been developed for encryption in DNA sequences, including the economical Huffman code based on Huffman s algorithm that is used for short-term data storage (4). Similar to the parity-bit operation within barcodes of goods, the comma code uses a single base to punctuate the message and create an automatic DNA reading frame, while the redundant alternate code comprises an alternating sequence of purines and pyrimidines; these codes utilize DNA durability to archive long-term data (4). Retrieval of these forms of data requires PCR-based * To whom correspondence should be addressed. Tel: Fax: [email protected]. Institute for Advanced Biosciences, Keio University. Graduate School of Media and Governance, Keio University. Department of Environmental Information, Keio University. Human Metabolome Technologies, Inc. amplification, using template primer sequences, and DNA sequencing. In order to avoid misreading caused by wobble DNA annealing, a comma-free DNA design with attention to template DNA was proposed (8). However, these codes are not fully essential for the encoded DNA region to offer some safekeeping against mutation, which might make alternations to the sense of encoded data (4). Degradation of data could occur during DNA preparation in the laboratory, during storage, or at data retrieval (4). Additionally, genetic evolution and adaptation of the host organism could lead to the accumulation of mutations such as point mutations, insertions, and deletions in chromosomal DNA, resulting in destruction of data inheritance. Magnetic disk drives store data securely by duplicating redundant data or parity codes into different sectors; data are then retrieved by the assembly of partial sectors. Here, we propose a method to copy and paste data within the genomic sequence of a living organism, Bacillus subtilis, thus acquiring versatile data storage and the robustness of data inheritance. Two or more different features of oligonucleotide DNA molecules compressed from the same data with an economical code were duplicated into multiple genomic loci for data storage. The encoded data is then retrievable by complete genome sequencing and searching for duplicated coding regions using multiple alignments of all the possible decoded bit-data sequences of the genomic DNA. Data durability was further discussed and estimated by computational simulation. We propose that this alignment-based method will be most beneficial in utilizing DNA as trademarks/signatures of living modified organisms (LMOs) and as valuable heritable media. Materials and Methods Conversion of Messages into Genetic Sequences. We encoded the message E)mc ! into the B. subtilis genome. The message was initially converted into binary /bp060261y CCC: $ American Chemical Society and American Institute of Chemical Engineers Published on Web 01/25/2007
2 502 Biotechnol. Prog., 2007, Vol. 23, No. 2 Figure 1. Encryption of a message in the B. subtilis genome. (A) The message E)mc !, including the shift and space keys, was converted into hexadecimal code (separated by %) according to the Keyboard Scan Code (Set2). The binary code was then produced by four-bit representation of each hexadecimal number. (B) The encryption key used to encode the message in DNA. Each pattern of four-bit binary code is represented by dinucleotides. (C) DNA cassettes C1, C2, C3, and C4 were translated by one-bit by one-bit frame shifting. Two sets of dinucleotides at the 5 and 3 regions encode different binary codes (see Table 1 for details). (D) Cassettes were subcloned into phash202 or phash203 plasmids and inserted into the B. subtilis genome. sequence according to the make signals of the Keyboard Scan Code Set2 (9) that represent all keyboard inputs (Figure 1A). The hexadecimal codes generated from keyboard inputs were shifted into bit data, and the encryption keys translated each four-bit binary code into dinucleotides (Figure 1B). Because every encryption key has the reading frame size of four-bit, it is possible to translate a bit data sequence into four multiple oligonucleotide sequences in the four different reading frames (Figure 1C). In this study, four of all of the possible different oligonucleotide cassettes, C1, C2, C3 and C4, were designed for the data storage of the message E)mc ! (Table 1). Introduction of Data into Plasmid Vectors. Introduction of the message sequences, C1 to C4, was carried out as summarized in Figure 1D. The chemically synthesized oligonucleotides C1 and C2 were annealed and subcloned into the T-extended SmaI and PVuII sites, respectively, of phash202 (pspiber01); C3 and C4 were cloned into the SmaI and PVuII sites, respectively, of phash203 (pspiber02) (10). Escherichia coli strain DH5R was used as the cloning host. The sequence of each DNA cassette was confirmed using the DNA sequencer ABI3100 (Applied Bioscience, CA). Transformation of B. subtilis. B. subtilis competent cells were prepared and transformed by the two-step culture method (11). In order to introduce the cassettes, B. subtilis BEST2136 was used as a recipient (Table 2). This strain is resistant to erythromycin (Em), blasticidin S (BS), and tetracycline (Tc). The GB01 cells were generated by a mixture of 0.1 µg ml -1 of BEST2136 genomic DNA and 1 µg ml -1 of the pspiber01 plasmid. The chloramphenicol (Cm)-resistant transformants were selected on LB medium plates containing 5 µg ml -1 of Cm. The colonies that appeared after incubation at 37 C overnight displayed an Em-sensitive and BS/Cm/Tc-resistant phenotype, indicating that the cassette C1 and C2 had been introduced within the metb::pbrem locus in the genome. Then, the GB02 cells were generated by a mixture of 0.1 µg ml -1 of GB01 genomic DNA and 1 µg ml -1 of the pspiber02 plasmid. The Em-resistant transformants were selected on LB medium plates containing 5 µg ml -1 of Em. After overnight incubation at 37 C, the cells were BS-sensitive and Cm/Em/Tc-resistant, indicating that the cassettes C3 and C4 had been introduced within the prob::pbrbs locus in the genome. The four sequences of cassettes within the genomic DNA of the data-
3 Biotechnol. Prog., 2007, Vol. 23, No Table 1. Oligonucleotide Sequences Encoding E)mc ! cassette oligonucleotide sequence a encoded bit data C1 C2 C3 C4 AA,CA,GA,GA,AC,CA,GA,AC,GT,TA,GG,GA, CA,CC,CC,CA,GT,GA,CG,CA,GC,AC,GC,AC, CC,GA,GT,CA,GA,CA,GC,TT (64 nt) GT,GA,AC,AC,AG,GA,AC,CG,AT,TC,AC,AC, GA,GG,GG,TA,AT,CC,GA,GA,AT,AG,AT,AG, GG,CC,AT,GA,AC,GA,AT (62 nt) AA,AC,AG,CG,AA,AC,CG,TA,AG,GT,AG,AG, CC,CC,AC,TC,AG,GG,AC,CC,CG,CA,CG,CA, AC,TG,AG,AC,AG,CC,AG (62 nt) AG,CG,CA,GA,AA,CG,GA,TC,CA,CT,CA,AA, GG,GG,AG,TT,CA,AC,AG,TG,GA,TA,GA,GA, CG,TC,AA,CG,AA,TG,TC (62 nt) 0000,0001,0010,0010,0100,0001,0010,0100,1110,0011,1010,0010,0001, 0101,0101,0001,1110,0010,1001,0001,0110,0100,0110,0100,0101,0010, 1110,0001,0010,0001,0110,1111 (128 bit) 1110,0010,0100,0100,1000,0010,0100,1001,1100,0111,0100,0100,0010, 1010,1010,0011,1100,0101,0010,0010,1100,1000,1100,1000,1010,0101, 1100,0010,0100,0010,1100 (124 bit) 0000,0100,1000,1001,0000,0100,1001,0011,1000,1110,1000,1000,0101, 0101,0100,0111,1000,1010,0100,0101,1001,0001,1001,0001,0100,1011, 1000,0100,1000,0101,1000 (124 bit) 1000,1001,0001,0010,0000,1001,0010,0111,0001,1101,0001,0000,1010, 1010,1000,1111,0001,0100,1000,1011,0010,0011,0010,0010,1001,0111, 0000,1001,0000,1011,0111 (124 bit) a Nucleotide sequences are given in the 5 f 3 direction. Dinucleotides separated by commas indicate encryption keys transformed by 4-bit binary codes. The 5 and 3 outer regions of bit data sequences (underlined) were designed differently to identify initiations and terminations of encoded data by sequence alignment. Table 2. Bacterial Strains Used strain relevant genotype description a BEST2136 metb::pbrem prob::pbrbs recipient strain, Em/BS/Tc-resistant GB01 metb::cat::(c1, C2) prob::pbrbs Em-sensitive, Cm/BS/Tc-resistant, with cassette C1 and C2 GB02 metb::cat::(c1, C2) prob::erm::(c3, C4) BS-sensitive, Em/Cm/Tc-resistant, with cassette C1, C2, C3, and C4 a BS, blasticidin S; Cm, chloramphenicol; Em, erythromycin; Tc, tetracycline. encoded strain GB02 were confirmed by DNA sequencing (ABI3100, Applied Bioscience, CA). Estimation of Data Durability by Computational Simulation. In order to estimate the durability of the data obtained with our method, we computationally simulated data retrievals and calculated the data recovery rates from mutated and/or deleted cassette sequences. We prepared 10,000 sets including four pairs of different nucleotide cassettes and their corresponding bit data sequences harboring the same data region, varying in length from 100 to 10,000 bits. The nucleotide sequences were randomly mutated and deleted of their partial fragments in silico, and their corresponding bit data sequences were changed. Then, in each set, the mutation/deletion rate of the total of the four cassette sequences was calculated, and the data recovery rate was estimated; we calculated the percent of positions where less than half of the bit characters were changed in the data region. Conducting the same procedure, the recovery rates of the data encoded by two and three cassettes were calculated. Results and Discussion The concept presented here is the use of sequence alignment to retrieve encoded data that are duplicated in separate genomic loci. For data storage, two or more sequences with different oligonucleotide features are compressed and copied from the same bit data through multiple data-to-nucleotide translations and then inserted into the genomic DNA. Data retrieval depends on the multiple alignment of all possible decoded bit data sequences from the genomic DNA. According to alignmentbased data retrieval from the complete genome with multiple inserted sequences, data durability, error-correcting, and parity check operations are all supplemented without any other qualified DNA code or template DNA design. Data Retrieval by Complete Genome Sequencing. Using this alignment-based approach, we demonstrated data storage in B. subtilis genomic DNA. For the secure retrieval of encoded data, our stratagem does not require additional material such as template DNA, only the sequencing of the complete genome. Data retrieval conducted by PCR-based amplification using the template DNAs designed for both the 5 - and 3 -regions of the single data region is vulnerable to the breakage of either of the DNA annealing sites, which are essential even to read the partial region of encoded data. Currently, a single DNA sequencer requires a 24-h day to sequence the equivalent of a bacterial genome, up to about two million bases (12). However, a growing demand for greater speeds and lower costs is pushing the development of new sequence technologies, and it is likely that new machines will be capable of reading one million bases per second in the relatively near future (12). This alignment-aided data storage and retrieval hence will be better suited to practical fast-lane genome sequencing techniques. Preparation of Duplicated Data Cassettes. In order to recover data by sequence alignment, the length of data sequence cannot be shorter than a certain length, as it can otherwise occur by chance in the host genome. For instance, when a message data is by two keyboard inputs, our encryption key method creates an eight-nucleotide sequence within each cassette, which may have several possible alignment patterns at the genomic level. In such a case, to retrieve data securely we therefore need to add more redundant descriptions or contrive more complex codes that make longer data regions in each cassette. Although many bacteria species presumably contain a number of naturally repeating sequences, synthetic sequences of more than 20 nucleotides is a sufficiently specific feature (p < ) in the 4.2 Mbp B. subtilis genomic DNA. In order to prevent damage to genomic DNA during evolution, such as the deletion of junk DNA sequences especially caused by homologous recombination (13, 14), our methodology introduced different nucleotide sequences encoding the same data by multiple data compression paths. Construction of a large piece of synthetic DNA is, however, still likely to be technically demanding and expensive. Although each nucleotide feature of the duplicated data sequence differed from each other and redundant descriptions were required to encode data, the direct data-to-nucleotide transitions maximized the information amount in each nucleotide cassette without any qualified DNA code, and the one-bit by one-bit frame shifting of the four-bit data windows generated the lossless data copies. Therefore, the cost of synthetic DNAs in this methodology is on a par with those of previous studies (4). The encryption keys of four-bit data to
4 504 Biotechnol. Prog., 2007, Vol. 23, No. 2 Figure 2. DNA sequencing of the four cassettes inserted into the B. subtilis genome. The genome sequences around the cassettes C1, C2, C3, and C4 (boxed) are shown in A, B, C, and D, respectively. According to the encryption keys, all possible dinucleotides were decompressed to the four-bit binary codes, which are displayed above each nucleotide sequence. dinucleotides used in this study can replicate a maximum of four cassettes, but the number of cassettes is not limited by other encryption keys. For example, the encryption keys of eight-bit data to tetranucleotides have the potential to replicate data in eight cassettes. Complementation of Durability by Alignment-Based Data Assembly. For data retrieval, genome sequences with multiple inserted nucleotide cassettes are decompressed to all possible patterns of bit data sequences. In this study, the encryption keys represent the one-to-one correspondence of dinucleotides and four-bit binary codes, and two of the bit data sequences are generated by a nucleotide sequence using two-bp sliding windows with one-bp displacement; the total of four possible bit data sequences are decompressed by the double-stranded genome sequence. Without any template DNA and parity check code within the bounds of data sequences, the encoded data and its sectional data breaks are recognizable by multiple alignments, searching conserved regions against all the decompressed bit data sequences. Point mismatch and gap space indicate mutations and partial deletions or insertions of genome sequence, respectively. The alignment result provides a complemented complete data region and is achieved by the merger of multiple decompressed patterns from the nucleotide fragments of the inserted cassettes. It complements frameshifts and adjusts encryption key reading frame misalignments. As such, it is similar to TBLASTX, an alignment program utilizing genetic codon rules that outputs possible amino acid sequences that are computationally translated from subjected nucleic acid sequences (15). Although the multiplication of cassettes in this technique does lead to redundant volumes, data are retrievable independently of any other error-correcting algorithms, resulting in the advantage of data stability. Previous studies have revealed that intricately intertwined parity effects within single data-coding regions cost a certain volume of data sequence (4). The data recovery rate is fragile and proportional to data breakage, particularly that which occurs through long-range DNA deletion. For the approximation of data stability, we computationally simulated data retrievals from cassettes that were randomly mutated and deleted (Figure 3). In this simulation result, three or more cassettes were indicated to ensure a comparatively high recovery rate. Over 99% of the encoded data was recovered from four cassettes when their mutation/deletion rates were less than 15%. Moreover, the positions of the data breakages were easily identified by the alignment results when these data could not be rescued. The encoding of multiple data into separate sections, much like the redundant arrays of inexpensive disk (RAID) (16) technology frequently used in magnetic disk data storage, undoubtedly contributes to the stability and durability of the data. In addition, the absence of boundary characteristics from the template DNAs seen in previous studies enables the boundaries of the encoded sequences within genomic DNA to also be identified by alignment.
5 Biotechnol. Prog., 2007, Vol. 23, No Institute for Advanced Biosciences, Keio University, for their critical discussions, Dr. Kazuharu Arakawa for helpful suggestions on the preparation of the manuscript, and Professor Mitsuhiro Itaya for providing B. subtilis strain BEST2136. References and Notes Figure 3. Expected data recovery rate. The solid lines represent the average data recovery rate per head of mutation/deletion. Dotted lines and gray-filled areas indicate two-sided 95% confidence intervals. Data Storage into the B. subtilis Genomic DNA. Data storage into B. subtilis cells was performed by a two-step culture method. B. subtilis BEST2136 was transformed using pspib- ER01, generating GB01 (Table 2). Sequentially, GB01 was transformed using pspiber02, generating GB02 (Table 2). This strain harbors C1 and C2 at the metb locus and C3 and C4 at the prob locus. Insertion of the four cassettes was confirmed by DNA sequencing (Figure 2). GB02 is available upon request. Practical Use of Long-Term and Large-Volume Data Storage. We have successfully demonstrated the data storage of the message E)mc ! into the B. subtilis strain BEST2136. One of the advantages of inserting data into organisms is the realization of data inheritance. Although these media vessels carry a significant risk of transmuting data in their evolutionary processes, our simple alignment-based approach effectuates a higher durability of data inheritance. While the size of the nucleotide sequence that can artificially be inserted into the organism s chromosomal DNA is limited (3), we suggest that B. subtilis is an acceptable species for large volume data storage. Previous megacloning techniques have cloned the entire 3.5 Mb genome of Synechocystis PCC6803 into the 4.2 Mb genome of B. subtilis 168, resulting in a 7.7 Mb composite genome (17). Hence, adopting and developing further codes and experimental methods or inserting plural fragments into the partial volumes of multiple-species metagenomes will enable the storage of huge volumes of data in heritable media. We suggest that this simple, flexible, and robust method offers a practical solution to data storage and retrieval challenges in combination with other, previously published techniques. Acknowledgment This work was inspired by discussions with Dr. Masanori Arita. The authors are grateful to the members of MGSP at the (1) Editorial, A Y3K bug. Nat. Biotechnol. 2000, 18 (1), 1. (2) Bancroft, C.; Bowler, T.; Bloom, B.; Clelland, C. T. Long-term storage of information in DNA. Science 2001, 293, (3) Cox, J. P. Long-term data storage in DNA. Trends Biotechnol. 2001, 19, (4) Smith, G. C.; Fiddes, C. C.; Hawkins, J. P.; Cox, J. P. Some possible codes for encrypting data in DNA. Biotechnol. Lett. 2003, 25, (5) Clelland, C. T.; Risca, V.; Bancroft, C. Hiding messages in DNA microdots. Nature 1999, 399, (6) Arita, M.; Ohashi, Y. Secret signatures inside genomic DNA. Biotechnol. Prog. 2004, 20, (7) Wong, P. C.; Wong, K.; Foote, H. Organic data memory using the DNA approach. Commun. ACM 2003, 46, (8) Arita, M. Comma-free design for DNA words. Commun. ACM 2004, 47, (9) Keyboard scan codes: Set 2. (accessed 7/23/06). (10) Ohashi, Y.; Ohshima, H.; Tsuge, K.; Itaya, M. Far different levels of gene expression provided by an oriented cloning system in Bacillus subtilis and Escherichia coli. FEMS Microbiol. Lett. 2003, 221, (11) Dubnau, E.; Davidoff-Abelson, R. Fate of transforming DNA following uptake by competent Bacillus subtilis. I. Formation and properties of the donor-recipient complex. J. Mol. Biol. 1971, 56, (12) Bonetta, L. Genome sequencing in the fast lane. Nat. Methods 2006, 3, (13) Kowalczykowski, S. C.; Dixon, D. A.; Eggleston, A. K.; Lauder, S. D.; Rehrauer, W. M. Biochemistry of homologous recombination in Escherichia coli. Microbiol. ReV. 1994, 58, (14) Kuzminov, A. Recombinational repair of DNA damage in Escherichia coli and bacteriophage lambda. Microbiol. Mol. Biol. ReV. 1999, 63, (15) Altschul, S. F.; Gish, W.; Miller, W.; Myers, E. W.; Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 1990, 215, (16) Patterson, D.; Gibson, G.; Katz, R. A case for redundant arrays of inexpensive disks (RAID). Proc ACM SIGMOD Conf. 1988, 1, (17) Itaya, M.; Tsuge, K.; Koizumi, M.; Fujita, K., Combining two genomes in one cell: Stable cloning of the Synechocystis PCC6803 genome in the Bacillus subtilis 168 genome. Proc. Natl. Acad. Sci. U.S.A. 2005, 102, Received August 28, Accepted December 18, BP060261Y
HIGH DENSITY DATA STORAGE IN DNA USING AN EFFICIENT MESSAGE ENCODING SCHEME Rahul Vishwakarma 1 and Newsha Amiri 2
HIGH DENSITY DATA STORAGE IN DNA USING AN EFFICIENT MESSAGE ENCODING SCHEME Rahul Vishwakarma 1 and Newsha Amiri 2 1 Tata Consultancy Services, India [email protected] 2 Bangalore University, India ABSTRACT
Hiding Color Images in DNA Sequences
Hiding Color Images in DNA Sequences Marc B. Beck*, Roman V. Yampolskiy Cybersecurity Lab, Department of Computer Engineering and Computer Science, Speed School of Engineering, University of Louisville,
On Covert Data Communication Channels Employing DNA Steganography with Application in Massive Data Storage
ARAB ACADEMY FOR SCIENCE, TECHNOLOGY AND MARITIME TRANSPORT COLLEGE OF ENGINEERING AND TECHNOLOGY COMPUTER ENGINEERING DEPARTMENT On Covert Data Communication Channels Employing DNA Steganography with
2. The number of different kinds of nucleotides present in any DNA molecule is A) four B) six C) two D) three
Chem 121 Chapter 22. Nucleic Acids 1. Any given nucleotide in a nucleic acid contains A) two bases and a sugar. B) one sugar, two bases and one phosphate. C) two sugars and one phosphate. D) one sugar,
INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE Q5B
INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE ICH HARMONISED TRIPARTITE GUIDELINE QUALITY OF BIOTECHNOLOGICAL PRODUCTS: ANALYSIS
Chapter 6 DNA Replication
Chapter 6 DNA Replication Each strand of the DNA double helix contains a sequence of nucleotides that is exactly complementary to the nucleotide sequence of its partner strand. Each strand can therefore
European Medicines Agency
European Medicines Agency July 1996 CPMP/ICH/139/95 ICH Topic Q 5 B Quality of Biotechnological Products: Analysis of the Expression Construct in Cell Lines Used for Production of r-dna Derived Protein
Recombinant DNA Unit Exam
Recombinant DNA Unit Exam Question 1 Restriction enzymes are extensively used in molecular biology. Below are the recognition sites of two of these enzymes, BamHI and BclI. a) BamHI, cleaves after the
restriction enzymes 350 Home R. Ward: Spring 2001
restriction enzymes 350 Home Restriction Enzymes (endonucleases): molecular scissors that cut DNA Properties of widely used Type II restriction enzymes: recognize a single sequence of bases in dsdna, usually
DNA Mapping/Alignment. Team: I Thought You GNU? Lars Olsen, Venkata Aditya Kovuri, Nick Merowsky
DNA Mapping/Alignment Team: I Thought You GNU? Lars Olsen, Venkata Aditya Kovuri, Nick Merowsky Overview Summary Research Paper 1 Research Paper 2 Research Paper 3 Current Progress Software Designs to
Recombinant DNA & Genetic Engineering. Tools for Genetic Manipulation
Recombinant DNA & Genetic Engineering g Genetic Manipulation: Tools Kathleen Hill Associate Professor Department of Biology The University of Western Ontario Tools for Genetic Manipulation DNA, RNA, cdna
Name Date Period. 2. When a molecule of double-stranded DNA undergoes replication, it results in
DNA, RNA, Protein Synthesis Keystone 1. During the process shown above, the two strands of one DNA molecule are unwound. Then, DNA polymerases add complementary nucleotides to each strand which results
Milestones of bacterial genetic research:
Milestones of bacterial genetic research: 1944 Avery's pneumococcal transformation experiment shows that DNA is the hereditary material 1946 Lederberg & Tatum describes bacterial conjugation using biochemical
Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources
1 of 8 11/7/2004 11:00 AM National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools
How is genome sequencing done?
How is genome sequencing done? Using 454 Sequencing on the Genome Sequencer FLX System, DNA from a genome is converted into sequence data through four primary steps: Step One DNA sample preparation; Step
DNA Replication & Protein Synthesis. This isn t a baaaaaaaddd chapter!!!
DNA Replication & Protein Synthesis This isn t a baaaaaaaddd chapter!!! The Discovery of DNA s Structure Watson and Crick s discovery of DNA s structure was based on almost fifty years of research by other
RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison
RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the
Structure and Function of DNA
Structure and Function of DNA DNA and RNA Structure DNA and RNA are nucleic acids. They consist of chemical units called nucleotides. The nucleotides are joined by a sugar-phosphate backbone. The four
CCR Biology - Chapter 9 Practice Test - Summer 2012
Name: Class: Date: CCR Biology - Chapter 9 Practice Test - Summer 2012 Multiple Choice Identify the choice that best completes the statement or answers the question. 1. Genetic engineering is possible
Genetic Analysis. Phenotype analysis: biological-biochemical analysis. Genotype analysis: molecular and physical analysis
Genetic Analysis Phenotype analysis: biological-biochemical analysis Behaviour under specific environmental conditions Behaviour of specific genetic configurations Behaviour of progeny in crosses - Genotype
Forensic DNA Testing Terminology
Forensic DNA Testing Terminology ABI 310 Genetic Analyzer a capillary electrophoresis instrument used by forensic DNA laboratories to separate short tandem repeat (STR) loci on the basis of their size.
1 Mutation and Genetic Change
CHAPTER 14 1 Mutation and Genetic Change SECTION Genes in Action KEY IDEAS As you read this section, keep these questions in mind: What is the origin of genetic differences among organisms? What kinds
Genetic information (DNA) determines structure of proteins DNA RNA proteins cell structure 3.11 3.15 enzymes control cell chemistry ( metabolism )
Biology 1406 Exam 3 Notes Structure of DNA Ch. 10 Genetic information (DNA) determines structure of proteins DNA RNA proteins cell structure 3.11 3.15 enzymes control cell chemistry ( metabolism ) Proteins
International Language Character Code
, pp.161-166 http://dx.doi.org/10.14257/astl.2015.81.33 International Language Character Code with DNA Molecules Wei Wang, Zhengxu Zhao, Qian Xu School of Information Science and Technology, Shijiazhuang
LECTURE 6 Gene Mutation (Chapter 16.1-16.2)
LECTURE 6 Gene Mutation (Chapter 16.1-16.2) 1 Mutation: A permanent change in the genetic material that can be passed from parent to offspring. Mutant (genotype): An organism whose DNA differs from the
2. True or False? The sequence of nucleotides in the human genome is 90.9% identical from one person to the next. False (it s 99.
1. True or False? A typical chromosome can contain several hundred to several thousand genes, arranged in linear order along the DNA molecule present in the chromosome. True 2. True or False? The sequence
Genetics Module B, Anchor 3
Genetics Module B, Anchor 3 Key Concepts: - An individual s characteristics are determines by factors that are passed from one parental generation to the next. - During gamete formation, the alleles for
RNA and Protein Synthesis
Name lass Date RN and Protein Synthesis Information and Heredity Q: How does information fl ow from DN to RN to direct the synthesis of proteins? 13.1 What is RN? WHT I KNOW SMPLE NSWER: RN is a nucleic
4. DNA replication Pages: 979-984 Difficulty: 2 Ans: C Which one of the following statements about enzymes that interact with DNA is true?
Chapter 25 DNA Metabolism Multiple Choice Questions 1. DNA replication Page: 977 Difficulty: 2 Ans: C The Meselson-Stahl experiment established that: A) DNA polymerase has a crucial role in DNA synthesis.
HCS604.03 Exercise 1 Dr. Jones Spring 2005. Recombinant DNA (Molecular Cloning) exercise:
HCS604.03 Exercise 1 Dr. Jones Spring 2005 Recombinant DNA (Molecular Cloning) exercise: The purpose of this exercise is to learn techniques used to create recombinant DNA or clone genes. You will clone
Preventing Data Loss by Storing Information in Bacterial DNA
Preventing Data Loss by Storing Information in Bacterial DNA Mohan S Department of Information Vinodh S Department of Electrical and Electronic Engineering Jeevan F R Department of Information ABSTRACT
Recombinant DNA and Biotechnology
Recombinant DNA and Biotechnology Chapter 18 Lecture Objectives What Is Recombinant DNA? How Are New Genes Inserted into Cells? What Sources of DNA Are Used in Cloning? What Other Tools Are Used to Study
Design of conditional gene targeting vectors - a recombineering approach
Recombineering protocol #4 Design of conditional gene targeting vectors - a recombineering approach Søren Warming, Ph.D. The purpose of this protocol is to help you in the gene targeting vector design
The Chinese University of Hong Kong igem 2010 Bacterial based storage and encryp2on device
The Chinese University of Hong Kong igem 2010 Bacterial based storage and encryp2on device Bacterial based informaaon storage device Bancro4 s group (2001) Mount Sinai School of Meducube Yachie s group
Pairwise Sequence Alignment
Pairwise Sequence Alignment [email protected] SS 2013 Outline Pairwise sequence alignment global - Needleman Wunsch Gotoh algorithm local - Smith Waterman algorithm BLAST - heuristics What
Genetic Technology. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.
Name: Class: Date: Genetic Technology Multiple Choice Identify the choice that best completes the statement or answers the question. 1. An application of using DNA technology to help environmental scientists
CHAPTER 6: RECOMBINANT DNA TECHNOLOGY YEAR III PHARM.D DR. V. CHITRA
CHAPTER 6: RECOMBINANT DNA TECHNOLOGY YEAR III PHARM.D DR. V. CHITRA INTRODUCTION DNA : DNA is deoxyribose nucleic acid. It is made up of a base consisting of sugar, phosphate and one nitrogen base.the
The sequence of bases on the mrna is a code that determines the sequence of amino acids in the polypeptide being synthesized:
Module 3F Protein Synthesis So far in this unit, we have examined: How genes are transmitted from one generation to the next Where genes are located What genes are made of How genes are replicated How
Description: Molecular Biology Services and DNA Sequencing
Description: Molecular Biology s and DNA Sequencing DNA Sequencing s Single Pass Sequencing Sequence data only, for plasmids or PCR products Plasmid DNA or PCR products Plasmid DNA: 20 100 ng/μl PCR Product:
Bob Jesberg. Boston, MA April 3, 2014
DNA, Replication and Transcription Bob Jesberg NSTA Conference Boston, MA April 3, 2014 1 Workshop Agenda Looking at DNA and Forensics The DNA, Replication i and Transcription i Set DNA Ladder The Double
1. Molecular computation uses molecules to represent information and molecular processes to implement information processing.
Chapter IV Molecular Computation These lecture notes are exclusively for the use of students in Prof. MacLennan s Unconventional Computation course. c 2013, B. J. MacLennan, EECS, University of Tennessee,
DNA Scissors: Introduction to Restriction Enzymes
DNA Scissors: Introduction to Restriction Enzymes Objectives At the end of this activity, students should be able to 1. Describe a typical restriction site as a 4- or 6-base- pair palindrome; 2. Describe
Genetics Test Biology I
Genetics Test Biology I Multiple Choice Identify the choice that best completes the statement or answers the question. 1. Avery s experiments showed that bacteria are transformed by a. RNA. c. proteins.
MUTATION, DNA REPAIR AND CANCER
MUTATION, DNA REPAIR AND CANCER 1 Mutation A heritable change in the genetic material Essential to the continuity of life Source of variation for natural selection New mutations are more likely to be harmful
Becker Muscular Dystrophy
Muscular Dystrophy A Case Study of Positional Cloning Described by Benjamin Duchenne (1868) X-linked recessive disease causing severe muscular degeneration. 100 % penetrance X d Y affected male Frequency
Gene Cloning. Reference. T.A. Brown, Gene Cloning, Chapman and Hall. S.B. Primrose, Molecular Biotechnology, Blackwell
Gene Cloning 2004 Seungwook Kim Chem. & Bio. Eng. Reference T.A. Brown, Gene Cloning, Chapman and Hall S.B. Primrose, Molecular Biotechnology, Blackwell Why Gene Cloning is Important? A century ago, Gregor
BioBoot Camp Genetics
BioBoot Camp Genetics BIO.B.1.2.1 Describe how the process of DNA replication results in the transmission and/or conservation of genetic information DNA Replication is the process of DNA being copied before
Bioinformatics Resources at a Glance
Bioinformatics Resources at a Glance A Note about FASTA Format There are MANY free bioinformatics tools available online. Bioinformaticists have developed a standard format for nucleotide and protein sequences
Mitochondrial DNA Analysis
Mitochondrial DNA Analysis Lineage Markers Lineage markers are passed down from generation to generation without changing Except for rare mutation events They can help determine the lineage (family tree)
SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications
Product Bulletin Sequencing Software SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications Comprehensive reference sequence handling Helps interpret the role of each
Innovations in Molecular Epidemiology
Innovations in Molecular Epidemiology Molecular Epidemiology Measure current rates of active transmission Determine whether recurrent tuberculosis is attributable to exogenous reinfection Determine whether
BCOR101 Midterm II Wednesday, October 26, 2005
BCOR101 Midterm II Wednesday, October 26, 2005 Name Key Please show all of your work. 1. A donor strain is trp+, pro+, met+ and a recipient strain is trp-, pro-, met-. The donor strain is infected with
DNA Sequence Analysis
DNA Sequence Analysis Two general kinds of analysis Screen for one of a set of known sequences Determine the sequence even if it is novel Screening for a known sequence usually involves an oligonucleotide
Algorithms in Computational Biology (236522) spring 2007 Lecture #1
Algorithms in Computational Biology (236522) spring 2007 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: Tuesday 11:00-12:00/by appointment TA: Ilan Gronau, Taub 700, tel 4894 Office
pmod2-puro A plasmid containing a synthetic Puromycin resistance gene Catalog # pmod2-puro For research use only Version # 11H29-MM
pmod2-puro A plasmid containing a synthetic Puromycin resistance gene Catalog # pmod2-puro For research use only Version # 11H29-MM PrOduct information content: - 20 mg of lyophilized pmod2-puro plasmid
Improved methods for site-directed mutagenesis using Gibson Assembly TM Master Mix
CLONING & MAPPING DNA CLONING DNA AMPLIFICATION & PCR EPIGENETICS RNA ANALYSIS Improved methods for site-directed mutagenesis using Gibson Assembly TM Master Mix LIBRARY PREP FOR NET GEN SEQUENCING PROTEIN
Biotechnology and Recombinant DNA (Chapter 9) Lecture Materials for Amy Warenda Czura, Ph.D. Suffolk County Community College
Biotechnology and Recombinant DNA (Chapter 9) Lecture Materials for Amy Warenda Czura, Ph.D. Suffolk County Community College Primary Source for figures and content: Eastern Campus Tortora, G.J. Microbiology
July 7th 2009 DNA sequencing
July 7th 2009 DNA sequencing Overview Sequencing technologies Sequencing strategies Sample preparation Sequencing instruments at MPI EVA 2 x 5 x ABI 3730/3730xl 454 FLX Titanium Illumina Genome Analyzer
Basic Concepts Recombinant DNA Use with Chapter 13, Section 13.2
Name Date lass Master 19 Basic oncepts Recombinant DN Use with hapter, Section.2 Formation of Recombinant DN ut leavage Splicing opyright lencoe/mcraw-hill, a division of he Mcraw-Hill ompanies, Inc. Bacterial
Expression and Purification of Recombinant Protein in bacteria and Yeast. Presented By: Puspa pandey, Mohit sachdeva & Ming yu
Expression and Purification of Recombinant Protein in bacteria and Yeast Presented By: Puspa pandey, Mohit sachdeva & Ming yu DNA Vectors Molecular carriers which carry fragments of DNA into host cell.
Sanger Sequencing and Quality Assurance. Zbigniew Rudzki Department of Pathology University of Melbourne
Sanger Sequencing and Quality Assurance Zbigniew Rudzki Department of Pathology University of Melbourne Sanger DNA sequencing The era of DNA sequencing essentially started with the publication of the enzymatic
Gene Mapping Techniques
Gene Mapping Techniques OBJECTIVES By the end of this session the student should be able to: Define genetic linkage and recombinant frequency State how genetic distance may be estimated State how restriction
Translation Study Guide
Translation Study Guide This study guide is a written version of the material you have seen presented in the replication unit. In translation, the cell uses the genetic information contained in mrna to
DNA Insertions and Deletions in the Human Genome. Philipp W. Messer
DNA Insertions and Deletions in the Human Genome Philipp W. Messer Genetic Variation CGACAATAGCGCTCTTACTACGTGTATCG : : CGACAATGGCGCT---ACTACGTGCATCG 1. Nucleotide mutations 2. Genomic rearrangements 3.
Effects of Antibiotics on Bacterial Growth and Protein Synthesis: Student Laboratory Manual
Effects of Antibiotics on Bacterial Growth and Protein Synthesis: Student Laboratory Manual I. Purpose...1 II. Introduction...1 III. Inhibition of Bacterial Growth Protocol...2 IV. Inhibition of in vitro
Secret Communication through Web Pages Using Special Space Codes in HTML Files
International Journal of Applied Science and Engineering 2008. 6, 2: 141-149 Secret Communication through Web Pages Using Special Space Codes in HTML Files I-Shi Lee a, c and Wen-Hsiang Tsai a, b, * a
A and B are not absolutely linked. They could be far enough apart on the chromosome that they assort independently.
Name Section 7.014 Problem Set 5 Please print out this problem set and record your answers on the printed copy. Answers to this problem set are to be turned in to the box outside 68-120 by 5:00pm on Friday
A Comparison of Genotype Representations to Acquire Stock Trading Strategy Using Genetic Algorithms
2009 International Conference on Adaptive and Intelligent Systems A Comparison of Genotype Representations to Acquire Stock Trading Strategy Using Genetic Algorithms Kazuhiro Matsui Dept. of Computer Science
Introduction to next-generation sequencing data
Introduction to next-generation sequencing data David Simpson Centre for Experimental Medicine Queens University Belfast http://www.qub.ac.uk/research-centres/cem/ Outline History of DNA sequencing NGS
BacReady TM Multiplex PCR System
BacReady TM Multiplex PCR System Technical Manual No. 0191 Version 10112010 I Description.. 1 II Applications 2 III Key Features.. 2 IV Shipping and Storage. 2 V Simplified Procedures. 2 VI Detailed Experimental
Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)
Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS) A typical RNA Seq experiment Library construction Protocol variations Fragmentation methods RNA: nebulization,
How many of you have checked out the web site on protein-dna interactions?
How many of you have checked out the web site on protein-dna interactions? Example of an approximately 40,000 probe spotted oligo microarray with enlarged inset to show detail. Find and be ready to discuss
Chapter 8: Recombinant DNA 2002 by W. H. Freeman and Company Chapter 8: Recombinant DNA 2002 by W. H. Freeman and Company
Genetic engineering: humans Gene replacement therapy or gene therapy Many technical and ethical issues implications for gene pool for germ-line gene therapy what traits constitute disease rather than just
FACULTY OF MEDICAL SCIENCE
Doctor of Philosophy Program in Microbiology FACULTY OF MEDICAL SCIENCE Naresuan University 171 Doctor of Philosophy Program in Microbiology The time is critical now for graduate education and research
mirnaselect pep-mir Cloning and Expression Vector
Product Data Sheet mirnaselect pep-mir Cloning and Expression Vector CATALOG NUMBER: MIR-EXP-C STORAGE: -80ºC QUANTITY: 2 vectors; each contains 100 µl of bacterial glycerol stock Components 1. mirnaselect
Focusing on results not data comprehensive data analysis for targeted next generation sequencing
Focusing on results not data comprehensive data analysis for targeted next generation sequencing Daniel Swan, Jolyon Holdstock, Angela Matchan, Richard Stark, John Shovelton, Duarte Mohla and Simon Hughes
Lecture 3: Mutations
Lecture 3: Mutations Recall that the flow of information within a cell involves the transcription of DNA to mrna and the translation of mrna to protein. Recall also, that the flow of information between
Biotechnology: DNA Technology & Genomics
Chapter 20. Biotechnology: DNA Technology & Genomics 2003-2004 The BIG Questions How can we use our knowledge of DNA to: diagnose disease or defect? cure disease or defect? change/improve organisms? What
Why Gene Cloning and DNA Analysis are Important
Chapter 1 Why Gene Cloning and DNA Analysis are Important 3 What is per'i, 6 Why gene cloning and per are so chain reaction, 4 important, 8 What is gene.5 How to find your way through this book, 12 In
Transmission of genetic variation: conjugation. Transmission of genetic variation: conjugation
Transmission of genetic variation: conjugation Transmission of genetic variation: conjugation Bacterial Conjugation is genetic recombination in which there is a transfer of DNA from a living donor bacterium
STOP. Before using this product, please read the Limited Use License statement below:
STOP Before using this product, please read the Limited Use License statement below: Important Limited Use License information for pdrive5lucia-rgfap The purchase of the pdrive5lucia-rgfap vector conveys
Gene mutation and molecular medicine Chapter 15
Gene mutation and molecular medicine Chapter 15 Lecture Objectives What Are Mutations? How Are DNA Molecules and Mutations Analyzed? How Do Defective Proteins Lead to Diseases? What DNA Changes Lead to
On Covert Data Communication Channels Employing DNA Recombinant and Mutagenesis-based Steganographic Techniques
On Covert Data Communication Channels Employing DNA Recombinant and Mutagenesis-based Steganographic Techniques MAGDY SAEB 1, EMAN EL-ABD 2, MOHAMED E. EL-ZANATY 1 1. School of Engineering, Computer Department,
Bio 102 Practice Problems Genetic Code and Mutation
Bio 102 Practice Problems Genetic Code and Mutation Multiple choice: Unless otherwise directed, circle the one best answer: 1. Beadle and Tatum mutagenized Neurospora to find strains that required arginine
From DNA to Protein. Proteins. Chapter 13. Prokaryotes and Eukaryotes. The Path From Genes to Proteins. All proteins consist of polypeptide chains
Proteins From DNA to Protein Chapter 13 All proteins consist of polypeptide chains A linear sequence of amino acids Each chain corresponds to the nucleotide base sequence of a gene The Path From Genes
Genotyping by sequencing and data analysis. Ross Whetten North Carolina State University
Genotyping by sequencing and data analysis Ross Whetten North Carolina State University Stein (2010) Genome Biology 11:207 More New Technology on the Horizon Genotyping By Sequencing Timeline 2007 Complexity
FASIM: Fragments Assembly Simulation using Biased-Sampling Model and Assembly Simulation for Microbial Genome Shotgun Sequencing
J. Microbiol. Biotechnol. (2006), 16(5), 683 688 FASIM: Fragments Assembly Simulation using Biased-Sampling Model and Assembly Simulation for Microbial Genome Shotgun Sequencing HUR, CHEOL-GOO 1,3, SUNNY
DNA, RNA, Protein synthesis, and Mutations. Chapters 12-13.3
DNA, RNA, Protein synthesis, and Mutations Chapters 12-13.3 1A)Identify the components of DNA and explain its role in heredity. DNA s Role in heredity: Contains the genetic information of a cell that can
Module 10: Bioinformatics
Module 10: Bioinformatics 1.) Goal: To understand the general approaches for basic in silico (computer) analysis of DNA- and protein sequences. We are going to discuss sequence formatting required prior
1.5 page 3 DNA Replication S. Preston 1
AS Unit 1: Basic Biochemistry and Cell Organisation Name: Date: Topic 1.5 Nucleic Acids and their functions Page 3 l. DNA Replication 1. Go through PowerPoint 2. Read notes p2 and then watch the animation
Lecture 13: DNA Technology. DNA Sequencing. DNA Sequencing Genetic Markers - RFLPs polymerase chain reaction (PCR) products of biotechnology
Lecture 13: DNA Technology DNA Sequencing Genetic Markers - RFLPs polymerase chain reaction (PCR) products of biotechnology DNA Sequencing determine order of nucleotides in a strand of DNA > bases = A,
Rapid Acquisition of Unknown DNA Sequence Adjacent to a Known Segment by Multiplex Restriction Site PCR
Rapid Acquisition of Unknown DNA Sequence Adjacent to a Known Segment by Multiplex Restriction Site PCR BioTechniques 25:415-419 (September 1998) ABSTRACT The determination of unknown DNA sequences around
Comparative genomic hybridization Because arrays are more than just a tool for expression analysis
Microarray Data Analysis Workshop MedVetNet Workshop, DTU 2008 Comparative genomic hybridization Because arrays are more than just a tool for expression analysis Carsten Friis ( with several slides from
PIONEER RESEARCH & DEVELOPMENT GROUP
SURVEY ON RAID Aishwarya Airen 1, Aarsh Pandit 2, Anshul Sogani 3 1,2,3 A.I.T.R, Indore. Abstract RAID stands for Redundant Array of Independent Disk that is a concept which provides an efficient way for
PROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE 2006 1. E-mail: [email protected]
BIOINFTool: Bioinformatics and sequence data analysis in molecular biology using Matlab Mai S. Mabrouk 1, Marwa Hamdy 2, Marwa Mamdouh 2, Marwa Aboelfotoh 2,Yasser M. Kadah 2 1 Biomedical Engineering Department,
IIID 14. Biotechnology in Fish Disease Diagnostics: Application of the Polymerase Chain Reaction (PCR)
IIID 14. Biotechnology in Fish Disease Diagnostics: Application of the Polymerase Chain Reaction (PCR) Background Infectious diseases caused by pathogenic organisms such as bacteria, viruses, protozoa,
