On Covert Data Communication Channels Employing DNA Steganography with Application in Massive Data Storage



Similar documents
On Covert Data Communication Channels Employing DNA Recombinant and Mutagenesis-based Steganographic Techniques

CCR Biology - Chapter 9 Practice Test - Summer 2012

CHAPTER 6: RECOMBINANT DNA TECHNOLOGY YEAR III PHARM.D DR. V. CHITRA

Alaa Alhamami, Avan Sabah Hamdi Amman Arab University Amman, Jordan

restriction enzymes 350 Home R. Ward: Spring 2001

Replication Study Guide

Lecture 13: DNA Technology. DNA Sequencing. DNA Sequencing Genetic Markers - RFLPs polymerase chain reaction (PCR) products of biotechnology

Expression and Purification of Recombinant Protein in bacteria and Yeast. Presented By: Puspa pandey, Mohit sachdeva & Ming yu

Safer data transmission using Steganography

DNA Replication & Protein Synthesis. This isn t a baaaaaaaddd chapter!!!

Chapter 6 DNA Replication

DNA, RNA, Protein synthesis, and Mutations. Chapters

Academic Nucleic Acids and Protein Synthesis Test

INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE Q5B

Bob Jesberg. Boston, MA April 3, 2014

Structure and Function of DNA

Arabidopsis. A Practical Approach. Edited by ZOE A. WILSON Plant Science Division, School of Biological Sciences, University of Nottingham

2. The number of different kinds of nucleotides present in any DNA molecule is A) four B) six C) two D) three

European Medicines Agency

From DNA to Protein. Proteins. Chapter 13. Prokaryotes and Eukaryotes. The Path From Genes to Proteins. All proteins consist of polypeptide chains

CHAPTER 1 INTRODUCTION

Genetic information (DNA) determines structure of proteins DNA RNA proteins cell structure enzymes control cell chemistry ( metabolism )

Basic Concepts Recombinant DNA Use with Chapter 13, Section 13.2

a. Ribosomal RNA rrna a type ofrna that combines with proteins to form Ribosomes on which polypeptide chains of proteins are assembled

Appendix 2 Molecular Biology Core Curriculum. Websites and Other Resources

Recombinant DNA Technology

Basic Concepts of DNA, Proteins, Genes and Genomes

Recombinant DNA & Genetic Engineering. Tools for Genetic Manipulation

DNA Fingerprinting. Unless they are identical twins, individuals have unique DNA

Translation Study Guide

Genetic Technology. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Biotechnology and Recombinant DNA (Chapter 9) Lecture Materials for Amy Warenda Czura, Ph.D. Suffolk County Community College

Forensic DNA Testing Terminology

A Concept of Digital Picture Envelope for Internet Communication

Recombinant DNA and Biotechnology

2. True or False? The sequence of nucleotides in the human genome is 90.9% identical from one person to the next. False (it s 99.

somatic cell egg genotype gamete polar body phenotype homologous chromosome trait dominant autosome genetics recessive

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Genetics Test Biology I

1 Mutation and Genetic Change

Genetics Module B, Anchor 3

STEGANOGRAPHY: TEXT FILE HIDING IN IMAGE YAW CHOON KIT CA10022

Proteins and Nucleic Acids

HCS Exercise 1 Dr. Jones Spring Recombinant DNA (Molecular Cloning) exercise:

Name: Date: Period: DNA Unit: DNA Webquest

Biological Sciences Initiative. Human Genome

Multi-factor Authentication in Banking Sector

A Model-based Methodology for Developing Secure VoIP Systems

Today you will extract DNA from some of your cells and learn more about DNA. Extracting DNA from Your Cells

Genetics Lecture Notes Lectures 1 2

How To Encrypt With Dna

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

GA as a Data Optimization Tool for Predictive Analytics

RNA & Protein Synthesis

12.1 The Role of DNA in Heredity

Becker Muscular Dystrophy

PRESTWICK ACADEMY NATIONAL 5 BIOLOGY CELL BIOLOGY SUMMARY

Turgut Ozal University. Computer Engineering Department. TR Ankara, Turkey

Transfection-Transfer of non-viral genetic material into eukaryotic cells. Infection/ Transduction- Transfer of viral genetic material into cells.

1. Molecular computation uses molecules to represent information and molecular processes to implement information processing.

Crime Scenes and Genes

Name Class Date. Figure Which nucleotide in Figure 13 1 indicates the nucleic acid above is RNA? a. uracil c. cytosine b. guanine d.

Transcription and Translation of DNA

Subject Area(s) Biology. Associated Unit Engineering Nature: DNA Visualization and Manipulation. Associated Lesson Imaging the DNA Structure

PRACTICE TEST QUESTIONS

Quantum and Non-deterministic computers facing NP-completeness

DNA. Discovery of the DNA double helix

The E. coli Insulin Factory

Guidelines for Establishment of Contract Areas Computer Science Department

A NOVEL STRATEGY TO PROVIDE SECURE CHANNEL OVER WIRELESS TO WIRE COMMUNICATION

Mitochondrial DNA Analysis

Difficult DNA Templates Sequencing. Primer Walking Service

Compiled and/or written by Amy B. Vento and David R. Gillum

Data Analysis for Ion Torrent Sequencing

MUTATION, DNA REPAIR AND CANCER

Chapter 11: Molecular Structure of DNA and RNA

Algorithms in Computational Biology (236522) spring 2007 Lecture #1

Provincial Exam Questions. 9. Give one role of each of the following nucleic acids in the production of an enzyme.

To be able to describe polypeptide synthesis including transcription and splicing

FINDING RELATION BETWEEN AGING AND

The Techniques of Molecular Biology: Forensic DNA Fingerprinting

Recombinant DNA Unit Exam

The sequence of bases on the mrna is a code that determines the sequence of amino acids in the polypeptide being synthesized:

CURRICULUM GUIDE. When this Forensics course has been completed successfully, students should be able to:

How is genome sequencing done?

DNA and Forensic Science

Lab # 12: DNA and RNA

Image Authentication Scheme using Digital Signature and Digital Watermarking

A greedy algorithm for the DNA sequencing by hybridization with positive and negative errors and information about repetitions

Bio 102 Practice Problems Chromosomes and DNA Replication

Scottish Qualifications Authority

agucacaaacgcu agugcuaguuua uaugcagucuua

Course Curriculum for Master Degree in Medical Laboratory Sciences/Clinical Microbiology, Immunology and Serology

Multimedia Document Authentication using On-line Signatures as Watermarks

MAKING AN EVOLUTIONARY TREE

Secret Communication through Web Pages Using Special Space Codes in HTML Files

A Robust and Lossless Information Embedding in Image Based on DCT and Scrambling Algorithms

Transcription:

ARAB ACADEMY FOR SCIENCE, TECHNOLOGY AND MARITIME TRANSPORT COLLEGE OF ENGINEERING AND TECHNOLOGY COMPUTER ENGINEERING DEPARTMENT On Covert Data Communication Channels Employing DNA Steganography with Application in Massive Data Storage Thesis submitted in partial fulfillment of the Requirements for the Degree of Master of Science in Computer Engineering. Submitted by: Mohamed El-Sayed El-Zanaty Supervised by: Prof. Dr. Magdy Saeb. Dr. Eman El-Abd.

Abstract DNA is the carrier of the life code, the form which was identified by Watson and Crick as a double helix of deoxyribonucleic acid (DNA). The DNA molecule consists of two-phosphate side chains linked by the nucleotide bases Adenine(A), Guanine(G), Cytosine(C) and Thymine(T). The DNA within each cell carries the fifty thousands genes that control cellular activities. Most of the modern steganographic techniques are based on digital media and these have some limitations or drawbacks where specialized filters can be applied to internet firewalls to detect packets that carry hidden information. The major drawback of the above techniques is the unsecured concealing of secret message. In the recent few years researchers have inclined to use DNA as one of the data hiding media. In this work we propose two new methods of embedding messages into DNA strands and employing restriction enzymes that cut the DNA in specific positions. The first method uses Recombinant DNA Technology,the second method uses the DNA Mutagenesis to insert the message into DNA. The receiver will be able to retrieve the hidden message using the key. In addition, we discuss the message space issues and the vulnerability of the two methods and the proposed modifications. Moreover we proposed the usefulness of the DNA in massive data storage and in solving NP complete problems Finally we construct a C# program that simulate the two previous methods that can hide a message into DNA file and send this file to the receiver. At the receiver side the program will extract the message from the received DNA file.

ACKNOWLEDGMENT I wish to express my grateful and indebtedness to professor Magdy Saeb and Dr Eman El-Abd, for their distinct supervision, support, permanent unlimited help, and encouragement during the making of this thesis. I would like to express my gratitude to the faculty and staff of the Computer Engineering department, at Arab Academy for Sciences, Technology and Maritime Transport, for their valuable advices. I send my success in this thesis to my family, work colleagues and friends.

Acronyms DNA : Deoxyribonucleic Acid. PCR : Polymerase Chain Reaction. YAC : Yeast Artificial Chromosome. Indels : DNA Insertion / Deletion Mutations. Kpb : Kilo base pair. Mbp : Mega base pair.

List of Figures Figure 1: DNA shape... 9 Figure 2: Process of steganography..11 Figure 3: Least significant bit process..12 Figure 4: Embed a message with a stego-key...13 Figure 5: DNA Steganography using PCR...14 Figure 6: The Hamiltonian path.19 Figure 7: The solution of the Hamiltonian path..21 Figure 8: The plasmid vector 29 Figure 9: The Lambda phage vector.30 Figure 10: (a) The Cosmid vector and COS site. (b) The Cosmid vector after being cleaved with restriction enzyme.. 31 Figure 11: The YAC vector 32 Figure 12: Procedure of hiding message into DNA using Recombinant technique.33 Figure 13: Procedure of hiding message into Plasmid vector...34 Figure 14: Procedure of hiding message into DNA using Mutagenesis......40 Figure 15: The enzymes and its target substrings. 50 Figure 16: The word and its DNA mapping.51 Figure 17:The Enzyme and its occurrence number in the DNA...52 Figure 18:The message, header and the selected two enzymes...53 Figure 19: Get the message back..54 Figure 20: The program overall process..56 Figure 21: DNA Mutagenesis Step 1.59 Figure 22: DNA Mutagenesis Step 2.60 Figure 23: DNA Mutagenesis Step 3.60 Figure 24: DNA Mutagenesis Step 4.60 Figure 25: DNA Mutagenesis Step 5.61 Figure 26: The biological overall process.62 Figure 27: The original DNA message with restriction enzymes.68 Figure 28: The restriction map of vector ptz57r/t.70

List of Tables Table 1: Cities and paths encoded with DNA 20 Table 2: Some of the restriction enzymes and their target substrings 26 Table 3: The most 1024 commonly used words and the DNA representation 27 Table 4: Possible vectors and its maximum length that can carry.....43 Table 5: Troubleshooting in Recombinant DNA... 71

PUBLICATIONS Paper: MAGDY SAEB, EMAN EL-ABD, MOHAMED E. EL-ZANATY, On Covert Data Communication Channels Employing DNA Recombinant and Mutagenesis-based Steganographic Techniques, WSEAS World Scientific And Engineering Academy And Society Proceedings Computer Engineering and Applications (CEA 07), Gold Cost, Australia, 17-19 January 2007. Journal: MAGDY SAEB, EMAN EL-ABD, MOHAMED E. EL-ZANATY, DNA Steganography Using DNA Recombinant and DNA Mutagenesis Techniques, WSEAS World Scientific And Engineering Academy And Society Transactions Computer Research, Issue 1,Voulume 2, January 2007, ISSN 1991-8755.

Table of Contents List of Figures.5 List of Tables..6 Acronyms 7 1 - Introduction..8 1.1- Basics of DNA.. 9 1.2- Applicability of Bioinformatics and Data security using DNA 10 1.3- Classical Methods of Steganography.... 10 1.3.1- Fingerprinting and Watermarking.11 1.3.2- Least Significant Bit Insertion.....11 1.3.3- Public Key Steganography....13 1.3.4- Frequency Domain encoding. 13 1.3.5- DNA steganography using Polymerase Chain Reaction 14 1.4- Summary 14 2 - DNA Applications...15 2.1- Introduction..16 2.2- Using DNA as Massive Data Storage.16 2.2.1- Encoding Data.17 2.2.2- Indexing Data..17 2.2.3- Retrieving Data...18 2.3- Using DNA in Solving NP-Complete Problems...18 2.3.1- Hamiltonian Path Problem 18 2.3.2- Solving the Hamiltonian Path Problem using DNA.19 2.3.2.1- Generation-&-Test Algorithm.19 2.3.2.2- The DNA Experiment...20 2.4- Summary..22 3 Methodology...23 3.1- Introduction......24 3.2- Restriction Enzymes 24 3.3- Assumptions 27 3.4- Recombinant DNA..28 3.4.1- Vectors 28 3.4.1.1- Plasmids...28 3.4.1.2- Lambda Phage 29 3.4.1.3- Cosmid.30 3.4.1.4- YAC.31 3.4.2- Sender Point of View.33 3.4.2.1- Prepare the Message.33 3.4.2.2- Cut the DNA Vector with Restriction Enzymes.34 3.4.2.3- Embed the Message using Recombinant DNA 35

3.4.3- Receiver Point of View..35 3.5- DNA Mutagenesis.36 3.5.1- Mutation Process...36 3.5.2- Kinds of Mutations 37 3.5.2.1- Point Mutation.37 3.5.2.2- Frame-Shift Mutation 37 3.5.2.3- Deletion 37 3.5.2.4- Insertion... 38 3.5.2.5- Inversion.. 38 3.5.3- Mutation using PCR.39 3.5.3.1- DNA Insertion/Deletion Mutations ("indels").39 3.5.4- Sender Point of View.40 3.5.4.1- Prepare the Message.40 3.5.4.2- Scan Specific DNA Sequence for Specific Two Restriction Sites 41 3.5.4.3- Use DNA Mutagenesis Technique to Modify Bases.41 3.5.5- Receiver Point of View..42 3.6- Probability of Finding the Target Substring of Restriction Enzyme..42 3.7- Maximum Message Space..42 3.7.1- Recombinant DNA...43 3.7.2- DNA Mutagenesis 43 3.8- Summary 43 4 - Vulnerability and Modifications 44 4.1- Introduction...45 4.2- Vulnerability.45 4.3- Attacks..45 4.3.1- Steganalysis.45 4.3.2- Brute Force Attack. 46 4.4- Modifications. 47 4.5- Summary 48 5 - Implementation..49 5.1 - Introduction..50 5.2 - Initializations 50 5.2.1- Restriction Enzymes Table.....50 5.2.2- Mapping Table 51 5.3- Procedure of Hiding Message into DNA File.....51 5.3.1- Get all the enzymes and its number of occurrences..52 5.3.2- Construct the new DNA file with hidden message...52 5.4- Procedure of Retrieving the Message From DNA File 53 5.5- Algorithm of Hiding and Retrieving a Message From DNA...55 5.6- Summary...56

6 - Lab Experiments.57 6.1 - Introduction...58 6.2 - Recombinant DNA 58 6.3- DNA Mutagenesis.59 6.4- Summary... 62 7- Summary, Conclusion and Future Work..... 63 7.1- Summary and Conclusion..64 7.2- Future Work..65 References.....66 Appendix A: Recombinant DNA Procedure...68 Appendix B: Source Code of The Program...72