ARAB ACADEMY FOR SCIENCE, TECHNOLOGY AND MARITIME TRANSPORT COLLEGE OF ENGINEERING AND TECHNOLOGY COMPUTER ENGINEERING DEPARTMENT On Covert Data Communication Channels Employing DNA Steganography with Application in Massive Data Storage Thesis submitted in partial fulfillment of the Requirements for the Degree of Master of Science in Computer Engineering. Submitted by: Mohamed El-Sayed El-Zanaty Supervised by: Prof. Dr. Magdy Saeb. Dr. Eman El-Abd.
Abstract DNA is the carrier of the life code, the form which was identified by Watson and Crick as a double helix of deoxyribonucleic acid (DNA). The DNA molecule consists of two-phosphate side chains linked by the nucleotide bases Adenine(A), Guanine(G), Cytosine(C) and Thymine(T). The DNA within each cell carries the fifty thousands genes that control cellular activities. Most of the modern steganographic techniques are based on digital media and these have some limitations or drawbacks where specialized filters can be applied to internet firewalls to detect packets that carry hidden information. The major drawback of the above techniques is the unsecured concealing of secret message. In the recent few years researchers have inclined to use DNA as one of the data hiding media. In this work we propose two new methods of embedding messages into DNA strands and employing restriction enzymes that cut the DNA in specific positions. The first method uses Recombinant DNA Technology,the second method uses the DNA Mutagenesis to insert the message into DNA. The receiver will be able to retrieve the hidden message using the key. In addition, we discuss the message space issues and the vulnerability of the two methods and the proposed modifications. Moreover we proposed the usefulness of the DNA in massive data storage and in solving NP complete problems Finally we construct a C# program that simulate the two previous methods that can hide a message into DNA file and send this file to the receiver. At the receiver side the program will extract the message from the received DNA file.
ACKNOWLEDGMENT I wish to express my grateful and indebtedness to professor Magdy Saeb and Dr Eman El-Abd, for their distinct supervision, support, permanent unlimited help, and encouragement during the making of this thesis. I would like to express my gratitude to the faculty and staff of the Computer Engineering department, at Arab Academy for Sciences, Technology and Maritime Transport, for their valuable advices. I send my success in this thesis to my family, work colleagues and friends.
Acronyms DNA : Deoxyribonucleic Acid. PCR : Polymerase Chain Reaction. YAC : Yeast Artificial Chromosome. Indels : DNA Insertion / Deletion Mutations. Kpb : Kilo base pair. Mbp : Mega base pair.
List of Figures Figure 1: DNA shape... 9 Figure 2: Process of steganography..11 Figure 3: Least significant bit process..12 Figure 4: Embed a message with a stego-key...13 Figure 5: DNA Steganography using PCR...14 Figure 6: The Hamiltonian path.19 Figure 7: The solution of the Hamiltonian path..21 Figure 8: The plasmid vector 29 Figure 9: The Lambda phage vector.30 Figure 10: (a) The Cosmid vector and COS site. (b) The Cosmid vector after being cleaved with restriction enzyme.. 31 Figure 11: The YAC vector 32 Figure 12: Procedure of hiding message into DNA using Recombinant technique.33 Figure 13: Procedure of hiding message into Plasmid vector...34 Figure 14: Procedure of hiding message into DNA using Mutagenesis......40 Figure 15: The enzymes and its target substrings. 50 Figure 16: The word and its DNA mapping.51 Figure 17:The Enzyme and its occurrence number in the DNA...52 Figure 18:The message, header and the selected two enzymes...53 Figure 19: Get the message back..54 Figure 20: The program overall process..56 Figure 21: DNA Mutagenesis Step 1.59 Figure 22: DNA Mutagenesis Step 2.60 Figure 23: DNA Mutagenesis Step 3.60 Figure 24: DNA Mutagenesis Step 4.60 Figure 25: DNA Mutagenesis Step 5.61 Figure 26: The biological overall process.62 Figure 27: The original DNA message with restriction enzymes.68 Figure 28: The restriction map of vector ptz57r/t.70
List of Tables Table 1: Cities and paths encoded with DNA 20 Table 2: Some of the restriction enzymes and their target substrings 26 Table 3: The most 1024 commonly used words and the DNA representation 27 Table 4: Possible vectors and its maximum length that can carry.....43 Table 5: Troubleshooting in Recombinant DNA... 71
PUBLICATIONS Paper: MAGDY SAEB, EMAN EL-ABD, MOHAMED E. EL-ZANATY, On Covert Data Communication Channels Employing DNA Recombinant and Mutagenesis-based Steganographic Techniques, WSEAS World Scientific And Engineering Academy And Society Proceedings Computer Engineering and Applications (CEA 07), Gold Cost, Australia, 17-19 January 2007. Journal: MAGDY SAEB, EMAN EL-ABD, MOHAMED E. EL-ZANATY, DNA Steganography Using DNA Recombinant and DNA Mutagenesis Techniques, WSEAS World Scientific And Engineering Academy And Society Transactions Computer Research, Issue 1,Voulume 2, January 2007, ISSN 1991-8755.
Table of Contents List of Figures.5 List of Tables..6 Acronyms 7 1 - Introduction..8 1.1- Basics of DNA.. 9 1.2- Applicability of Bioinformatics and Data security using DNA 10 1.3- Classical Methods of Steganography.... 10 1.3.1- Fingerprinting and Watermarking.11 1.3.2- Least Significant Bit Insertion.....11 1.3.3- Public Key Steganography....13 1.3.4- Frequency Domain encoding. 13 1.3.5- DNA steganography using Polymerase Chain Reaction 14 1.4- Summary 14 2 - DNA Applications...15 2.1- Introduction..16 2.2- Using DNA as Massive Data Storage.16 2.2.1- Encoding Data.17 2.2.2- Indexing Data..17 2.2.3- Retrieving Data...18 2.3- Using DNA in Solving NP-Complete Problems...18 2.3.1- Hamiltonian Path Problem 18 2.3.2- Solving the Hamiltonian Path Problem using DNA.19 2.3.2.1- Generation-&-Test Algorithm.19 2.3.2.2- The DNA Experiment...20 2.4- Summary..22 3 Methodology...23 3.1- Introduction......24 3.2- Restriction Enzymes 24 3.3- Assumptions 27 3.4- Recombinant DNA..28 3.4.1- Vectors 28 3.4.1.1- Plasmids...28 3.4.1.2- Lambda Phage 29 3.4.1.3- Cosmid.30 3.4.1.4- YAC.31 3.4.2- Sender Point of View.33 3.4.2.1- Prepare the Message.33 3.4.2.2- Cut the DNA Vector with Restriction Enzymes.34 3.4.2.3- Embed the Message using Recombinant DNA 35
3.4.3- Receiver Point of View..35 3.5- DNA Mutagenesis.36 3.5.1- Mutation Process...36 3.5.2- Kinds of Mutations 37 3.5.2.1- Point Mutation.37 3.5.2.2- Frame-Shift Mutation 37 3.5.2.3- Deletion 37 3.5.2.4- Insertion... 38 3.5.2.5- Inversion.. 38 3.5.3- Mutation using PCR.39 3.5.3.1- DNA Insertion/Deletion Mutations ("indels").39 3.5.4- Sender Point of View.40 3.5.4.1- Prepare the Message.40 3.5.4.2- Scan Specific DNA Sequence for Specific Two Restriction Sites 41 3.5.4.3- Use DNA Mutagenesis Technique to Modify Bases.41 3.5.5- Receiver Point of View..42 3.6- Probability of Finding the Target Substring of Restriction Enzyme..42 3.7- Maximum Message Space..42 3.7.1- Recombinant DNA...43 3.7.2- DNA Mutagenesis 43 3.8- Summary 43 4 - Vulnerability and Modifications 44 4.1- Introduction...45 4.2- Vulnerability.45 4.3- Attacks..45 4.3.1- Steganalysis.45 4.3.2- Brute Force Attack. 46 4.4- Modifications. 47 4.5- Summary 48 5 - Implementation..49 5.1 - Introduction..50 5.2 - Initializations 50 5.2.1- Restriction Enzymes Table.....50 5.2.2- Mapping Table 51 5.3- Procedure of Hiding Message into DNA File.....51 5.3.1- Get all the enzymes and its number of occurrences..52 5.3.2- Construct the new DNA file with hidden message...52 5.4- Procedure of Retrieving the Message From DNA File 53 5.5- Algorithm of Hiding and Retrieving a Message From DNA...55 5.6- Summary...56
6 - Lab Experiments.57 6.1 - Introduction...58 6.2 - Recombinant DNA 58 6.3- DNA Mutagenesis.59 6.4- Summary... 62 7- Summary, Conclusion and Future Work..... 63 7.1- Summary and Conclusion..64 7.2- Future Work..65 References.....66 Appendix A: Recombinant DNA Procedure...68 Appendix B: Source Code of The Program...72