International Language Character Code



Similar documents
HIGH DENSITY DATA STORAGE IN DNA USING AN EFFICIENT MESSAGE ENCODING SCHEME Rahul Vishwakarma 1 and Newsha Amiri 2

UPS battery remote monitoring system in cloud computing

A New Digital Encryption Scheme: Binary Matrix Rotations Encryption Algorithm

2. The number of different kinds of nucleotides present in any DNA molecule is A) four B) six C) two D) three

Basic Concepts of DNA, Proteins, Genes and Genomes

DNA and the Cell. Version 2.3. English version. ELLS European Learning Laboratory for the Life Sciences

CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES

1. Molecular computation uses molecules to represent information and molecular processes to implement information processing.

Design of Distributed Pharmaceutical Retail Management System Based on Advanced Encryption Standard Algorithm

PRACTICE TEST QUESTIONS

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS)

Image Compression through DCT and Huffman Coding Technique

Research on the UHF RFID Channel Coding Technology based on Simulink

MATCH Commun. Math. Comput. Chem. 61 (2009)

MAKING AN EVOLUTIONARY TREE

Fault Analysis in Software with the Data Interaction of Classes

Cyber Security Workshop Encryption Reference Manual

Parallel Compression and Decompression of DNA Sequence Reads in FASTQ Format

Design and Implementation of Asymmetric Cryptography Using AES Algorithm

Molecular Genetics. RNA, Transcription, & Protein Synthesis

Task: ASC Ascending Paths

A Proxy-Based Data Security Solution in Mobile Cloud

A NEW DNA BASED APPROACH OF GENERATING KEY-DEPENDENT SHIFTROWS TRANSFORMATION

Split Based Encryption in Secure File Transfer

AStudyofEncryptionAlgorithmsAESDESandRSAforSecurity

The Unicode Standard Version 8.0 Core Specification

Chapter 4: Computer Codes

Cryptography and Network Security Prof. D. Mukhopadhyay Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Research and Performance Analysis of HTML5 WebSocket for a Real-time Multimedia Data Communication Environment

STRUCTURES OF NUCLEIC ACIDS

Implementation of Full -Parallelism AES Encryption and Decryption

DNA is found in all organisms from the smallest bacteria to humans. DNA has the same composition and structure in all organisms!

Comparison of Open Source Cloud System for Small and Medium Sized Enterprises

Name Class Date. Figure Which nucleotide in Figure 13 1 indicates the nucleic acid above is RNA? a. uracil c. cytosine b. guanine d.

Synthetic Biology: DNA Digital Storage, Computation and the Organic Computer

Overview/Questions. What is Cryptography? The Caesar Shift Cipher. CS101 Lecture 21: Overview of Cryptography

Cloud Storage Solution for WSN Based on Internet Innovation Union

Friendly Medical Image Sharing Scheme

Method of Fault Detection in Cloud Computing Systems

Name Date Period. 2. When a molecule of double-stranded DNA undergoes replication, it results in

A PPENDIX G S IMPLIFIED DES

Molecular Computing Athabasca Hall Sept. 30, 2013

Design of Remote data acquisition system based on Internet of Things

DNA Replication & Protein Synthesis. This isn t a baaaaaaaddd chapter!!!

Thymine = orange Adenine = dark green Guanine = purple Cytosine = yellow Uracil = brown

Crime Hotspots Analysis in South Korea: A User-Oriented Approach

On Cloud Computing Technology in the Construction of Digital Campus

Cellular Respiration Worksheet What are the 3 phases of the cellular respiration process? Glycolysis, Krebs Cycle, Electron Transport Chain.

Exploring the Efficiency of Big Data Processing with Hadoop MapReduce

The Structure, Replication, and Chromosomal Organization of DNA

Cryptography and Network Security

Secret Communication through Web Pages Using Special Space Codes in HTML Files

SH-Sim: A Flexible Simulation Platform for Hybrid Storage Systems

Design and Analysis of Mobile Learning Management System based on Web App

A Service Revenue-oriented Task Scheduling Model of Cloud Computing

CHAPTER 6: RECOMBINANT DNA TECHNOLOGY YEAR III PHARM.D DR. V. CHITRA

A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster

Survey on Enhancing Cloud Data Security using EAP with Rijndael Encryption Algorithm

DNA, RNA, Protein synthesis, and Mutations. Chapters

Teacher Guide: Have Your DNA and Eat It Too ACTIVITY OVERVIEW.

Algorithms in Computational Biology (236522) spring 2007 Lecture #1

Internet Banking Two-Factor Authentication using Smartphones

Computer Systems Structure Main Memory Organization

A Tutorial in Genetic Sequence Classification Tools and Techniques

A Method of Cloud Resource Load Balancing Scheduling Based on Improved Adaptive Genetic Algorithm

Optimization of Distributed Crawler under Hadoop

Main Memory & Backing Store. Main memory backing storage devices

ASCII Code. Numerous codes were invented, including Émile Baudot's code (known as Baudot

Encoding Text with a Small Alphabet

DNA and Forensic Science

Modern Accounting Information System Security (AISS) Research Based on IT Technology

Cloud based Holdfast Electronic Sports Game Platform

An Implementation of a High Capacity 2D Barcode

Chapter 11: Molecular Structure of DNA and RNA

ZIMBABWE SCHOOL EXAMINATIONS COUNCIL. COMPUTER STUDIES 7014/01 PAPER 1 Multiple Choice SPECIMEN PAPER

CLOUD COMPUTING SECURITY ARCHITECTURE - IMPLEMENTING DES ALGORITHM IN CLOUD FOR DATA SECURITY

DNA. Discovery of the DNA double helix

Genetics Test Biology I

DNA Mapping/Alignment. Team: I Thought You GNU? Lars Olsen, Venkata Aditya Kovuri, Nick Merowsky

Subject Area(s) Biology. Associated Unit Engineering Nature: DNA Visualization and Manipulation. Associated Lesson Imaging the DNA Structure

Genetics Module B, Anchor 3

CONCEPTUAL MODEL OF MULTI-AGENT BUSINESS COLLABORATION BASED ON CLOUD WORKFLOW

Polar Covalent Bonds and Hydrogen Bonds

Secure Collaborative Privacy In Cloud Data With Advanced Symmetric Key Block Algorithm

Primary Memory. Input Units CPU (Central Processing Unit)

ANALYSIS OF RSA ALGORITHM USING GPU PROGRAMMING

CHAPTER 5. Obfuscation is a process of converting original data into unintelligible data. It

Academic Nucleic Acids and Protein Synthesis Test

MIFARE CONTACTLESS CARD TECHNOLOLGY AN HID WHITE PAPER

Structure and Function of DNA

How To Encrypt With Dna

Forensic DNA Testing Terminology

DNA Scissors: Introduction to Restriction Enzymes

Transcription:

, pp.161-166 http://dx.doi.org/10.14257/astl.2015.81.33 International Language Character Code with DNA Molecules Wei Wang, Zhengxu Zhao, Qian Xu School of Information Science and Technology, Shijiazhuang Tiedao University, Shijiazhuang, Hebei, 050043, China {wangwei, zhaozx, xuqian}@stdu.edu.cn Abstract. In 1994, Dr Adleman solved problem using DNA as computational mechanism. He proved the principle that DNA computing could be used to solve computationally complex problems. Recent 20 years with the rapid development of biological molecular computer, scientist have set a series of theoretical model and succeed in biochemical experiment. DNA computing has become an important research direction of the computer science and molecular biology. This research present a novel approach in which character could be encoded by the permutation and combination of the four nitrogenous bases (Adenine, Guanine, Cytosine and Thymine) in DNA molecules. The character encoding should support multi-language and unique identifier. Keywords: DNA Storage, Character Encoding, DNA Computing 1 Introduction The rapid development of science and information industry, especially the development of multimedia technology, cloud computer and computer network, computer storage equipment not only has a larger data storage capacity, higher data transmission rate and more reliable data storage quality. Also on how to make the data more economic and safe storage, storage in time and space on the extensibility, have put forward higher requirements. Current computer storage system the birth defects are revealed and the subsequent development of lack of power, has become one of the bottleneck of the computer promotion. Whether the HDD or optical storage technology is unable to cope with the future demand for storage of computer. It is estimated that in the future semiconductor, disk, and CD-ROM data storage density will achieve its physical limit [1], it is urgent need to develop a new generation of alternative storage technology. On the other hand, Biological molecular computer which Adleman [2] completed the first experimental verification has been rapid development. Nearly two decades, a variety of theoretical models and experimental methods emerge in endlessly, such as Adleman model, Splicing System model, Insertion-Deletion System model and DNA- EC model [3]. DNA storage as an important branch in the field of biological molecular computer, because it has high storage density and low hardware cost, access procedure parallelizable, good scalability and integration, and long term ISSN: 2287-1233 ASTL Copyright 2015 SERSC

storage. In the foreseeable future DNA storage system will be likely to replace the traditional storage systems. [4] [5] DNA molecule is a powerful and effective natural information storage medium, it has been widely used since 1985 when DNA molecule was synthesized for the first time. There are obvious similarities between DNA storage system and traditional storage system, both of two storage system are sequential storage devices, and use special symbols to indicate the beginning and end of a single information section, and the data error correction coding is used to ensure the integrity of their information. As a result, DNA molecules can be used as a medium of the information is stored. DNA storage technology is based on the DNA molecule storage medium. The four nitrogenous bases (Adenine, Guanine, Cytosine and Thymine) what are contained within DNA molecule can be used to encode information. With the existing biochemical experiment method, it's easily complete the clone operation of DNA molecules and the modify operation of the nitrogenous bases what has been encode in the DNA molecules, these operations are similar with the traditional storage system which read and write operations. Because of the advantages of DNA storage system such as stable and reliable work, no wear, huge information capacity, long life, high quality, low price of bits of information and access procedure parallelizable, DNA storage system is seen as high density and large capacity of storage. Although DNA molecule as a data storage method has been proposed, but at this stage how to encode the information what will be stored in DNA molecule has not yet been determined. The method of character encoding is one of most important foundations of computer system, there is an exploratory research what use permutation and combination of four nitrogenous bases of DNA molecule to encode the character information. This research include two major problems, storage medium select and coding rules. 2 Storage Medium DNA molecule as information storage medium can take many forms. As information storage medium of DNA molecule can be a single-stranded, also can be doublestranded; can be a long chain, can also be a circular strand, some with special biological meaning chain is called the plasmid [6]. These different modes have their different advantages and disadvantages when they are as information storage medium, therefore must consider these factors when choosing storage medium, to make the DNA molecule storage advantages and simplicity of operation have been play. DNA storage system using circular single-stranded DNA molecule as storage medium. Compared with single-stranded and double-stranded each have each advantages and disadvantages. Double-stranded DNA is more stability than single-stranded DNA, that is one of the most important reasons what the most living organisms choose double-stranded DNA as their genetic materials, but the data which stored in the double-stranded DNA are difficult to read. Double-stranded should be unzipped their two attached chains into single-stranded before reading and clone. Single-stranded DNA can use Watson-Crick Complement principle to read data, but it is not stable, and single-stranded DNA is not only more easily fracture than double-stranded DNA, 162 Copyright 2015 SERSC

but also easily to form own complementary hairpin structure. It is the reasons why we choose single-stranded that single-stranded easier to read and clone than doublestranded. In addition we can avoid the generation of the hairpin structure in the singlestranded special design. Compare with long-chain DNA than circular strand DNA, long chain will be cut into two independent segments by endonuclease at a time, but circular strand is still together, under certain conditions can also even the back circular strand again. Even more long chain easy to be degraded by certain exonuclease from its ends, and this degradation possibility of a circular strand is less than long chain 3 Coding Rules The DNA molecule is composed of four nitrogenous bases, therefore the permutation and combination of the four nitrogenous bases can be used to encode information which will be stored in the DNA storage system. The coding rules are as follows: 3.1 Unique Code In order to compatible with different countries and languages, multi-language environment, it is must be defined each character as unique code. Coding using an abstract way which combines Adenine, Guanine, Cytosine and Thymine (A, G, C and T for short) to deal with characters, and the visual image work, such as font size, shape, font, form, style and so on for application software to deal with, such as a web browser or word processor. 3.2 Permutation and Combination of Nitrogenous Bases Use The coding rule is composed of four nitrogenous bases permutation and combination. In order to maximize the including information about the character of all countries and languages, from 0 to 0x10FFFF are used to indicate all countries and the language character in Unicode encoding, a total of 1114112 code points. If use the nitrogenous bases permutation and combination to represent 1114112 code points, in order to defined each character as unique code, it need 11 nitrogenous bases to represent each code point. For economizing on space of storage, reducing duplication of nitrogenous bases which are from the high-order to low-order. And the adenine (A for short) as '00', the guanine (G for short) as '01', the cytosine (C for short) as '10', the thymine (T for short) as '11'. The table 1 is mapping table of nitrogenous bases. Copyright 2015 SERSC 163

Table 1. The mapping table of nitrogenous bases Unicode Binary Sequence 0 0 A 0x1 1 G 0x2 10 C 0x3 11 T 0xA 1010 CC 0xAF 1010 1111 CCTT 0x10FFFF 1 0000 1111 1111 1100 0000 GAATTTTTAAA 3.3 Latin Letters Computer system support the basic Latin letters. In the ISO8859-1 it defined 256 commonly used characters, such as numbers, uppercase Latin letters, lowercase Latin letters, etc. So the first 256 positions in the character encoding reserved for the characters which include in the ISO8859-1, in order to improve the character encoding efficiency and compatibility. 3.4 Multi-Languages Environment To improve the efficiency and compatibility of multi-languages, the character encoding provide independent zone for different language. The Unicode plane is a good reference for the character encoding. 5 Algorithm Algorithm describes how to perform the character encode with nitrogenous bases. First import the text file which will be transform into the memory. According to the order of the characters in the text, get the Unicode of the character one by one. Follow the code rules, transcode the Unicode to nitrogenous bases. Output the final result to store DNA sequence. For example, the character "A" Unicode is 0x41 (01000001), the corresponding nitrogenous bases is AAAAAAAAGAAG, simplified nitrogenous bases is GAAG. In encryption round, the nitrogenous bases (DNA sequence) will add round key, sub bytes, shift rows, mix columns. The final ciphertext will be storage. 1: Initialization 2: Import the plaintext file 3: for each character do 4: Get Unicode of the characters C unicode 5: Transcode C unicode to C DNA 6: Output C DNA to store DNA sequence 7: end for 164 Copyright 2015 SERSC

6 Verification of Algorithm The Import the text file which include Latin alphabets, Chinese characters, Japanese characters, numbers, and symbols. The application software (Fig. 1 is an example) get Unicode of the character in binary at first. Then follow the coding rules the application software transcode the Unicode to the nitrogenous bases. Inverse this operation, the application software also get the raw text from DNA sequence. Fig. 1. Example of the Character encoding 7 Conclusions This paper puts forward a set of encoding of characters used to DNA storage system. The character encoding can be implemented to convert character to sequence of nitrogenous bases so as to implement the encoding and decoding of character information. This character encoding are more compatible with the multi-language environment, and all character encoding is uniqueness. Acknowledgment. Dr. Yang Guo are greatly acknowledged for supporting this study. Laboratory of complex network and visualization has made publishing of this article possible. Copyright 2015 SERSC 165

References 1. Wei Dan, "Review of magnetic information storage technology," in Physics, vol. 33(9), 2004, pp. 646-651 2. Adleman LM., "Molecular Computation of Solution to Combination Problems," in Science, vol. 266(11), 1994, pp. 1021-1023 3. ZINGEL T., "Formal models of DNA computing:a survey," in Proc Estonian Acad Sci Phys Math, vol. 49(2), 2000, pp. 90-99. 4. Dietrich A. and Been W., "Memory and DNA," in J theor Biol, vol. 208, 2001, pp. 145-149 5. Garzon MH., Neel A., Chen H., "Efficiency and Reliability of DNA Based Memories," in GECCO, 2003, pp. 379-389 6. ROBERT F W., Molecular Biology, 2nd ed., Beijing:Science Press, 2003, pp. 642-682. 166 Copyright 2015 SERSC