Requirement Specification Document Of NSC Open Source Project. Hadoop A Molecular Docking Simulation System based on Hadoop Platform

Requirement Specification Document Of NSC Open Source Project Hadoop A Molecular Docking Simulation System based on Hadoop Platform 101-2221-E-320-007 Department of Medical Informatics National Science Council, Taiwan 2010/11/23

(Contents) 1 (Introduction) 1.1 System)...1 1.1.1 (Purpose)...1 1.1.2 (Identification)...2 1.1.3 (Overview)...2 1.1.4 (Controlling Documents)...2 1.2 (Document)...3 1.2.1 (Purpose)...3 1.2.2 (Acceptance Criteria)...3 1.2.3 (Notation Description)...3 1.2.4 (Priority Definition)...4 2 Hadoop 2.1 (System Description)...5 2.2 (Interface Requirements)...8 2.2.1 (Internal Interface Requirements)...8 2.2.2 (External Interface Requirements)...8 2.2.3 (User Interfaces Requirements)...9 2.3 (Function Requirements)...9 2.4 (Performance Requirements)...9 2.5 (Test Requirements)...9 2.5.1 (System Test Requirement)...9 2.5.2 (Acceptance Criteria)...9 2.6 (Other Requirements)...9 2.6.1 (Reliability Requirement)...9 2.6.2 (Delivery Requirement)...10 2.6.3 (Installation Requirement)...10 2.6.4 (Environment Requirement)...10 2.7 (Operational Concept)...10 2.7.1 (Scenario 1)...11 2.8 (Design and Implementation Constrains)...11 2.9 (Technological Limitations)...11 2.10 (End User Issues)...11 2.11 (Risk Management)...11

1 (Introduction) 1.1 (System) 1.1.1 (Purpose) Hadoop Hadoop MaReduce HDFS(Hadoop File System) Autodock[1] (Genetic Algorithm GA)[2] Hadoop MapReduce 1

1.1.2 (Identification) Hadoop (A Molecular Docking Simulation System based on Hadoop Platform, MDSH) 1.1.3 (Overview) Hadoop Hadoop MapReduce HDFS Hadoop 1.1.4 (Controlling Documents) MDSH Capability Maturity Model-Integrated v1.2 (CMMI v1.2; ) 2

1.2 (Document) 1.2.1 (Purpose) MDSH 1.2.2 (Acceptance Criteria) (Clearly and properly stated) (Completely) (Consistently) (Uniquely Identified) (Appropriately implement) (Verifiably) 1.2.3 (Notation Description) Notation Description MDSH 1.0.0 The MDSH system will be labeled with the number 1.0.0 MDSH-F-xx MDSH-N-xx MDSH (Functional Requirements) MDSH (Non-Functional Requirements) 1.2.4 (Priority Definition) No Name Description 1 Critical 2 Important 3

3 Desirable 4 Unnecessary 4

2 Hadoop (MDSH 1.0.0) 2.1 (System Description) (Molecular Docking) (ligand) (receptor) Fisher E.[3] complementarity pre-organization (EX: ) 1958 Koshland[4] induced fit ( ) (http://oregonstate.edu/instruction/bb350/ahernmaterials/a06/06p11.jpg) 1 2 5

3 UCSF Kuntz DOCK 4.0 6.5 (anchor and grow) Autodock (Genetic Algorithm GA) Hadoop Hadoop MapReduce HDFS HDFS (NameNode) (DataNode) MapReduce JobTracker TaskTracker JobTracker TaskTracker TaskTracker TaskTracker JobTracker JobTracker MapReduce map reduce key/value Hadoop MapReduce Hadoop 1. Autodock pdbqt 2. (1) pdbqt 6

(2) GA (3) (4) (X Y Z) (5) (2) (4) 3. GA Map Reduce MapReuce Hadoop 2009 [5] MapReducing SGAs(MapReducing Compact Genetic Algorithms) GA Hadoop MapReduce GA Hadoop 2009 [6] [5] GA Hadoop [5] HDFS I/O [6] map GA Map GA Map Map GA 2 4 Reduce Reduce Map map key 7

[5][6] GA 4. 2.2 (Interface Requirements) 2.2.1 (Internal Interface Requirements) MDSH-N-001 1 HDFS MDSH-N-002 1 Hadoop MapReduce MDSH-N-003 1 Hadoop MapReduce MDSH-N-004 1 Autodock pdbqt 2.2.2 (External Interface Requirements) MDSH-N-005 1 MDSH-N-006 1 8

2.2.3 (User Interfaces Requirements) MDSH-N-007 1 2.3 (Function Requirements) MDSH-F-008 1 MDSH-F-009 1 MDSH-F-010 1 Map Reduce Function MDSH-F-011 1 GA 2.4 (Performance Requirements) MDSH-N-012 2 MDSH-N-013 2 5 2.5 (Test Requirements) 2.5.1 (System Test Requirement) MDSH-N-014 1 MDSH-N-015 1 MDSH-N-016 1 MDSH-N-017 1 2.5.2 (Acceptance Criteria) MDSH-N-018 1 MDSH-N-019 1 2.6 (Other Requirements) 2.6.1 (Reliability Requirement) MDSH-N-020 1 MDSH-N-021 2 9

2.6.2 (Delivery Requirement) MDSH-N-022 1 Hadoop MDSH-N-023 1 MDSH-N-024 1 2013/06/13 2.6.3 (Installation Requirement) MDSH-N-025 1 Linux Hadoop MDSH-N-026 1 java 7 2.6.4 (Environment Requirement) MDSH-N-027 1 ASUS RS-100 ubuntu 11.10 MDSH-N-028 1 CISCO Gigabit 2.7 (Operational Concept) 2.7.1 (Scenario 1) 1. Autodock pdbqt 2. (.,,, ) 3. hadoop 4. 10

2.8 (Design and Implementation Constrains) MDSH-N-029 1 java 7 MDSH-N-030 1 clinet Server 2.9 (Technological Limitations) MDSH-N-031 1 docking 2.10 (End User Issues) MDSH-N-032 1 MDSH-N-033 1 2.11 (Risk Management) MDSH-N-034 1 Subversion MDSH-N-035 1 11

Reference 1. Autodock http://autodock.scripps.edu/ 2. J. H. Holland, Adaptation in natural and artificial systems : an introductory analysis with applications to biology, control, and artificial intelligence. Ann Arbor: University of Michigan Press, 1975. 3. E. Fischer, "Einfluss der Configuration auf die Wirkung der Enzyme," Berichte der deutschen chemischen Gesellschaft, vol. 27, pp. 2985-2993, 1894. 4. D. E. Koshland, "Application of a Theory of Enzyme Specificity to Protein Synthesis," Proceedings of the National Academy of Sciences of the United States of America, vol. 44, pp. 98-104, Feb 1958. 5. D. Keco and A. Subasi, "Parallelization of genetic algorithms using Hadoop Map/Reduce," Southeast Europe Journal of Soft Computing, 2012. 6. A. Verma, X. Llor, D. E. Goldberg, and R. H. Campbell, "Scaling Genetic Algorithms Using MapReduce," presented at the Proceedings of the 2009 Ninth International Conference on Intelligent Systems Design and Applications, 2009. 12