Requirement Specification Document Of NSC Open Source Project. Hadoop A Molecular Docking Simulation System based on Hadoop Platform



Similar documents
MapReduce Job Processing

Hadoop Architecture. Part 1

Chapter 7. Using Hadoop Cluster and MapReduce

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Performance and Energy Efficiency of. Hadoop deployment models

A Multilevel Secure MapReduce Framework for Cross-Domain Information Sharing in the Cloud

IMPLEMENTING PREDICTIVE ANALYTICS USING HADOOP FOR DOCUMENT CLASSIFICATION ON CRM SYSTEM

Weekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay

marlabs driving digital agility WHITEPAPER Big Data and Hadoop

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

The Performance Characteristics of MapReduce Applications on Scalable Clusters

MapReduce. Tushar B. Kute,

Hadoop Parallel Data Processing

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

How To Install Hadoop From Apa Hadoop To (Hadoop)

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform

Open source Google-style large scale data analysis with Hadoop

Apache Hadoop new way for the company to store and analyze big data

Towards a Resource Aware Scheduler in Hadoop

Design of Electric Energy Acquisition System on Hadoop

White Paper. Big Data and Hadoop. Abhishek S, Java COE. Cloud Computing Mobile DW-BI-Analytics Microsoft Oracle ERP Java SAP ERP

Hadoop Scheduler w i t h Deadline Constraint

What We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea

NTT DOCOMO Technical Journal. Large-Scale Data Processing Infrastructure for Mobile Spatial Statistics

Open source large scale distributed data management with Google s MapReduce and Bigtable

Setup Hadoop On Ubuntu Linux. ---Multi-Node Cluster

How to properly misuse Hadoop. Marcel Huntemann NERSC tutorial session 2/12/13

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Improving Current Hadoop MapReduce Workflow and Performance

R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5

A Cost-Evaluation of MapReduce Applications in the Cloud

PaRFR : Parallel Random Forest Regression on Hadoop for Multivariate Quantitative Trait Loci Mapping. Version 1.0, Oct 2012

Apache Hadoop 2.0 Installation and Single Node Cluster Configuration on Ubuntu A guide to install and setup Single-Node Apache Hadoop 2.

Research Article Hadoop-Based Distributed Sensor Node Management System

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. Big Data Management and Analytics

CURSO: ADMINISTRADOR PARA APACHE HADOOP

Big Data - Infrastructure Considerations

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Research Article Cloud Computing for Protein-Ligand Binding Site Comparison

Consensus Scoring to Improve the Predictive Power of in-silico Screening for Drug Design

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

Telecom Data processing and analysis based on Hadoop

METHOD OF A MULTIMEDIA TRANSCODING FOR MULTIPLE MAPREDUCE JOBS IN CLOUD COMPUTING ENVIRONMENT

Introduction to Cloud Computing

Energy-Saving Cloud Computing Platform Based On Micro-Embedded System

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

!"#$%&' ( )%#*'+,'-#.//"0( !"#$"%&'()*$+()',!-+.'/', 4(5,67,!-+!"89,:*$;'0+$.<.,&0$'09,&)"/=+,!()<>'0, 3, Processing LARGE data sets

HadoopRDF : A Scalable RDF Data Analysis System

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

Cloudera Manager Health Checks

Fundamentals Curriculum HAWQ

Cloudera Manager Health Checks

Hadoop on OpenStack Cloud. Dmitry Mescheryakov Software

How To Analyze Network Traffic With Mapreduce On A Microsoft Server On A Linux Computer (Ahem) On A Network (Netflow) On An Ubuntu Server On An Ipad Or Ipad (Netflower) On Your Computer

Bright Cluster Manager

Analysing Large Web Log Files in a Hadoop Distributed Cluster Environment

A very short Intro to Hadoop

A CLOUD-BASED FRAMEWORK FOR ONLINE MANAGEMENT OF MASSIVE BIMS USING HADOOP AND WEBGL

Sector vs. Hadoop. A Brief Comparison Between the Two Systems

Qsoft Inc

Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012

Mobile Storage and Search Engine of Information Oriented to Food Cloud

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Reference Architecture and Best Practices for Virtualizing Hadoop Workloads Justin Murray VMware

Mobile Cloud Computing for Data-Intensive Applications

Local Alignment Tool Based on Hadoop Framework and GPU Architecture

Benchmarking Hadoop & HBase on Violin

Performance Analysis of Book Recommendation System on Hadoop Platform

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

Apache Hadoop. Alexandru Costan

L1: Introduction to Hadoop

NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE

MapReduce, Hadoop and Amazon AWS

THE HADOOP DISTRIBUTED FILE SYSTEM

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Parallel Data Mining and Assurance Service Model Using Hadoop in Cloud

Hadoop. Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware.

Mining of Web Server Logs in a Distributed Cluster Using Big Data Technologies

H2O on Hadoop. September 30,

Building & Optimizing Enterprise-class Hadoop with Open Architectures Prem Jain NetApp

Data-Intensive Computing with Map-Reduce and Hadoop

High Performance Computing with Hadoop WV HPC Summer Institute 2014

Hadoop Submitted in partial fulfillment of the requirement for the award of degree of Bachelor of Technology in Computer Science

Keywords: Big Data, HDFS, Map Reduce, Hadoop

Hadoop Installation. Sandeep Prasad


Hadoop Setup. 1 Cluster

Optimize the execution of local physics analysis workflows using Hadoop

The Improved Job Scheduling Algorithm of Hadoop Platform

map/reduce connected components

How To Run Apa Hadoop 1.0 On Vsphere Tmt On A Hyperconverged Network On A Virtualized Cluster On A Vspplace Tmter (Vmware) Vspheon Tm (

Lecture 10 - Functional programming: Hadoop and MapReduce

General Terms. Keywords 1. INTRODUCTION 2. RELATED WORKS

University of Maryland. Tuesday, February 2, 2010

Mr. Apichon Witayangkurn Department of Civil Engineering The University of Tokyo

Hadoop Training Hands On Exercise

Efficient Data Replication Scheme based on Hadoop Distributed File System

Transcription:

Requirement Specification Document Of NSC Open Source Project Hadoop A Molecular Docking Simulation System based on Hadoop Platform 101-2221-E-320-007 Department of Medical Informatics National Science Council, Taiwan 2010/11/23

(Contents) 1 (Introduction) 1.1 System)...1 1.1.1 (Purpose)...1 1.1.2 (Identification)...2 1.1.3 (Overview)...2 1.1.4 (Controlling Documents)...2 1.2 (Document)...3 1.2.1 (Purpose)...3 1.2.2 (Acceptance Criteria)...3 1.2.3 (Notation Description)...3 1.2.4 (Priority Definition)...4 2 Hadoop 2.1 (System Description)...5 2.2 (Interface Requirements)...8 2.2.1 (Internal Interface Requirements)...8 2.2.2 (External Interface Requirements)...8 2.2.3 (User Interfaces Requirements)...9 2.3 (Function Requirements)...9 2.4 (Performance Requirements)...9 2.5 (Test Requirements)...9 2.5.1 (System Test Requirement)...9 2.5.2 (Acceptance Criteria)...9 2.6 (Other Requirements)...9 2.6.1 (Reliability Requirement)...9 2.6.2 (Delivery Requirement)...10 2.6.3 (Installation Requirement)...10 2.6.4 (Environment Requirement)...10 2.7 (Operational Concept)...10 2.7.1 (Scenario 1)...11 2.8 (Design and Implementation Constrains)...11 2.9 (Technological Limitations)...11 2.10 (End User Issues)...11 2.11 (Risk Management)...11

1 (Introduction) 1.1 (System) 1.1.1 (Purpose) Hadoop Hadoop MaReduce HDFS(Hadoop File System) Autodock[1] (Genetic Algorithm GA)[2] Hadoop MapReduce 1

1.1.2 (Identification) Hadoop (A Molecular Docking Simulation System based on Hadoop Platform, MDSH) 1.1.3 (Overview) Hadoop Hadoop MapReduce HDFS Hadoop 1.1.4 (Controlling Documents) MDSH Capability Maturity Model-Integrated v1.2 (CMMI v1.2; ) 2

1.2 (Document) 1.2.1 (Purpose) MDSH 1.2.2 (Acceptance Criteria) (Clearly and properly stated) (Completely) (Consistently) (Uniquely Identified) (Appropriately implement) (Verifiably) 1.2.3 (Notation Description) Notation Description MDSH 1.0.0 The MDSH system will be labeled with the number 1.0.0 MDSH-F-xx MDSH-N-xx MDSH (Functional Requirements) MDSH (Non-Functional Requirements) 1.2.4 (Priority Definition) No Name Description 1 Critical 2 Important 3

3 Desirable 4 Unnecessary 4

2 Hadoop (MDSH 1.0.0) 2.1 (System Description) (Molecular Docking) (ligand) (receptor) Fisher E.[3] complementarity pre-organization (EX: ) 1958 Koshland[4] induced fit ( ) (http://oregonstate.edu/instruction/bb350/ahernmaterials/a06/06p11.jpg) 1 2 5

3 UCSF Kuntz DOCK 4.0 6.5 (anchor and grow) Autodock (Genetic Algorithm GA) Hadoop Hadoop MapReduce HDFS HDFS (NameNode) (DataNode) MapReduce JobTracker TaskTracker JobTracker TaskTracker TaskTracker TaskTracker JobTracker JobTracker MapReduce map reduce key/value Hadoop MapReduce Hadoop 1. Autodock pdbqt 2. (1) pdbqt 6

(2) GA (3) (4) (X Y Z) (5) (2) (4) 3. GA Map Reduce MapReuce Hadoop 2009 [5] MapReducing SGAs(MapReducing Compact Genetic Algorithms) GA Hadoop MapReduce GA Hadoop 2009 [6] [5] GA Hadoop [5] HDFS I/O [6] map GA Map GA Map Map GA 2 4 Reduce Reduce Map map key 7

[5][6] GA 4. 2.2 (Interface Requirements) 2.2.1 (Internal Interface Requirements) MDSH-N-001 1 HDFS MDSH-N-002 1 Hadoop MapReduce MDSH-N-003 1 Hadoop MapReduce MDSH-N-004 1 Autodock pdbqt 2.2.2 (External Interface Requirements) MDSH-N-005 1 MDSH-N-006 1 8

2.2.3 (User Interfaces Requirements) MDSH-N-007 1 2.3 (Function Requirements) MDSH-F-008 1 MDSH-F-009 1 MDSH-F-010 1 Map Reduce Function MDSH-F-011 1 GA 2.4 (Performance Requirements) MDSH-N-012 2 MDSH-N-013 2 5 2.5 (Test Requirements) 2.5.1 (System Test Requirement) MDSH-N-014 1 MDSH-N-015 1 MDSH-N-016 1 MDSH-N-017 1 2.5.2 (Acceptance Criteria) MDSH-N-018 1 MDSH-N-019 1 2.6 (Other Requirements) 2.6.1 (Reliability Requirement) MDSH-N-020 1 MDSH-N-021 2 9

2.6.2 (Delivery Requirement) MDSH-N-022 1 Hadoop MDSH-N-023 1 MDSH-N-024 1 2013/06/13 2.6.3 (Installation Requirement) MDSH-N-025 1 Linux Hadoop MDSH-N-026 1 java 7 2.6.4 (Environment Requirement) MDSH-N-027 1 ASUS RS-100 ubuntu 11.10 MDSH-N-028 1 CISCO Gigabit 2.7 (Operational Concept) 2.7.1 (Scenario 1) 1. Autodock pdbqt 2. (.,,, ) 3. hadoop 4. 10

2.8 (Design and Implementation Constrains) MDSH-N-029 1 java 7 MDSH-N-030 1 clinet Server 2.9 (Technological Limitations) MDSH-N-031 1 docking 2.10 (End User Issues) MDSH-N-032 1 MDSH-N-033 1 2.11 (Risk Management) MDSH-N-034 1 Subversion MDSH-N-035 1 11

Reference 1. Autodock http://autodock.scripps.edu/ 2. J. H. Holland, Adaptation in natural and artificial systems : an introductory analysis with applications to biology, control, and artificial intelligence. Ann Arbor: University of Michigan Press, 1975. 3. E. Fischer, "Einfluss der Configuration auf die Wirkung der Enzyme," Berichte der deutschen chemischen Gesellschaft, vol. 27, pp. 2985-2993, 1894. 4. D. E. Koshland, "Application of a Theory of Enzyme Specificity to Protein Synthesis," Proceedings of the National Academy of Sciences of the United States of America, vol. 44, pp. 98-104, Feb 1958. 5. D. Keco and A. Subasi, "Parallelization of genetic algorithms using Hadoop Map/Reduce," Southeast Europe Journal of Soft Computing, 2012. 6. A. Verma, X. Llor, D. E. Goldberg, and R. H. Campbell, "Scaling Genetic Algorithms Using MapReduce," presented at the Proceedings of the 2009 Ninth International Conference on Intelligent Systems Design and Applications, 2009. 12