Development of Bio-Cloud Service for Genomic Analysis Based on Virtual

Size: px
Start display at page:

Download "Development of Bio-Cloud Service for Genomic Analysis Based on Virtual"

Transcription

1 Development of Bio-Cloud Service for Genomic Analysis Based on Virtual Infrastructure 1 Jung-Ho Um, 2 Sang Bae Park, 3 Hoon Choi, 4 Hanmin Jung 1, First Author Korea Institute of Science and Technology Information, jhum@kisti.re.kr 2, Korea Institute of Science and Technology Information, plucky@kisti.re.kr *3, Corresponding Author Korea Institute of Science and Technology Information, choid@kisti.re.kr 4, Korea Institute of Science and Technology Information, jhm@kisti.re.kr Abstract Recently the importance of genomic data analysis is growing to realize the personalized treatment of human cancers. The next generation sequencing (NGS) technique is a cost-effective way to obtain such data sets for cancer data analysis. Because NGS produces data sets of lots of short reads, more computing resources are required to analyze those sets of data. To solve this issue, cloud computing can be considered as a prominent solution in order to elastically manage the requirement of computing resources. In this paper, we propose a bio-cloud service for large-scale NGS data analysis based on virtualized computing infrastructure. It has been developed by collaboration of KISTI and KOBIC to enhance the productivity of KOBIC s genomic data re-sequencing study. Keywords: Next-Generation Sequencing, Cloud Computing, Genomic Analysis, Virtual Cluster Management, NGS Analysis Pipeline 1. Introduction In general, genome sequence data analysis is an essential part of studying the personalized medical treatment of human cancers [1]. Nowadays, the next-generation sequencing (NGS) technique is prevalent in the field of genome analysis study because it is possible for biologists to save the cost and time for generating genome data. Hence, most bioinformatics research groups use the NGS technique to analyze genome sequences [2, 3]. The Korean Bioinformation Center (KOBIC), which is the primary research institute for bioinformatics research on human cancers, constructed a sequence data analysis system which consists of 100 physical computing servers and one storage server. However, the system has a few issues due to the direct use of physical computing resources. First, as the workload increases, latency increases; therefore, job execution time is severely delayed. Second, it is very difficult to enhance the efficiency of computing resources because even a bioinformatics application consists of sequential processing and parallel processing components. Therefore, bioinformatics applications tend to under-utilize physical computing resources. Finally, the existing system only stores the output data from executing applications in a storage server. This means that data management is handled by each user who executes applications. Since the system manages data not from a user s viewpoint but from an application s viewpoint, it requires additional efforts by users to manage their data. These problems cause serious inefficiency in dealing with large-scale data analysis like RNA and DNA re-sequencing. To enhance the productivity of such analysis, we developed the Bio-Cloud system, which is based on virtualized computing resources and is specific to KOBIC s requirements. In this paper, we describe the Bio-Cloud system which can elastically allocate virtualized computing resources to bioinformatics applications on demand. We exploit Xen and OpenNebula to construct a virtual infrastructure management system for Bio-Cloud. 2. Proposed Bio-Cloud System We designed the Bio-Cloud system by analyzing the above problems and developing solutions to resolve them. To reduce job execution time, we constructed a virtual infrastructure management system that can dynamically allocate a virtual cluster to a user s job on demand. To improve the utilization of computing resources, a virtual cluster is allocated on the basis of CPU usage frequency, memory usage Advances in information Sciences and Service Sciences(AISS) Volume4, Number13, July 2012 doi: /AISS.vol4.issue

2 patterns, and characteristics of applications including sequential and parallel processing components. Data management functions are provided to support NGS analysis pipelines and make it easy for cloud users to handle their data. Figure 1. Bio-Cloud system architecture Figure 1 shows the overall system architecture of Bio-Cloud. The system consists of a virtual infrastructure, resource management server and web interface to receive from and respond to KOBIC s requests. Users and their scenarios in our system are described as follows. - End-User: End users execute and monitor their tasks, and upload/download their data by using an App Server provided by the Bio App. manager. - Bio App. Manager (SaaS Provider): The Bio App Manager (KOBIC personnel in our project) is primarily responsible for submitting the request of a virtual cluster to the Resource Manager by using a web interface. The capacity of a virtual cluster provided by the Resource Manager is based on end user s requests and the status of virtual infrastructure. The Bio App Manager is also responsible for maintaining pipelines of the NGS application. In addition, the Bio App Manager manually or automatically installs bioinformatics applications on the virtual cluster provided by the Resource Manager. - Resource Manager: The Resource Manager manages and monitors the physical and virtual infrastructure for the Bio-Cloud system. We designed a virtual infrastructure management system to provide a virtual cluster (or virtual machine) to bioinformatics applications. Figure 2 shows components of the virtual infrastructure management system. 298

3 Figure 2. Virtual Infrastructure Management System - Virtual Infrastructure: The Virtual Infrastructure is consideredd as a pool of virtual clusters, virtual machines, and storage servers. The diverse capacity (i.e., the number of vcpu s, the size of virtual memory, the number of virtual machines for a virtual cluster, etc.) of virtual clusters and virtual machines is created by SaaS Provider s requests. The virtual resources in the Virtual Infrastructure are managed by the Computing Resource Management Server and Data Resource Management Server - Computing Resource Management Server: The Computing Resource Management Server manages virtual computing resources, such as virtual clusters and virtual machines. To allocate a virtual cluster to a job and execute it on the virtual cluster, the Computing Resource Management Server provides a Virtual Cluster Management module, which provides functions such as allocation, release, and monitoring of a virtual cluster. Virtual Cluster Management calculates the number of virtual CPU s, the size of the memory, and the number of virtual machines in a virtual cluster. Based on the calculation, the Virtual Cluster Management module creates a virtual cluster or machine from the Virtual Infrastructure. In addition, it also manages the execution of a job on a virtual cluster. - Data Resource Management Server: The Dataa Resource Management Server manages input/output and temporary data required to execute a job on a virtual cluster. It provides end users with data management functions to handle their data according to NGS analysis pipelines. Figure 3 shows an example of an NGS analysis pipeline frequently used in DNA and RNA re-sequencing. The pipeline consistss of BWA[4], BowTie[5] ], SSAHA[6] ], SamTools[ [7] and GATK[8]. The data requirement for each step of the pipeline is shown in Table

4 Figure 3. An example NGS analysis pipeline Table 1. The data size generated by the example NGS analysis pipeline file name ref.fa gen.fq snp.rod input file size(gigabyte) file name gen.sam outpu file size(gigabyte) gen.addrg.sam gen.bam gen.sorted.bam gen.snpcalls.vcf file name ref.fa.amb ref.fa.ann ref.fa.bwt ref.fa.pac ref.fa.rbwt size(gigabyte) index file file name ref.fa.rpac ref.fa.rsa ref.fa.sa gen.sai ref.fa.fai gen.snpcalls.vcf.idx size(gigabyte) Total 2, The Data Resource Management Serverr must store and maintainn the whole data set of 2 TB produced from the above analysis pipeline, which is reusable to NGS researchers. To do this, it annotates the metadata shown in Table 2 to the data products. It also creates a folder for each step of a 300

5 pipeline and stores data in the folder. As a result, the data product for a pipeline consists of a hierarchy of folders, and the folders hold data from each step with the metadata. Field Name UserID Storage Server Address Path Filename Datasize Creation date Modified date Application type pipelineid Table 2. Metadata table unique user identifier address of storage server stored data folder path to store user s data file name of data size of data creation date of data last modification date of data application type if data is output data pipeline Id for genome sequencing The Data Resource Management Server supports Project folders, File folders and Script folders to make the management of data products easy. The folders store output data from applications, end user s input data, and user defined scripts, respectively. Each user owns and manages these folders in a storage server like NFS [9]. Data updating consists of three steps as shown in Figure Meta-data related to the update request is stored to the meta-data table. 2. User submits the update request to storage server. 3. Updated data is transferred to storage server. Figure 4. Data management flow The storage server contains a Data Management module and a Data Transfer module to handle user s requests. The Data Management module maintains input/output data and provides creation, deletion, rename and copy of folders and data. In addition, it maintains a metadata table for efficient data management. The Data Transfer module uploads and downloads data between clients and a storage server. 3. Implementation The Bio-Cloud designed in the previous section is implemented by the use of Xen and OpenNebula. They are well-suited to the construction of a virtual infrastructure management system for bioinformatics applications. The user environment of Bio-Cloud is similar to KOBIC s current system to reduce customer s adaptation efforts. That is, we use the same system components, such as job scheduler, operating system of a virtual machine, web server, etc. 301

6 - Virtual Cluster Management module: We extended OpenNebula [10] to develop this module. OpenNebula supports Open Cloud API (OCA) to manage virtual infrastructure. OCA provides creation, deletion, and monitoring of virtual machines, virtual network, host, physical cluster, user and image. OCA does not provide functions for managing virtual clusters. We exploited OCA to add the functions for virtual cluster management to OpenNebula. The added functions are creation, deletion, and monitoring of a virtual cluster (Table 3). In addition, KOBIC uses Sun Grid Engine (SGE) [11] to execute bioinformatics applications because SGE is an open-source program of high stability with consistent updates. We also adapt SGE for a job scheduler in Bio-Cloud and execute KOBIC s jobs on virtual clusters. Table 3. Virtual Cluster Management APIs Allocate : allocation of virtual cluster Client client connect to a client String virtual machine template to allocate virtual cluster description int numvm the number of virtual machines for virtual cluster OneResponse communication acknowledgement finalizecluster : deletion of virtual cluster OneResponse communication acknowledgement savecluster : saving a virtual machine image in virtual cluster int vmid[] storing virtual machine identifier int disked storing disk identifier int imageid storing image identifier OneResponse communication acknowledgement getclusterinfo : monitoring virtual cluster VCInfo store virtual cluster information - Data Transfer module: KOBIC s current system uses innods [12] for transferring data. However, innods costs license fee, which is a major obstacle to deployment of Bio-Cloud. Instead, the opensource code Rapident [13] is used to implement the Data Transfer module. Data Transfer includes upload/download of data and folders (Table 4). 302

7 Upload : Upload data String srcpath String destpath boolean Download : Download data String srcpath String destpath boolean Table 4. APIs for data transfer path of user s data source path to store data to server success or failure of upload (true/false) stored data path on the server path to download data success or failure of download (true/false) - Data Management module: KOBIC users want to manage data for their own purposes. This module provides data management functions from a user s viewpoint. They include creation, deletion, rename and copy of folders and files (Table 5). Table 5. Data Management APIs create : creation of folder or file String path path including the folder(or file) name to create boolean success or failure of creation (true/false) delete : deletion of folder or file String path path including the folder(or file) name to delete boolean success or failure of delete(true/false) rename : rename folder or file String srcpath source path including the folder(or file) name to rename String destpath destination path to rename boolean success or failure of rename(true/false) copy : rename folder or file String srcpath source path including the folder(or file) name to copy String destpath destination path to copy boolean success or failure of copy(true/false) 303

8 4. Software as a Service on Bio-Cloud: NEUMA Web Portal NEUMA [14] web portal service was developed on the top of the Bio-Cloud system described in sections 2 and 3. NEUMA is the fastest RNA re-sequencing application for quantifying the volume of genomic expression data. The NEUMA analysis pipeline comprises the NGS applications shown in Table 6. steps Table 6. NEUMA pipeline function 1 make bowtie index file of Reference sequence by bowtie(bowtie [5]) 2 make mapping statistics of each RNA-Seq 3 find length distribution 4 build indexed table for transcriptome model 5 build suffix array table for transcriptome model 6 print suffix array table for transcriptome model 7 build gu, iu table for transcriptome model 8 gene and isoform quantification 9 merge Bio-Cloud provides the NEUMA web portal with the following features. First of all, it is responsible for allocating virtual clusters to NEUMA application services. The virtual clusters consist of vcpu, the memory and storage volume on the Bio App Manager s request (Figure 5). Figure 5. Request of virtual clusters Second, Bio-Cloud provides a feature that can summarize the requests of virtual clusters from users and monitor the current status of virtual clusters allocated to users (Figure 7). 304

9 Figure 6. Summary of requested virtual clusters Figure 7. Monitoring allocated virtual clusters Third, Bio-Cloud executes NEUMA application jobs through Sun Grid Engine (SGE), and monitors their status executed on virtual clusters (Figure 8). Figure 8. Monitoring user s job status 305

10 Finally, Bio-Cloud provides a data management feature for creating, deleting, and renaming files and folders. It also provides editing, uploading, and downloading of data files (Figure 9). 5. Related works Cloud computing services for bioinformatics have been studied by many research and industry organizations. CloudBurst[15] is a read-mapping application for DNA and RNA re-sequencing running on Amazon EC2. It drastically reduces the execution time for re-sequencing by the use of the MapReduce programming framework and runtime system. CloudBLAST[16] also parallelized BLAST[17] with MapReduce on the public cloud Amazon EC2. OBIWEE [18] has proposed an open-source on a private cloud. Kim [19] has also presented a private cloud computing environment for parallel processing of bioinformatics applications consisting of multiple tasks. These studies focus on the efficient processing of bioinformatics applications on public and private cloud environments. But all of them do not support analysis pipelines essential to analyze genomic data. computing environment to support workflows among bioinformatics applications The Bio-Cloud system we have proposed in this paper deals with analysis pipelines and data management issues for NGS analysis on a private cloud. 6. Conclusionss Figure 9. Data management interface In this paper, we have proposed and developed Bio-Cloud service for genomic data analysis, which can dynamically allocate a virtual cluster depending on a user s job. The advantages of Bio-Cloud may be summarized as follows. First, it reduces user s waiting time and job execution time by creating a virtual cluster suited for users jobs. Second, it allocates a virtual cluster to a job based on the characteristics of bioinformatics applications (sequential or parallel processing) and the number of vcpus or the size of memory. Finally, since each user s data is separately stored and maintained according to his/her own analysis viewpoint, a user s efforts to manage data is drastically reduced. Bio-Cloud is used to deploy NEUMA, a web tool for KOBIC s RNA re-sequencing [20]. Bio-Cloud and NEUMA web portal service are very useful for smalll labs that do not operate computing systems with high enough performance to handle their research. We are still working on the development of an adaptive scheduler for the Virtual Cluster Management in Bio-Cloud. The scheduler dynamically allocates a virtual cluster to a cluster of physical hosts based on the application s characteristics to 306

11 improve performance and scalability. We are also developing a data provenance management module for bioinformatics data management. 10. References [1] A. H. Chen, and M. C. Lee, Novel Approaches for the Prediction of Cancer Classification, In Proceedings of IJACT, Vol. 3, No. 3, pp , [2] H. Liu, Z. Lu, L Guo, Q. Wu, Q. Ge, and J. Lu, Next generation sequencing, an effective method for genomic profiling of circulating mirna, In Proceedings of JCIT, Vol. 6, No. 12, pp , [3] M. Xiong, Z. Zhao, J. Arnold, and F. Yu, Next-Generation Sequencing, In Proceedings of Journal of Biomedicine and Biotechnology, [4] [5] [6] [7] [8] [9] R. Sandberg, D. Goldberg, S. Kleiman, D. Walsh, and B. Lyon, Design and Implementation or the Sun Network Filesystem, In Proceedings of USENIX, [10] [11] [12] [13] [14] [15] M. Schatz, CloudBurst: Highly Sensitive Short Read Mapping with MapReduce, In Proceedings of BioInformatics, [16] A. Matsunaga, M. Tsugawa, and J. Fortes, CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources for Bioinformatics Applications, In Proceedings of 4th IEEE International Conference on escience, [17] [18] F. Moreews, J. Piat, and O. Sallou, OBIWEE : an open source bioinformatics cloud environment, In Proceedings of 12th Annual Bioinformatics Open Source Conference, [19] T. K. Kim, B. K. Hou, and W. S. Cho, Private Cloud Computing Techniques for Inter-processing Bioinformatics Tools, In Proceedings of Convergence and Hybrid Information Technology [20] S. Lee, C. H. Seo, B. Lim, J. O. Yang, J. Oh, M. Kim, S. Lee, B. Lee, C. Kang, and S. Lee, Accurate quantification of transcriptome from RNA-Seq data by effective length normalization, In Proceedings of Nucleic Acids Res.,

A Design of Resource Fault Handling Mechanism using Dynamic Resource Reallocation for the Resource and Job Management System

A Design of Resource Fault Handling Mechanism using Dynamic Resource Reallocation for the Resource and Job Management System A Design of Resource Fault Handling Mechanism using Dynamic Resource Reallocation for the Resource and Job Management System Young-Ho Kim, Eun-Ji Lim, Gyu-Il Cha, Seung-Jo Bae Electronics and Telecommunications

More information

Large-scale Research Data Management and Analysis Using Globus Services. Ravi Madduri Argonne National Lab University of Chicago @madduri

Large-scale Research Data Management and Analysis Using Globus Services. Ravi Madduri Argonne National Lab University of Chicago @madduri Large-scale Research Data Management and Analysis Using Globus Services Ravi Madduri Argonne National Lab University of Chicago @madduri Outline Who we are Challenges in Big Data Management and Analysis

More information

CLOSHA MANUAL ver1.1. KOBIC (Korean Bioinformation Center) kogun82@kribb.re.kr 2016-05-08. Bioinformatics Workflow management System in Bio-Express

CLOSHA MANUAL ver1.1. KOBIC (Korean Bioinformation Center) kogun82@kribb.re.kr 2016-05-08. Bioinformatics Workflow management System in Bio-Express CLOSHA MANUAL ver1.1 Bioinformatics Workflow management System in Bio-Express Cloud Services for Massive Sequencing Data Analysis KOBIC (Korean Bioinformation Center) kogun82@kribb.re.kr 2016-05-08 1 1.

More information

Hadoopizer : a cloud environment for bioinformatics data analysis

Hadoopizer : a cloud environment for bioinformatics data analysis Hadoopizer : a cloud environment for bioinformatics data analysis Anthony Bretaudeau (1), Olivier Sallou (2), Olivier Collin (3) (1) anthony.bretaudeau@irisa.fr, INRIA/Irisa, Campus de Beaulieu, 35042,

More information

Cloud-Based Big Data Analytics in Bioinformatics

Cloud-Based Big Data Analytics in Bioinformatics Cloud-Based Big Data Analytics in Bioinformatics Presented By Cephas Mawere Harare Institute of Technology, Zimbabwe 1 Introduction 2 Big Data Analytics Big Data are a collection of data sets so large

More information

New solutions for Big Data Analysis and Visualization

New solutions for Big Data Analysis and Visualization New solutions for Big Data Analysis and Visualization From HPC to cloud-based solutions Barcelona, February 2013 Nacho Medina imedina@cipf.es http://bioinfo.cipf.es/imedina Head of the Computational Biology

More information

A Service for Data-Intensive Computations on Virtual Clusters

A Service for Data-Intensive Computations on Virtual Clusters A Service for Data-Intensive Computations on Virtual Clusters Executing Preservation Strategies at Scale Rainer Schmidt, Christian Sadilek, and Ross King rainer.schmidt@arcs.ac.at Planets Project Permanent

More information

Figure 1. The cloud scales: Amazon EC2 growth [2].

Figure 1. The cloud scales: Amazon EC2 growth [2]. - Chung-Cheng Li and Kuochen Wang Department of Computer Science National Chiao Tung University Hsinchu, Taiwan 300 shinji10343@hotmail.com, kwang@cs.nctu.edu.tw Abstract One of the most important issues

More information

Early Cloud Experiences with the Kepler Scientific Workflow System

Early Cloud Experiences with the Kepler Scientific Workflow System Available online at www.sciencedirect.com Procedia Computer Science 9 (2012 ) 1630 1634 International Conference on Computational Science, ICCS 2012 Early Cloud Experiences with the Kepler Scientific Workflow

More information

Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers

Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers Ntinos Krampis Asst. Professor J. Craig Venter Institute kkrampis@jcvi.org http://www.jcvi.org/cms/about/bios/kkrampis/

More information

MINIMIZING STORAGE COST IN CLOUD COMPUTING ENVIRONMENT

MINIMIZING STORAGE COST IN CLOUD COMPUTING ENVIRONMENT MINIMIZING STORAGE COST IN CLOUD COMPUTING ENVIRONMENT 1 SARIKA K B, 2 S SUBASREE 1 Department of Computer Science, Nehru College of Engineering and Research Centre, Thrissur, Kerala 2 Professor and Head,

More information

Mobile Storage and Search Engine of Information Oriented to Food Cloud

Mobile Storage and Search Engine of Information Oriented to Food Cloud Advance Journal of Food Science and Technology 5(10): 1331-1336, 2013 ISSN: 2042-4868; e-issn: 2042-4876 Maxwell Scientific Organization, 2013 Submitted: May 29, 2013 Accepted: July 04, 2013 Published:

More information

Support for data-intensive computing with CloudMan

Support for data-intensive computing with CloudMan Support for data-intensive computing with CloudMan Y. Kowsar 1 and E. Afgan 1,2 1 Victorian Life Sciences Computation Initiative (VLSCI), University of Melbourne, Melbourne, Australia 2 Centre for Informatics

More information

OpenNebula Open Souce Solution for DC Virtualization. C12G Labs. Online Webinar

OpenNebula Open Souce Solution for DC Virtualization. C12G Labs. Online Webinar OpenNebula Open Souce Solution for DC Virtualization C12G Labs Online Webinar What is OpenNebula? Multi-tenancy, Elasticity and Automatic Provision on Virtualized Environments I m using virtualization/cloud,

More information

Getting Started Hacking on OpenNebula

Getting Started Hacking on OpenNebula LinuxTag 2013 Berlin, Germany, May 22nd Getting Started Hacking on OpenNebula Carlos Martín Project Engineer Acknowledgments The research leading to these results has received funding from Comunidad de

More information

CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES

CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES 1 MYOUNGJIN KIM, 2 CUI YUN, 3 SEUNGHO HAN, 4 HANKU LEE 1,2,3,4 Department of Internet & Multimedia Engineering,

More information

Excalibur: An Autonomic Cloud Architecture for Executing Parallel Applications

Excalibur: An Autonomic Cloud Architecture for Executing Parallel Applications Excalibur: An Autonomic Cloud Architecture for Executing Parallel Applications Alessandro Ferreira Leite Université Paris-Sud/University of Brasilia alessandro.ferreira-leite@u-psud.fr Claude Tadonki MINES

More information

Optimal Service Pricing for a Cloud Cache

Optimal Service Pricing for a Cloud Cache Optimal Service Pricing for a Cloud Cache K.SRAVANTHI Department of Computer Science & Engineering (M.Tech.) Sindura College of Engineering and Technology Ramagundam,Telangana G.LAKSHMI Asst. Professor,

More information

Phylogenetic Code in the Cloud Can it Meet the Expectations?

Phylogenetic Code in the Cloud Can it Meet the Expectations? Phylogenetic Code in the Cloud Can it Meet the Expectations? Adam Kraut 1, Sébastien Moretti 2,3, Marc Robinson-Rechavi 2, Heinz Stockinger 3, and Dean Flanders 4 1) BioTeam Inc., Middleton, MA, USA 2)

More information

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

A Novel Cloud Based Elastic Framework for Big Data Preprocessing School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview

More information

Twister4Azure: Data Analytics in the Cloud

Twister4Azure: Data Analytics in the Cloud Twister4Azure: Data Analytics in the Cloud Thilina Gunarathne, Xiaoming Gao and Judy Qiu, Indiana University Genome-scale data provided by next generation sequencing (NGS) has made it possible to identify

More information

5 SCS Deployment Infrastructure in Use

5 SCS Deployment Infrastructure in Use 5 SCS Deployment Infrastructure in Use Currently, an increasing adoption of cloud computing resources as the base to build IT infrastructures is enabling users to build flexible, scalable, and low-cost

More information

Task Scheduling in Hadoop

Task Scheduling in Hadoop Task Scheduling in Hadoop Sagar Mamdapure Munira Ginwala Neha Papat SAE,Kondhwa SAE,Kondhwa SAE,Kondhwa Abstract Hadoop is widely used for storing large datasets and processing them efficiently under distributed

More information

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community Ntinos Krampis Asst. Professor J. Craig Venter Institute kkrampis@jcvi.org http://www.jcvi.org/cms/about/bios/kkrampis/

More information

BioHPC Web Computing Resources at CBSU

BioHPC Web Computing Resources at CBSU BioHPC Web Computing Resources at CBSU 3CPG workshop Robert Bukowski Computational Biology Service Unit http://cbsu.tc.cornell.edu/lab/doc/biohpc_web_tutorial.pdf BioHPC infrastructure at CBSU BioHPC Web

More information

The Study of a Hierarchical Hadoop Architecture in Multiple Data Centers Environment

The Study of a Hierarchical Hadoop Architecture in Multiple Data Centers Environment Send Orders for Reprints to reprints@benthamscience.ae The Open Cybernetics & Systemics Journal, 2015, 9, 131-137 131 Open Access The Study of a Hierarchical Hadoop Architecture in Multiple Data Centers

More information

Practical Solutions for Big Data Analytics

Practical Solutions for Big Data Analytics Practical Solutions for Big Data Analytics Ravi Madduri Computation Institute (madduri@anl.gov) Paul Dave (pdave@uchicago.edu) Dinanath Sulakhe (sulakhe@uchicago.edu) Alex Rodriguez (arodri7@uchicago.edu)

More information

Globus Genomics Tutorial GlobusWorld 2014

Globus Genomics Tutorial GlobusWorld 2014 Globus Genomics Tutorial GlobusWorld 2014 Agenda Overview of Globus Genomics Example Collaborations Demonstration Globus Genomics interface Globus Online integration Scenario 1: Using Globus Genomics for

More information

Energy Constrained Resource Scheduling for Cloud Environment

Energy Constrained Resource Scheduling for Cloud Environment Energy Constrained Resource Scheduling for Cloud Environment 1 R.Selvi, 2 S.Russia, 3 V.K.Anitha 1 2 nd Year M.E.(Software Engineering), 2 Assistant Professor Department of IT KSR Institute for Engineering

More information

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community Ntinos Krampis Asst. Professor J. Craig Venter Institute kkrampis@jcvi.org http://www.jcvi.org/cms/about/bios/kkrampis/

More information

Sistemi Operativi e Reti. Cloud Computing

Sistemi Operativi e Reti. Cloud Computing 1 Sistemi Operativi e Reti Cloud Computing Facoltà di Scienze Matematiche Fisiche e Naturali Corso di Laurea Magistrale in Informatica Osvaldo Gervasi ogervasi@computer.org 2 Introduction Technologies

More information

Reverse Auction-based Resource Allocation Policy for Service Broker in Hybrid Cloud Environment

Reverse Auction-based Resource Allocation Policy for Service Broker in Hybrid Cloud Environment Reverse Auction-based Resource Allocation Policy for Service Broker in Hybrid Cloud Environment Sunghwan Moon, Jaekwon Kim, Taeyoung Kim, Jongsik Lee Department of Computer and Information Engineering,

More information

Distributed Framework for Data Mining As a Service on Private Cloud

Distributed Framework for Data Mining As a Service on Private Cloud RESEARCH ARTICLE OPEN ACCESS Distributed Framework for Data Mining As a Service on Private Cloud Shraddha Masih *, Sanjay Tanwani** *Research Scholar & Associate Professor, School of Computer Science &

More information

CLOUD COMPUTING: A NEW VISION OF THE DISTRIBUTED SYSTEM

CLOUD COMPUTING: A NEW VISION OF THE DISTRIBUTED SYSTEM CLOUD COMPUTING: A NEW VISION OF THE DISTRIBUTED SYSTEM Taha Chaabouni 1 and Maher Khemakhem 2 1 MIRACL Lab, FSEG, University of Sfax, Sfax, Tunisia chaabounitaha@yahoo.fr 2 MIRACL Lab, FSEG, University

More information

Auto-Scaling Model for Cloud Computing System

Auto-Scaling Model for Cloud Computing System Auto-Scaling Model for Cloud Computing System Che-Lun Hung 1*, Yu-Chen Hu 2 and Kuan-Ching Li 3 1 Dept. of Computer Science & Communication Engineering, Providence University 2 Dept. of Computer Science

More information

MyCloudLab: An Interactive Web-based Management System for Cloud Computing Administration

MyCloudLab: An Interactive Web-based Management System for Cloud Computing Administration MyCloudLab: An Interactive Web-based Management System for Cloud Computing Administration Hoi-Wan Chan 1, Min Xu 2, Chung-Pan Tang 1, Patrick P. C. Lee 1 & Tsz-Yeung Wong 1, 1 Department of Computer Science

More information

CLEVER: a CLoud-Enabled Virtual EnviRonment

CLEVER: a CLoud-Enabled Virtual EnviRonment CLEVER: a CLoud-Enabled Virtual EnviRonment Francesco Tusa Maurizio Paone Massimo Villari Antonio Puliafito {ftusa,mpaone,mvillari,apuliafito}@unime.it Università degli Studi di Messina, Dipartimento di

More information

A Study on Architecture of Private Cloud Based on Virtual Technology

A Study on Architecture of Private Cloud Based on Virtual Technology A Study on Architecture of Private Cloud Based on Virtual Technology Zhao Huaming National Science Library, Chinese Academy of Sciences Beijing, China Abstract with the cloud service platform of National

More information

Research Article Cloud Computing for Protein-Ligand Binding Site Comparison

Research Article Cloud Computing for Protein-Ligand Binding Site Comparison BioMed Research International Volume 213, Article ID 17356, 7 pages http://dx.doi.org/1.1155/213/17356 Research Article Cloud Computing for Protein-Ligand Binding Site Comparison Che-Lun Hung 1 and Guan-Jie

More information

Reallocation and Allocation of Virtual Machines in Cloud Computing Manan D. Shah a, *, Harshad B. Prajapati b

Reallocation and Allocation of Virtual Machines in Cloud Computing Manan D. Shah a, *, Harshad B. Prajapati b Proceedings of International Conference on Emerging Research in Computing, Information, Communication and Applications (ERCICA-14) Reallocation and Allocation of Virtual Machines in Cloud Computing Manan

More information

Study on Architecture and Implementation of Port Logistics Information Service Platform Based on Cloud Computing 1

Study on Architecture and Implementation of Port Logistics Information Service Platform Based on Cloud Computing 1 , pp. 331-342 http://dx.doi.org/10.14257/ijfgcn.2015.8.2.27 Study on Architecture and Implementation of Port Logistics Information Service Platform Based on Cloud Computing 1 Changming Li, Jie Shen and

More information

Introduction to Cloud computing. Viet Tran

Introduction to Cloud computing. Viet Tran Introduction to Cloud computing Viet Tran Type of Cloud computing Infrastructure as a Service IaaS: offer full virtual machines via hardware virtualization tech. Amazon EC2, AbiCloud, ElasticHosts, Platform

More information

OpenNebula Open Souce Solution for DC Virtualization

OpenNebula Open Souce Solution for DC Virtualization 13 th LSM 2012 7 th -12 th July, Geneva OpenNebula Open Souce Solution for DC Virtualization Constantino Vázquez Blanco OpenNebula.org What is OpenNebula? Multi-tenancy, Elasticity and Automatic Provision

More information

Cloud Computing Simulation Using CloudSim

Cloud Computing Simulation Using CloudSim Cloud Computing Simulation Using CloudSim Ranjan Kumar #1, G.Sahoo *2 # Assistant Professor, Computer Science & Engineering, Ranchi University, India Professor & Head, Information Technology, Birla Institute

More information

Planning, Provisioning and Deploying Enterprise Clouds with Oracle Enterprise Manager 12c Kevin Patterson, Principal Sales Consultant, Enterprise

Planning, Provisioning and Deploying Enterprise Clouds with Oracle Enterprise Manager 12c Kevin Patterson, Principal Sales Consultant, Enterprise Planning, Provisioning and Deploying Enterprise Clouds with Oracle Enterprise Manager 12c Kevin Patterson, Principal Sales Consultant, Enterprise Manager Oracle NIST Definition of Cloud Computing Cloud

More information

Li Sheng. lsheng1@uci.edu. Nowadays, with the booming development of network-based computing, more and more

Li Sheng. lsheng1@uci.edu. Nowadays, with the booming development of network-based computing, more and more 36326584 Li Sheng Virtual Machine Technology for Cloud Computing Li Sheng lsheng1@uci.edu Abstract: Nowadays, with the booming development of network-based computing, more and more Internet service vendors

More information

CloudFTP: A free Storage Cloud

CloudFTP: A free Storage Cloud CloudFTP: A free Storage Cloud ABSTRACT: The cloud computing is growing rapidly for it offers on-demand computing power and capacity. The power of cloud enables dynamic scalability of applications facing

More information

Multi-level Metadata Management Scheme for Cloud Storage System

Multi-level Metadata Management Scheme for Cloud Storage System , pp.231-240 http://dx.doi.org/10.14257/ijmue.2014.9.1.22 Multi-level Metadata Management Scheme for Cloud Storage System Jin San Kong 1, Min Ja Kim 2, Wan Yeon Lee 3, Chuck Yoo 2 and Young Woong Ko 1

More information

NIH Commons Overview, Framework & Pilots - Version 1. The NIH Commons

NIH Commons Overview, Framework & Pilots - Version 1. The NIH Commons The NIH Commons Summary The Commons is a shared virtual space where scientists can work with the digital objects of biomedical research, i.e. it is a system that will allow investigators to find, manage,

More information

Analysis and Research of Cloud Computing System to Comparison of Several Cloud Computing Platforms

Analysis and Research of Cloud Computing System to Comparison of Several Cloud Computing Platforms Volume 1, Issue 1 ISSN: 2320-5288 International Journal of Engineering Technology & Management Research Journal homepage: www.ijetmr.org Analysis and Research of Cloud Computing System to Comparison of

More information

Analysis of ChIP-seq data in Galaxy

Analysis of ChIP-seq data in Galaxy Analysis of ChIP-seq data in Galaxy November, 2012 Local copy: https://galaxy.wi.mit.edu/ Joint project between BaRC and IT Main site: http://main.g2.bx.psu.edu/ 1 Font Conventions Bold and blue refers

More information

OpenNebula Open Souce Solution for DC Virtualization

OpenNebula Open Souce Solution for DC Virtualization OSDC 2012 25 th April, Nürnberg OpenNebula Open Souce Solution for DC Virtualization Constantino Vázquez Blanco OpenNebula.org What is OpenNebula? Multi-tenancy, Elasticity and Automatic Provision on Virtualized

More information

HADOOP IN THE LIFE SCIENCES:

HADOOP IN THE LIFE SCIENCES: White Paper HADOOP IN THE LIFE SCIENCES: An Introduction Abstract This introductory white paper reviews the Apache Hadoop TM technology, its components MapReduce and Hadoop Distributed File System (HDFS)

More information

How To Understand Cloud Computing

How To Understand Cloud Computing Cloud Computing: a Perspective Study Lizhe WANG, Gregor von LASZEWSKI, Younge ANDREW, Xi HE Service Oriented Cyberinfrastruture Lab, Rochester Inst. of Tech. Abstract The Cloud computing emerges as a new

More information

A Proposed Framework for Ranking and Reservation of Cloud Services Based on Quality of Service

A Proposed Framework for Ranking and Reservation of Cloud Services Based on Quality of Service II,III A Proposed Framework for Ranking and Reservation of Cloud Services Based on Quality of Service I Samir.m.zaid, II Hazem.m.elbakry, III Islam.m.abdelhady I Dept. of Geology, Faculty of Sciences,

More information

Cloud FTP: A Case Study of Migrating Traditional Applications to the Cloud

Cloud FTP: A Case Study of Migrating Traditional Applications to the Cloud Cloud FTP: A Case Study of Migrating Traditional Applications to the Cloud Pooja H 1, S G Maknur 2 1 M.Tech Student, Dept. of Computer Science and Engineering, STJIT, Ranebennur (India) 2 Head of Department,

More information

Cloud Computing Architecture with OpenNebula HPC Cloud Use Cases

Cloud Computing Architecture with OpenNebula HPC Cloud Use Cases NASA Ames NASA Advanced Supercomputing (NAS) Division California, May 24th, 2012 Cloud Computing Architecture with OpenNebula HPC Cloud Use Cases Ignacio M. Llorente Project Director OpenNebula Project.

More information

Research Article Hadoop-Based Distributed Sensor Node Management System

Research Article Hadoop-Based Distributed Sensor Node Management System Distributed Networks, Article ID 61868, 7 pages http://dx.doi.org/1.1155/214/61868 Research Article Hadoop-Based Distributed Node Management System In-Yong Jung, Ki-Hyun Kim, Byong-John Han, and Chang-Sung

More information

CHAPTER 8 CLOUD COMPUTING

CHAPTER 8 CLOUD COMPUTING CHAPTER 8 CLOUD COMPUTING SE 458 SERVICE ORIENTED ARCHITECTURE Assist. Prof. Dr. Volkan TUNALI Faculty of Engineering and Natural Sciences / Maltepe University Topics 2 Cloud Computing Essential Characteristics

More information

Cloud Storage Solution for WSN Based on Internet Innovation Union

Cloud Storage Solution for WSN Based on Internet Innovation Union Cloud Storage Solution for WSN Based on Internet Innovation Union Tongrang Fan 1, Xuan Zhang 1, Feng Gao 1 1 School of Information Science and Technology, Shijiazhuang Tiedao University, Shijiazhuang,

More information

Cloud Computing Architecture: A Survey

Cloud Computing Architecture: A Survey Cloud Computing Architecture: A Survey Abstract Now a day s Cloud computing is a complex and very rapidly evolving and emerging area that affects IT infrastructure, network services, data management and

More information

Hadoop s Rise in Life Sciences

Hadoop s Rise in Life Sciences Exploring EMC Isilon scale-out storage solutions Hadoop s Rise in Life Sciences By John Russell, Contributing Editor, Bio IT World Produced by Cambridge Healthtech Media Group By now the Big Data challenge

More information

Evaluation Methodology of Converged Cloud Environments

Evaluation Methodology of Converged Cloud Environments Krzysztof Zieliński Marcin Jarząb Sławomir Zieliński Karol Grzegorczyk Maciej Malawski Mariusz Zyśk Evaluation Methodology of Converged Cloud Environments Cloud Computing Cloud Computing enables convenient,

More information

Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications

Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications Ahmed Abdulhakim Al-Absi, Dae-Ki Kang and Myong-Jong Kim Abstract In Hadoop MapReduce distributed file system, as the input

More information

Scientific and Technical Applications as a Service in the Cloud

Scientific and Technical Applications as a Service in the Cloud Scientific and Technical Applications as a Service in the Cloud University of Bern, 28.11.2011 adapted version Wibke Sudholt CloudBroker GmbH Technoparkstrasse 1, CH-8005 Zurich, Switzerland Phone: +41

More information

Benchmark Report: Univa Grid Engine, Nextflow, and Docker for running Genomic Analysis Workflows

Benchmark Report: Univa Grid Engine, Nextflow, and Docker for running Genomic Analysis Workflows PRBB / Ferran Mateo Benchmark Report: Univa Grid Engine, Nextflow, and Docker for running Genomic Analysis Workflows Summary of testing by the Centre for Genomic Regulation (CRG) utilizing new virtualization

More information

THE EUCALYPTUS OPEN-SOURCE PRIVATE CLOUD

THE EUCALYPTUS OPEN-SOURCE PRIVATE CLOUD THE EUCALYPTUS OPEN-SOURCE PRIVATE CLOUD By Yohan Wadia ucalyptus is a Linux-based opensource software architecture that implements efficiencyenhancing private and hybrid clouds within an enterprise s

More information

Research on Digital Agricultural Information Resources Sharing Plan Based on Cloud Computing *

Research on Digital Agricultural Information Resources Sharing Plan Based on Cloud Computing * Research on Digital Agricultural Information Resources Sharing Plan Based on Cloud Computing * Guifen Chen 1,**, Xu Wang 2, Hang Chen 1, Chunan Li 1, Guangwei Zeng 1, Yan Wang 1, and Peixun Liu 1 1 College

More information

Scalable Services for Digital Preservation

Scalable Services for Digital Preservation Scalable Services for Digital Preservation A Perspective on Cloud Computing Rainer Schmidt, Christian Sadilek, and Ross King Digital Preservation (DP) Providing long-term access to growing collections

More information

OpenNebula An Innovative Open Source Toolkit for Building Cloud Solutions

OpenNebula An Innovative Open Source Toolkit for Building Cloud Solutions Cloud Computing and its Applications 20th October 2009 OpenNebula An Innovative Open Source Toolkit for Building Cloud Solutions Distributed Systems Architecture Research Group Universidad Complutense

More information

Improving Current Hadoop MapReduce Workflow and Performance

Improving Current Hadoop MapReduce Workflow and Performance Improving Current Hadoop MapReduce Workflow and Performance Hamoud Alshammari Department of Computer Science, CT, USA Jeongkyu Lee Department of Computer Science, CT, USA Hassan Bajwa Department of Electrical

More information

Cloud Models and Platforms

Cloud Models and Platforms Cloud Models and Platforms Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF A Working Definition of Cloud Computing Cloud computing is a model

More information

for my computation? Stefano Cozzini Which infrastructure Which infrastructure Democrito and SISSA/eLAB - Trieste

for my computation? Stefano Cozzini Which infrastructure Which infrastructure Democrito and SISSA/eLAB - Trieste Which infrastructure Which infrastructure for my computation? Stefano Cozzini Democrito and SISSA/eLAB - Trieste Agenda Introduction:! E-infrastructure and computing infrastructures! What is available

More information

Bioinformatics Grid - Enabled Tools For Biologists.

Bioinformatics Grid - Enabled Tools For Biologists. Bioinformatics Grid - Enabled Tools For Biologists. What is Grid-Enabled Tools (GET)? As number of data from the genomics and proteomics experiment increases. Problems arise for the current sequence analysis

More information

Cloud Computing Utility and Applications

Cloud Computing Utility and Applications Cloud Computing Utility and Applications Pradeep Kumar Tiwari 1, Rajesh Kumar Shrivastava 2, Satish Pandey 3, Pradeep Kumar Tripathi 4 Abstract Cloud Architecture provides services on demand basis via

More information

Managing and Conducting Biomedical Research on the Cloud Prasad Patil

Managing and Conducting Biomedical Research on the Cloud Prasad Patil Managing and Conducting Biomedical Research on the Cloud Prasad Patil Laboratory for Personalized Medicine Center for Biomedical Informatics Harvard Medical School SaaS & PaaS gmail google docs app engine

More information

CHALLENGES IN NEXT-GENERATION SEQUENCING

CHALLENGES IN NEXT-GENERATION SEQUENCING CHALLENGES IN NEXT-GENERATION SEQUENCING BASIC TENETS OF DATA AND HPC Gray s Laws of data engineering 1 : Scientific computing is very dataintensive, with no real limits. The solution is scale-out architecture

More information

Allocation of Datacenter Resources Based on Demands Using Virtualization Technology in Cloud

Allocation of Datacenter Resources Based on Demands Using Virtualization Technology in Cloud Allocation of Datacenter Resources Based on Demands Using Virtualization Technology in Cloud G.Rajesh L.Bobbian Naik K.Mounika Dr. K.Venkatesh Sharma Associate Professor, Abstract: Introduction: Cloud

More information

High Throughput Sequencing Data Analysis using Cloud Computing

High Throughput Sequencing Data Analysis using Cloud Computing High Throughput Sequencing Data Analysis using Cloud Computing Stéphane Le Crom (stephane.le_crom@upmc.fr) LBD - Université Pierre et Marie Curie (UPMC) Institut de Biologie de l École normale supérieure

More information

Description of Application

Description of Application Description of Application Operating Organization: Coeur d Alene Tribe, Plummer, Idaho Community of Interest: U.S. Indian tribes and their governments; rural governments OS and software requirements: Microsoft

More information

Manjrasoft Market Oriented Cloud Computing Platform

Manjrasoft Market Oriented Cloud Computing Platform Manjrasoft Market Oriented Cloud Computing Platform Innovative Solutions for 3D Rendering Aneka is a market oriented Cloud development and management platform with rapid application development and workload

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS Survey of Optimization of Scheduling in Cloud Computing Environment Er.Mandeep kaur 1, Er.Rajinder kaur 2, Er.Sughandha Sharma 3 Research Scholar 1 & 2 Department of Computer

More information

Energetic Resource Allocation Framework Using Virtualization in Cloud

Energetic Resource Allocation Framework Using Virtualization in Cloud Energetic Resource Allocation Framework Using Virtualization in Ms.K.Guna *1, Ms.P.Saranya M.E *2 1 (II M.E(CSE)) Student Department of Computer Science and Engineering, 2 Assistant Professor Department

More information

Alternative Deployment Models for Cloud Computing in HPC Applications. Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix

Alternative Deployment Models for Cloud Computing in HPC Applications. Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix Alternative Deployment Models for Cloud Computing in HPC Applications Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix The case for Cloud in HPC Build it in house Assemble in the cloud?

More information

Hadoop. Bioinformatics Big Data

Hadoop. Bioinformatics Big Data Hadoop Bioinformatics Big Data Paolo D Onorio De Meo Mattia D Antonio p.donoriodemeo@cineca.it m.dantonio@cineca.it Big Data Too much information! Big Data Explosive data growth proliferation of data capture

More information

Dutch HPC Cloud: flexible HPC for high productivity in science & business

Dutch HPC Cloud: flexible HPC for high productivity in science & business Dutch HPC Cloud: flexible HPC for high productivity in science & business Dr. Axel Berg SARA national HPC & e-science Support Center, Amsterdam, NL April 17, 2012 4 th PRACE Executive Industrial Seminar,

More information

OGF25/EGEE User Forum Catania, Italy 2 March 2009

OGF25/EGEE User Forum Catania, Italy 2 March 2009 OGF25/EGEE User Forum Catania, Italy 2 March 2009 Constantino Vázquez Blanco Javier Fontán Muiños Raúl Sampedro Distributed Systems Architecture Research Group Universidad Complutense de Madrid 1/31 Outline

More information

Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000

Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000 Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000 Alexandra Carpen-Amarie Diana Moise Bogdan Nicolae KerData Team, INRIA Outline

More information

On a Hadoop-based Analytics Service System

On a Hadoop-based Analytics Service System Int. J. Advance Soft Compu. Appl, Vol. 7, No. 1, March 2015 ISSN 2074-8523 On a Hadoop-based Analytics Service System Mikyoung Lee, Hanmin Jung, and Minhee Cho Korea Institute of Science and Technology

More information

Making a Smooth Transition to a Hybrid Cloud with Microsoft Cloud OS

Making a Smooth Transition to a Hybrid Cloud with Microsoft Cloud OS Making a Smooth Transition to a Hybrid Cloud with Microsoft Cloud OS Transitioning from today s highly virtualized data center environments to a true cloud environment requires solutions that let companies

More information

GATECloud.net: Cloud Infrastructure for Large-Scale, Open-Source Text Processing

GATECloud.net: Cloud Infrastructure for Large-Scale, Open-Source Text Processing : Cloud Infrastructure for Large-Scale, Open-Source Text Processing Valentin Tablan Ian Roberts Hamish Cunningham Kalina Bontcheva University of Sheffield 28 September 2011 Tablan, Roberts, Cunningham,

More information

Proposal and Design for DTV Broadcasting Service Applying Cloud

Proposal and Design for DTV Broadcasting Service Applying Cloud Proposal and Design for DTV Broadcasting Service Applying Cloud Computing Testbed 1 Jong Won Yang, 2 Sung Jun Kim, 3 Mi-Hye Kim 1, First Author KISTI, jwyang@kisti.re.kr 2, Corresponding KISTI, sjkim@kisti.re.kr

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

Cloud-pilot.doc 12-12-2010 SA1 Marcus Hardt, Marcin Plociennik, Ahmad Hammad, Bartek Palak E U F O R I A

Cloud-pilot.doc 12-12-2010 SA1 Marcus Hardt, Marcin Plociennik, Ahmad Hammad, Bartek Palak E U F O R I A Identifier: Date: Activity: Authors: Status: Link: Cloud-pilot.doc 12-12-2010 SA1 Marcus Hardt, Marcin Plociennik, Ahmad Hammad, Bartek Palak E U F O R I A J O I N T A C T I O N ( S A 1, J R A 3 ) F I

More information

Data Semantics Aware Cloud for High Performance Analytics

Data Semantics Aware Cloud for High Performance Analytics Data Semantics Aware Cloud for High Performance Analytics Microsoft Future Cloud Workshop 2011 June 2nd 2011, Prof. Jun Wang, Computer Architecture and Storage System Laboratory (CASS) Acknowledgement

More information

Data management challenges in todays Healthcare and Life Sciences ecosystems

Data management challenges in todays Healthcare and Life Sciences ecosystems Data management challenges in todays Healthcare and Life Sciences ecosystems Jose L. Alvarez Principal Engineer, WW Director Life Sciences jose.alvarez@seagate.com Evolution of Data Sets in Healthcare

More information

Design and Building of IaaS Clouds

Design and Building of IaaS Clouds 21th May 2010 CloudViews 2010 Porto, Portugal Next Generation Data Center Summit Design and Building of IaaS Clouds Distributed Systems Architecture Research Group Universidad Complutense de Madrid This

More information

A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing

A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing Liang-Teh Lee, Kang-Yuan Liu, Hui-Yang Huang and Chia-Ying Tseng Department of Computer Science and Engineering,

More information

An Experimental Study of Load Balancing of OpenNebula Open-Source Cloud Computing Platform

An Experimental Study of Load Balancing of OpenNebula Open-Source Cloud Computing Platform An Experimental Study of Load Balancing of OpenNebula Open-Source Cloud Computing Platform A B M Moniruzzaman 1, Kawser Wazed Nafi 2, Prof. Syed Akhter Hossain 1 and Prof. M. M. A. Hashem 1 Department

More information

Enabling Technologies for Cloud Computing

Enabling Technologies for Cloud Computing 3th June 2010 1 st European Summit on the Future Internet Luxembourg Next Generation Data Center Summit Enabling Technologies for Cloud Computing Distributed Systems Architecture Research Group Universidad

More information