One target application: BLAST using large databases over the grid
|
|
- Holly Freeman
- 8 years ago
- Views:
Transcription
1 DIET_BLAST: Architecture logicielle et petits problèmes de recherche Frédéric Desprez LIP ENS Lyon/INRIA Grenoble Rhône- Alpes EPI GRAAL/Avalon 14/06/11! 1! Agenda Introduction One target application: BLAST using large databases over the grid One problem: join scheduling and replication Thèse d Antoine Vernois (avec C. Blanchet et P. Vicat-Blanc Primet) Thèse de Gaël Le Mahec (avec V. Breton) One middleware : DIET/DAGDA Conclusion and future work Décrypthon 2!
2 Introduction Several applications ready (and not only number crunching ones!) Huge data sets start to be available around the world Data management is one of the most important issue of today s applications Replication has to be used to improve platform throughput Services for resource management and data management/replication available in most grid middleware but usually they work separately Our approach Put the data management into the resource management sphere Perform the request scheduling with the data replication based on the information provided by the application Application to a large scale bioinformatics application 3! One target application: BLAST over the grid Basic Local Alignment Search Tool (BLAST) To find homologies between nucleotids or amino acids sequences. Objectives To find clues about the function of a protein/gene comparing a newly discovered sequence to well-known ones. Biological databases BLAST uses biological databases containing large sets of annotated sequences as entry parameter. Usage Generally, biologists perform BLAST searches on large sets of sequences. A T C A A G T C A C C A - G T C 4!
3 One target application: BLAST over the grid Each sequence of the entry set can be treated independently A simple parallelization: to perform the searches for each sequence on different nodes But efficient A sequence weights at most some kilobytes and a search on it takes at most some minutes Requirements to gridify BLAST applications A way to submit and distribute the requests A way to replicate databases! Then a software to perform! the sequences sets division before the computation! the results merge on exit 5! One target application: BLAST over the grid Grid-BLAST execution The user submits a large set of sequences and a database or a database identifier Entry set is divided. The database is replicated on the platform using a data manager The resource broker returns a set of computing resources Each sequence is treated on a different server Results merged into a single result file 6!
4 Parallelization and distribution of bioinformatics requests over the Grid Context Large sets of requests are submitted by many users of the grid Large databases are used as parameters of the requests The computing nodes of the grid have different computing and storage capacities Several different BLAST applications depending upon users needs The jobs have to be scheduled on-line (the scheduling is made job by job) We want to optimize the resources usage: No useless replication Platform throughput optimization Where to replicate the databases? How to distribute the requests? 7! Related work: Job scheduling & data replication In «Decoupling Computation and Data Scheduling in Distributed Data- Intensive Applications», Foster et al. analyzed the performance of various combinations of job scheduling and data replication algorithms. They evaluated four job scheduling strategies: JobRandom: The job is sent to a random site. JobLeastLoaded: The job is sent to the node with the least number of waiting jobs. JobDataPresent: The job is sent on the least loaded site on which the data is present. JobLocal: The job is run locally. And three replication strategies: DataDoNothing: The data are not replicated. Data may be downloaded from a remote site in a cache using LRU replacement algorithm. DataRandom: The most «popular» data are replicated on a random site. DataLeastLoaded: The most «popular» data are replicated on the least loaded site in the neighborhood of the node. Decoupling Computation and Data Scheduling in Distributed Data-Intensive Applications, Ranganathan, K., Foster, I., HPDC '02, Washington, DC, USA, IEEE !
5 Related work: Job scheduling & data replication For the different combinations they measured: Best average response time when: The job is scheduled where the data is With data replication (even with random replication) Best average data transferred when: The job is scheduled where the data is With data replication Best average idle time: The job is scheduled where the data is With data replication! Scheduling should take into account the data distribution.! It is not always necessary to couple data movement & computation. 9! Related work: Integration of Scheduling & Replication In «Integration of Scheduling and Replication in Data Grids», Chakrabati et al. present the Integrated Replication and Scheduling Strategy (IRS) which couple scheduling and replication strategies. Improve the performance by working alternatively on the data mapping and the task mapping. Algorithm principle The job scheduling can use two approaches: One based on the data availability: The job is sent on the node which have the most of the needed data. One based on scheduling cost (time to transfer the data, job waiting time and job execution time) : The job is sent on the node which have the smallest cost. After each job submission, a data usage matrix is updated. The replication algorithm have two phases: Estimation of the maximum number of replications for a data. The data replication itself using the data usage matrix Integration of Scheduling and Replication in Data Grids, Chakrabarti A., Dheepak, R.A., Sengupta, S., HiPC 2004, Lecture Notes in Computer Science, !
6 Parallelization and distribution of bioinformatics requests over the Grid To obtain dynamic information about the nodes status is a difficult task involving complex grid services The information obtained can be several minutes old For long execution time jobs: not really a problem For jobs that take some minutes: the information are outdated In this context without more information, there are few things to do Using the classical Minimum Completion Time (MCT) scheduling strategy is a good way to schedule the jobs but With few more information, we can do better The analysis of the execution traces of bioinformatics clusters showed that the way the biologists are submitting jobs is homogeneous if the observed time interval is long enough. We will use this information to optimize the Grid resources usage. 11! Parallelization and distribution of bioinformatics requests over the Grid Problem modeling Scheduling and Replication Algorithm (SRA) Relies on the integer approximation of a linear program Gives the data distribution and the jobs scheduling Takes into account the specificities of the biologists submissions A. Vernois 12!
7 Scheduling and Replication Algorithm SRA gives good results when the requests frequencies are constant The data distribution and the scheduling are efficient It does not need dynamic information about the Grid A. Vernois 13! Scheduling and Replication Algorithm Simulation experiments Grid 5000 platform 2540 heterogeneous CPUs on 9 sites 5 databases of different sizes 5 algorithms of different complexities 1 request per 0,3 s. The data are randomly distributed before the start For small sets of requests SRA does not have enough time to replicate the data For large sets of requests The replications and jobs scheduling of SRA avoid the computing nodes to saturate A. Vernois 14!
8 Using MCT A. Vernois 15! Using SRA A. Vernois 16!
9 SRA with frequency variation detection On the grid, the number and variety of users can make the time needed to have constant frequencies very long. Some «events» can temporary modify the frequencies Data challenges Important conference submission deadline Holidays and week-ends By detecting such a frequency variation, we can correct the data distribution and jobs scheduling 17! SRA with frequency variation detection Algorithm principle SRA gives an initial data distribution The Resource Broker records the submissions and update the frequencies If a frequency varies beyond a! threshold New data distribution computation using SRA Start of the transfers asynchronously if possible. Otherwise, the job scheduling will cause the data transfers Task scheduling using the job distribution given by SRA 18!
10 SRA with frequency variation detection The frequencies vary - We do not proceed to data redistribution The scheduling is no more efficient for large sets of jobs. The average execution time of the jobs increases with the number of jobs submitted. 19! SRA with frequency variation detection The frequencies vary - We detect the variations and proceed to the data redistribution The scheduling efficiency is saved. The average execution time of the jobs does not increase too much. 20!
11 Implementation within a middleware - DIET GridRPC API Network Enabled Servers paradigm Some features Distributed (hierarchical) scheduling Plugin schedulers Data management Workflow management Sequential and parallel task (through batch schedulers) Clusters, Grids, Clouds Open source 21! Data/replica management Two needs Keep the data in place to reduce the overhead of communications between clients and servers Replicate data whenever possible Three approaches for DIET DTM (LIFC, Besançon) Hierarchy similar to the DIET s one Distributed data manager Redistribution between servers JuxMem (Paris, Rennes) P2P data cache DAGDA (IN2P3, Clermont-Ferrand and now Univ. of Picardie) Replication Joining task scheduling and data management Work done within the GridRPC Working Group (OGF) Relations with workflow management 22!
12 DAGDA Data Arrangement for Grid and Distributed Applications A data manager for the DIET middleware providing Explicit data replication: using the API Implicit data replication: data items are replicated on the selected servers Direct data get/put through the API Automatic data management: using a selected data replacement algorithm when necessary LRU: The Least Recently Used data is deleted LFU: The Least Frequently Used data is deleted FIFO: The «oldest» data is deleted Transfer optimization by selecting the best source Using statistics on previous transfers Storage resources usage management The space reserved for the data can be configured by the user Data status backup/restoration Allowing to stop and restart DIET, saving the data status on each node 23! DAGDA Transfer model Uses the pull model. The data are sent independently of the service call. The data can be sent in several parts. 1: The client send a request for a service 2: DIET selects some SeDs according using a scheduling heuristic 3: The client sends its request to the SeD 4: The SeD downloads the data from the client and/or from other DIET servers 5: The SeD performs the call. 6: The persistent data are updated 24!
13 DAGDA DAGDA architecture Each data is associated to one unique identifier DAGDA control the disk and memory space limits. If necessary, it uses a data replacement algorithm The CORBA interface is used to communicate between the DAGDA nodes Users can access data and perform replications using the API 25! Putting everything together 26!
14 Putting everything together Experiments over Grid 5000 using DIET and DAGDA 300 nodes (4 sites), sequences, 4 BLAST algorithms, 2 databases 2 different sequences splits Small grain: 1 sequence after an other Coarse grain: splitting as a function of the number of servers available 4 scheduling algorithms random, MCT, round-robin, dynamic SRA 27! Putting everything together Maximum request partition Partition in n sub-requests. (n the number of available nodes) For MCT, Random & Round-Robin: The better requests partitioning is using the number of available nodes (coarse grain). For Dynamic-SRA: One sequence per request (small grain) 28!
15 Putting everything together Comparison of MCT and dynamic SRA over 1000 nodes (8 sites) 4 databases between 175 Mo and 6.5 Go 4 algorithms sequences Frequencies variation 29! Putting everything together Using MCT: Using Dynamic-SRA: Requests 1 to 20000: BLASTP Vs DB 1 30% BLASTN Vs DB 2 30% BLASTX Vs DB 3 20% TBLASTX Vs DB 4 20% Requests to 40000: BLASTP Vs DB 1 10% BLASTP Vs DB 3 30% BLASTN Vs DB 2 10% BLASTN Vs DB 4 10% BLASTX Vs DB 1 10% BLASTX Vs DB 3 10% TBLASTX Vs DB 2 20% 30!
16 Conclusion and future work about DIET_BLAST Data management is one of the most important issue of large scale applications over the grid It has to be linked with request scheduling to get the best performance Static algorithms perform well even on a dynamic platform Future work Provide a set of replication algorithms tuned for specific application classes within DAGDA Increase the exchange between resource and data management systems Use other performance metrics (fairness) Manage replication for workflows scheduling 31! Une collaboration autour de trois partenaires fondateurs AFM - coordination de l appel à projets auprès de la communauté scientifique - financement des projets de recherche stratégie scientifique de l'afm : guérir les maladies neuromusculaires et les maladies rares pour la plupart d'origine génétique IBM - expertise du Grid Computing + Sciences de la Vie - dotation de 6 universités de supercalculateurs (programme Shared University Research) - accès WCG CNRS - pilotage scientifique du programme - expertise scientifique et technologique du portage des applications - Gestion des travaux et des résultats du WCG (HCMD1 et 2 d A. Carbone) 32!
17 Avec la participation d'autres partenaires Installation de supercalculateurs (à base de power G5) puissance de 500 Gflops / 473 Gflops déjà présents dans les universités Réseau National de Télécommunications pour l Enseignement et la Recherche (RENATER) connecte l ensemble des ressources ENS Lyon équipe GRAAL/Avalon Pilotage des ressources de la grille en optimisant : planification et exécution - utilisation du logiciel DIET développé par l ENS (succède à United Devices) - suivi des programmes informatiques de chaque projet 33! Deployment example with Universities Sed = Server Daemon, installed on any server running Loadleveler. Note that we can define rescue SeD. MA = master agent, coordinates Jobs. We can define rescue or multiple Master Agent. WN = worker node DIET Décrypthon Web Interface Orsay Decrypthon2 Orsay Decrypthon1 Master Agent LILLE Project Users SeD LoadLeveler SeD LoadLeveler BORDEAUX JUSSIEU ORSAY SeD LoadLeveler SeD LoadLeveler IBM WII CRIHAN DB2 Lyon Data manager! Interface! BD AFM Cliniques 34!
18 Philosophy of the Décrypthon grid Transparency An interface as close as the one users are used to, to submit, to monitor and to download the results of jobs. Reactivity new algorithms are ported as quickly as possible on the grid to be close to the practical use case. 35! Data management Décrypthon Grid - Grid Resources Dedicated to Neuromuscular Disorders, Bard, N., Bolze, R., Caron, E., Desprez, F., Heymann, M., Friedrich, A., Moulinier, L., Nguyen, N.-H., Poch, O. and Toursel, T., 8th HealthGrid conference, Paris, France, June, Credits: H. N Guyen, O. Poch, IGBMC 36!
19 Data management, cont 37! SM2PH a pilot project and a success story Infrastructure: - BIRD database (sequences, genomical data, alignments, 3D structures or models, pathology descriptions, analysis results ) - Genotype/Phenotype description (UMD, LOVD) - WEB server, interactive analysis interface Software and algorithms: scoring function for prediction of mutation consequences in structural and variability contexts, correlation between mutation and phenotype severity, Credits: H. N Guyen, O. Poch, IGBMC 38!
20 SM2PH-db! SM2PH-db ( " entry: proteins involved in human monogenic diseases " for each entry: wide range of information involved in the genotype phenotype relationship " Evolutionary view : Multiple Alignment of Complete Sequences Eukaryote Filter Structural sampling " Structural view : 3D models: 1596 wild type proteins mutant proteins " Informational view : structural and functional annotations (UniProt, Pfam, Prosite, Interpro, GO) mutation and phenotypic data ( missense mutations) Automatic update every 2 months, current version : 9 39! Complex interconnected programs On a single data, up to 25 programs are applied, in cascade or in parallel. Gain de 15 à 1 jours sur la plate-forme du Décrypthon. Credits: H. N Guyen, O. Poch, IGBMC 40!
21 What s next? New platform based on IBM BlueGene/P Mixed Grid/Cloud approach using SysFera-DS LoadLeveler on existing platforms IBM Cloud software on new platforms Extension to public Clouds Seamless access to resources Tight integration of recent developments from IGBMC around data management Work around the user interfaces 41! SysFera SysFera-DS : une pile logicielle complète pour le HPC... et un accès simple et transparent aux infrastructures de Cloud Inside the Cloud + DIET platform is virtualized inside the cloud. (as Xen image for example) + Very flexible and scalable as DIET nodes can be launched + Dynamic adaptation % charge Cloud manager + EC2 interface + EC2 is treated as a new Batch System + Automatic deployment of VMs with associated services 42!
22 Des questions? 43!
Grid Computing Perspectives for IBM
Grid Computing Perspectives for IBM Atelier Internet et Grilles de Calcul en Afrique Jean-Pierre Prost IBM France jpprost@fr.ibm.com Agenda Grid Computing Initiatives within IBM World Community Grid Decrypthon
More informationSeed4C: A High-security project for Cloud Infrastructure
Seed4C: A High-security project for Cloud Infrastructure J. Rouzaud-Cornabas (LIP/CC-IN2P3 CNRS) & E. Caron (LIP ENS-Lyon) November 30, 2012 J. Rouzaud-Cornabas (LIP/CC-IN2P3 CNRS) & E. Seed4C: Caron (LIP
More informationA Peer-to-peer Extension of Network-Enabled Server Systems
A Peer-to-peer Extension of Network-Enabled Server Systems Eddy Caron 1, Frédéric Desprez 1, Cédric Tedeschi 1 Franck Petit 2 1 - GRAAL Project / LIP laboratory 2 - LaRIA laboratory E-Science 2005 - December
More informationDIET A Scalable Platform for Clusters, Grids and Clouds
DIET A Scalable Platform for Clusters, Grids and Clouds Eddy Caron, Frédéric Desprez INRIA LIP ENS Lyon Avalon Research Team Benjamin Depardon SysFera Joint work with A. Muresan, J. Rouzaud-Cornabas (LIP
More informationBSC vision on Big Data and extreme scale computing
BSC vision on Big Data and extreme scale computing Jesus Labarta, Eduard Ayguade,, Fabrizio Gagliardi, Rosa M. Badia, Toni Cortes, Jordi Torres, Adrian Cristal, Osman Unsal, David Carrera, Yolanda Becerra,
More informationRETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison
RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the
More informationChapter 4 Cloud Computing Applications and Paradigms. Cloud Computing: Theory and Practice. 1
Chapter 4 Cloud Computing Applications and Paradigms Chapter 4 1 Contents Challenges for cloud computing. Existing cloud applications and new opportunities. Architectural styles for cloud applications.
More informationBioinformatics Grid - Enabled Tools For Biologists.
Bioinformatics Grid - Enabled Tools For Biologists. What is Grid-Enabled Tools (GET)? As number of data from the genomics and proteomics experiment increases. Problems arise for the current sequence analysis
More informationDeployment of BioXSDenabled services on a Cloud. christophe.blanchet@ibcp.fr
Deployment of BioXSDenabled services on a Cloud Outline IBCP, provider of BioXSD-enabled services Cloud Computing RENABI GRISBI, French infrastructure Bioinformatics Integrated s gbio-pbil.ibcp.fr/ws GBIO
More informationManjrasoft Market Oriented Cloud Computing Platform
Manjrasoft Market Oriented Cloud Computing Platform Innovative Solutions for 3D Rendering Aneka is a market oriented Cloud development and management platform with rapid application development and workload
More informationSGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, 2012. Abstract. Haruna Cofer*, PhD
White Paper SGI High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems Haruna Cofer*, PhD January, 2012 Abstract The SGI High Throughput Computing (HTC) Wrapper
More informationBig Data and Cloud Computing for GHRSST
Big Data and Cloud Computing for GHRSST Jean-Francois Piollé (jfpiolle@ifremer.fr) Frédéric Paul, Olivier Archer CERSAT / Institut Français de Recherche pour l Exploitation de la Mer Facing data deluge
More informationProvisioning and Resource Management at Large Scale (Kadeploy and OAR)
Provisioning and Resource Management at Large Scale (Kadeploy and OAR) Olivier Richard Laboratoire d Informatique de Grenoble (LIG) Projet INRIA Mescal 31 octobre 2007 Olivier Richard ( Laboratoire d Informatique
More informationEfficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing
Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing James D. Jackson Philip J. Hatcher Department of Computer Science Kingsbury Hall University of New Hampshire Durham,
More informationScheduling Policies for Enterprise Wide Peer-to-Peer Computing
Scheduling Policies for Enterprise Wide Peer-to-Peer Computing Adnan Fida Department of Computer Sciences University of Saskatchewan 57 Campus Drive Saskatoon, SK, Canada + 306 966-4886 adf428@mail.usask.ca
More informationThe Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets
The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets!! Large data collections appear in many scientific domains like climate studies.!! Users and
More informationIaaS Cloud Architectures: Virtualized Data Centers to Federated Cloud Infrastructures
IaaS Cloud Architectures: Virtualized Data Centers to Federated Cloud Infrastructures Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF Introduction
More informationCentralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures
Chapter 18: Database System Architectures Centralized Systems! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types! Run on a single computer system and do
More informationScientific and Technical Applications as a Service in the Cloud
Scientific and Technical Applications as a Service in the Cloud University of Bern, 28.11.2011 adapted version Wibke Sudholt CloudBroker GmbH Technoparkstrasse 1, CH-8005 Zurich, Switzerland Phone: +41
More informationAmazon EC2 Product Details Page 1 of 5
Amazon EC2 Product Details Page 1 of 5 Amazon EC2 Functionality Amazon EC2 presents a true virtual computing environment, allowing you to use web service interfaces to launch instances with a variety of
More informationEnabling Large-Scale Testing of IaaS Cloud Platforms on the Grid 5000 Testbed
Enabling Large-Scale Testing of IaaS Cloud Platforms on the Grid 5000 Testbed Sébastien Badia, Alexandra Carpen-Amarie, Adrien Lèbre, Lucas Nussbaum Grid 5000 S. Badia, A. Carpen-Amarie, A. Lèbre, L. Nussbaum
More informationRunning a Workflow on a PowerCenter Grid
Running a Workflow on a PowerCenter Grid 2010-2014 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise)
More informationAlternative Deployment Models for Cloud Computing in HPC Applications. Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix
Alternative Deployment Models for Cloud Computing in HPC Applications Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix The case for Cloud in HPC Build it in house Assemble in the cloud?
More informationSimilarity Searches on Sequence Databases: BLAST, FASTA. Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003
Similarity Searches on Sequence Databases: BLAST, FASTA Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003 Outline Importance of Similarity Heuristic Sequence Alignment:
More informationCloud Ready for Bioinformatics?
IDB acknowledges co-funding by the European Community's Seventh Framework Programme (INFSO-RI-261552) and the French National Research Agency's Arpege Programme (ANR-10-SEGI-001) Cloud Ready for Bioinformatics?
More informationA Survey Study on Monitoring Service for Grid
A Survey Study on Monitoring Service for Grid Erkang You erkyou@indiana.edu ABSTRACT Grid is a distributed system that integrates heterogeneous systems into a single transparent computer, aiming to provide
More informationCS 6343: CLOUD COMPUTING Term Project
CS 6343: CLOUD COMPUTING Term Project Group A1 Project: IaaS cloud middleware Create a cloud environment with a number of servers, allowing users to submit their jobs, scale their jobs Make simple resource
More informationManaging Data Persistence in Network Enabled Servers
INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE Managing Data Persistence in Network Enabled Servers Eddy Caron Bruno DelFabbro Frédéric Desprez Emmanuel Jeannot Jean-Marc Nicod N 5725
More informationGrid Computing vs Cloud
Chapter 3 Grid Computing vs Cloud Computing 3.1 Grid Computing Grid computing [8, 23, 25] is based on the philosophy of sharing information and power, which gives us access to another type of heterogeneous
More informationHow To Manage Cloud Service Provisioning And Maintenance
Managing Cloud Service Provisioning and SLA Enforcement via Holistic Monitoring Techniques Vincent C. Emeakaroha Matrikelnr: 0027525 vincent@infosys.tuwien.ac.at Supervisor: Univ.-Prof. Dr. Schahram Dustdar
More informationPairwise Sequence Alignment
Pairwise Sequence Alignment carolin.kosiol@vetmeduni.ac.at SS 2013 Outline Pairwise sequence alignment global - Needleman Wunsch Gotoh algorithm local - Smith Waterman algorithm BLAST - heuristics What
More informationLoad Balancing in the Cloud Computing Using Virtual Machine Migration: A Review
Load Balancing in the Cloud Computing Using Virtual Machine Migration: A Review 1 Rukman Palta, 2 Rubal Jeet 1,2 Indo Global College Of Engineering, Abhipur, Punjab Technical University, jalandhar,india
More informationManaging Data Persistence in Network Enabled Servers 1
Laboratoire de l Informatique du Parallélisme École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON-UCBL n o 5668 Managing Data Persistence in Network Enabled Servers 1 Eddy Caron,
More informationManjrasoft Market Oriented Cloud Computing Platform
Manjrasoft Market Oriented Cloud Computing Platform Aneka Aneka is a market oriented Cloud development and management platform with rapid application development and workload distribution capabilities.
More informationGrid Scheduling Architectures with Globus GridWay and Sun Grid Engine
Grid Scheduling Architectures with and Sun Grid Engine Sun Grid Engine Workshop 2007 Regensburg, Germany September 11, 2007 Ignacio Martin Llorente Javier Fontán Muiños Distributed Systems Architecture
More informationTask Scheduling in Hadoop
Task Scheduling in Hadoop Sagar Mamdapure Munira Ginwala Neha Papat SAE,Kondhwa SAE,Kondhwa SAE,Kondhwa Abstract Hadoop is widely used for storing large datasets and processing them efficiently under distributed
More informationA Framework for Performance Analysis and Tuning in Hadoop Based Clusters
A Framework for Performance Analysis and Tuning in Hadoop Based Clusters Garvit Bansal Anshul Gupta Utkarsh Pyne LNMIIT, Jaipur, India Email: [garvit.bansal anshul.gupta utkarsh.pyne] @lnmiit.ac.in Manish
More informationCloud/SaaS enablement of existing applications
Cloud/SaaS enablement of existing applications GigaSpaces: Nati Shalom, CTO & Founder About GigaSpaces Technologies Enabling applications to run a distributed cluster as if it was a single machine 75+
More informationMitglied der Helmholtz-Gemeinschaft. System monitoring with LLview and the Parallel Tools Platform
Mitglied der Helmholtz-Gemeinschaft System monitoring with LLview and the Parallel Tools Platform November 25, 2014 Carsten Karbach Content 1 LLview 2 Parallel Tools Platform (PTP) 3 Latest features 4
More informationExperimenting OAR in a virtual cluster environment for batch schedulers comparative evaluation
Experimenting OAR in a virtual cluster environment for batch schedulers comparative evaluation Joseph Emeras (Joseph.Emeras@imag.fr) Yiannis Georgiou (Yiannis.Georgiou@imag.fr) 1 Context OAR [3] is the
More informationScalable stochastic tracing of distributed data management events
Scalable stochastic tracing of distributed data management events Mario Lassnig mario.lassnig@cern.ch ATLAS Data Processing CERN Physics Department Distributed and Parallel Systems University of Innsbruck
More informationMining Association Rules on Grid Platforms
UNIVERSITY OF TUNIS EL MANAR FACULTY OF SCIENCES OF TUNISIA Mining Association Rules on Grid Platforms Raja Tlili raja_tlili@yahoo.fr Yahya Slimani yahya.slimani@fst.rnu.tn CoreGrid 11 Plan Introduction
More informationBrief description of the paper/report. Identification
Brief description of the paper/report Argument Original reference A holonic framework for coordination and optimization in oil and gas production complexes E. Chacon, I. Besembel, Univ. Los Andes, Mérida,
More informationChapter 18: Database System Architectures. Centralized Systems
Chapter 18: Database System Architectures! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types 18.1 Centralized Systems! Run on a single computer system and
More informationSeed4C: A Cloud Security Infrastructure validated on Grid 5000
Seed4C: A Cloud Security Infrastructure validated on Grid 5000 E. Caron 1, A. Lefray 1, B. Marquet 2, and J. Rouzaud-Cornabas 1 1 Université de Lyon. LIP Laboratory. UMR CNRS - ENS Lyon - INRIA - UCBL
More informationArchitecting ColdFusion For Scalability And High Availability. Ryan Stewart Platform Evangelist
Architecting ColdFusion For Scalability And High Availability Ryan Stewart Platform Evangelist Introduction Architecture & Clustering Options Design an architecture and develop applications that scale
More informationA Distributed Render Farm System for Animation Production
A Distributed Render Farm System for Animation Production Jiali Yao, Zhigeng Pan *, Hongxin Zhang State Key Lab of CAD&CG, Zhejiang University, Hangzhou, 310058, China {yaojiali, zgpan, zhx}@cad.zju.edu.cn
More informationScaling Objectivity Database Performance with Panasas Scale-Out NAS Storage
White Paper Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage A Benchmark Report August 211 Background Objectivity/DB uses a powerful distributed processing architecture to manage
More informationCode and Process Migration! Motivation!
Code and Process Migration! Motivation How does migration occur? Resource migration Agent-based system Details of process migration Lecture 6, page 1 Motivation! Key reasons: performance and flexibility
More informationFocusing on results not data comprehensive data analysis for targeted next generation sequencing
Focusing on results not data comprehensive data analysis for targeted next generation sequencing Daniel Swan, Jolyon Holdstock, Angela Matchan, Richard Stark, John Shovelton, Duarte Mohla and Simon Hughes
More informationDeliverable D2.2. Resource Management Systems for Distributed High Performance Computing
Resource Management Systems for Distributed High Performance Computing VERSION Version 1.1 DATE February 4, 2011 EDITORIAL MANAGER Yann Radenac (Yann.Radenac@inria.fr) AUTHORS STAFF Yann Radenac IRISA/INRIA,
More informationChapter 7: Distributed Systems: Warehouse-Scale Computing. Fall 2011 Jussi Kangasharju
Chapter 7: Distributed Systems: Warehouse-Scale Computing Fall 2011 Jussi Kangasharju Chapter Outline Warehouse-scale computing overview Workloads and software infrastructure Failures and repairs Note:
More informationWrite a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical
Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or
More informationVolunteer Computing, Grid Computing and Cloud Computing: Opportunities for Synergy. Derrick Kondo INRIA, France
Volunteer Computing, Grid Computing and Cloud Computing: Opportunities for Synergy Derrick Kondo INRIA, France Outline Cloud Grid Volunteer Computing Cloud Background Vision Hide complexity of hardware
More informationA Comparative Performance Analysis of Load Balancing Algorithms in Distributed System using Qualitative Parameters
A Comparative Performance Analysis of Load Balancing Algorithms in Distributed System using Qualitative Parameters Abhijit A. Rajguru, S.S. Apte Abstract - A distributed system can be viewed as a collection
More informationData Grids. Lidan Wang April 5, 2007
Data Grids Lidan Wang April 5, 2007 Outline Data-intensive applications Challenges in data access, integration and management in Grid setting Grid services for these data-intensive application Architectural
More informationSystem Models for Distributed and Cloud Computing
System Models for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF Classification of Distributed Computing Systems
More informationVirtual Machine Based Resource Allocation For Cloud Computing Environment
Virtual Machine Based Resource Allocation For Cloud Computing Environment D.Udaya Sree M.Tech (CSE) Department Of CSE SVCET,Chittoor. Andra Pradesh, India Dr.J.Janet Head of Department Department of CSE
More informationUN EXEMPLE D INFÉRENCES BAYÉSIENNES
Journées SUCCES France Grilles Institut de Physique du Globe de Paris 5 et 6 novembre 2015 UN EXEMPLE D INFÉRENCES BAYÉSIENNES EN GÉNÉTIQUE DES POPULATIONS, LE CAS DU CRAPAUD CALAMITE (EPIDALEA CALAMITA)
More informationBig Data Management in the Clouds and HPC Systems
Big Data Management in the Clouds and HPC Systems Hemera Final Evaluation Paris 17 th December 2014 Shadi Ibrahim Shadi.ibrahim@inria.fr Era of Big Data! Source: CNRS Magazine 2013 2 Era of Big Data! Source:
More informationDDS-Enabled Cloud Management Support for Fast Task Offloading
DDS-Enabled Cloud Management Support for Fast Task Offloading IEEE ISCC 2012, Cappadocia Turkey Antonio Corradi 1 Luca Foschini 1 Javier Povedano-Molina 2 Juan M. Lopez-Soler 2 1 Dipartimento di Elettronica,
More informationClient/Server Computing Distributed Processing, Client/Server, and Clusters
Client/Server Computing Distributed Processing, Client/Server, and Clusters Chapter 13 Client machines are generally single-user PCs or workstations that provide a highly userfriendly interface to the
More informationThis presentation covers virtual application shared services supplied with IBM Workload Deployer version 3.1.
This presentation covers virtual application shared services supplied with IBM Workload Deployer version 3.1. WD31_VirtualApplicationSharedServices.ppt Page 1 of 29 This presentation covers the shared
More informationTier Architectures. Kathleen Durant CS 3200
Tier Architectures Kathleen Durant CS 3200 1 Supporting Architectures for DBMS Over the years there have been many different hardware configurations to support database systems Some are outdated others
More informationFinal Project Proposal. CSCI.6500 Distributed Computing over the Internet
Final Project Proposal CSCI.6500 Distributed Computing over the Internet Qingling Wang 660795696 1. Purpose Implement an application layer on Hybrid Grid Cloud Infrastructure to automatically or at least
More informationThe EMSX Platform. A Modular, Scalable, Efficient, Adaptable Platform to Manage Multi-technology Networks. A White Paper.
The EMSX Platform A Modular, Scalable, Efficient, Adaptable Platform to Manage Multi-technology Networks A White Paper November 2002 Abstract: The EMSX Platform is a set of components that together provide
More informationRessources management and runtime environments in the exascale computing era
Ressources management and runtime environments in the exascale computing era Guillaume Huard MOAIS and MESCAL INRIA Projects CNRS LIG Laboratory Grenoble University, France Guillaume Huard MOAIS and MESCAL
More informationThe Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland
The Lattice Project: A Multi-Model Grid Computing System Center for Bioinformatics and Computational Biology University of Maryland Parallel Computing PARALLEL COMPUTING a form of computation in which
More informationGroup Based Load Balancing Algorithm in Cloud Computing Virtualization
Group Based Load Balancing Algorithm in Cloud Computing Virtualization Rishi Bhardwaj, 2 Sangeeta Mittal, Student, 2 Assistant Professor, Department of Computer Science, Jaypee Institute of Information
More informationDynamic resource management for energy saving in the cloud computing environment
Dynamic resource management for energy saving in the cloud computing environment Liang-Teh Lee, Kang-Yuan Liu, and Hui-Yang Huang Department of Computer Science and Engineering, Tatung University, Taiwan
More informationEnergy Constrained Resource Scheduling for Cloud Environment
Energy Constrained Resource Scheduling for Cloud Environment 1 R.Selvi, 2 S.Russia, 3 V.K.Anitha 1 2 nd Year M.E.(Software Engineering), 2 Assistant Professor Department of IT KSR Institute for Engineering
More informationProject Convergence: Integrating Data Grids and Compute Grids. Eugene Steinberg, CTO Grid Dynamics May, 2008
Project Convergence: Integrating Data Grids and Compute Grids Eugene Steinberg, CTO May, 2008 Data-Driven Scalability Challenges in HPC Data is far away Latency of remote connection Latency of data movement
More informationDynamic Load Balancing in a Network of Workstations
Dynamic Load Balancing in a Network of Workstations 95.515F Research Report By: Shahzad Malik (219762) November 29, 2000 Table of Contents 1 Introduction 3 2 Load Balancing 4 2.1 Static Load Balancing
More informationMapCenter: An Open Grid Status Visualization Tool
MapCenter: An Open Grid Status Visualization Tool Franck Bonnassieux Robert Harakaly Pascale Primet UREC CNRS UREC CNRS RESO INRIA ENS Lyon, France ENS Lyon, France ENS Lyon, France franck.bonnassieux@ens-lyon.fr
More informationClouds vs Grids KHALID ELGAZZAR GOODWIN 531 ELGAZZAR@CS.QUEENSU.CA
Clouds vs Grids KHALID ELGAZZAR GOODWIN 531 ELGAZZAR@CS.QUEENSU.CA [REF] I Foster, Y Zhao, I Raicu, S Lu, Cloud computing and grid computing 360-degree compared Grid Computing Environments Workshop, 2008.
More informationCORBA and object oriented middleware. Introduction
CORBA and object oriented middleware Introduction General info Web page http://www.dis.uniroma1.it/~beraldi/elective Exam Project (application), plus oral discussion 3 credits Roadmap Distributed applications
More informationHigh Performance Computing OpenStack Options. September 22, 2015
High Performance Computing OpenStack PRESENTATION TITLE GOES HERE Options September 22, 2015 Today s Presenters Glyn Bowden, SNIA Cloud Storage Initiative Board HP Helion Professional Services Alex McDonald,
More informationIn-Memory BigData. Summer 2012, Technology Overview
In-Memory BigData Summer 2012, Technology Overview Company Vision In-Memory Data Processing Leader: > 5 years in production > 100s of customers > Starts every 10 secs worldwide > Over 10,000,000 starts
More informationHDMQ :Towards In-Order and Exactly-Once Delivery using Hierarchical Distributed Message Queues. Dharmit Patel Faraj Khasib Shiva Srivastava
HDMQ :Towards In-Order and Exactly-Once Delivery using Hierarchical Distributed Message Queues Dharmit Patel Faraj Khasib Shiva Srivastava Outline What is Distributed Queue Service? Major Queue Service
More informationFP-Hadoop: Efficient Execution of Parallel Jobs Over Skewed Data
FP-Hadoop: Efficient Execution of Parallel Jobs Over Skewed Data Miguel Liroz-Gistau, Reza Akbarinia, Patrick Valduriez To cite this version: Miguel Liroz-Gistau, Reza Akbarinia, Patrick Valduriez. FP-Hadoop:
More informationHigh Performance Computing. Course Notes 2007-2008. HPC Fundamentals
High Performance Computing Course Notes 2007-2008 2008 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs
More informationSCALABILITY AND AVAILABILITY
SCALABILITY AND AVAILABILITY Real Systems must be Scalable fast enough to handle the expected load and grow easily when the load grows Available available enough of the time Scalable Scale-up increase
More informationReverse Auction-based Resource Allocation Policy for Service Broker in Hybrid Cloud Environment
Reverse Auction-based Resource Allocation Policy for Service Broker in Hybrid Cloud Environment Sunghwan Moon, Jaekwon Kim, Taeyoung Kim, Jongsik Lee Department of Computer and Information Engineering,
More informationApplication of Predictive Analytics for Better Alignment of Business and IT
Application of Predictive Analytics for Better Alignment of Business and IT Boris Zibitsker, PhD bzibitsker@beznext.com July 25, 2014 Big Data Summit - Riga, Latvia About the Presenter Boris Zibitsker
More informationHadoop Cluster Applications
Hadoop Overview Data analytics has become a key element of the business decision process over the last decade. Classic reporting on a dataset stored in a database was sufficient until recently, but yesterday
More informationPerformance rule violations usually result in increased CPU or I/O, time to fix the mistake, and ultimately, a cost to the business unit.
Is your database application experiencing poor response time, scalability problems, and too many deadlocks or poor application performance? One or a combination of zparms, database design and application
More informationPayment minimization and Error-tolerant Resource Allocation for Cloud System Using equally spread current execution load
Payment minimization and Error-tolerant Resource Allocation for Cloud System Using equally spread current execution load Pooja.B. Jewargi Prof. Jyoti.Patil Department of computer science and engineering,
More informationDutch HPC Cloud: flexible HPC for high productivity in science & business
Dutch HPC Cloud: flexible HPC for high productivity in science & business Dr. Axel Berg SARA national HPC & e-science Support Center, Amsterdam, NL April 17, 2012 4 th PRACE Executive Industrial Seminar,
More informationPetascale Software Challenges. Piyush Chaudhary piyushc@us.ibm.com High Performance Computing
Petascale Software Challenges Piyush Chaudhary piyushc@us.ibm.com High Performance Computing Fundamental Observations Applications are struggling to realize growth in sustained performance at scale Reasons
More informationA science-gateway workload archive application to the self-healing of workflow incidents
A science-gateway workload archive application to the self-healing of workflow incidents Rafael FERREIRA DA SILVA, Tristan GLATARD University of Lyon, CNRS, INSERM, CREATIS Villeurbanne, France Frédéric
More informationLoad balancing in SOAJA (Service Oriented Java Adaptive Applications)
Load balancing in SOAJA (Service Oriented Java Adaptive Applications) Richard Olejnik Université des Sciences et Technologies de Lille Laboratoire d Informatique Fondamentale de Lille (LIFL UMR CNRS 8022)
More informationCloud Computing for Research Roger Barga Cloud Computing Futures, Microsoft Research
Cloud Computing for Research Roger Barga Cloud Computing Futures, Microsoft Research Trends: Data on an Exponential Scale Scientific data doubles every year Combination of inexpensive sensors + exponentially
More informationBioinformatics Resources at a Glance
Bioinformatics Resources at a Glance A Note about FASTA Format There are MANY free bioinformatics tools available online. Bioinformaticists have developed a standard format for nucleotide and protein sequences
More informationOperating System Support for Multiprocessor Systems-on-Chip
Operating System Support for Multiprocessor Systems-on-Chip Dr. Gabriel marchesan almeida Agenda. Introduction. Adaptive System + Shop Architecture. Preliminary Results. Perspectives & Conclusions Dr.
More informationBuilding Platform as a Service for Scientific Applications
Building Platform as a Service for Scientific Applications Moustafa AbdelBaky moustafa@cac.rutgers.edu Rutgers Discovery Informa=cs Ins=tute (RDI 2 ) The NSF Cloud and Autonomic Compu=ng Center Department
More informationGrid Scheduling Dictionary of Terms and Keywords
Grid Scheduling Dictionary Working Group M. Roehrig, Sandia National Laboratories W. Ziegler, Fraunhofer-Institute for Algorithms and Scientific Computing Document: Category: Informational June 2002 Status
More informationCloud Computing at Google. Architecture
Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale
More informationEfficient Load Balancing using VM Migration by QEMU-KVM
International Journal of Computer Science and Telecommunications [Volume 5, Issue 8, August 2014] 49 ISSN 2047-3338 Efficient Load Balancing using VM Migration by QEMU-KVM Sharang Telkikar 1, Shreyas Talele
More informationDynamic Virtual Machine Scheduling for Resource Sharing In the Cloud Environment
Dynamic Virtual Machine Scheduling for Resource Sharing In the Cloud Environment Karthika.M 1 P.G Student, Department of Computer Science and Engineering, V.S.B Engineering College, Tamil Nadu 1 ABSTRACT:
More informationVirtual machine interface. Operating system. Physical machine interface
Software Concepts User applications Operating system Hardware Virtual machine interface Physical machine interface Operating system: Interface between users and hardware Implements a virtual machine that
More information