Scalable Data-Intensive Processing for Science on Clouds: A-Brain and Z-CloudFlow
|
|
- Eustace Barton
- 8 years ago
- Views:
Transcription
1 Scalable Data-Intensive Processing for Science on Clouds: A-Brain and Z-CloudFlow Lessons Learned and Future Directions Gabriel Antoniu, Inria Joint work with Radu Tudoran, Benoit Da Mota, Alexandru Costan, Elena Apostol, Bertrand Thirion (co-pi for A-Brain), Ji Liu, Luis Pineda, Esther Pacitti, Patrick Valduriez (co-pi for Z-CloudFlow) and the Microsoft Azure team from MSR ATL Europe EIT Digital Future Cloud Symposium, Rennes, October 2015
2 Inria Teams Involved in Cloud-Related Projects of the MSR-Inria Joint Centre INRIA Lille Nord Europe KERDATA: Data Storage and Processing INRIA Paris Rocquencourt INRIA Rennes Bretagne Atlantique INRIA Nancy Grand Est INRIA Saclay Île-de-France PARIETAL: Neuroimaging INRIA Grenoble Rhône-Alpes INRIA Bordeaux Sud-Ouest INRIA Sophia Antipolis Méditerranée ZENITH: Scien=fic Data Management -2 2
3 KerData s Focus: How to efficiently store and share data at large scale for next-generation, data-intensive applications? Scientific challenges Massive data Geographically distributed Fine-grain access (MB) for reading and writing High concurrency Without locking Major goal: high-throughput under heavy concurrency Our contribution Design and implementation of distributed algorithms Validation with real apps on real platforms with real users 3
4 Motivating Application: A-Brain Detect risk factors for brain diseases Brain image finding associations: p( 106 Genetic data, ) 106 Anatomical MRI Functional MRI Diffusion MRI DNA array (SNP/CNV) gene expression data others... >2000 subjects IEEE Cluster 15, Chicago, USA, 10 September
5 Approach: A-Brain as Map-Reduce Processing 5 5
6 Challenges: Overview Scaling the processing Multi- site MapReduce Enabling scientific discovery Enabling large- scale scientific processing Data management across sites High- Performance Big Data Management Across Cloud Data Centers High- performance streaming Optimize inter- site transfers Cloud- provided Transfers Service Streaming across cloud sites Configurable cost- performance tradeoffs 6
7 Challenges: Overview Scaling the processing Multi- site MapReduce Enabling scientific discovery Enabling large- scale scientific processing Data management across sites High- Performance Big Data Management Across Cloud Data Centers High- performance streaming Optimize inter- site transfers Cloud- provided Transfers Service Streaming across cloud sites Configurable cost- performance tradeoffs 7
8 Data Management on Public Clouds Cloud Compute Nodes Cloud- provided storage service Computa.on- to- data latency is high! 8
9 TomusBlobs: Leverage Virtual Disks Colloca.ng computa.on and data in PaaS clouds: Federate virtual disk of compute nodes Self- configura.on, automa.c deployment and scaling of the data management system Apply to MapReduce and Workflow processing 9
10 Leveraging TomusBlobs for MapReduce Processing Map Client Map Map Azure Queues Reduce Reduce New MapReduce prototype (no Hadoop at that point on Azure) Relies on versioning to support high throughput under heavy concurrency, leveraging BlobSeer (KerData, Inria, Rennes) 10
11 Background: BlobSeer, a Software Platform for Scalable, Distributed BLOB Management Started in 2008, 6 PhD theses (Gilles Kahn/SPECIF PhD Thesis Award in 2011) Main goal: optimized for concurrent accesses under heavy concurrency Three key ideas Decentralized metadata management Lock-free concurrent writes (enabled by versioning) Write = create new version of the data Data and metadata patching rather than updating A back-end for higher-level data management systems Short term: highly scalable distributed file systems Middle term: storage for cloud services Our approach Design and implementation of distributed algorithms Experiments on the Grid 5000 grid/cloud testbed Validation with real apps on real platforms: Nimbus, Azure, OpenNebula clouds
12 Initial A-Brain Experimentation Scenario: 100 nodes deployment on Azure Comparison with an Azure Blobs based MapReduce TomusBlobs is definitely faster than Azure storage 12
13 Beyond MapReduce: MapIterativeReduce Unique result with parallel reduction No central control entity No synchronization barrier 13
14 The Global Gain The Most Frequent Words benchmark Data set 3.2 GB to 32 GB A- Brain ini.al experimenta.on Data set 5 GB to 50 GB Experimental Setup: 200 nodes deployment on Azure Map-IterativeReduce reduces the execution timespan to half 14
15 Challenges: Overview Scaling the processing Multi- site MapReduce Enabling scientific discovery Enabling large- scale scientific processing Data management across sites High- Performance Big Data Management Across Cloud Data Centers High- performance streaming Optimize inter- site transfers Cloud- provided Transfers Service Streaming across cloud sites Configurable cost- performance tradeoffs 15
16 Single-Site Computation on the Cloud Timespan es.ma.on for single core machine: 5,3 years Parallelize and execute on Azure cloud across 350 cores using TomusBlobs Achievements: Reduced execu.on.me to 5.6 days Demonstrated tgeographically hat this technique is d sensi.ve to outliers and cannot get results L istributed processing Ø Get more data: 1 billion euro needed Ø More robust analysis: computa.on.mespan increases to 86 years (for single core) 16
17 Going Geo-Distributed Azure Data Centers Hierarchical mul.- site MapReduce: Map- Itera.veReduce, Global Reduce Data management: TomusBlobs (intra- site), Cloud Storage (inter- site) Itera.ve- Reduce technique for minimizing transfers of par.al results 17-17
18 Executing the A-Brain Application at Large-Scale The TomusBlobs data- storage layer developed within the A- Brain project was demonstrated to scale up to 1,000 cores on 3 Azure data centers (from EU, US) Gain compared to Azure BLOBs: close to 50% Experiment dura.on: ~ 14 days More than 210,000 hours of computa.on used Cost of the experiments: 20,000 euros (VM price, storage, outbound traffic) 28,000 map jobs (each las.ng about 2 hours) and ~600 reduce jobs Scien=fic Discovery: Provided the first sta.s.cal evidence of the heritability of func.onal signals in a failed stop task in basal ganglia 18
19 People Involved Gabriel Antoniu (INRIA, Project Lead) Benoit Da Mota (INRIA) Bertrand Thirion (INRIA, Project Lead) Hakan Soncu (Microsoft Research) Pierre Louis Xech (Microsoft) Alexandru Costan (INRIA) Götz-Philip Brasche (Microsoft Research Now at HUAWEI) Radu Tudoran (INRIA Now at HUAWEI)
20 What s Next? Z-CloudFlow: Data-Intensive Workflows in the Cloud KerData
21 Scientific Workflow Scenario 1. Data is generated and collected Provenance Data 2. It is locally evaluated 3. Large volume of data produced Final results generated in a reasonable time 4....which need to be processed (HPC) Phylogenetic trees 21
22 Why to Use Multisite Clouds for Workflows? Multisite cloud = a cloud with multiple data centers Each with its own cluster, data and programs Matches well the requirements of scientific apps With different labs and groups at different sites 22
23 Multisite Cloud Data Management: Challenges What strategies to use and how for efficient data transfers? How to handle metadata across datacenters? How to group tasks and datasets together to minimize data transfers? 23
24 Multisite Cloud Data Management: Challenges What strategies to use and how for efficient data transfers? How to handle metadata across datacenters? How to group tasks and datasets together to minimize data transfers? 24
25 Main obstacle: the network! Metadata update latency Time (sec) 1000 Remote Regional 400 Local Number of files IEEE Cluster 15, Chicago, USA, 10 September
26 Metadata management (for workflows on distributed clouds) Workflow features what we know Many small files (when striping makes no sense) Common data access patterns: pipeline, gather, scatter, reduce and broadcast Applications are a combination of them Typical scheme: write once, read many times Design Principles how we handle it Hybrid distributed/replicated DHT-based architecture In-memory Caching Leverage workflow metadata for data provisioning Eventual consistency for geo-distributed metadata: lazy metadata updates 26
27 Four Strategies Centralized Baseline Replicated Local metadata accesses Synchroniza.on agent Decentralized Non- replicated Scahered metadata across sites DHT- based Decentralized Replicated Metadata stored locally and replicated to a remote loca.on (using hashing) 27
28 Architecture and Implementation Communication and distributed synchronization manager In-memory metadata storage Optimistic Concurrency Model: no locks during operations 28
29 Experimental Setup Azure Cloud (PaaS) 4 datacenters 2 EU, 2 US Up to 128 nodes 1 CPU core, 1.75 GB memory, 127 GB disk size 29
30 Impact of descentralization on makespan large workflow provisioning small Completion time vs. speedup No significant gain in small settings 50% improvement at large scale Impact of the local replication Speedup of IEEE Cluster 15, Chicago, USA, 10 September
31 Scalability Up to 128 nodes 5000 operations/node Degradation of the replicated approach at large scale 31
32 Support for real-life workflows BuzzFlow Pipeline-like Correlation in large scientific databases Montage Split + parallelized jobs + merge Astronomy application to create mosaics of the sky Scenario Opera=ons / node 3 scenarios Small Scale Comp. Int. Metadata Int ,000 Computa=on =me / node 1s 5s 1s Total Ops BuzzFlow 7,200 14,400 72,000 Total Ops - Montage 16,000 32, ,000 32
33 Matching strategies to workflows Centralized still better at small scale Replicated benefits from intensive computations on large files Decentralized approaches suitable for large-scale, metadata-intensive apps handling a large number of small files non-replicated: for parallel jobs linear metadata access replicated: for sequential, tightly dependent jobs, data available locally IEEE Cluster 15, Chicago, USA, 10 September
34 Overall Achievements Publications Book Chapter In Cloud Computing for Data-Intensive Applications, Springer 2015 Journal articles Frontiers in Neuroinformatics 2014 Concurrency and Computation Practice and Experience 2013 ERCIM Electonic Journal 2012 International Conferences publications IEEE Cluster papers at IEEE/ACM CCGrid 2012 and 2014 IEEE SRDS 2014 IEEE Big Data 2013 ACM DEBS 2014 IEEE Trustcom/ISPA 2013 Workshops papers, Posters and Demos MapReduce in conjuction with ACM HPDC (rank A) CloudCP in conjuction with ACM EuroSys (rank A) IPDPSW in conjuction with IEEE IPDPS (rank A) Microsoft: CloudFutures, ResearchNext, PhD Summer School DEBS Demo in conjunction with ACM DEBS SoWware PaaS data management middleware Available with Microsoi GenericWorker MapReduce engine for the Azure cloud Cloud service for bio- informa.cs SaaS for benchmarking the performance of data stage- in to cloud data centers Available on Azure Cloud Middleware for batch- based, high- performance streaming across cloud sites Binding with Microsoi StreamInsight External Collaborators Microsoi Research ATLE, Cambridge Argonne Na.onal Laboratory Inria Saclay Inria Sophia An.polis 34
35 Future Directions Multi-site workflow across geographically distributed sites Incorporate metadata registry with a workflow execution engine to support multi-site scheduling Self-* processing Cost/performance/energy tradeoffs One size does not fit all! Cloud stream processing Management of many small events, latency constraints for distributed queries 35
36 Scalable Data-Intensive Processing for Science on Clouds: A-Brain and Z-CloudFlow Contact: team.inria.fr/kerdata Thank you!
High-Performance Big Data Management Across Cloud Data Centers
High-Performance Big Data Management Across Cloud Data Centers Radu Tudoran PhD Advisors Gabriel Antoniu INRIA Luc Bougé ENS Rennes KerData research team IRISA/INRIA Rennes Doctoral Work: Context VOLUME
More informationBig Data Management in the Clouds and HPC Systems
Big Data Management in the Clouds and HPC Systems Hemera Final Evaluation Paris 17 th December 2014 Shadi Ibrahim Shadi.ibrahim@inria.fr Era of Big Data! Source: CNRS Magazine 2013 2 Era of Big Data! Source:
More informationHigh-Performance Big Data Management Across Cloud Data Centers. Radu Marius Tudoran
THÈSE / ENS RENNES sous le sceau de l Université européenne de Bretagne pour obtenir le titre de DOCTEUR DE L ÉCOLE NORMALE SUPÉRIEURE DE RENNES Mention : Informatique École doctorale MATISSE présentée
More informationBlobSeer: Towards efficient data storage management on large-scale, distributed systems
: Towards efficient data storage management on large-scale, distributed systems Bogdan Nicolae University of Rennes 1, France KerData Team, INRIA Rennes Bretagne-Atlantique PhD Advisors: Gabriel Antoniu
More informationBlobSeer: Enabling Efficient Lock-Free, Versioning-Based Storage for Massive Data under Heavy Access Concurrency
BlobSeer: Enabling Efficient Lock-Free, Versioning-Based Storage for Massive Data under Heavy Access Concurrency Gabriel Antoniu 1, Luc Bougé 2, Bogdan Nicolae 3 KerData research team 1 INRIA Rennes -
More informationLeveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000
Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000 Alexandra Carpen-Amarie Diana Moise Bogdan Nicolae KerData Team, INRIA Outline
More informationEvalua&ng Streaming Strategies for Event Processing across Infrastructure Clouds (joint work)
Evalua&ng Streaming Strategies for Event Processing across Infrastructure Clouds (joint work) Radu Tudoran, Gabriel Antoniu (INRIA, University of Rennes) Kate Keahey, Pierre Riteau (ANL, University of
More informationComputing in clouds: Where we come from, Where we are, What we can, Where we go
Computing in clouds: Where we come from, Where we are, What we can, Where we go Luc Bougé ENS Cachan/Rennes, IRISA, INRIA Biogenouest With help from many colleagues: Gabriel Antoniu, Guillaume Pierre,
More informationDataSteward: Using Dedicated Compute Nodes for Scalable Data Management on Public Clouds
DataSteward: Using Dedicated Compute Nodes for Scalable Data Management on Public Clouds Radu Tudoran, Alexandru Costan, Gabriel Antoniu To cite this version: Radu Tudoran, Alexandru Costan, Gabriel Antoniu.
More informationBig Data Mining Services and Knowledge Discovery Applications on Clouds
Big Data Mining Services and Knowledge Discovery Applications on Clouds Domenico Talia DIMES, Università della Calabria & DtoK Lab Italy talia@dimes.unical.it Data Availability or Data Deluge? Some decades
More informationBENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next
More informationGeoGrid Project and Experiences with Hadoop
GeoGrid Project and Experiences with Hadoop Gong Zhang and Ling Liu Distributed Data Intensive Systems Lab (DiSL) Center for Experimental Computer Systems Research (CERCS) Georgia Institute of Technology
More informationData Semantics Aware Cloud for High Performance Analytics
Data Semantics Aware Cloud for High Performance Analytics Microsoft Future Cloud Workshop 2011 June 2nd 2011, Prof. Jun Wang, Computer Architecture and Storage System Laboratory (CASS) Acknowledgement
More informationA Performance Evaluation of Azure and Nimbus Clouds for Scientific Applications
A Performance Evaluation of Azure and Nimbus Clouds for Scientific Applications Radu Tudoran, Alexandru Costan, Gabriel Antoniu, Luc Bougé To cite this version: Radu Tudoran, Alexandru Costan, Gabriel
More informationCluster, Grid, Cloud Concepts
Cluster, Grid, Cloud Concepts Kalaiselvan.K Contents Section 1: Cluster Section 2: Grid Section 3: Cloud Cluster An Overview Need for a Cluster Cluster categorizations A computer cluster is a group of
More informationAzure Data Lake Analytics
Azure Data Lake Analytics Compose and orchestrate data services at scale Fully managed service to support orchestration of data movement and processing Connect to relational or non-relational data
More informationAssignment # 1 (Cloud Computing Security)
Assignment # 1 (Cloud Computing Security) Group Members: Abdullah Abid Zeeshan Qaiser M. Umar Hayat Table of Contents Windows Azure Introduction... 4 Windows Azure Services... 4 1. Compute... 4 a) Virtual
More informationGoing Back and Forth: Efficient Multideployment and Multisnapshotting on Clouds
Going Back and Forth: Efficient Multideployment and Multisnapshotting on Clouds Bogdan Nicolae INRIA Saclay France bogdan.nicolae@inria.fr John Bresnahan Argonne National Laboratory USA bresnaha@mcs.anl.gov
More informationCloud computing The cloud as a pool of shared hadrware and software resources
Cloud computing The cloud as a pool of shared hadrware and software resources cloud Towards SLA-oriented Cloud Computing middleware layers (e.g. application servers) operating systems, virtual machines
More informationEvaluating MapReduce and Hadoop for Science
Evaluating MapReduce and Hadoop for Science Lavanya Ramakrishnan LRamakrishnan@lbl.gov Lawrence Berkeley National Lab Computation and Data are critical parts of the scientific process Three Pillars of
More informationFlauncher and DVMS Deploying and Scheduling Thousands of Virtual Machines on Hundreds of Nodes Distributed Geographically
Flauncher and Deploying and Scheduling Thousands of Virtual Machines on Hundreds of Nodes Distributed Geographically Daniel Balouek, Adrien Lèbre, Flavien Quesnel To cite this version: Daniel Balouek,
More informationChapter 4 Cloud Computing Applications and Paradigms. Cloud Computing: Theory and Practice. 1
Chapter 4 Cloud Computing Applications and Paradigms Chapter 4 1 Contents Challenges for cloud computing. Existing cloud applications and new opportunities. Architectural styles for cloud applications.
More informationFinal Project Proposal. CSCI.6500 Distributed Computing over the Internet
Final Project Proposal CSCI.6500 Distributed Computing over the Internet Qingling Wang 660795696 1. Purpose Implement an application layer on Hybrid Grid Cloud Infrastructure to automatically or at least
More informationEnabling Large-Scale Testing of IaaS Cloud Platforms on the Grid 5000 Testbed
Enabling Large-Scale Testing of IaaS Cloud Platforms on the Grid 5000 Testbed Sébastien Badia, Alexandra Carpen-Amarie, Adrien Lèbre, Lucas Nussbaum Grid 5000 S. Badia, A. Carpen-Amarie, A. Lèbre, L. Nussbaum
More informationAccelerating Hadoop MapReduce Using an In-Memory Data Grid
Accelerating Hadoop MapReduce Using an In-Memory Data Grid By David L. Brinker and William L. Bain, ScaleOut Software, Inc. 2013 ScaleOut Software, Inc. 12/27/2012 H adoop has been widely embraced for
More informationCUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS. Review Business and Technology Series www.cumulux.com
` CUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS Review Business and Technology Series www.cumulux.com Table of Contents Cloud Computing Model...2 Impact on IT Management and
More informationNews about HPC and Clouds @ Inria
News about HPC and Clouds @ Inria Claude Kirchner Advisor to the president 24/11/2014 Nov 24, 2014-2 Nov 24, 2014-3 Inria Research Centres Inria LILLE Nord Europe Inria PARIS - Rocquencourt Inria NANCY
More informationData-intensive HPC: opportunities and challenges. Patrick Valduriez
Data-intensive HPC: opportunities and challenges Patrick Valduriez Big Data Landscape Multi-$billion market! Big data = Hadoop = MapReduce? No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard,
More informationOverFlow: Multi-Site Aware Big Data Management for Scientific Workflows on Clouds
OverFlow: Multi-Site Aware Big Data Management for Scientific Workflows on Clouds Radu Tudoran, Alexandru Costan, Gabriel Antoniu To cite this version: Radu Tudoran, Alexandru Costan, Gabriel Antoniu.
More informationLustre * Filesystem for Cloud and Hadoop *
OpenFabrics Software User Group Workshop Lustre * Filesystem for Cloud and Hadoop * Robert Read, Intel Lustre * for Cloud and Hadoop * Brief Lustre History and Overview Using Lustre with Hadoop Intel Cloud
More informationEvaluation Methodology of Converged Cloud Environments
Krzysztof Zieliński Marcin Jarząb Sławomir Zieliński Karol Grzegorczyk Maciej Malawski Mariusz Zyśk Evaluation Methodology of Converged Cloud Environments Cloud Computing Cloud Computing enables convenient,
More informationScaling Database Performance in Azure
Scaling Database Performance in Azure Results of Microsoft-funded Testing Q1 2015 2015 2014 ScaleArc. All Rights Reserved. 1 Test Goals and Background Info Test Goals and Setup Test goals Microsoft commissioned
More informationWrite a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical
Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or
More informationA Service for Data-Intensive Computations on Virtual Clusters
A Service for Data-Intensive Computations on Virtual Clusters Executing Preservation Strategies at Scale Rainer Schmidt, Christian Sadilek, and Ross King rainer.schmidt@arcs.ac.at Planets Project Permanent
More informationDeveloping Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
More informationParallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.
Parallel Computing: Strategies and Implications Dori Exterman CTO IncrediBuild. In this session we will discuss Multi-threaded vs. Multi-Process Choosing between Multi-Core or Multi- Threaded development
More informationScaling Objectivity Database Performance with Panasas Scale-Out NAS Storage
White Paper Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage A Benchmark Report August 211 Background Objectivity/DB uses a powerful distributed processing architecture to manage
More informationChapter 7. Using Hadoop Cluster and MapReduce
Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in
More informationAn Experimental Study of Load Balancing of OpenNebula Open-Source Cloud Computing Platform
An Experimental Study of Load Balancing of OpenNebula Open-Source Cloud Computing Platform A B M Moniruzzaman 1, Kawser Wazed Nafi 2, Prof. Syed Akhter Hossain 1 and Prof. M. M. A. Hashem 1 Department
More informationCloud Computing for Research Roger Barga Cloud Computing Futures, Microsoft Research
Cloud Computing for Research Roger Barga Cloud Computing Futures, Microsoft Research Trends: Data on an Exponential Scale Scientific data doubles every year Combination of inexpensive sensors + exponentially
More informationCloud computing. Intelligent Services for Energy-Efficient Design and Life Cycle Simulation. as used by the ISES project
Intelligent Services for Energy-Efficient Design and Life Cycle Simulation Project number: 288819 Call identifier: FP7-ICT-2011-7 Project coordinator: Technische Universität Dresden, Germany Website: ises.eu-project.info
More informationScalable Data Management for Map-Reduce-based Data-Intensive Applications: A View for Cloud and Hybrid Infrastructures
Int. J. of Cloud Computing Scalable Data Management for Map-Reduce-based Data-Intensive Applications: A View for Cloud and Hybrid Infrastructures Gabriel Antoniu a,b gabriel.antoniu@inria.fr Julien Bigot
More informationDennis Gannon Cloud Computing Futures extreme Computing Group Microsoft Research
Dennis Gannon Cloud Computing Futures extreme Computing Group Microsoft Research 2 Cloud Concepts Data Center Architecture The cloud flavors: IaaS, PaaS, SaaS Our world of client devices plus the cloud
More informationHow To Understand Cloud Computing
Overview of Cloud Computing (ENCS 691K Chapter 1) Roch Glitho, PhD Associate Professor and Canada Research Chair My URL - http://users.encs.concordia.ca/~glitho/ Overview of Cloud Computing Towards a definition
More informationIntroduction to Cloud Computing
Introduction to Cloud Computing Cloud Computing I (intro) 15 319, spring 2010 2 nd Lecture, Jan 14 th Majd F. Sakr Lecture Motivation General overview on cloud computing What is cloud computing Services
More informationEFFICIENT GEAR-SHIFTING FOR A POWER-PROPORTIONAL DISTRIBUTED DATA-PLACEMENT METHOD
EFFICIENT GEAR-SHIFTING FOR A POWER-PROPORTIONAL DISTRIBUTED DATA-PLACEMENT METHOD 2014/1/27 Hieu Hanh Le, Satoshi Hikida and Haruo Yokota Tokyo Institute of Technology 1.1 Background 2 Commodity-based
More informationPerformance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems
Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems Rekha Singhal and Gabriele Pacciucci * Other names and brands may be claimed as the property of others. Lustre File
More informationFig. 3. PostgreSQL subsystems
Development of a Parallel DBMS on the Basis of PostgreSQL C. S. Pan kvapen@gmail.com South Ural State University Abstract. The paper describes the architecture and the design of PargreSQL parallel database
More informationCloud Computing Summary and Preparation for Examination
Basics of Cloud Computing Lecture 8 Cloud Computing Summary and Preparation for Examination Satish Srirama Outline Quick recap of what we have learnt as part of this course How to prepare for the examination
More informationAlternative Deployment Models for Cloud Computing in HPC Applications. Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix
Alternative Deployment Models for Cloud Computing in HPC Applications Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix The case for Cloud in HPC Build it in house Assemble in the cloud?
More informationCloud/SaaS enablement of existing applications
Cloud/SaaS enablement of existing applications GigaSpaces: Nati Shalom, CTO & Founder About GigaSpaces Technologies Enabling applications to run a distributed cluster as if it was a single machine 75+
More informationAn HPC Application Deployment Model on Azure Cloud for SMEs
An HPC Application Deployment Model on Azure Cloud for SMEs Fan Ding CLOSER 2013, Aachen, Germany, May 9th,2013 Rechen- und Kommunikationszentrum (RZ) Agenda Motivation Windows Azure Relevant Technology
More informationEnabling Execution of Service Workflows in Grid/Cloud Hybrid Systems
Enabling Execution of Service Workflows in Grid/Cloud Hybrid Systems Luiz F. Bittencourt, Carlos R. Senna, and Edmundo R. M. Madeira Institute of Computing University of Campinas - UNICAMP P.O. Box 6196,
More informationBIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
More informationPOSIX and Object Distributed Storage Systems
1 POSIX and Object Distributed Storage Systems Performance Comparison Studies With Real-Life Scenarios in an Experimental Data Taking Context Leveraging OpenStack Swift & Ceph by Michael Poat, Dr. Jerome
More informationApache Hadoop. Alexandru Costan
1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open
More informationPerformance Comparison of Intel Enterprise Edition for Lustre* software and HDFS for MapReduce Applications
Performance Comparison of Intel Enterprise Edition for Lustre software and HDFS for MapReduce Applications Rekha Singhal, Gabriele Pacciucci and Mukesh Gangadhar 2 Hadoop Introduc-on Open source MapReduce
More informationNetwork-Aware Scheduling of MapReduce Framework on Distributed Clusters over High Speed Networks
Network-Aware Scheduling of MapReduce Framework on Distributed Clusters over High Speed Networks Praveenkumar Kondikoppa, Chui-Hui Chiu, Cheng Cui, Lin Xue and Seung-Jong Park Department of Computer Science,
More informationDatacenter Operating Systems
Datacenter Operating Systems CSE451 Simon Peter With thanks to Timothy Roscoe (ETH Zurich) Autumn 2015 This Lecture What s a datacenter Why datacenters Types of datacenters Hyperscale datacenters Major
More informationGigaSpaces Real-Time Analytics for Big Data
GigaSpaces Real-Time Analytics for Big Data GigaSpaces makes it easy to build and deploy large-scale real-time analytics systems Rapidly increasing use of large-scale and location-aware social media and
More informationDistributed Framework for Data Mining As a Service on Private Cloud
RESEARCH ARTICLE OPEN ACCESS Distributed Framework for Data Mining As a Service on Private Cloud Shraddha Masih *, Sanjay Tanwani** *Research Scholar & Associate Professor, School of Computer Science &
More informationCloud Computing Trends
UT DALLAS Erik Jonsson School of Engineering & Computer Science Cloud Computing Trends What is cloud computing? Cloud computing refers to the apps and services delivered over the internet. Software delivered
More informationA Very Brief Introduction To Cloud Computing. Jens Vöckler, Gideon Juve, Ewa Deelman, G. Bruce Berriman
A Very Brief Introduction To Cloud Computing Jens Vöckler, Gideon Juve, Ewa Deelman, G. Bruce Berriman What is The Cloud Cloud computing refers to logical computational resources accessible via a computer
More informationBig Data Management in the Clouds. Alexandru Costan IRISA / INSA Rennes (KerData team)
Big Data Management in the Clouds Alexandru Costan IRISA / INSA Rennes (KerData team) Cumulo NumBio 2015, Aussois, June 4, 2015 After this talk Realize the potential: Data vs. Big Data Understand why we
More informationBSPCloud: A Hybrid Programming Library for Cloud Computing *
BSPCloud: A Hybrid Programming Library for Cloud Computing * Xiaodong Liu, Weiqin Tong and Yan Hou Department of Computer Engineering and Science Shanghai University, Shanghai, China liuxiaodongxht@qq.com,
More informationOpenNebula Leading Innovation in Cloud Computing Management
OW2 Annual Conference 2010 Paris, November 24th, 2010 OpenNebula Leading Innovation in Cloud Computing Management Ignacio M. Llorente DSA-Research.org Distributed Systems Architecture Research Group Universidad
More informationIntroduction to Cloud Computing
Introduction to Cloud Computing Summary Need for Cloud computing Cloud computing Architecture Cloud Services Possible challenges related to parallel processing Wolfson et al optimal data replication strategy
More informationHow can new technologies can be of service to astronomy? Community effort
1 Astronomy must develop new computational model Integration and processing of data will be done increasingly on distributed facilities rather than desktops Wonderful opportunity for the next generation!
More informationHigh-Performance Cloud Computing: A View of Scientific Applications
2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks High-Performance Cloud Computing: A View of Scientific Applications Christian Vecchiola 1, Suraj Pandey 1, and Rajkumar
More informationCloud Computing Now and the Future Development of the IaaS
2010 Cloud Computing Now and the Future Development of the IaaS Quanta Computer Division: CCASD Title: Project Manager Name: Chad Lin Agenda: What is Cloud Computing? Public, Private and Hybrid Cloud.
More informationCloud Federation to Elastically Increase MapReduce Processing Resources
Cloud Federation to Elastically Increase MapReduce Processing Resources A.Panarello, A.Celesti, M. Villari, M. Fazio and A. Puliafito {apanarello,acelesti, mfazio, mvillari, apuliafito}@unime.it DICIEAMA,
More informationBig data management with IBM General Parallel File System
Big data management with IBM General Parallel File System Optimize storage management and boost your return on investment Highlights Handles the explosive growth of structured and unstructured data Offers
More informationUsing Peer to Peer Dynamic Querying in Grid Information Services
Using Peer to Peer Dynamic Querying in Grid Information Services Domenico Talia and Paolo Trunfio DEIS University of Calabria HPC 2008 July 2, 2008 Cetraro, Italy Using P2P for Large scale Grid Information
More informationHow In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time
SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first
More informationCreating A Galactic Plane Atlas With Amazon Web Services
Creating A Galactic Plane Atlas With Amazon Web Services G. Bruce Berriman 1*, Ewa Deelman 2, John Good 1, Gideon Juve 2, Jamie Kinney 3, Ann Merrihew 3, and Mats Rynge 2 1 Infrared Processing and Analysis
More informationOutline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging
Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging
More informationKeywords: Cloudsim, MIPS, Gridlet, Virtual machine, Data center, Simulation, SaaS, PaaS, IaaS, VM. Introduction
Vol. 3 Issue 1, January-2014, pp: (1-5), Impact Factor: 1.252, Available online at: www.erpublications.com Performance evaluation of cloud application with constant data center configuration and variable
More informationDIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION
DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION A DIABLO WHITE PAPER AUGUST 2014 Ricky Trigalo Director of Business Development Virtualization, Diablo Technologies
More informationHadoop Architecture. Part 1
Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,
More informationFederated Big Data for resource aggregation and load balancing with DIRAC
Procedia Computer Science Volume 51, 2015, Pages 2769 2773 ICCS 2015 International Conference On Computational Science Federated Big Data for resource aggregation and load balancing with DIRAC Víctor Fernández
More informationUse of Hadoop File System for Nuclear Physics Analyses in STAR
1 Use of Hadoop File System for Nuclear Physics Analyses in STAR EVAN SANGALINE UC DAVIS Motivations 2 Data storage a key component of analysis requirements Transmission and storage across diverse resources
More informationSo#ware Product Lines for Automa5c Mul5- Cloud Configura5on
So#ware Product Lines for Automa5c Mul5- Cloud Configura5on Université Lille 1 CRIStAL UMR CNRS 9189 Inria Lille - Nord Europe France Gustavo Sousa gustavo.sousa@inria.fr Encadrants: Walter Rudametkin
More informationScala Storage Scale-Out Clustered Storage White Paper
White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current
More informationEmerging Technology for the Next Decade
Emerging Technology for the Next Decade Cloud Computing Keynote Presented by Charles Liang, President & CEO Super Micro Computer, Inc. What is Cloud Computing? Cloud computing is Internet-based computing,
More informationKeywords Distributed Computing, On Demand Resources, Cloud Computing, Virtualization, Server Consolidation, Load Balancing
Volume 5, Issue 1, January 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Survey on Load
More informationBuilding a Scalable Big Data Infrastructure for Dynamic Workflows
Building a Scalable Big Data Infrastructure for Dynamic Workflows INTRODUCTION Organizations of all types and sizes are looking to big data to help them make faster, more intelligent decisions. Many efforts
More informationEnhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications
Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications Ahmed Abdulhakim Al-Absi, Dae-Ki Kang and Myong-Jong Kim Abstract In Hadoop MapReduce distributed file system, as the input
More informationThe data explosion is transforming science
Talk Outline The data tsunami and the 4 th paradigm of science The challenges for the long tail of science Where is the cloud being used now? The app marketplace SMEs Analytics as a service. What are the
More informationData Management in the Cloud: Limitations and Opportunities. Annies Ductan
Data Management in the Cloud: Limitations and Opportunities Annies Ductan Discussion Outline: Introduc)on Overview Vision of Cloud Compu8ng Managing Data in The Cloud Cloud Characteris8cs Data Management
More informationTackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc.
Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc. 2015 The MathWorks, Inc. 1 Challenges of Big Data Any collection of data sets so large and complex that it becomes difficult
More informationHarnessing the Power of the Microsoft Cloud for Deep Data Analytics
1 Harnessing the Power of the Microsoft Cloud for Deep Data Analytics Today's Focus How you can operate your business more efficiently and effectively by tapping into Cloud based data analytics solutions
More informationDatacenters and Cloud Computing. Jia Rao Assistant Professor in CS http://cs.uccs.edu/~jrao/cs5540/spring2014/index.html
Datacenters and Cloud Computing Jia Rao Assistant Professor in CS http://cs.uccs.edu/~jrao/cs5540/spring2014/index.html What is Cloud Computing? A model for enabling ubiquitous, convenient, ondemand network
More informationBig Data Challenges in Bioinformatics
Big Data Challenges in Bioinformatics BARCELONA SUPERCOMPUTING CENTER COMPUTER SCIENCE DEPARTMENT Autonomic Systems and ebusiness Pla?orms Jordi Torres Jordi.Torres@bsc.es Talk outline! We talk about Petabyte?
More informationDatabase Server Configuration Best Practices for Aras Innovator 10
Database Server Configuration Best Practices for Aras Innovator 10 Aras Innovator 10 Running on SQL Server 2012 Enterprise Edition Contents Executive Summary... 1 Introduction... 2 Overview... 2 Aras Innovator
More informationCloud Computing with Red Hat Solutions. Sivaram Shunmugam Red Hat Asia Pacific Pte Ltd. sivaram@redhat.com
Cloud Computing with Red Hat Solutions Sivaram Shunmugam Red Hat Asia Pacific Pte Ltd sivaram@redhat.com Linux Automation Details Red Hat's Linux Automation strategy for next-generation IT infrastructure
More informationHPC Programming Framework Research Team
HPC Programming Framework Research Team 1. Team Members Naoya Maruyama (Team Leader) Motohiko Matsuda (Research Scientist) Soichiro Suzuki (Technical Staff) Mohamed Wahib (Postdoctoral Researcher) Shinichiro
More informationDenis Caromel, CEO Ac.veEon. Orchestrate and Accelerate Applica.ons. Open Source Cloud Solu.ons Hybrid Cloud: Private with Burst Capacity
Cloud computing et Virtualisation : applications au domaine de la Finance Denis Caromel, CEO Ac.veEon Orchestrate and Accelerate Applica.ons Open Source Cloud Solu.ons Hybrid Cloud: Private with Burst
More informationIBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud
IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud February 25, 2014 1 Agenda v Mapping clients needs to cloud technologies v Addressing your pain
More informationPerformance Evaluation for BlobSeer and Hadoop using Machine Learning Algorithms
Performance Evaluation for BlobSeer and Hadoop using Machine Learning Algorithms Elena Burceanu, Irina Presa Automatic Control and Computers Faculty Politehnica University of Bucharest Emails: {elena.burceanu,
More information