BlobSeer: Towards efficient data storage management on large-scale, distributed systems

Size: px
Start display at page:

Download "BlobSeer: Towards efficient data storage management on large-scale, distributed systems"

Transcription

1 : Towards efficient data storage management on large-scale, distributed systems Bogdan Nicolae University of Rennes 1, France KerData Team, INRIA Rennes Bretagne-Atlantique PhD Advisors: Gabriel Antoniu and Luc Bougé December 1, 2010 Bogdan Nicolae : Efficient data storage on distributed systems 1/ 45

2 Outline Context Related work and its limitations Contribution: Principles High level description Zoom on metadata management Synthetic benchmarks as a storage backend for Hadoop MapReduce providing virtual machine image storage for clouds as a QoS enabled storage service for applications running on the cloud Bogdan Nicolae : Efficient data storage on distributed systems 2/ 45

3 Information overload Large-scale computing infrastructure Data storage at large scale We live in exponential times... Every two years the amount of information doubles Making something useful out of it becomes incresingly difficult We depend more and more on large-scale computing infrastructure Bogdan Nicolae : Efficient data storage on distributed systems 3/ 45

4 Information overload Large-scale computing infrastructure Data storage at large scale Dealing with information overload: Enterprise datacenters Tens of thousands of machines in huge clusters Leveraged directly by the owner Commodity hardware: minimizes per unit cost Easy to add, upgrade and replace Data-intensive applications Bogdan Nicolae : Efficient data storage on distributed systems 4/ 45

5 Information overload Large-scale computing infrastructure Data storage at large scale Dealing with information overload: Clouds Computing as utility rather than capital investment Driven by pay-as-you-go model Several levels of abstraction: IaaS, PaaS, SaaS Several advantages: low entry cost, elasticity, rapid development Bogdan Nicolae : Efficient data storage on distributed systems 5/ 45

6 Context Information overload Large-scale computing infrastructure Data storage at large scale Dealing with the information overload: HPC infrastructures Complex scientific and engineering applications High-end hardware Manipulate information at petabyte-scale and beyond Bogdan Nicolae : Efficient data storage on distributed systems 6/ 45

7 Information overload Large-scale computing infrastructure Data storage at large scale Data storage and management is a key issue Requirements Easy manipulation of data High access throughput Scalability Data Metadata Bogdan Nicolae : Efficient data storage on distributed systems 7/ 45

8 Information overload Large-scale computing infrastructure Data storage at large scale Current approaches: Parallel file systems Mostly used in HPC infrastructures POSIX access interface Data striping Advanced caching Pros Distributed data MPI optimizations Cons Locking-based Too many small files Bogdan Nicolae : Efficient data storage on distributed systems 8/ 45

9 Information overload Large-scale computing infrastructure Data storage at large scale Current approaches: Data-intensive oriented file systems Huge files Writes at random offsets are seldom Files grow by atomic appends Fine grain concurrent reads Pros No locking Data location aware Cons Centralized metadata Expensive updates Bogdan Nicolae : Efficient data storage on distributed systems 9/ 45

10 Information overload Large-scale computing infrastructure Data storage at large scale Current approaches: Cloud data storage services Virtualize storage resources Pay for duration, size and traffic Flat naming scheme Simple access model Pros High data availability Versioning Cons Limited object size Limited concurrency control Bogdan Nicolae : Efficient data storage on distributed systems 10/ 4

11 Information overload Large-scale computing infrastructure Data storage at large scale Limitations of existing approaches Issue Parallel FS Data-intensive FS Cloud store Too many small files Centralized metadata No versioning support No fine grain writes Addressed Addressed Addressed Addressed Addressed Bogdan Nicolae : Efficient data storage on distributed systems 11/ 4

12 Information overload Large-scale computing infrastructure Data storage at large scale Limitations of existing approaches Issue Parallel FS Data-intensive FS Cloud store??? Too many small files Centralized metadata No versioning support No fine grain writes Addressed Addressed Addressed Addressed Addressed Addressed Addressed Addressed Addressed Bogdan Nicolae : Efficient data storage on distributed systems 12/ 4

13 Principles High level description Metadata management Design issues Synthetic benchmarks Contribution: Data Sharing at Large Scale Bogdan Nicolae : Efficient data storage on distributed systems 13/ 4

14 Principles Context Principles High level description Metadata management Design issues Synthetic benchmarks BLOBs Eliminate need to keep many small files Provide fine-grain R/W access Data striping Distributes I/O workload Enables user to configure distribution strategy Decentralized metadata Distributes metadata at fine granularity Brings scalability and high avalability Versioning is a key design principle Bogdan Nicolae : Efficient data storage on distributed systems 14/ 4

15 Versioning as a key principle Principles High level description Metadata management Design issues Synthetic benchmarks Clients mutate BLOBs by submitting diffs A BLOB is never overwritten: a new snapshot is generated Only diffs are stored Clients see whole, fully independent snapshots Fine-grain read access to any past snapshot is possible Bogdan Nicolae : Efficient data storage on distributed systems 15/ 4

16 Principles High level description Metadata management Design issues Synthetic benchmarks Contribution: a versioning-oriented access interface id = CREATE() v = APPEND(id, size, buffer) v = WRITE(id, offset, size, buffer) (v, size) = GET RECENT(id) READ(id, v, offset, size, buffer) new id = CLONE(id, v) dv = MERGE(sid, sv, soffset, ssize, did, doffset) Bogdan Nicolae : Efficient data storage on distributed systems 16/ 4

17 Consistency semantics Context Principles High level description Metadata management Design issues Synthetic benchmarks Writes: are atomic and totally ordered No guarantee when they become visible to clients Finish before becoming visible to clients Reads: require a version explicitly GET RECENT does not guarantee latest version Can read any version older than GET RECENT Read exposed writes only Bogdan Nicolae : Efficient data storage on distributed systems 17/ 4

18 Advantages of this proposal Principles High level description Metadata management Design issues Synthetic benchmarks Access to historic data Revert to previous snapshots Track changes to data Exploit data parallelism better Avoid synchronization to achieve better throughput Build complex workflows Bogdan Nicolae : Efficient data storage on distributed systems 18/ 4

19 Architecture Context Principles High level description Metadata management Design issues Synthetic benchmarks Data providers Metadata providers Provider manager Allocation strategy Version manager Guarantees total ordering and atomicity Bogdan Nicolae : Efficient data storage on distributed systems 19/ 4

20 How does a read work? Context Principles High level description Metadata management Design issues Synthetic benchmarks 1 Select a version (optionally ask version manager for the most recently exposed version) 2 Fetch the corresponding metadata from the metadata providers 3 Contact providers in parallel and fetch the chunks into the local buffer Bogdan Nicolae : Efficient data storage on distributed systems 20/ 4

21 How does a write work? Context Principles High level description Metadata management Design issues Synthetic benchmarks 1 Get a list of providers, one for each chunk 2 Contact providers in parallel and write the chunks 3 Get a version number for the update 4 Add new metadata to consolidate the new version 5 Report the new version is ready Version manager will eventually expose it Bogdan Nicolae : Efficient data storage on distributed systems 21/ 4

22 Principles High level description Metadata management Design issues Synthetic benchmarks How to guarantee total ordering and atomicity? Clients ask for a version number Version manager assigns version numbers Clients write metadata concurrently Clients confirm completion Version manager exposes versions in order of assignment Bogdan Nicolae : Efficient data storage on distributed systems 22/ 4

23 Principles High level description Metadata management Design issues Synthetic benchmarks Zoom on metadata management: Motivation Present fully independent snapshots in spite of writing only diffs Access performance should not degrade with increasing number of diffs Proposed so far: B-Trees, Shadowing Difficult to maintain in a distributed fashion Expensive synchronization for concurrent updates Our goals: Easy to manage in a distributed fashion Efficient concurrent updates Bogdan Nicolae : Efficient data storage on distributed systems 23/ 4

24 Principles High level description Metadata management Design issues Synthetic benchmarks Contribution: Versioning over Distributed Segment Trees Deals with distributing the metadata while avoiding expensive management Binary tree is associated to each BLOB snapshot Reads descend towards leaves, writes build new trees bottom-up Key ideas Keep metadata immutable Share whole sub-trees Bogdan Nicolae : Efficient data storage on distributed systems 24/ 4

25 Principles High level description Metadata management Design issues Synthetic benchmarks Contribution: Versioning over Distributed Segment Trees Deals with distributing the metadata while avoiding expensive management Binary tree is associated to each BLOB snapshot Reads descend towards leaves, writes build new trees bottom-up Key ideas Keep metadata immutable Share whole sub-trees Bogdan Nicolae : Efficient data storage on distributed systems 24/ 4

26 Principles High level description Metadata management Design issues Synthetic benchmarks Contribution: Versioning over Distributed Segment Trees Deals with distributing the metadata while avoiding expensive management Binary tree is associated to each BLOB snapshot Reads descend towards leaves, writes build new trees bottom-up Key ideas Keep metadata immutable Share whole sub-trees Bogdan Nicolae : Efficient data storage on distributed systems 24/ 4

27 Principles High level description Metadata management Design issues Synthetic benchmarks Contribution: Versioning over Distributed Segment Trees Deals with distributing the metadata while avoiding expensive management Binary tree is associated to each BLOB snapshot Reads descend towards leaves, writes build new trees bottom-up Key ideas Keep metadata immutable Share whole sub-trees Bogdan Nicolae : Efficient data storage on distributed systems 24/ 4

28 Metadata forward references Principles High level description Metadata management Design issues Synthetic benchmarks Solve the problem of efficient concurrent updates to the metadata Key idea: precalculate children of lower versions instead of waiting Example Initial BLOB Three concurrent writers finished writing their chunks Black is faster but knows about blue and links to the not-yet-existing node After red and blue finish, metadata is consistent Bogdan Nicolae : Efficient data storage on distributed systems 25/ 4

29 Metadata forward references Principles High level description Metadata management Design issues Synthetic benchmarks Solve the problem of efficient concurrent updates to the metadata Key idea: precalculate children of lower versions instead of waiting Example Initial BLOB Three concurrent writers finished writing their chunks Black is faster but knows about blue and links to the not-yet-existing node After red and blue finish, metadata is consistent Bogdan Nicolae : Efficient data storage on distributed systems 25/ 4

30 Metadata forward references Principles High level description Metadata management Design issues Synthetic benchmarks Solve the problem of efficient concurrent updates to the metadata Key idea: precalculate children of lower versions instead of waiting Example Initial BLOB Three concurrent writers finished writing their chunks Black is faster but knows about blue and links to the not-yet-existing node After red and blue finish, metadata is consistent Bogdan Nicolae : Efficient data storage on distributed systems 25/ 4

31 Metadata forward references Principles High level description Metadata management Design issues Synthetic benchmarks Solve the problem of efficient concurrent updates to the metadata Key idea: precalculate children of lower versions instead of waiting Example Initial BLOB Three concurrent writers finished writing their chunks Black is faster but knows about blue and links to the not-yet-existing node After red and blue finish, metadata is consistent Bogdan Nicolae : Efficient data storage on distributed systems 25/ 4

32 Design considerations Context Principles High level description Metadata management Design issues Synthetic benchmarks Event-driven, layered design Callbacks instead of blocking Asynchronous RPC for interprocess communication Metadata providers form a DHT Custom implementation Plugin-able allocation strategy on provider manager Round-robin load-balancing by default More adaptive solutions possible Bogdan Nicolae : Efficient data storage on distributed systems 26/ 4

33 Fault tolerance Context Principles High level description Metadata management Design issues Synthetic benchmarks Clients die during write Before asking for a version: no problem After asking for a version: delegate to metadata provider Data and/or metadata providers die Replication of chunks and metadata pieces Data and metadata immutable: no sync between replicas needed Version manager and/or provider manager dies Distributed state machine using leader election Bogdan Nicolae : Efficient data storage on distributed systems 27/ 4

34 Experimental platform: Grid 5000 Principles High level description Metadata management Design issues Synthetic benchmarks Experimental testbed distributed in 9 sites around France A total of more than 5000 cores Reservation system grants exclusive access for experiments x86 64 CPUs, >2 GB RAM, locally attached disks Interconnect: Gigabit Ethernet, Myrinet, Infiniband Bogdan Nicolae : Efficient data storage on distributed systems 28/ 4

35 Results: synthetic benchmarks Principles High level description Metadata management Design issues Synthetic benchmarks Data striping Metadata decentralization Bogdan Nicolae : Efficient data storage on distributed systems 29/ 4

36 Results: synthetic benchmarks (2) Principles High level description Metadata management Design issues Synthetic benchmarks Increase write pressure while reading Increase read pressure while writing Bogdan Nicolae : Efficient data storage on distributed systems 30/ 4

37 Storage backend for Hadoop MapReduce Virtual machine image storage for IaaS clouds Quality-of-service enabled storage for cloud applications Storage backend for Hadoop MapReduce Efficient VM image deployment and snapshotting on clouds QoS enabled storage for clouds Bogdan Nicolae : Efficient data storage on distributed systems 31/ 4

38 Storage backend for Hadoop MapReduce Virtual machine image storage for IaaS clouds Quality-of-service enabled storage for cloud applications MapReduce Data-intensive oriented paradigm Covers a wide range of data-intensive application classes Users stick to a well-defined model Widely adopted: Google, Yahoo Popular open-source implementation: Hadoop Bogdan Nicolae : Efficient data storage on distributed systems 32/ 4

39 Storage backend for Hadoop MapReduce Virtual machine image storage for IaaS clouds Quality-of-service enabled storage for cloud applications as a storage backend for Hadoop MapReduce Proposal replaces HDFS (default storage backend) Design issues Implement Hadoop API Hierarchic namespace for BLOBs Data prefetching Affinity scheduling: exposing data location Bogdan Nicolae : Efficient data storage on distributed systems 33/ 4

40 Storage backend for Hadoop MapReduce Virtual machine image storage for IaaS clouds Quality-of-service enabled storage for cloud applications Results: synthetic benchmarks Single writer Concurrent readers Bogdan Nicolae : Efficient data storage on distributed systems 34/ 4

41 Storage backend for Hadoop MapReduce Virtual machine image storage for IaaS clouds Quality-of-service enabled storage for cloud applications Results: Real MapReduce applications Write intensive Read intensive Bogdan Nicolae : Efficient data storage on distributed systems 35/ 4

42 Storage backend for Hadoop MapReduce Virtual machine image storage for IaaS clouds Quality-of-service enabled storage for cloud applications In short Improvement of 11%-30% over HDFS for real MapReduce applications Potential to leverage versioning in Hadoop Further improve performance New features Bogdan Nicolae : Efficient data storage on distributed systems 36/ 4

43 Storage backend for Hadoop MapReduce Virtual machine image storage for IaaS clouds Quality-of-service enabled storage for cloud applications Virtual machine image storage for IaaS clouds On IaaS clouds users rent resources as VMs VM image customized with user application Two patterns: Multi-deployment: instantiate many VMs from the same image Multi-snapshotting: save state of VMs into independent images State-of-art: full pre-propagation, then copy images back to repository Our goal: reduce cost (execution time, storage space, network traffic) Bogdan Nicolae : Efficient data storage on distributed systems 37/ 4

44 Storage backend for Hadoop MapReduce Virtual machine image storage for IaaS clouds Quality-of-service enabled storage for cloud applications Proposal Store image in a striped fashion Leverage to store each image as a BLOB Mirror image contents locally Lazy scheme: only read from BLOB when needed Keep changes to image local Only the necessary parts are accessed Consolidate local changes into an independent image CLONE BLOB, then WRITE local changes to BLOB Provides illusion of independent images Only differences are stored Bogdan Nicolae : Efficient data storage on distributed systems 38/ 4

45 Storage backend for Hadoop MapReduce Virtual machine image storage for IaaS clouds Quality-of-service enabled storage for cloud applications Results: scalability of multi-deployment under concurrency VM boot: runtime speedup vs. pre-propagation VM boot: total network traffic vs. pre-propagation Bogdan Nicolae : Efficient data storage on distributed systems 39/ 4

46 Storage backend for Hadoop MapReduce Virtual machine image storage for IaaS clouds Quality-of-service enabled storage for cloud applications Results: performance of multi-snapshotting Avg. time to take snapshots Total network traffic (storage space) Bogdan Nicolae : Efficient data storage on distributed systems 40/ 4

47 Storage backend for Hadoop MapReduce Virtual machine image storage for IaaS clouds Quality-of-service enabled storage for cloud applications In short Large speedup and network traffic savings over pre-propagation for multi-deployment Efficient multi-snapshotting Portable approach: does not depend on hypervisor to manage diffs Bogdan Nicolae : Efficient data storage on distributed systems 41/ 4

48 Storage backend for Hadoop MapReduce Virtual machine image storage for IaaS clouds Quality-of-service enabled storage for cloud applications Quality-of-service enabled storage for cloud applications Storage for data-intensive applications deployed on IaaS clouds QoS impacted by several factors Multiple customers share the same storage service Hardware components prone to failures Application access pattern We need: High aggregated throughput under concurrency Stable throughput for individual data accesses Bogdan Nicolae : Efficient data storage on distributed systems 42/ 4

49 Storage backend for Hadoop MapReduce Virtual machine image storage for IaaS clouds Quality-of-service enabled storage for cloud applications Proposal: QoS improvement methodology Methodology 1 Monitor storage service 2 Collect app feedback 3 Identify + classify behavior patterns using GloBeM 4 Prevent undesired patterns Input: MapReduce access patterns + faults Applied methodology to find bottlenecks Output: Improved allocation strategy Bogdan Nicolae : Efficient data storage on distributed systems 43/ 4

50 Storage backend for Hadoop MapReduce Virtual machine image storage for IaaS clouds Quality-of-service enabled storage for cloud applications Results: improvement of throughput stability Clients separated from providers Clients co-deployed with providers Bogdan Nicolae : Efficient data storage on distributed systems 44/ 4

51 Storage backend for Hadoop MapReduce Virtual machine image storage for IaaS clouds Quality-of-service enabled storage for cloud applications In short General methodology to improve QoS for cloud storage Concretely: Reduction in standard deviation for read throughput of up to 25% Promising results for cloud providers to improve SLA for the same price Bogdan Nicolae : Efficient data storage on distributed systems 45/ 4

Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000

Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000 Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000 Alexandra Carpen-Amarie Diana Moise Bogdan Nicolae KerData Team, INRIA Outline

More information

BlobSeer: Enabling Efficient Lock-Free, Versioning-Based Storage for Massive Data under Heavy Access Concurrency

BlobSeer: Enabling Efficient Lock-Free, Versioning-Based Storage for Massive Data under Heavy Access Concurrency BlobSeer: Enabling Efficient Lock-Free, Versioning-Based Storage for Massive Data under Heavy Access Concurrency Gabriel Antoniu 1, Luc Bougé 2, Bogdan Nicolae 3 KerData research team 1 INRIA Rennes -

More information

Going Back and Forth: Efficient Multideployment and Multisnapshotting on Clouds

Going Back and Forth: Efficient Multideployment and Multisnapshotting on Clouds Going Back and Forth: Efficient Multideployment and Multisnapshotting on Clouds Bogdan Nicolae INRIA Saclay France bogdan.nicolae@inria.fr John Bresnahan Argonne National Laboratory USA bresnaha@mcs.anl.gov

More information

Computing in clouds: Where we come from, Where we are, What we can, Where we go

Computing in clouds: Where we come from, Where we are, What we can, Where we go Computing in clouds: Where we come from, Where we are, What we can, Where we go Luc Bougé ENS Cachan/Rennes, IRISA, INRIA Biogenouest With help from many colleagues: Gabriel Antoniu, Guillaume Pierre,

More information

A Cost-Evaluation of MapReduce Applications in the Cloud

A Cost-Evaluation of MapReduce Applications in the Cloud 1/23 A Cost-Evaluation of MapReduce Applications in the Cloud Diana Moise, Alexandra Carpen-Amarie Gabriel Antoniu, Luc Bougé KerData team 2/23 1 MapReduce applications - case study 2 3 4 5 3/23 MapReduce

More information

Storage Architectures for Big Data in the Cloud

Storage Architectures for Big Data in the Cloud Storage Architectures for Big Data in the Cloud Sam Fineberg HP Storage CT Office/ May 2013 Overview Introduction What is big data? Big Data I/O Hadoop/HDFS SAN Distributed FS Cloud Summary Research Areas

More information

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes

More information

J. Parallel Distrib. Comput. BlobCR: Virtual disk based checkpoint-restart for HPC applications on IaaS clouds

J. Parallel Distrib. Comput. BlobCR: Virtual disk based checkpoint-restart for HPC applications on IaaS clouds J. Parallel Distrib. Comput. 73 (2013) 698 711 Contents lists available at SciVerse ScienceDirect J. Parallel Distrib. Comput. journal homepage: www.elsevier.com/locate/jpdc BlobCR: Virtual disk based

More information

Accelerating and Simplifying Apache

Accelerating and Simplifying Apache Accelerating and Simplifying Apache Hadoop with Panasas ActiveStor White paper NOvember 2012 1.888.PANASAS www.panasas.com Executive Overview The technology requirements for big data vary significantly

More information

THE HADOOP DISTRIBUTED FILE SYSTEM

THE HADOOP DISTRIBUTED FILE SYSTEM THE HADOOP DISTRIBUTED FILE SYSTEM Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Presented by Alexander Pokluda October 7, 2013 Outline Motivation and Overview of Hadoop Architecture,

More information

BlobCR: Efficient Checkpoint-Restart for HPC Applications on IaaS Clouds using Virtual Disk Image Snapshots

BlobCR: Efficient Checkpoint-Restart for HPC Applications on IaaS Clouds using Virtual Disk Image Snapshots BlobCR: Efficient Checkpoint-Restart for HPC Applications on IaaS Clouds using Virtual Disk Image Snapshots Bogdan Nicolae INRIA Saclay, Île-de-France, France bogdan.nicolae@inria.fr Franck Cappello INRIA

More information

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or

More information

Hadoop Architecture. Part 1

Hadoop Architecture. Part 1 Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Cloud Computing I (intro) 15 319, spring 2010 2 nd Lecture, Jan 14 th Majd F. Sakr Lecture Motivation General overview on cloud computing What is cloud computing Services

More information

GeoGrid Project and Experiences with Hadoop

GeoGrid Project and Experiences with Hadoop GeoGrid Project and Experiences with Hadoop Gong Zhang and Ling Liu Distributed Data Intensive Systems Lab (DiSL) Center for Experimental Computer Systems Research (CERCS) Georgia Institute of Technology

More information

Google File System. Web and scalability

Google File System. Web and scalability Google File System Web and scalability The web: - How big is the Web right now? No one knows. - Number of pages that are crawled: o 100,000 pages in 1994 o 8 million pages in 2005 - Crawlable pages might

More information

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl Big Data Processing, 2014/15 Lecture 5: GFS & HDFS!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind

More information

Sistemi Operativi e Reti. Cloud Computing

Sistemi Operativi e Reti. Cloud Computing 1 Sistemi Operativi e Reti Cloud Computing Facoltà di Scienze Matematiche Fisiche e Naturali Corso di Laurea Magistrale in Informatica Osvaldo Gervasi ogervasi@computer.org 2 Introduction Technologies

More information

Towards the Magic Green Broker Jean-Louis Pazat IRISA 1/29. Jean-Louis Pazat. IRISA/INSA Rennes, FRANCE MYRIADS Project Team

Towards the Magic Green Broker Jean-Louis Pazat IRISA 1/29. Jean-Louis Pazat. IRISA/INSA Rennes, FRANCE MYRIADS Project Team Towards the Magic Green Broker Jean-Louis Pazat IRISA 1/29 Jean-Louis Pazat IRISA/INSA Rennes, FRANCE MYRIADS Project Team Towards the Magic Green Broker Jean-Louis Pazat IRISA 2/29 OUTLINE Clouds and

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

Performance Analysis of Mixed Distributed Filesystem Workloads

Performance Analysis of Mixed Distributed Filesystem Workloads Performance Analysis of Mixed Distributed Filesystem Workloads Esteban Molina-Estolano, Maya Gokhale, Carlos Maltzahn, John May, John Bent, Scott Brandt Motivation Hadoop-tailored filesystems (e.g. CloudStore)

More information

HPC performance applications on Virtual Clusters

HPC performance applications on Virtual Clusters Panagiotis Kritikakos EPCC, School of Physics & Astronomy, University of Edinburgh, Scotland - UK pkritika@epcc.ed.ac.uk 4 th IC-SCCE, Athens 7 th July 2010 This work investigates the performance of (Java)

More information

Will They Blend?: Exploring Big Data Computation atop Traditional HPC NAS Storage

Will They Blend?: Exploring Big Data Computation atop Traditional HPC NAS Storage Will They Blend?: Exploring Big Data Computation atop Traditional HPC NAS Storage Ellis H. Wilson III 1,2 Mahmut Kandemir 1 Garth Gibson 2,3 1 Department of Computer Science and Engineering, The Pennsylvania

More information

Evaluation Methodology of Converged Cloud Environments

Evaluation Methodology of Converged Cloud Environments Krzysztof Zieliński Marcin Jarząb Sławomir Zieliński Karol Grzegorczyk Maciej Malawski Mariusz Zyśk Evaluation Methodology of Converged Cloud Environments Cloud Computing Cloud Computing enables convenient,

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

Energy Efficient MapReduce

Energy Efficient MapReduce Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing

More information

Distributed File Systems

Distributed File Systems Distributed File Systems Paul Krzyzanowski Rutgers University October 28, 2012 1 Introduction The classic network file systems we examined, NFS, CIFS, AFS, Coda, were designed as client-server applications.

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

Scala Storage Scale-Out Clustered Storage White Paper

Scala Storage Scale-Out Clustered Storage White Paper White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current

More information

Distributed File Systems

Distributed File Systems Distributed File Systems Mauro Fruet University of Trento - Italy 2011/12/19 Mauro Fruet (UniTN) Distributed File Systems 2011/12/19 1 / 39 Outline 1 Distributed File Systems 2 The Google File System (GFS)

More information

NoSQL and Hadoop Technologies On Oracle Cloud

NoSQL and Hadoop Technologies On Oracle Cloud NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath

More information

Violin: A Framework for Extensible Block-level Storage

Violin: A Framework for Extensible Block-level Storage Violin: A Framework for Extensible Block-level Storage Michail Flouris Dept. of Computer Science, University of Toronto, Canada flouris@cs.toronto.edu Angelos Bilas ICS-FORTH & University of Crete, Greece

More information

Evaluating HDFS I/O Performance on Virtualized Systems

Evaluating HDFS I/O Performance on Virtualized Systems Evaluating HDFS I/O Performance on Virtualized Systems Xin Tang xtang@cs.wisc.edu University of Wisconsin-Madison Department of Computer Sciences Abstract Hadoop as a Service (HaaS) has received increasing

More information

IaaS Cloud Architectures: Virtualized Data Centers to Federated Cloud Infrastructures

IaaS Cloud Architectures: Virtualized Data Centers to Federated Cloud Infrastructures IaaS Cloud Architectures: Virtualized Data Centers to Federated Cloud Infrastructures Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF Introduction

More information

BookKeeper. Flavio Junqueira Yahoo! Research, Barcelona. Hadoop in China 2011

BookKeeper. Flavio Junqueira Yahoo! Research, Barcelona. Hadoop in China 2011 BookKeeper Flavio Junqueira Yahoo! Research, Barcelona Hadoop in China 2011 What s BookKeeper? Shared storage for writing fast sequences of byte arrays Data is replicated Writes are striped Many processes

More information

Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela

Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Hadoop Distributed File System T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Agenda Introduction Flesh and bones of HDFS Architecture Accessing data Data replication strategy Fault tolerance

More information

Performance Evaluation for BlobSeer and Hadoop using Machine Learning Algorithms

Performance Evaluation for BlobSeer and Hadoop using Machine Learning Algorithms Performance Evaluation for BlobSeer and Hadoop using Machine Learning Algorithms Elena Burceanu, Irina Presa Automatic Control and Computers Faculty Politehnica University of Bucharest Emails: {elena.burceanu,

More information

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics Overview Big Data in Apache Hadoop - HDFS - MapReduce in Hadoop - YARN https://hadoop.apache.org 138 Apache Hadoop - Historical Background - 2003: Google publishes its cluster architecture & DFS (GFS)

More information

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components of Hadoop. We will see what types of nodes can exist in a Hadoop

More information

Amazon EC2 Product Details Page 1 of 5

Amazon EC2 Product Details Page 1 of 5 Amazon EC2 Product Details Page 1 of 5 Amazon EC2 Functionality Amazon EC2 presents a true virtual computing environment, allowing you to use web service interfaces to launch instances with a variety of

More information

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Agenda Introduction Database Architecture Direct NFS Client NFS Server

More information

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency WHITE PAPER Solving I/O Bottlenecks to Enable Superior Cloud Efficiency Overview...1 Mellanox I/O Virtualization Features and Benefits...2 Summary...6 Overview We already have 8 or even 16 cores on one

More information

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform Page 1 of 16 Table of Contents Table of Contents... 2 Introduction... 3 NoSQL Databases... 3 CumuLogic NoSQL Database Service...

More information

Cloud Computing at Google. Architecture

Cloud Computing at Google. Architecture Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale

More information

CSE-E5430 Scalable Cloud Computing Lecture 2

CSE-E5430 Scalable Cloud Computing Lecture 2 CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing

More information

HDFS Users Guide. Table of contents

HDFS Users Guide. Table of contents Table of contents 1 Purpose...2 2 Overview...2 3 Prerequisites...3 4 Web Interface...3 5 Shell Commands... 3 5.1 DFSAdmin Command...4 6 Secondary NameNode...4 7 Checkpoint Node...5 8 Backup Node...6 9

More information

Big data management with IBM General Parallel File System

Big data management with IBM General Parallel File System Big data management with IBM General Parallel File System Optimize storage management and boost your return on investment Highlights Handles the explosive growth of structured and unstructured data Offers

More information

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction

More information

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction

More information

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging

More information

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS) Journal of science e ISSN 2277-3290 Print ISSN 2277-3282 Information Technology www.journalofscience.net STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS) S. Chandra

More information

Can High-Performance Interconnects Benefit Memcached and Hadoop?

Can High-Performance Interconnects Benefit Memcached and Hadoop? Can High-Performance Interconnects Benefit Memcached and Hadoop? D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University,

More information

The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform

The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform Fong-Hao Liu, Ya-Ruei Liou, Hsiang-Fu Lo, Ko-Chin Chang, and Wei-Tsong Lee Abstract Virtualization platform solutions

More information

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next

More information

HDFS Architecture Guide

HDFS Architecture Guide by Dhruba Borthakur Table of contents 1 Introduction... 3 2 Assumptions and Goals... 3 2.1 Hardware Failure... 3 2.2 Streaming Data Access...3 2.3 Large Data Sets... 3 2.4 Simple Coherency Model...3 2.5

More information

Hadoop and Map-Reduce. Swati Gore

Hadoop and Map-Reduce. Swati Gore Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data

More information

Scalable Data-Intensive Processing for Science on Clouds: A-Brain and Z-CloudFlow

Scalable Data-Intensive Processing for Science on Clouds: A-Brain and Z-CloudFlow Scalable Data-Intensive Processing for Science on Clouds: A-Brain and Z-CloudFlow Lessons Learned and Future Directions Gabriel Antoniu, Inria Joint work with Radu Tudoran, Benoit Da Mota, Alexandru Costan,

More information

Cloud Computing Trends

Cloud Computing Trends UT DALLAS Erik Jonsson School of Engineering & Computer Science Cloud Computing Trends What is cloud computing? Cloud computing refers to the apps and services delivered over the internet. Software delivered

More information

CIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensive Computing. University of Florida, CISE Department Prof.

CIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensive Computing. University of Florida, CISE Department Prof. CIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensie Computing Uniersity of Florida, CISE Department Prof. Daisy Zhe Wang Map/Reduce: Simplified Data Processing on Large Clusters Parallel/Distributed

More information

Assignment # 1 (Cloud Computing Security)

Assignment # 1 (Cloud Computing Security) Assignment # 1 (Cloud Computing Security) Group Members: Abdullah Abid Zeeshan Qaiser M. Umar Hayat Table of Contents Windows Azure Introduction... 4 Windows Azure Services... 4 1. Compute... 4 a) Virtual

More information

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets

More information

On the Duality of Data-intensive File System Design: Reconciling HDFS and PVFS

On the Duality of Data-intensive File System Design: Reconciling HDFS and PVFS On the Duality of Data-intensive File System Design: Reconciling HDFS and PVFS Wittawat Tantisiriroj Seung Woo Son Swapnil Patil Carnegie Mellon University Samuel J. Lang Argonne National Laboratory Garth

More information

Parallel Processing of cluster by Map Reduce

Parallel Processing of cluster by Map Reduce Parallel Processing of cluster by Map Reduce Abstract Madhavi Vaidya, Department of Computer Science Vivekanand College, Chembur, Mumbai vamadhavi04@yahoo.co.in MapReduce is a parallel programming model

More information

Hadoop: Embracing future hardware

Hadoop: Embracing future hardware Hadoop: Embracing future hardware Suresh Srinivas @suresh_m_s Page 1 About Me Architect & Founder at Hortonworks Long time Apache Hadoop committer and PMC member Designed and developed many key Hadoop

More information

Open Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud)

Open Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud) Open Cloud System (Integration of Eucalyptus, Hadoop and into deployment of University Private Cloud) Thinn Thu Naing University of Computer Studies, Yangon 25 th October 2011 Open Cloud System University

More information

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc hairong@yahoo-inc.com

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc hairong@yahoo-inc.com Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc hairong@yahoo-inc.com What s Hadoop Framework for running applications on large clusters of commodity hardware Scale: petabytes of data

More information

A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage

A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage Volume 2, No.4, July August 2013 International Journal of Information Systems and Computer Sciences ISSN 2319 7595 Tejaswini S L Jayanthy et al., Available International Online Journal at http://warse.org/pdfs/ijiscs03242013.pdf

More information

CUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS. Review Business and Technology Series www.cumulux.com

CUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS. Review Business and Technology Series www.cumulux.com ` CUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS Review Business and Technology Series www.cumulux.com Table of Contents Cloud Computing Model...2 Impact on IT Management and

More information

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel Parallel Databases Increase performance by performing operations in parallel Parallel Architectures Shared memory Shared disk Shared nothing closely coupled loosely coupled Parallelism Terminology Speedup:

More information

Mixing Hadoop and HPC Workloads on Parallel Filesystems

Mixing Hadoop and HPC Workloads on Parallel Filesystems Mixing Hadoop and HPC Workloads on Parallel Filesystems Esteban Molina-Estolano *, Maya Gokhale, Carlos Maltzahn *, John May, John Bent, Scott Brandt * * UC Santa Cruz, ISSDM, PDSI Lawrence Livermore National

More information

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2016 MapReduce MapReduce is a programming model

More information

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS By HAI JIN, SHADI IBRAHIM, LI QI, HAIJUN CAO, SONG WU and XUANHUA SHI Prepared by: Dr. Faramarz Safi Islamic Azad

More information

Cloud Computing: Meet the Players. Performance Analysis of Cloud Providers

Cloud Computing: Meet the Players. Performance Analysis of Cloud Providers BASEL UNIVERSITY COMPUTER SCIENCE DEPARTMENT Cloud Computing: Meet the Players. Performance Analysis of Cloud Providers Distributed Information Systems (CS341/HS2010) Report based on D.Kassman, T.Kraska,

More information

Software Execution Protection in the Cloud

Software Execution Protection in the Cloud Software Execution Protection in the Cloud Miguel Correia 1st European Workshop on Dependable Cloud Computing Sibiu, Romania, May 8 th 2012 Motivation clouds fail 2 1 Motivation accidental arbitrary faults

More information

MaxDeploy Hyper- Converged Reference Architecture Solution Brief

MaxDeploy Hyper- Converged Reference Architecture Solution Brief MaxDeploy Hyper- Converged Reference Architecture Solution Brief MaxDeploy Reference Architecture solutions are configured and tested for support with Maxta software- defined storage and with industry

More information

Managed Virtualized Platforms: From Multicore Nodes to Distributed Cloud Infrastructures

Managed Virtualized Platforms: From Multicore Nodes to Distributed Cloud Infrastructures Managed Virtualized Platforms: From Multicore Nodes to Distributed Cloud Infrastructures Ada Gavrilovska Karsten Schwan, Mukil Kesavan Sanjay Kumar, Ripal Nathuji, Adit Ranadive Center for Experimental

More information

MapReduce and Hadoop. Aaron Birkland Cornell Center for Advanced Computing. January 2012

MapReduce and Hadoop. Aaron Birkland Cornell Center for Advanced Computing. January 2012 MapReduce and Hadoop Aaron Birkland Cornell Center for Advanced Computing January 2012 Motivation Simple programming model for Big Data Distributed, parallel but hides this Established success at petabyte

More information

Big Data Processing in the Cloud. Shadi Ibrahim Inria, Rennes - Bretagne Atlantique Research Center

Big Data Processing in the Cloud. Shadi Ibrahim Inria, Rennes - Bretagne Atlantique Research Center Big Data Processing in the Cloud Shadi Ibrahim Inria, Rennes - Bretagne Atlantique Research Center Data is ONLY as useful as the decisions it enables 2 Data is ONLY as useful as the decisions it enables

More information

Optimize the execution of local physics analysis workflows using Hadoop

Optimize the execution of local physics analysis workflows using Hadoop Optimize the execution of local physics analysis workflows using Hadoop INFN CCR - GARR Workshop 14-17 May Napoli Hassen Riahi Giacinto Donvito Livio Fano Massimiliano Fasi Andrea Valentini INFN-PERUGIA

More information

Lecture 02a Cloud Computing I

Lecture 02a Cloud Computing I Mobile Cloud Computing Lecture 02a Cloud Computing I 吳 秀 陽 Shiow-yang Wu What is Cloud Computing? Computing with cloud? Mobile Cloud Computing Cloud Computing I 2 Note 1 What is Cloud Computing? Walking

More information

Comparative analysis of Google File System and Hadoop Distributed File System

Comparative analysis of Google File System and Hadoop Distributed File System Comparative analysis of Google File System and Hadoop Distributed File System R.Vijayakumari, R.Kirankumar, K.Gangadhara Rao Dept. of Computer Science, Krishna University, Machilipatnam, India, vijayakumari28@gmail.com

More information

Cloud/SaaS enablement of existing applications

Cloud/SaaS enablement of existing applications Cloud/SaaS enablement of existing applications GigaSpaces: Nati Shalom, CTO & Founder About GigaSpaces Technologies Enabling applications to run a distributed cluster as if it was a single machine 75+

More information

IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud

IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud February 25, 2014 1 Agenda v Mapping clients needs to cloud technologies v Addressing your pain

More information

Virtualizing Apache Hadoop. June, 2012

Virtualizing Apache Hadoop. June, 2012 June, 2012 Table of Contents EXECUTIVE SUMMARY... 3 INTRODUCTION... 3 VIRTUALIZING APACHE HADOOP... 4 INTRODUCTION TO VSPHERE TM... 4 USE CASES AND ADVANTAGES OF VIRTUALIZING HADOOP... 4 MYTHS ABOUT RUNNING

More information

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters COSC 6374 Parallel Computation Parallel I/O (I) I/O basics Spring 2008 Concept of a clusters Processor 1 local disks Compute node message passing network administrative network Memory Processor 2 Network

More information

Design and Evolution of the Apache Hadoop File System(HDFS)

Design and Evolution of the Apache Hadoop File System(HDFS) Design and Evolution of the Apache Hadoop File System(HDFS) Dhruba Borthakur Engineer@Facebook Committer@Apache HDFS SDC, Sept 19 2011 Outline Introduction Yet another file-system, why? Goals of Hadoop

More information

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters COSC 6374 Parallel I/O (I) I/O basics Fall 2012 Concept of a clusters Processor 1 local disks Compute node message passing network administrative network Memory Processor 2 Network card 1 Network card

More information

A Very Brief Introduction To Cloud Computing. Jens Vöckler, Gideon Juve, Ewa Deelman, G. Bruce Berriman

A Very Brief Introduction To Cloud Computing. Jens Vöckler, Gideon Juve, Ewa Deelman, G. Bruce Berriman A Very Brief Introduction To Cloud Computing Jens Vöckler, Gideon Juve, Ewa Deelman, G. Bruce Berriman What is The Cloud Cloud computing refers to logical computational resources accessible via a computer

More information

Load Rebalancing for File System in Public Cloud Roopa R.L 1, Jyothi Patil 2

Load Rebalancing for File System in Public Cloud Roopa R.L 1, Jyothi Patil 2 Load Rebalancing for File System in Public Cloud Roopa R.L 1, Jyothi Patil 2 1 PDA College of Engineering, Gulbarga, Karnataka, India rlrooparl@gmail.com 2 PDA College of Engineering, Gulbarga, Karnataka,

More information

Introduction to OpenStack

Introduction to OpenStack Introduction to OpenStack Carlo Vallati PostDoc Reseracher Dpt. Information Engineering University of Pisa carlo.vallati@iet.unipi.it Cloud Computing - Definition Cloud Computing is a term coined to refer

More information

Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7

Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7 Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7 Yan Fisher Senior Principal Product Marketing Manager, Red Hat Rohit Bakhshi Product Manager,

More information

Cluster, Grid, Cloud Concepts

Cluster, Grid, Cloud Concepts Cluster, Grid, Cloud Concepts Kalaiselvan.K Contents Section 1: Cluster Section 2: Grid Section 3: Cloud Cluster An Overview Need for a Cluster Cluster categorizations A computer cluster is a group of

More information

A Service for Data-Intensive Computations on Virtual Clusters

A Service for Data-Intensive Computations on Virtual Clusters A Service for Data-Intensive Computations on Virtual Clusters Executing Preservation Strategies at Scale Rainer Schmidt, Christian Sadilek, and Ross King rainer.schmidt@arcs.ac.at Planets Project Permanent

More information

An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing

An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing An Alternative Storage Solution for MapReduce Eric Lomascolo Director, Solutions Marketing MapReduce Breaks the Problem Down Data Analysis Distributes processing work (Map) across compute nodes and accumulates

More information

Hadoop@LaTech ATLAS Tier 3

Hadoop@LaTech ATLAS Tier 3 Cerberus Hadoop Hadoop@LaTech ATLAS Tier 3 David Palma DOSAR Louisiana Tech University January 23, 2013 Cerberus Hadoop Outline 1 Introduction Cerberus Hadoop 2 Features Issues Conclusions 3 Cerberus Hadoop

More information

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE Hadoop Storage-as-a-Service ABSTRACT This White Paper illustrates how EMC Elastic Cloud Storage (ECS ) can be used to streamline the Hadoop data analytics

More information

Cloud Computing. Chapter 1 Introducing Cloud Computing

Cloud Computing. Chapter 1 Introducing Cloud Computing Cloud Computing Chapter 1 Introducing Cloud Computing Learning Objectives Understand the abstract nature of cloud computing. Describe evolutionary factors of computing that led to the cloud. Describe virtualization

More information

A Study on Analysis and Implementation of a Cloud Computing Framework for Multimedia Convergence Services

A Study on Analysis and Implementation of a Cloud Computing Framework for Multimedia Convergence Services A Study on Analysis and Implementation of a Cloud Computing Framework for Multimedia Convergence Services Ronnie D. Caytiles and Byungjoo Park * Department of Multimedia Engineering, Hannam University

More information

Cloud Computing through Virtualization and HPC technologies

Cloud Computing through Virtualization and HPC technologies Cloud Computing through Virtualization and HPC technologies William Lu, Ph.D. 1 Agenda Cloud Computing & HPC A Case of HPC Implementation Application Performance in VM Summary 2 Cloud Computing & HPC HPC

More information