High Throughput Computing on P2P Networks. Carlos Pérez Miguel carlos.perezm@ehu.es

Similar documents
Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.

Cassandra A Decentralized, Structured Storage System

MASTER PROJECT. Resource Provisioning for NoSQL Datastores

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

NoSQL Data Base Basics

these three NoSQL databases because I wanted to see a the two different sides of the CAP

Storage Systems Autumn Chapter 6: Distributed Hash Tables and their Applications André Brinkmann

WSO2 Message Broker. Scalable persistent Messaging System

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

Practical Cassandra. Vitalii

NoSQL Databases. Nikos Parlavantzas

Structured Data Storage

The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Study and Comparison of Elastic Cloud Databases : Myth or Reality?

A Review of Column-Oriented Datastores. By: Zach Pratt. Independent Study Dr. Maskarinec Spring 2011

LARGE-SCALE DATA STORAGE APPLICATIONS

Introduction to NOSQL

Cassandra. Jonathan Ellis

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

Distributed Systems. Tutorial 12 Cassandra

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

Distributed Computing over Communication Networks: Topology. (with an excursion to P2P)

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

Lecture Data Warehouse Systems

Challenges for Data Driven Systems

A programming model in Cloud: MapReduce

A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November ISSN

Highly available, scalable and secure data with Cassandra and DataStax Enterprise. GOTO Berlin 27 th February 2014

Evaluation of NoSQL databases for large-scale decentralized microblogging

Distributed Storage Systems

Technical Overview Simple, Scalable, Object Storage Software

Can the Elephants Handle the NoSQL Onslaught?

Distributed Data Stores

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

In Memory Accelerator for MongoDB

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)

CSE-E5430 Scalable Cloud Computing Lecture 2

Blockchain, Throughput, and Big Data Trent McConaghy

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

HDB++: HIGH AVAILABILITY WITH. l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

Big Data Storage, Management and challenges. Ahmed Ali-Eldin

Big Systems, Big Data

Amazon EC2 Product Details Page 1 of 5

So What s the Big Deal?

Future Internet Technologies

In-Memory BigData. Summer 2012, Technology Overview

Peer-to-Peer Networks. Chapter 6: P2P Content Distribution

Using Peer to Peer Dynamic Querying in Grid Information Services

Data Consistency on Private Cloud Storage System

Case study: CASSANDRA

Data Management in the Cloud

Referential Integrity in Cloud NoSQL Databases

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

MapReduce and Hadoop. Aaron Birkland Cornell Center for Advanced Computing. January 2012

The Cloud Trade Off IBM Haifa Research Storage Systems

Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at

No-SQL Databases for High Volume Data

Hands-on Cassandra. OSCON July 20, Eric

Scalable Architecture on Amazon AWS Cloud

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

GraySort on Apache Spark by Databricks

Database Scalability and Oracle 12c

An Approach to Implement Map Reduce with NoSQL Databases

BlobSeer: Towards efficient data storage management on large-scale, distributed systems

NoSQL Systems for Big Data Management

Energy Efficient MapReduce

How To Scale Out Of A Nosql Database

Infrastructures for big data

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

Understanding Neo4j Scalability

HDMQ :Towards In-Order and Exactly-Once Delivery using Hierarchical Distributed Message Queues. Dharmit Patel Faraj Khasib Shiva Srivastava

How To Use Big Data For Telco (For A Telco)

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

A survey of big data architectures for handling massive data

Cassandra A Decentralized Structured Storage System

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

Geo-Replication in Large-Scale Cloud Computing Applications

Chapter 4 Cloud Computing Applications and Paradigms. Cloud Computing: Theory and Practice. 1

Apache Cassandra for Big Data Applications

Parallel & Distributed Data Management

Big Data in Test and Evaluation by Udaya Ranawake (HPCMP PETTT/Engility Corporation)

D1.1 Service Discovery system: Load balancing mechanisms

Cloud Scale Distributed Data Storage. Jürmo Mehine

Advanced Data Management Technologies

Distributed File Systems

CSE-E5430 Scalable Cloud Computing Lecture 11

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

Financial Services Grid Computing on Amazon Web Services January 2013 Ian Meyers

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Transcription:

High Throughput Computing on P2P Networks Carlos Pérez Miguel carlos.perezm@ehu.es

Overview High Throughput Computing Motivation All things distributed: Peer-to-peer Non structured overlays Structured overlays P2P Computing Cassandra HTC over Cassandra Eventual consistency Experiments Future Work Conclusions

High Throughput Computing Concept introduced by the Condor team in 1996 In contrast to HPC, it optimizes the execution of a set of applications Figure of merit: the number of computational tasks per time unit Tasks are independent Examples: Condor, Oracle Grid Engine (Kalimero), BOINC

Functioning N worker nodes One master node Users interact with the master node Master manages pending task and idle workers using a queuing system Task are (usually) executed in FIFO order

Motivations Limitations of this model Master node may become a scalability bottleneck Failures in the master affects the whole system Is it possible to distribute the capabilities of the master node among all sytem nodes? How? (which technology can help?)

All things distributed: peerto-peer Distributed systems in which all nodes have the same role Nodes are interconnected defining an application-level virtual network An overlay network This overlay is used to locate other nodes and information inside them Two types of overlays: structured and non-structured

Non-structured overlays Nodes are interconnected randomly Searchs in the overlay are made by flooding Efficient search of popular contents Cannot guarantee that any system point is reachable Not efficient in terms of number of messages

Non-structured overlays (II)

Structured overlays Nodes interconnected using some kind of (regular) structure Each node has an unique ID of N bits, defining a 2 N keyspace This keyspace is divided among the nodes

Structured overlays (II) Each object in the system has an ID and a position in the key space A distance-based routing protocol is used This permits reaching any point with O(log n) messages

Distributed Hash Tables Provides a hash-like user API: Put (ID, Object) Get (ID) Fast access to distributed information Used to distribute file, communicate users, VoIP, Video Streaming

P2P Computing Must be seen by the user as a single resource pool User should be able to submit jobs from any node in the system System stores job s information permitting progress even when the user is not connected A FIFO order should be guaranteed DHTs are suitable for this purpose

DHTs for P2P Computing Must provide scalability in adverse conditions Must provide persistency (using replication) Replicas are synchronized by consensus algorithms Load balancing algorithms are also needed

DHTs for P2P Computing (II) In 2007 Amazon presented Dynamo, a DHT P2P system with persistence, scalability, access in O(n) and eventual consistency From Dynamo, many alternatives have been proposed: Riak, Scalaris, Memcached,... Facebook proposed Cassandra in 2009 with the same Dynamo capabilities and Google's BigTable data model

Cassandra Developed by Facebook and Twitter since 2009 Has been released to the Apache Foundation Developed in Java with multilanguage client libraries Pros: Fault tolerant, decentralized, scalable, durable Cons: Eventual consistency

Cassandra s Data Model DHTs store (key, value) pairs Cassandra store (key, (values..)) tuples across different tables The different tables are named ColumnFamilies or SuperColumnFamilies CF are 4-dimensional tables SCF are 5-dimensional tables

Column Families WaitingQueue ColumnFamily JobID Name Owner Binary 1 Task1 User1 URL 2 Task2 User2 URL 3 Task3 User1 URL N TaskN User3 URL

SuperColumn Families Waiting Running Queues SuperColumn Family Job1 Job2 JobN Task1 User1 Task2 User2 TaskN UserN Job1 Job2 JobN Task1 User1 Task2 User2 TaskN UserN

HTC over Cassandra A batch queue system has been implemented over Cassandra s data model This permits idle workers decide which task to run, in FIFO order Users can: Submit jobs Check jobs status Retrieve jobs results The use of Cassandra as underlying data storage allows for disconnected operation

HTC over Cassandra (II) System stores Job information Name Owner Binaries Users information Queues information The system is totally reconfigurable at run time, permitting the utilization of unlimited queues with different policies

Eventual Consistency All changes in any object reach all object replicas eventually CAP theorem implies that it is not possible to have these three properties at the same time: Consistency Availability Partition tolerance Cassandra have selected availability and partition tolerance instead of consistency In a failure-free scenario, Cassandra provides low latency

Eventual Consistency (II) This scenario implies the impossibility of atomic operations in Cassandra In our HTC system, collisions may happen when several nodes try to execute the same task We have implemented some partial solutions that reduce the probability of a collision: QUORUM consistency for all I/O operations Extra queue where idle nodes compete for the waiting task Reduces the collision probability from 30% to 4%

Experiments We have performed some experiments to evaluate our system A 20 nodes cluster has been used for this purpose Each node has a P4 processor with hyperthreading 1.5 2 GB of RAM Each node represents one user in the system We have used a workload generator in order to generate a works list for each user

Metrics Bounded Slowdown: Waiting time for a job plus the running time bsd =max 1, w r max 10, r System utilization Scheduling time: time used by idle nodes to schedule a waiting job Collisions detected

System Load

Bounded Slowdown

Scheduling Time

Collisions

Future Work Find a viable solution to the Eventual Consistency problem Develop a workflow system with MapReduce tasks Reputation systems in order to classify nodes behavior

Conclusions HTC over P2P is possible A prototype has been developed Some preliminary experiments have been done obtaining good performance levels

QUESTIONS?