Communication System Design Projects

Similar documents
Hosting Transaction Based Applications on Cloud

Future Prospects of Scalable Cloud Computing

Yahoo! Cloud Serving Benchmark

Slave. Master. Research Scholar, Bharathiar University

Evaluation of NoSQL and Array Databases for Scientific Applications

Snapshots in Hadoop Distributed File System

A Novel Cloud Computing Data Fragmentation Service Design for Distributed Systems

Cloud Data Management: A Short Overview and Comparison of Current Approaches

Big Data and Hadoop with Components like Flume, Pig, Hive and Jaql

Benchmarking and Analysis of NoSQL Technologies

MASTER PROJECT. Resource Provisioning for NoSQL Datastores

Data Management Challenges in Cloud Computing Infrastructures

Data Management in Cloud based Environment using k- Median Clustering Technique

Performance Evaluation of a MongoDB and Hadoop Platform for Scientific Data Analysis

A Demonstration of Rubato DB: A Highly Scalable NewSQL Database System for OLTP and Big Data Applications

Big Data and Hadoop with components like Flume, Pig, Hive and Jaql

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14

Performance Analysis for NoSQL and SQL

A Survey on Carbon Emission Management and Intelligent System using Cloud

Distributed Lucene : A distributed free text index for Hadoop

NoSQL Data Base Basics

BIG DATA WEB ORGINATED TECHNOLOGY MEETS TELEVISION BHAVAN GHANDI, ADVANCED RESEARCH ENGINEER SANJEEV MISHRA, DISTINGUISHED ADVANCED RESEARCH ENGINEER

Review of Query Processing Techniques of Cloud Databases Ruchi Nanda Assistant Professor, IIS University Jaipur.

Joining Cassandra. Luiz Fernando M. Schlindwein Computer Science Department University of Crete Heraklion, Greece

Scalable Transaction Management on Cloud Data Management Systems

Loose Coupling between Cloud Computing Applications and Databases: A Challenge to be Hit

Measuring Elasticity for Cloud Databases

What is Analytic Infrastructure and Why Should You Care?

Lifetime Management of Cache Memory using Hadoop Snehal Deshmukh 1 Computer, PGMCOE, Wagholi, Pune, India

Big Data Analysis using Hadoop components like Flume, MapReduce, Pig and Hive

Beyond Batch Processing: Towards Real-Time and Streaming Big Data

Distributed Systems. Tutorial 12 Cassandra

Performance Evaluation of NoSQL Systems Using YCSB in a resource Austere Environment

Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk

Survey of the Benchmark Systems and Testing Frameworks For Tachyon-Perf

Performance Evaluation of a MongoDB and Hadoop Platform for Scientific Data Analysis

EXPERIMENTAL EVALUATION OF NOSQL DATABASES

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.

Lecture Data Warehouse Systems

Which NoSQL Database? A Performance Overview

Cloud Storage Solution for WSN in Internet Innovation Union

Availability Digest. MySQL Clusters Go Active/Active. December 2006

Dynamo: Amazon s Highly Available Key-value Store

NoSQL Databases: a step to database scalability in Web environment

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Amr El Abbadi. Computer Science, UC Santa Barbara

How To Build Cloud Storage On Google.Com

Index Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk.

Distributed File Systems

DISTRIBUTION OF DATA SERVICES FOR CORPORATE APPLICATIONS IN CLOUD SYSTEM

High Throughput Computing on P2P Networks. Carlos Pérez Miguel

CSE-E5430 Scalable Cloud Computing Lecture 2

Scalable Transaction Management with Snapshot Isolation on Cloud Data Management Systems

Cloud Storage Solution for WSN Based on Internet Innovation Union

geo-distributed storage in data centers marcos k. aguilera microsoft research silicon valley

A Study on Workload Imbalance Issues in Data Intensive Distributed Computing

A Review of Column-Oriented Datastores. By: Zach Pratt. Independent Study Dr. Maskarinec Spring 2011

Structured Data Storage

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Design Patterns for Distributed Non-Relational Databases

Secure Cloud Transactions by Performance, Accuracy, and Precision

SOLVING LOAD REBALANCING FOR DISTRIBUTED FILE SYSTEM IN CLOUD

Data Management in the Cloud

Reducer Load Balancing and Lazy Initialization in Map Reduce Environment S.Mohanapriya, P.Natesan

NoSQL and Hadoop Technologies On Oracle Cloud

LARGE-SCALE DATA STORAGE APPLICATIONS

Scalable Multiple NameNodes Hadoop Cloud Storage System

Logistics. Database Management Systems. Chapter 1. Project. Goals for This Course. Any Questions So Far? What This Course Cannot Do.

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Report for the seminar Algorithms for Database Systems F1: A Distributed SQL Database That Scales

Benchmarking the Performance of XenDesktop Virtual DeskTop Infrastructure (VDI) Platform

Database Replication with Oracle 11g and MS SQL Server 2008

A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage

Can the Elephants Handle the NoSQL Onslaught?

Storage of Structured Data: BigTable and HBase. New Trends In Distributed Systems MSc Software and Systems

Introduction to NOSQL

Accelerating Cassandra Workloads using SanDisk Solid State Drives

Data Management in the Cloud: Limitations and Opportunities. Annies Ductan

A Distribution Management System for Relational Databases in Cloud Environments

High Availability Using MySQL in the Cloud:

MANAGEMENT OF DATA REPLICATION FOR PC CLUSTER BASED CLOUD STORAGE SYSTEM

CURTAIL THE EXPENDITURE OF BIG DATA PROCESSING USING MIXED INTEGER NON-LINEAR PROGRAMMING

Divy Agrawal and Amr El Abbadi Department of Computer Science University of California at Santa Barbara

Survey on Load Rebalancing for Distributed File System in Cloud

Transcription:

Communication System Design Projects PROFESSOR DEJAN KOSTIC PRESENTER: KIRILL BOGDANOV

KTH-DB Geo Distributed Key Value Store DESIGN AND DEVELOP GEO DISTRIBUTED KEY VALUE STORE. DEPLOY AND TEST IT ON A NETWORK EMULATOR. Project #5 KTH-DB

What is a Key Value Store? Is a NoSQL database Does not support SQL queries Not a relationship database (Oracle, DB2, MySQL) It is a map<key, value> Simple API Put(key, value) value = Get(key) Delete(key) Key John Doe Robert Green Alex Murphy Value (age, telephone, address...) {20, 777333222, London } {54, 123456, New York } {45, 31323311, Detroit } GEO DISTRIBUTED KEY VALUE STORE 3 of 26

What is Geo-Distributed? GEO DISTRIBUTED KEY VALUE STORE 4 of 26

NoSQL Databases Google BigTable Facebook Cassandra Amazon Dynamo Yahoo PNUTS Many others GEO DISTRIBUTED KEY VALUE STORE 5 of 26

Properties Geo Distributed Key Value Stores Outstanding pieces of engineering High Availability Disaster Recovery Data management Highly Scalable GEO DISTRIBUTED KEY VALUE STORE 6 of 26

Why Implement a Key Value Store Hands on experience with NoSQL databases Object Oriented Design Understanding client/server model Algorithms, Data Structures and Networking Compare performance to a production industry systems Challanging back end engineering project! GEO DISTRIBUTED KEY VALUE STORE 7 of 26

Distributed Data Bases GEO DISTRIBUTED KEY VALUE STORE 8 of 26

Master/Slave Master: Accepts connections from clients Performs read/write (get/put) operations Sends updates to all Slaves Slaves: Accepts connections from clients Performs reads only (get) operations Receives updates from Master GEO DISTRIBUTED KEY VALUE STORE 9 of 26

Scope - Architecture Node (Master or Slave) Other ndoes Other ndoes Other ndoes Client Core Logic File System Network Interface for Nodes Network Interface for Clients Client Yahoo Benchmark (YCSB) Driver GEO DISTRIBUTED KEY VALUE STORE 10 of 26

Dynamic System Any node can take a role of a master master slave slave 25ms GEO DISTRIBUTED KEY VALUE STORE 11 of 26

Requirements 1. Using MiniNet HiFi (network emulator) deploy KTH-DB on 10 nodes and run YCSB benchmark 1. Use MiniNet to simulate network latencies 2. Demonstrate performance of 500 writes/second and 2000 reads/second, store 10M keys and handle 100 clients accessing the data store in parallel. 2. Adopt Master/Slave architecture where writes executed only on Master node and reads can be done on any node 3. Create self aware system. Any node in the system can take role of the Master. Measure network latency from Master to Clients and assign master status to maximize write performance. 4. C/C++/C11/Java 5. Use network libraries, for example zeromq, POCO etc GEO DISTRIBUTED KEY VALUE STORE 12 of 26

Reading Material Douglas Terry, Vijayan Prabhakaran, Ramakrishna Kotla, Mahesh Balakrishnan, and Marcos K. Aguilera, Transactions with Consistency Choices on Geo-Replicated Cloud Storage, no. MSR-TR-2013-82, September 2013 Ghemawat, S., Gobioff, H., and Leung, S.-T. 2003. The Google file system. In 19th Symposium on Operating Systems Principles. Lake George, NY. 29-43. Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra, T., Fikes, A., and Gruber, R. E. 2006. Bigtable: a distributed storage system for structured data. In roceedings of the 7th Conference on USENIX Symposium on Operating Systems Design and Implementation - Volume 7 (Seattle, WA, November 06-08, 2006). USENIX Association, Berkeley, CA, 15-15. B.F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. Benchmarking Cloud Serving Systems with YCSB. In ACM Symposium on Cloud Computing, 2010. GEO DISTRIBUTED KEY VALUE STORE 13 of 26

Global Distributed Snapshot DEVELOP DISTRIBUTED TRANSACTION ENGINE, COLLECT ITS GLOBAL SNAPSHOT, USE THIS SNAPSHOT TO DETECT AND RESOLVE DEADLOCKING IN THE SYSTEM Project #6 Snapshot

Distributed Systems Parallel and High Performance Computing Internet protocols Distributed Databases P2P networks Critical Systems that require high fault tolerance Wireless sensors Many more... GLOBAL DISTRIBUTED SNAPSHOT 15 of 26

Properties Collection of independant processes Coordinate their work by sending messages over network medium No shared memory No global synchronized clock Numerous event orderings due to nature of network, failures and external events Hard to design develop and manage GLOBAL DISTRIBUTED SNAPSHOT 16 of 26

Global Distributed Snapshot state a = 10 b = 20 x = {1, 2, 3...} N1 N2 Used for: Distributed Debugging Failure recovery Future state prediction Deadlock detection state a = 15 b = 35 x = {7,6,8...} N3 N4 N5 GLOBAL DISTRIBUTED SNAPSHOT 17 of 26

Scope Transaction Engine Sample Workload Collect Global Snapshots Detect Deadlocks Resolve Deadlocks Simple Distributed Database capable of performing Atomic Transactions Randomly generated workloads to simulate human behavior Periodically collect global system states Analyse each snapshot and check for Deadlocks Apply Deadlock resolution mechanism GLOBAL DISTRIBUTED SNAPSHOT 18 of 26

Scope Transaction Engine Transaction Engine Sample Workload Account Number Value 100100 2 000 kr 100123 100 000 kr Collect Global Snapshots Detect Deadlocks Resolve Deadlocks TE TE TE Atomic Transactions Obtain lock on a row Receive confirmation Send out new value GLOBAL DISTRIBUTED SNAPSHOT 19 of 26

Scope - Workload Transaction Engine Sample Workload Collect Global Snapshots Detect Deadlocks Simulate user behavior by generating random transactions Periodically each node in the system selects two random accounts and transfers random amount of money from one to another Each transaction is Atomic and each change should be consistent across all databases Resolve Deadlocks GLOBAL DISTRIBUTED SNAPSHOT 20 of 26

Scope Global Snapshot Transaction Engine Sample Workload Collect Global Snapshots Detect Deadlocks No Global Clock not possible to order everyone to record their state at a time T Investigate this problem and choose an algorithm for obtaining global distributed snapshots Periodically collect snapshots Resolve Deadlocks GLOBAL DISTRIBUTED SNAPSHOT 21 of 26

Scope Deadlocks Transaction Engine Sample Workload Deadlocking - occure when two or more processes compete for a shared resource and waiting for each other to finish, thus neither ever does Analyse global snapshot and detect dead locks Collect Global Snapshots Detect Deadlocks Resolve Deadlocks Node1 Res1 Node2 Res2 GLOBAL DISTRIBUTED SNAPSHOT 22 of 26

Scope Deadlock Resolution Transaction Engine Sample Workload Decide how deadlock will be resolved Which transaction will be allowed to succeed? Collect Global Snapshots Detect Deadlocks Resolve Deadlocks GLOBAL DISTRIBUTED SNAPSHOT 23 of 26

Requirements 1. Using MiniNet HiFi (network emulator) deploy Transaction Engine on 5 nodes 2. Demonstrate that system can collect Distributed Global Snapshots 3. Show that all snapshots are consistent 4. Detect deadlocks at a runtime and resolve them using algorithm of choice 5. C/C++/C11/Java 6. Use network libraries, for example zeromq, POCO etc GLOBAL DISTRIBUTED SNAPSHOT 24 of 26

Reading Material LAMPORT, L. Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21,7 (July 1978), 558-565. K. Mani Chandy and Leslie Lamport. Distributed snapshots: Determining global states of distributed systems. ACM Transactions on Computer Systems, 3(1):63 75, February 1985. A.D. Kshemkalyani, M. Raynal, and M. Singhal, An Introduction to Snapshot Algorithms in Distributed Computing, Distributed Systems Eng. J., vol. 2, no. 4, pp. 224-233, Dec. 1995. CHANDY, K.M., AND MISRA, J. A distributed algorithm for detecting resource deadlocks in distributed systems. In Proc. A CM SIGA CT-SIGOPS Syrup. Principles of Distributed Computing ACM, New York, 1982, pp. 157-164. CHANDY, K. M., MISRA, J., AND HAAS, L. Distributed deadlock detection. ACM Trans. Comput. Syst. I,2 (May 1983), 144-156. GLOBAL DISTRIBUTED SNAPSHOT 25 of 26

Global Distributed Snapshot Distributed Transaction Engine Atomic Transactions Global Distributed Snapshot Deadlock Detection/Resolution Deployemnt on a Network Emulator KTH-DB Geo-Distributed Key Value Store Master-Slave Architecture Strong Consistency Yahoo benchmark Deployment on a Network Emulator GLOBAL DISTRIBUTED SNAPSHOT 26 of 26