Distributed Systems. Tutorial 12 Cassandra
|
|
- Ashlynn Bradley
- 8 years ago
- Views:
Transcription
1 Distributed Systems Tutorial 12 Cassandra written by Alex Libov Based on FOSDEM 2010 presentation winter semester,
2 Cassandra In Greek mythology, Cassandra had the power of prophecy and the curse of never being believed 2
3 Cassandra A massively scalable, decentralized, structured data store Developed by Facebook to power the inbox search Released as an open source project on google code in July 2008 Became an apache incubator project in March 2009 On February 2010 graduated to a top-level project Version 2.0, released Sep
4 Cassandra features Decentralized Every node in the cluster has the same role No single point of failure Scalable Read and write throughput both increase linearly as new machines are added Fault-tolerant Data is automatically replicated to multiple nodes for fault-tolerance. Replication across multiple data centers is supported. Tunable consistency from "writes never fail" to "block for all replicas to be readable Query language CQL (Cassandra Query Languge) is an SQL-like interface alternative to the traditional RPC interface 4
5 Cassandra Structure A Key K F B E D C Nodes B,C and D store keys in range A,B (R=3) 5
6 Vnodes More nodes can be used when recovering from node failure 6
7 Vnodes Easing the use of heterogeneous machines in a cluster 7
8 Replication in CQL A key space is the highest level container SimpleStrategy means placing replicas on successive nodes in the ring NetworkTopologyStrategy places replicas across different data centers (which are defined elsewhere) NetworkTopologyStrategy places replicas in the same data center by walking the ring clockwise until reaching the first node in another rack 8
9 Not only a O(1) DHT Values are structured, indexed Columns/ column families Queries 9
10 Column families Key1 column column column Key2 column column Column name: byte[] value: byte[] timestamp: long 10
11 Why column families? Vs 11
12 Write Consistency The client can specify desired consistency Any Will always succeed One Write to at least one replica node Two Three Quorum ((N/2)+1) Local_one At least one in the local datacenter Local_Quorum Each_Quorum Written to commit log and memory table on a quorum in all data centers All 12
13 Write Path Memtable Commitlog Write SStables 13
14 Writes No reads No seeks Sequential disk access Atomic within a column family Fast Any node if the write doesn t belong to the node proxied to where the write belongs Always writable (hinted hand-off) if the node where the write belongs is down, the write is given to someone else with a hint, that says, update the correct node when it comes back up 14
15 Read consistency One Read from closest replica Two Read from any two, return most recent data Three Quorum Local_quorum Local_one Returns only if the replica is in the local datacenter Each_quorum All 15
16 Request illustration 16
17 Read path Memtable Read Bf Idx Bf Idx Bf Idx 17
18 Reads Any node Cassandra tracks which replicas respond fastest and prefers to route requests there Read repair Usual caching conventions apply 18
19 Hinted handoff When a write is performed and a replica is down, the coordinator node stores the request for some time After a node discovers from gossip that a node for which it holds hints has recovered, the node sends the data row corresponding to each hint to the target If insufficient replica targets are alive to satisfy a requested consistency level, an exception is thrown with or without hinted handoff Unlike Dynamo s replication model - Cassandra does not default to sloppy quorum. 19
20 Lightweight transactions Two users attempting to create a unique user account in the same cluster could overwrite each other s work with neither user knowing about it Using and extending the Paxos consensus protocol, Cassandra offers a way to ensure a transaction isolation level similar to the serializable level offered by RDBMS s 20
21 A modified Paxos Promises to not accept any proposals associated with any earlier ballot. Along with that promise, it includes the most recent proposal it has already received. read the current value of the row to see if it matches the expected one If a majority of the nodes promise to accept the leader s proposal, it may proceed to the actual proposal reset the Paxos state for subsequent proposals 21
22 Datastore comparison Google Bigtable Amazon Dynamo Microsoft Azure Yahoo! PNUTS Apache Cassandra Consistency Atomic appends Non-atomic writes + atomic transaction on a row basis Eventual Atomic appends Tunable per request eventual to timeline Tunable per request eventual to serializable Master implementation Chubby lock service + primary/backup in GFS O(1) DHT Stream master + Paxos cluster in storage layer a single pair of active/standby servers O(1) DHT + (modified Paxos for serializable requests) Request handling Assigned tablet server Any node LB to specific partition server Router to closest tablet server Any node Conflict resolution By client By client - By client By server using timestamps Replication Count Tunable per file (GFS) Tunable per service 3+georeplicaiton Geo-replication Tunable per keystore Write implementation Directly with the corresponding tablet Server Coordinator sends to all+ hinted handoffs+ background read repairs Master replica + sync with stream master on recovery uses pub/sub with guaranteed delivery to commit Same as Dynamo 22
23 For more info dra/cassandrathenandnow.html 23
Distributed Storage Systems part 2. Marko Vukolić Distributed Systems and Cloud Computing
Distributed Storage Systems part 2 Marko Vukolić Distributed Systems and Cloud Computing Distributed storage systems Part I CAP Theorem Amazon Dynamo Part II Cassandra 2 Cassandra in a nutshell Distributed
More informationXiaowe Xiaow i e Wan Wa g Jingxin Fen Fe g n Mar 7th, 2011
Xiaowei Wang Jingxin Feng Mar 7 th, 2011 Overview Background Data Model API Architecture Users Linearly scalability Replication and Consistency Tradeoff Background Cassandra is a highly scalable, eventually
More informationCase study: CASSANDRA
Case study: CASSANDRA Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu Cassandra:
More informationCloud data store services and NoSQL databases. Ricardo Vilaça Universidade do Minho Portugal
Cloud data store services and NoSQL databases Ricardo Vilaça Universidade do Minho Portugal Context Introduction Traditional RDBMS were not designed for massive scale. Storage of digital data has reached
More informationFacebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation
Facebook: Cassandra Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/24 Outline 1 2 3 Smruti R. Sarangi Leader Election
More informationHDB++: HIGH AVAILABILITY WITH. l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
HDB++: HIGH AVAILABILITY WITH Page 1 OVERVIEW What is Cassandra (C*)? Who is using C*? CQL C* architecture Request Coordination Consistency Monitoring tool HDB++ Page 2 OVERVIEW What is Cassandra (C*)?
More informationCassandra A Decentralized, Structured Storage System
Cassandra A Decentralized, Structured Storage System Avinash Lakshman and Prashant Malik Facebook Published: April 2010, Volume 44, Issue 2 Communications of the ACM http://dl.acm.org/citation.cfm?id=1773922
More informationHighly available, scalable and secure data with Cassandra and DataStax Enterprise. GOTO Berlin 27 th February 2014
Highly available, scalable and secure data with Cassandra and DataStax Enterprise GOTO Berlin 27 th February 2014 About Us Steve van den Berg Johnny Miller Solutions Architect Regional Director Western
More informationDistributed storage for structured data
Distributed storage for structured data Dennis Kafura CS5204 Operating Systems 1 Overview Goals scalability petabytes of data thousands of machines applicability to Google applications Google Analytics
More informationCloud Computing at Google. Architecture
Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale
More informationHypertable Architecture Overview
WHITE PAPER - MARCH 2012 Hypertable Architecture Overview Hypertable is an open source, scalable NoSQL database modeled after Bigtable, Google s proprietary scalable database. It is written in C++ for
More informationIntroduction to Cassandra
Introduction to Cassandra DuyHai DOAN, Technical Advocate Agenda! Architecture cluster replication Data model last write win (LWW), CQL basics (CRUD, DDL, collections, clustering column) lightweight transactions
More informationA Review of Column-Oriented Datastores. By: Zach Pratt. Independent Study Dr. Maskarinec Spring 2011
A Review of Column-Oriented Datastores By: Zach Pratt Independent Study Dr. Maskarinec Spring 2011 Table of Contents 1 Introduction...1 2 Background...3 2.1 Basic Properties of an RDBMS...3 2.2 Example
More informationIntroduction to Apache Cassandra
Introduction to Apache Cassandra White Paper BY DATASTAX CORPORATION JULY 2013 1 Table of Contents Abstract 3 Introduction 3 Built by Necessity 3 The Architecture of Cassandra 4 Distributing and Replicating
More informationCassandra vs MySQL. SQL vs NoSQL database comparison
Cassandra vs MySQL SQL vs NoSQL database comparison 19 th of November, 2015 Maxim Zakharenkov Maxim Zakharenkov Riga, Latvia Java Developer/Architect Company Goals Explore some differences of SQL and NoSQL
More informationBig Table A Distributed Storage System For Data
Big Table A Distributed Storage System For Data OSDI 2006 Fay Chang, Jeffrey Dean, Sanjay Ghemawat et.al. Presented by Rahul Malviya Why BigTable? Lots of (semi-)structured data at Google - - URLs: Contents,
More informationCassandra. Jonathan Ellis
Cassandra Jonathan Ellis Motivation Scaling reads to a relational database is hard Scaling writes to a relational database is virtually impossible and when you do, it usually isn't relational anymore The
More informationEvaluation of NoSQL databases for large-scale decentralized microblogging
Evaluation of NoSQL databases for large-scale decentralized microblogging Cassandra & Couchbase Alexandre Fonseca, Anh Thu Vu, Peter Grman Decentralized Systems - 2nd semester 2012/2013 Universitat Politècnica
More informationThe Apache Cassandra storage engine
The Apache Cassandra storage engine Sylvain Lebresne (sylvain@.com) FOSDEM 12, Brussels 1. What is Apache Cassandra 2. Data Model 3. The storage engine 1. What is Apache Cassandra 2. Data Model 3. The
More informationPractical Cassandra. Vitalii Tymchyshyn tivv00@gmail.com @tivv00
Practical Cassandra NoSQL key-value vs RDBMS why and when Cassandra architecture Cassandra data model Life without joins or HDD space is cheap today Hardware requirements & deployment hints Vitalii Tymchyshyn
More informationLARGE-SCALE DATA STORAGE APPLICATIONS
BENCHMARKING AVAILABILITY AND FAILOVER PERFORMANCE OF LARGE-SCALE DATA STORAGE APPLICATIONS Wei Sun and Alexander Pokluda December 2, 2013 Outline Goal and Motivation Overview of Cassandra and Voldemort
More informationCassandra A Decentralized Structured Storage System
Cassandra A Decentralized Structured Storage System Avinash Lakshman, Prashant Malik LADIS 2009 Anand Iyer CS 294-110, Fall 2015 Historic Context Early & mid 2000: Web applicaoons grow at tremendous rates
More informationNoSQL Database Options
NoSQL Database Options Introduction For this report, I chose to look at MongoDB, Cassandra, and Riak. I chose MongoDB because it is quite commonly used in the industry. I chose Cassandra because it has
More informationCassandra in Action ApacheCon NA 2013
Cassandra in Action ApacheCon NA 2013 Yuki Morishita Software Developer@DataStax / Apache Cassandra Committer 1 2 ebay Application/Use Case Social Signals: like/want/own features for ebay product and item
More informationFAQs. Requirements in node Joins. CS535 Big Data Fall 2015 Colorado State University http://www.cs.colostate.edu/~cs535
CS535 Big Data - Fall 215 W1.B. CS535 Big Data - Fall 215 W1.B.1 CS535 BIG DATA FAQs Zookeeper Installation [Step 1] Download the zookeeper package: $ wget http://apache.arvixe.com/zookeeper/stable/ zookeeper-3.4.6.tar.gz!
More informationEnabling SOX Compliance on DataStax Enterprise
Enabling SOX Compliance on DataStax Enterprise Table of Contents Table of Contents... 2 Introduction... 3 SOX Compliance and Requirements... 3 Who Must Comply with SOX?... 3 SOX Goals and Objectives...
More informationApache Cassandra 1.2
Apache Cassandra 1.2 Documentation January 21, 2016 Apache, Apache Cassandra, Apache Hadoop, Hadoop and the eye logo are trademarks of the Apache Software Foundation 2016 DataStax, Inc. All rights reserved.
More informationApache Cassandra for Big Data Applications
Apache Cassandra for Big Data Applications Christof Roduner COO and co-founder christof@scandit.com Java User Group Switzerland January 7, 2014 2 AGENDA Cassandra origins and use How we use Cassandra Data
More informationNoSQL Data Base Basics
NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS
More informationGeo-Replication in Large-Scale Cloud Computing Applications
Geo-Replication in Large-Scale Cloud Computing Applications Sérgio Garrau Almeida sergio.garrau@ist.utl.pt Instituto Superior Técnico (Advisor: Professor Luís Rodrigues) Abstract. Cloud computing applications
More informationComparison of Distribution Technologies in Different NoSQL Database Systems
Comparison of Distribution Technologies in Different NoSQL Database Systems Studienarbeit Institute of Applied Informatics and Formal Description Methods (AIFB) Karlsruhe Institute of Technology (KIT)
More informationthese three NoSQL databases because I wanted to see a the two different sides of the CAP
Michael Sharp Big Data CS401r Lab 3 For this paper I decided to do research on MongoDB, Cassandra, and Dynamo. I chose these three NoSQL databases because I wanted to see a the two different sides of the
More informationMASTER PROJECT. Resource Provisioning for NoSQL Datastores
Vrije Universiteit Amsterdam MASTER PROJECT - Parallel and Distributed Computer Systems - Resource Provisioning for NoSQL Datastores Scientific Adviser Dr. Guillaume Pierre Author Eng. Mihai-Dorin Istin
More informationDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databases aka Just Enough Distributed Systems To Be Dangerous (in 40 minutes) Todd Lipcon (@tlipcon) Cloudera June 11, 2009 Introduction Common Underlying
More informationParallel & Distributed Data Management
Parallel & Distributed Data Management Kai Shen Data Management Data management Efficiency: fast reads/writes Durability and consistency: data is safe and sound despite failures Usability: convenient interfaces
More informationCLOUD BURSTING FOR CLOUDY
CLOUD BURSTING FOR CLOUDY Master Thesis Systems Group November, 2008 April, 2009 Thomas Unternaehrer ETH Zurich unthomas@student.ethz.ch Supervised by: Prof. Dr. Donald Kossmann Tim Kraska 2 There has
More informationDistributed Data Stores
Distributed Data Stores 1 Distributed Persistent State MapReduce addresses distributed processing of aggregation-based queries Persistent state across a large number of machines? Distributed DBMS High
More informationStudy and Comparison of Elastic Cloud Databases : Myth or Reality?
Université Catholique de Louvain Ecole Polytechnique de Louvain Computer Engineering Department Study and Comparison of Elastic Cloud Databases : Myth or Reality? Promoters: Peter Van Roy Sabri Skhiri
More informationApache Cassandra 1.2 Documentation
Apache Cassandra 1.2 Documentation January 13, 2013 2013 DataStax. All rights reserved. Contents Apache Cassandra 1.2 Documentation 1 What's new in Apache Cassandra 1.2 1 Key Improvements 1 Concurrent
More informationHigh Throughput Computing on P2P Networks. Carlos Pérez Miguel carlos.perezm@ehu.es
High Throughput Computing on P2P Networks Carlos Pérez Miguel carlos.perezm@ehu.es Overview High Throughput Computing Motivation All things distributed: Peer-to-peer Non structured overlays Structured
More informationSQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford
SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55 Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems
More informationData Management in the Cloud
Data Management in the Cloud Ryan Stern stern@cs.colostate.edu : Advanced Topics in Distributed Systems Department of Computer Science Colorado State University Outline Today Microsoft Cloud SQL Server
More informationDynamo: Amazon s Highly Available Key-value Store
Dynamo: Amazon s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and
More informationApache HBase. Crazy dances on the elephant back
Apache HBase Crazy dances on the elephant back Roman Nikitchenko, 16.10.2014 YARN 2 FIRST EVER DATA OS 10.000 nodes computer Recent technology changes are focused on higher scale. Better resource usage
More informationBenchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk
Benchmarking Couchbase Server for Interactive Applications By Alexey Diomin and Kirill Grigorchuk Contents 1. Introduction... 3 2. A brief overview of Cassandra, MongoDB, and Couchbase... 3 3. Key criteria
More informationAccelerating Cassandra Workloads using SanDisk Solid State Drives
WHITE PAPER Accelerating Cassandra Workloads using SanDisk Solid State Drives February 2015 951 SanDisk Drive, Milpitas, CA 95035 2015 SanDIsk Corporation. All rights reserved www.sandisk.com Table of
More informationThe NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL marcua@csail.mit.edu / @marcua
The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg Adam Marcus MIT CSAIL marcua@csail.mit.edu / @marcua About Me Social Computing + Database Systems Easily Distracted: Wrote The NoSQL Ecosystem in
More informationNo-SQL Databases for High Volume Data
Target Conference 2014 No-SQL Databases for High Volume Data Edward Wijnen 3 November 2014 The New Connected World Needs a Revolutionary New DBMS Today The Internet of Things 1990 s Mobile 1970 s Mainfram
More informationNOSQL DATABASES AND CASSANDRA
NOSQL DATABASES AND CASSANDRA Semester Project: Advanced Databases DECEMBER 14, 2015 WANG CAN, EVABRIGHT BERTHA Université Libre de Bruxelles 0 Preface The goal of this report is to introduce the new evolving
More informationDistributed File Systems
Distributed File Systems Mauro Fruet University of Trento - Italy 2011/12/19 Mauro Fruet (UniTN) Distributed File Systems 2011/12/19 1 / 39 Outline 1 Distributed File Systems 2 The Google File System (GFS)
More informationConsistency Management in Cloud Storage Systems
Consistency Management in Cloud Storage Systems Houssem-Eddine Chihoub, Shadi Ibrahim, Gabriel Antoniu, María S. Pérez INRIA Rennes - Bretagne Atlantique Rennes, 35000, France {houssem-eddine.chihoub,
More informationHands-on Cassandra. OSCON July 20, 2010. Eric Evans eevans@rackspace.com @jericevans http://blog.sym-link.com
Hands-on Cassandra OSCON July 20, 2010 Eric Evans eevans@rackspace.com @jericevans http://blog.sym-link.com 2 Background Influential Papers BigTable Strong consistency Sparse map data model GFS, Chubby,
More informationZooKeeper. Table of contents
by Table of contents 1 ZooKeeper: A Distributed Coordination Service for Distributed Applications... 2 1.1 Design Goals...2 1.2 Data model and the hierarchical namespace...3 1.3 Nodes and ephemeral nodes...
More informationHadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
More informationNetflixOSS A Cloud Native Architecture
NetflixOSS A Cloud Native Architecture LASER Session 5 Availability September 2013 Adrian Cockcroft @adrianco @NetflixOSS http://www.linkedin.com/in/adriancockcroft Failure Modes and Effects Failure Mode
More informationReal-Time Big Data in practice with Cassandra. Michaël Figuière @mfiguiere
Real-Time Big Data in practice with Cassandra Michaël Figuière @mfiguiere Speaker Michaël Figuière @mfiguiere 2 Ring Architecture Cassandra 3 Ring Architecture Replica Replica Replica 4 Linear Scalability
More informationHosting Transaction Based Applications on Cloud
Proc. of Int. Conf. on Multimedia Processing, Communication& Info. Tech., MPCIT Hosting Transaction Based Applications on Cloud A.N.Diggikar 1, Dr. D.H.Rao 2 1 Jain College of Engineering, Belgaum, India
More informationAvoid a single point of failure by replicating the server Increase scalability by sharing the load among replicas
3. Replication Replication Goal: Avoid a single point of failure by replicating the server Increase scalability by sharing the load among replicas Problems: Partial failures of replicas and messages No
More informationBig Data Development CASSANDRA NoSQL Training - Workshop. March 13 to 17-2016 9 am to 5 pm HOTEL DUBAI GRAND DUBAI
Big Data Development CASSANDRA NoSQL Training - Workshop March 13 to 17-2016 9 am to 5 pm HOTEL DUBAI GRAND DUBAI ISIDUS TECH TEAM FZE PO Box 121109 Dubai UAE, email training-coordinator@isidusnet M: +97150
More informationINTRODUCTION TO CASSANDRA
INTRODUCTION TO CASSANDRA This ebook provides a high level overview of Cassandra and describes some of its key strengths and applications. WHAT IS CASSANDRA? Apache Cassandra is a high performance, open
More informationCloud Computing mit mathematischen Anwendungen
Cloud Computing mit mathematischen Anwendungen Vorlesung SoSe 2009 Dr. Marcel Kunze Karlsruhe Institute of Technology (KIT) Steinbuch Centre for Computing (SCC) KIT the cooperation of Forschungszentrum
More informationCluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.
Architectures Cluster Computing Job Parallelism Request Parallelism 2 2010 VMware Inc. All rights reserved Replication Stateless vs. Stateful! Fault tolerance High availability despite failures If one
More informationCloud Computing with Microsoft Azure
Cloud Computing with Microsoft Azure Michael Stiefel www.reliablesoftware.com development@reliablesoftware.com http://www.reliablesoftware.com/dasblog/default.aspx Azure's Three Flavors Azure Operating
More informationA programming model in Cloud: MapReduce
A programming model in Cloud: MapReduce Programming model and implementation developed by Google for processing large data sets Users specify a map function to generate a set of intermediate key/value
More informationApache Cassandra Present and Future. Jonathan Ellis
Apache Cassandra Present and Future Jonathan Ellis History Bigtable, 2006 Dynamo, 2007 OSS, 2008 Incubator, 2009 TLP, 2010 1.0, October 2011 Why people choose Cassandra Multi-master, multi-dc Linearly
More informationApache Cassandra 2.0
Apache Cassandra 2.0 Documentation December 16, 2015 Apache, Apache Cassandra, Apache Hadoop, Hadoop and the eye logo are trademarks of the Apache Software Foundation 2015 DataStax, Inc. All rights reserved.
More informationLecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl
Big Data Processing, 2014/15 Lecture 5: GFS & HDFS!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind
More informationDistributed Storage Systems
Distributed Storage Systems John Leach john@brightbox.com twitter @johnleach Brightbox Cloud http://brightbox.com Our requirements Bright box has multiple zones (data centres) Should tolerate a zone failure
More information09 Cloud Storage. NoSQL Databases. Dynamo: Amazon s Highly Available Key-value Store. Christof Strauch
09 Cloud Storage NoSQL Databases Christof Strauch Dynamo: Amazon s Highly Available Key-value Store NoSQL Databases Developed by companies to fulfill internal requirements Some replicate ideas from Amazon
More informationData Management in the Cloud -
Data Management in the Cloud - current issues and research directions Patrick Valduriez Esther Pacitti DNAC Congress, Paris, nov. 2010 http://www.med-hoc-net-2010.org SOPHIA ANTIPOLIS - MÉDITERRANÉE Is
More informationCS435 Introduction to Big Data
CS435 Introduction to Big Data Final Exam Date: May 11 6:20PM 8:20PM Location: CSB 130 Closed Book, NO cheat sheets Topics covered *Note: Final exam is NOT comprehensive. 1. NoSQL Impedance mismatch Scale-up
More informationCan the Elephants Handle the NoSQL Onslaught?
Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented
More informationIntroduction to Multi-Data Center Operations with Apache Cassandra and DataStax Enterprise
Introduction to Multi-Data Center Operations with Apache Cassandra and DataStax Enterprise White Paper BY DATASTAX CORPORATION October 2013 1 Table of Contents Abstract 3 Introduction 3 The Growth in Multiple
More informationDistributed File Systems
Distributed File Systems Paul Krzyzanowski Rutgers University October 28, 2012 1 Introduction The classic network file systems we examined, NFS, CIFS, AFS, Coda, were designed as client-server applications.
More informationAmr El Abbadi. Computer Science, UC Santa Barbara amr@cs.ucsb.edu
Amr El Abbadi Computer Science, UC Santa Barbara amr@cs.ucsb.edu Collaborators: Divy Agrawal, Sudipto Das, Aaron Elmore, Hatem Mahmoud, Faisal Nawab, and Stacy Patterson. Client Site Client Site Client
More informationReferences. Introduction to Database Systems CSE 444. Motivation. Basic Features. Outline: Database in the Cloud. Outline
References Introduction to Database Systems CSE 444 Lecture 24: Databases as a Service YongChul Kwon Amazon SimpleDB Website Part of the Amazon Web services Google App Engine Datastore Website Part of
More informationIntroduction to Database Systems CSE 444
Introduction to Database Systems CSE 444 Lecture 24: Databases as a Service YongChul Kwon References Amazon SimpleDB Website Part of the Amazon Web services Google App Engine Datastore Website Part of
More informationSeminar Presentation for ECE 658 Instructed by: Prof.Anura Jayasumana Distributed File Systems
Seminar Presentation for ECE 658 Instructed by: Prof.Anura Jayasumana Distributed File Systems Prabhakaran Murugesan Outline File Transfer Protocol (FTP) Network File System (NFS) Andrew File System (AFS)
More informationOctober 1-3, 2012 gotocon.com. Apache Cassandra As A BigData Platform Matthew F. Dennis // @mdennis
October 1-3, 2012 gotocon.com Apache Cassandra As A BigData Platform Matthew F. Dennis // @mdennis Why Does BigData Matter? Effective Use of BigData Leads To Success And The Trends Continue (according
More informationAdvanced Data Management Technologies
ADMT 2014/15 Unit 15 J. Gamper 1/44 Advanced Data Management Technologies Unit 15 Introduction to NoSQL J. Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE ADMT 2014/15 Unit 15
More informationChallenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2
More informationBig Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Distributed File Systems and NoSQL Database Distributed
More informationBig Data With Hadoop
With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
More informationMinCopysets: Derandomizing Replication In Cloud Storage
MinCopysets: Derandomizing Replication In Cloud Storage Asaf Cidon, Ryan Stutsman, Stephen Rumble, Sachin Katti, John Ousterhout and Mendel Rosenblum Stanford University cidon@stanford.edu, {stutsman,rumble,skatti,ouster,mendel}@cs.stanford.edu
More informationMapReduce with Apache Hadoop Analysing Big Data
MapReduce with Apache Hadoop Analysing Big Data April 2010 Gavin Heavyside gavin.heavyside@journeydynamics.com About Journey Dynamics Founded in 2006 to develop software technology to address the issues
More informationHadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh
1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets
More informationDistributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms
Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes
More informationF1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013
F1: A Distributed SQL Database That Scales Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013 What is F1? Distributed relational database Built to replace sharded MySQL back-end of AdWords
More informationBENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next
More informationWelcome to Apache Cassandra 1.0
Welcome to Apache Cassandra 1.0 An Overview for Architects, Developers, and IT Managers WHITE PAPER Welcome to Apache Cassandra 1.0 An Overview for Architects, Developers, and IT Managers By DataStax Corporation
More informationMegastore: Providing Scalable, Highly Available Storage for Interactive Services
Megastore: Providing Scalable, Highly Available Storage for Interactive Services J. Baker, C. Bond, J.C. Corbett, JJ Furman, A. Khorlin, J. Larson, J-M Léon, Y. Li, A. Lloyd, V. Yushprakh Google Inc. Originally
More informationMASSIVE DATA PROCESSING (THE GOOGLE WAY ) 27/04/2015. Fundamentals of Distributed Systems. Inside Google circa 2015
7/04/05 Fundamentals of Distributed Systems CC5- PROCESAMIENTO MASIVO DE DATOS OTOÑO 05 Lecture 4: DFS & MapReduce I Aidan Hogan aidhog@gmail.com Inside Google circa 997/98 MASSIVE DATA PROCESSING (THE
More informationBRAC. Investigating Cloud Data Storage UNIVERSITY SCHOOL OF ENGINEERING. SUPERVISOR: Dr. Mumit Khan DEPARTMENT OF COMPUTER SCIENCE AND ENGEENIRING
BRAC UNIVERSITY SCHOOL OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGEENIRING 12-12-2012 Investigating Cloud Data Storage Sumaiya Binte Mostafa (ID 08301001) Firoza Tabassum (ID 09101028) BRAC University
More informationOverview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB
Overview of Databases On MacOS Karl Kuehn Automation Engineer RethinkDB Session Goals Introduce Database concepts Show example players Not Goals: Cover non-macos systems (Oracle) Teach you SQL Answer what
More informationCloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu
Lecture 4 Introduction to Hadoop & GAE Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu Outline Introduction to Hadoop The Hadoop ecosystem Related projects
More informationCASSANDRA. Arash Akhlaghi, Badrinath Jayakumar, Wa el Belkasim. Instructor: Dr. Rajshekhar Sunderraman. CSC 8711 Project Report
CASSANDRA Arash Akhlaghi, Badrinath Jayakumar, Wa el Belkasim Instructor: Dr. Rajshekhar Sunderraman CSC 8711 Project Report 1 Introduction The relational model was brought by E.F. Codd s 1970 paper which
More informationComparative analysis of Google File System and Hadoop Distributed File System
Comparative analysis of Google File System and Hadoop Distributed File System R.Vijayakumari, R.Kirankumar, K.Gangadhara Rao Dept. of Computer Science, Krishna University, Machilipatnam, India, vijayakumari28@gmail.com
More informationUse Your MySQL Knowledge to Become an Instant Cassandra Guru
Use Your MySQL Knowledge to Become an Instant Cassandra Guru Percona Live Santa Clara 2014 Robert Hodges CEO Continuent Tim Callaghan VP/Engineering Tokutek Who are we? Robert Hodges CEO at Continuent
More informationCloud Computing Is In Your Future
Cloud Computing Is In Your Future Michael Stiefel www.reliablesoftware.com development@reliablesoftware.com http://www.reliablesoftware.com/dasblog/default.aspx Cloud Computing is Utility Computing Illusion
More information