Cassandra in Action ApacheCon NA 2013

Size: px
Start display at page:

Download "Cassandra in Action ApacheCon NA 2013"

Transcription

1 Cassandra in Action ApacheCon NA 2013 Yuki Morishita Software / Apache Cassandra Committer 1

2 2

3 ebay Application/Use Case Social Signals: like/want/own features for ebay product and item pages Hunch taste graph for ebay users and items Many time series use cases Why Cassandra? Multi-datacenter Scalable Write performance Distributed counters Hadoop support ACE 3

4 Time series data 4

5 Multi-Datacenter Support 5

6 Distributed counters 6

7 Hadoop support 7

8 Disney Application/Use Case Meet the data management needs of user facing applications across The Walt Disney Company with a single platform Why Cassandra? DataStax Enterprise can tackle real-time and search functions in the same cluster Scalability 24x7 uptime NDI 8

9 Multitenancy 9

10 Multi-tenancy 10

11 Enterprise search 11

12 Netflix Application/Use Case General purpose backend for large scale highly available cloud based web services supporting Netflix Streaming Why Cassandra? Highly available, highly robust and no schema change downtime Highly scalable, optimized for SSD Much lower cost than previous Oracle and SimpleDB implementations Flexible data model Ability to directly influence/implement OSS feature set Supports local and wide area distributed operations, spanning US and Europe RCE 12

13 Optimized for SSD 13

14 Open source 14

15 Use case patterns High performance Massively Scalable Reliable/Available 15

16 Cassandra Scalability In terms of scalability, there is a clear winner throughout our experiments. Cassandra achieves the highest throughput for the maximum number of nodes in all experiments with a linear increasing throughput from 1 to 12 nodes. 16

17 Classic partitioning with SPOF partition 1 partition 2 partition 3 partition 4 slave slave master router client 17

18 Fully distributed, no SPOF client p3 p1 p6 p1 p1 18

19 Partitioning Primary Key as Partition Key jim carol age: 36 car: camaro gender: M age: 37 car: subaru gender: F johnny suzy age:12 age:10 gender: M gender: F 19

20 PK Hashed Value jim carol johnny suzy 5e a9a f4eb27cea b421309e... MD5* hash operation yields a 128-bit number for keys of any size. 20

21 The Token Ring Node A Node B Node D Node C 21

22 Start End A 0xc x B 0x x C 0x x D 0x xc jim carol johnny suzy 5e a9a f4eb27cea b421309e... 22

23 Start End A 0xc x B 0x x C 0x x D 0x xc jim carol johnny suzy 5e a9a f4eb27cea b421309e... 23

24 Start End A 0xc x B 0x x C 0x x D 0x xc jim carol johnny suzy 5e a9a f4eb27cea b421309e... 24

25 Start End A 0xc x B 0x x C 0x x D 0x xc jim carol johnny suzy 5e a9a f4eb27cea b421309e... 25

26 Start End A 0xc x B 0x x C 0x x D 0x xc jim carol johnny suzy 5e a9a f4eb27cea b421309e... 26

27 Replication Node A Node B Node D Node C carol a9a

28 Replication Factor = 2 Node A Node B Node D Node C carol a9a

29 Replication Factor = 3 Node A Node B Node D Node C carol a9a

30 Tunable Consistency Consistency Level READ ONE, QUORUM, LOCAL_QUORUM, EACH_QUORUM, ALL WRITE ANY, ONE, QUORUM, LOCAL_QUORUM, EACH_QUORUM, ALL 30

31 Virtual Nodes without Vnodes with Vnodes 31

32 Virtual Nodes Cluster of heterogeneous machines 32

33 Virtual Nodes 33

34 CQL - Cassandra Query Language CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int ); INSERT INTO users (id, name, state, birth_date) VALUES ( f-11e2-9e c9a66, john, Texas, 1990); SELECT * FROM users WHERE state= Texas AND birth_date > 1950; 2012 DataStax

35 Strictly realtime focused No joins No subqueries No aggregation functions* or GROUP BY ORDER BY? 35

36 Collections CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int ); CREATE TABLE users_addresses ( user_id uuid REFERENCES users, text ); SELECT * FROM users NATURAL JOIN users_addresses; 2012 DataStax

37 Collections CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int ); X CREATE TABLE users_addresses ( user_id uuid REFERENCES users, text ); SELECT * FROM users NATURAL JOIN users_addresses; 2012 DataStax

38 Collections CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int, _addresses set<text> ); UPDATE users SET _addresses = _addresses + { jbellis@gmail.com, jbellis@datastax.com }; 2012 DataStax

39 Question? Feel free to contact me later if you have one yukim (IRC, twitter) 39

HDB++: HIGH AVAILABILITY WITH. l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

HDB++: HIGH AVAILABILITY WITH. l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg HDB++: HIGH AVAILABILITY WITH Page 1 OVERVIEW What is Cassandra (C*)? Who is using C*? CQL C* architecture Request Coordination Consistency Monitoring tool HDB++ Page 2 OVERVIEW What is Cassandra (C*)?

More information

Introduction to Cassandra

Introduction to Cassandra Introduction to Cassandra DuyHai DOAN, Technical Advocate Agenda! Architecture cluster replication Data model last write win (LWW), CQL basics (CRUD, DDL, collections, clustering column) lightweight transactions

More information

No-SQL Databases for High Volume Data

No-SQL Databases for High Volume Data Target Conference 2014 No-SQL Databases for High Volume Data Edward Wijnen 3 November 2014 The New Connected World Needs a Revolutionary New DBMS Today The Internet of Things 1990 s Mobile 1970 s Mainfram

More information

Distributed Systems. Tutorial 12 Cassandra

Distributed Systems. Tutorial 12 Cassandra Distributed Systems Tutorial 12 Cassandra written by Alex Libov Based on FOSDEM 2010 presentation winter semester, 2013-2014 Cassandra In Greek mythology, Cassandra had the power of prophecy and the curse

More information

Real-Time Big Data in practice with Cassandra. Michaël Figuière @mfiguiere

Real-Time Big Data in practice with Cassandra. Michaël Figuière @mfiguiere Real-Time Big Data in practice with Cassandra Michaël Figuière @mfiguiere Speaker Michaël Figuière @mfiguiere 2 Ring Architecture Cassandra 3 Ring Architecture Replica Replica Replica 4 Linear Scalability

More information

Highly available, scalable and secure data with Cassandra and DataStax Enterprise. GOTO Berlin 27 th February 2014

Highly available, scalable and secure data with Cassandra and DataStax Enterprise. GOTO Berlin 27 th February 2014 Highly available, scalable and secure data with Cassandra and DataStax Enterprise GOTO Berlin 27 th February 2014 About Us Steve van den Berg Johnny Miller Solutions Architect Regional Director Western

More information

Apache Cassandra for Big Data Applications

Apache Cassandra for Big Data Applications Apache Cassandra for Big Data Applications Christof Roduner COO and co-founder christof@scandit.com Java User Group Switzerland January 7, 2014 2 AGENDA Cassandra origins and use How we use Cassandra Data

More information

Dell Reference Configuration for DataStax Enterprise powered by Apache Cassandra

Dell Reference Configuration for DataStax Enterprise powered by Apache Cassandra Dell Reference Configuration for DataStax Enterprise powered by Apache Cassandra A Quick Reference Configuration Guide Kris Applegate kris_applegate@dell.com Solution Architect Dell Solution Centers Dave

More information

October 1-3, 2012 gotocon.com. Apache Cassandra As A BigData Platform Matthew F. Dennis // @mdennis

October 1-3, 2012 gotocon.com. Apache Cassandra As A BigData Platform Matthew F. Dennis // @mdennis October 1-3, 2012 gotocon.com Apache Cassandra As A BigData Platform Matthew F. Dennis // @mdennis Why Does BigData Matter? Effective Use of BigData Leads To Success And The Trends Continue (according

More information

Introduction to Apache Cassandra

Introduction to Apache Cassandra Introduction to Apache Cassandra White Paper BY DATASTAX CORPORATION JULY 2013 1 Table of Contents Abstract 3 Introduction 3 Built by Necessity 3 The Architecture of Cassandra 4 Distributing and Replicating

More information

FAQs. Requirements in node Joins. CS535 Big Data Fall 2015 Colorado State University http://www.cs.colostate.edu/~cs535

FAQs. Requirements in node Joins. CS535 Big Data Fall 2015 Colorado State University http://www.cs.colostate.edu/~cs535 CS535 Big Data - Fall 215 W1.B. CS535 Big Data - Fall 215 W1.B.1 CS535 BIG DATA FAQs Zookeeper Installation [Step 1] Download the zookeeper package: $ wget http://apache.arvixe.com/zookeeper/stable/ zookeeper-3.4.6.tar.gz!

More information

Enabling SOX Compliance on DataStax Enterprise

Enabling SOX Compliance on DataStax Enterprise Enabling SOX Compliance on DataStax Enterprise Table of Contents Table of Contents... 2 Introduction... 3 SOX Compliance and Requirements... 3 Who Must Comply with SOX?... 3 SOX Goals and Objectives...

More information

NoSQL: Going Beyond Structured Data and RDBMS

NoSQL: Going Beyond Structured Data and RDBMS NoSQL: Going Beyond Structured Data and RDBMS Scenario Size of data >> disk or memory space on a single machine Store data across many machines Retrieve data from many machines Machine = Commodity machine

More information

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Hadoop MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Understanding Hadoop Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current

More information

JVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra

JVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra JVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra January 2014 Legal Notices Apache Cassandra, Spark and Solr and their respective logos are trademarks or registered trademarks

More information

So What s the Big Deal?

So What s the Big Deal? So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data

More information

How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns

How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns Table of Contents Abstract... 3 Introduction... 3 Definition... 3 The Expanding Digitization

More information

Evaluation of NoSQL databases for large-scale decentralized microblogging

Evaluation of NoSQL databases for large-scale decentralized microblogging Evaluation of NoSQL databases for large-scale decentralized microblogging Cassandra & Couchbase Alexandre Fonseca, Anh Thu Vu, Peter Grman Decentralized Systems - 2nd semester 2012/2013 Universitat Politècnica

More information

Introduction to Multi-Data Center Operations with Apache Cassandra and DataStax Enterprise

Introduction to Multi-Data Center Operations with Apache Cassandra and DataStax Enterprise Introduction to Multi-Data Center Operations with Apache Cassandra and DataStax Enterprise White Paper BY DATASTAX CORPORATION October 2013 1 Table of Contents Abstract 3 Introduction 3 The Growth in Multiple

More information

Use Your MySQL Knowledge to Become an Instant Cassandra Guru

Use Your MySQL Knowledge to Become an Instant Cassandra Guru Use Your MySQL Knowledge to Become an Instant Cassandra Guru Percona Live Santa Clara 2014 Robert Hodges CEO Continuent Tim Callaghan VP/Engineering Tokutek Who are we? Robert Hodges CEO at Continuent

More information

Distributed Storage Systems part 2. Marko Vukolić Distributed Systems and Cloud Computing

Distributed Storage Systems part 2. Marko Vukolić Distributed Systems and Cloud Computing Distributed Storage Systems part 2 Marko Vukolić Distributed Systems and Cloud Computing Distributed storage systems Part I CAP Theorem Amazon Dynamo Part II Cassandra 2 Cassandra in a nutshell Distributed

More information

Evaluating Apache Cassandra as a Cloud Database White Paper

Evaluating Apache Cassandra as a Cloud Database White Paper Evaluating Apache Cassandra as a Cloud Database White Paper BY DATASTAX CORPORATION October 2013 1 Table of Contents Abstract 3 Introduction 3 Why Move to a Cloud Database? 3 The Cloud Promises Transparent

More information

INTRODUCTION TO CASSANDRA

INTRODUCTION TO CASSANDRA INTRODUCTION TO CASSANDRA This ebook provides a high level overview of Cassandra and describes some of its key strengths and applications. WHAT IS CASSANDRA? Apache Cassandra is a high performance, open

More information

How To Use Big Data For Telco (For A Telco)

How To Use Big Data For Telco (For A Telco) ON-LINE VIDEO ANALYTICS EMBRACING BIG DATA David Vanderfeesten, Bell Labs Belgium ANNO 2012 YOUR DATA IS MONEY BIG MONEY! Your click stream, your activity stream, your electricity consumption, your call

More information

Going Native With Apache Cassandra. QCon London, 2014 www.datastax.com @DataStaxEMEA

Going Native With Apache Cassandra. QCon London, 2014 www.datastax.com @DataStaxEMEA Going Native With Apache Cassandra QCon London, 2014 www.datastax.com @DataStaxEMEA About Me Johnny Miller Solutions Architect www.datastax.com @DataStaxEU jmiller@datastax.com @CyanMiller https://www.linkedin.com/in/johnnymiller

More information

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015 NoSQL Databases Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015 Database Landscape Source: H. Lim, Y. Han, and S. Babu, How to Fit when No One Size Fits., in CIDR,

More information

Apache Cassandra Present and Future. Jonathan Ellis

Apache Cassandra Present and Future. Jonathan Ellis Apache Cassandra Present and Future Jonathan Ellis History Bigtable, 2006 Dynamo, 2007 OSS, 2008 Incubator, 2009 TLP, 2010 1.0, October 2011 Why people choose Cassandra Multi-master, multi-dc Linearly

More information

A Distributed Network Security Analysis System Based on Apache Hadoop-Related Technologies. Jeff Springer, Mehmet Gunes, George Bebis

A Distributed Network Security Analysis System Based on Apache Hadoop-Related Technologies. Jeff Springer, Mehmet Gunes, George Bebis A Distributed Network Security Analysis System Based on Apache Hadoop-Related Technologies Bingdong Li, Jeff Springer, Mehmet Gunes, George Bebis University of Nevada Reno FloCon 2013 January 7-10, Albuquerque,

More information

Simba Apache Cassandra ODBC Driver

Simba Apache Cassandra ODBC Driver Simba Apache Cassandra ODBC Driver with SQL Connector 2.2.0 Released 2015-11-13 These release notes provide details of enhancements, features, and known issues in Simba Apache Cassandra ODBC Driver with

More information

Implementing Search in Web, Mobile, and IOT Applications An Overview of DataStax Enterprise Search

Implementing Search in Web, Mobile, and IOT Applications An Overview of DataStax Enterprise Search Implementing Search in Web, Mobile, and IOT Applications An Overview of DataStax Enterprise Search Table of Contents Introduction... 3 Why Search?... 3 General Search Requirements... 3 Traditional Deployment

More information

Apache Cassandra 1.2

Apache Cassandra 1.2 Apache Cassandra 1.2 Documentation January 21, 2016 Apache, Apache Cassandra, Apache Hadoop, Hadoop and the eye logo are trademarks of the Apache Software Foundation 2016 DataStax, Inc. All rights reserved.

More information

NoSQL Database Options

NoSQL Database Options NoSQL Database Options Introduction For this report, I chose to look at MongoDB, Cassandra, and Riak. I chose MongoDB because it is quite commonly used in the industry. I chose Cassandra because it has

More information

DD2471: Modern Database Systems and Their Applications Distributed data management using Apache Cassandra

DD2471: Modern Database Systems and Their Applications Distributed data management using Apache Cassandra DD2471: Modern Database Systems and Their Applications Distributed data management using Apache Cassandra Frej Connolly, Erik Ranby, and Alexander Roghult KTH CSC The School of Computer Science and Communication

More information

Introduction to Multi-Data Center Operations with Apache Cassandra, Hadoop, and Solr WHITE PAPER

Introduction to Multi-Data Center Operations with Apache Cassandra, Hadoop, and Solr WHITE PAPER Introduction to Multi-Data Center Operations with Apache Cassandra, Hadoop, and Solr WHITE PAPER By DataStax Corporation August 2012 Contents Introduction...3 The Growth in Multiple Data Centers...3 Why

More information

Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk

Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk Benchmarking Couchbase Server for Interactive Applications By Alexey Diomin and Kirill Grigorchuk Contents 1. Introduction... 3 2. A brief overview of Cassandra, MongoDB, and Couchbase... 3 3. Key criteria

More information

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB Overview of Databases On MacOS Karl Kuehn Automation Engineer RethinkDB Session Goals Introduce Database concepts Show example players Not Goals: Cover non-macos systems (Oracle) Teach you SQL Answer what

More information

NoSQL Databases. Nikos Parlavantzas

NoSQL Databases. Nikos Parlavantzas !!!! NoSQL Databases Nikos Parlavantzas Lecture overview 2 Objective! Present the main concepts necessary for understanding NoSQL databases! Provide an overview of current NoSQL technologies Outline 3!

More information

Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1

Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1 Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots

More information

The Modern Online Application for the Internet Economy: 5 Key Requirements that Ensure Success

The Modern Online Application for the Internet Economy: 5 Key Requirements that Ensure Success The Modern Online Application for the Internet Economy: 5 Key Requirements that Ensure Success 1 Table of Contents Abstract... 3 Introduction... 3 Requirement #1 Smarter Customer Interactions... 4 Requirement

More information

Practical Cassandra. Vitalii Tymchyshyn tivv00@gmail.com @tivv00

Practical Cassandra. Vitalii Tymchyshyn tivv00@gmail.com @tivv00 Practical Cassandra NoSQL key-value vs RDBMS why and when Cassandra architecture Cassandra data model Life without joins or HDD space is cheap today Hardware requirements & deployment hints Vitalii Tymchyshyn

More information

NoSQL and Hadoop Technologies On Oracle Cloud

NoSQL and Hadoop Technologies On Oracle Cloud NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath

More information

Comparing Oracle with Cassandra / DataStax Enterprise

Comparing Oracle with Cassandra / DataStax Enterprise Comparing Oracle with Cassandra / DataStax Enterprise Table of Contents Table of Contents... 2 Abstract... 3 Introduction... 3 Oracle and Today s Online Applications... 3 Architectural Limitations... 3

More information

Comparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS)

Comparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS) Comparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS) White Paper BY DATASTAX CORPORATION August 2013 1 Table of Contents Abstract 3 Introduction 3 Overview of HDFS 4

More information

Evaluating Apache Cassandra as a Cloud Database WHITE PAPER

Evaluating Apache Cassandra as a Cloud Database WHITE PAPER Evaluating Apache Cassandra as a Cloud Database WHITE PAPER By DataStax Corporation March 2012 Contents Introduction... 3 Why Move to a Cloud Database?... 3 The Cloud Promises Transparent Elasticity...

More information

Apache Cassandra 2.0

Apache Cassandra 2.0 Apache Cassandra 2.0 Documentation December 16, 2015 Apache, Apache Cassandra, Apache Hadoop, Hadoop and the eye logo are trademarks of the Apache Software Foundation 2015 DataStax, Inc. All rights reserved.

More information

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55 Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems

More information

BIG DATA TOOLS. Top 10 open source technologies for Big Data

BIG DATA TOOLS. Top 10 open source technologies for Big Data BIG DATA TOOLS Top 10 open source technologies for Big Data We are in an ever expanding marketplace!!! With shorter product lifecycles, evolving customer behavior and an economy that travels at the speed

More information

Cassandra A Decentralized Structured Storage System

Cassandra A Decentralized Structured Storage System Cassandra A Decentralized Structured Storage System Avinash Lakshman, Prashant Malik LADIS 2009 Anand Iyer CS 294-110, Fall 2015 Historic Context Early & mid 2000: Web applicaoons grow at tremendous rates

More information

Using Kafka to Optimize Data Movement and System Integration. Alex Holmes @

Using Kafka to Optimize Data Movement and System Integration. Alex Holmes @ Using Kafka to Optimize Data Movement and System Integration Alex Holmes @ https://www.flickr.com/photos/tom_bennett/7095600611 THIS SUCKS E T (circa 2560 B.C.E.) L a few years later... 2,014 C.E. i need

More information

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next

More information

Distributed Storage Systems

Distributed Storage Systems Distributed Storage Systems John Leach john@brightbox.com twitter @johnleach Brightbox Cloud http://brightbox.com Our requirements Bright box has multiple zones (data centres) Should tolerate a zone failure

More information

Evaluating Apache Cassandra as a Cloud Database WHITE PAPER

Evaluating Apache Cassandra as a Cloud Database WHITE PAPER Evaluating Apache Cassandra as a Cloud Database WHITE PAPER Evaluating Apache Cassandra as a Cloud Database By DataStax Corporation November 2011 Contents Introduction... 3 Why Move to a Cloud Database?...

More information

[Hadoop, Storm and Couchbase: Faster Big Data]

[Hadoop, Storm and Couchbase: Faster Big Data] [Hadoop, Storm and Couchbase: Faster Big Data] With over 8,500 clients, LivePerson is the global leader in intelligent online customer engagement. With an increasing amount of agent/customer engagements,

More information

Case study: CASSANDRA

Case study: CASSANDRA Case study: CASSANDRA Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu Cassandra:

More information

Big Data: Beyond the Hype. Why Big Data Matters to You. White Paper

Big Data: Beyond the Hype. Why Big Data Matters to You. White Paper Big Data: Beyond the Hype Why Big Data Matters to You White Paper BY DATASTAX CORPORATION October 2013 Table of Contents Abstract 3 Introduction 3 Big Data and You 5 Big Data Is More Prevalent Than You

More information

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning

More information

Comparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS) WHITE PAPER

Comparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS) WHITE PAPER Comparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS) WHITE PAPER By DataStax Corporation September 2012 Contents Introduction... 3 Overview of HDFS... 4 The Benefits

More information

Understanding Neo4j Scalability

Understanding Neo4j Scalability Understanding Neo4j Scalability David Montag January 2013 Understanding Neo4j Scalability Scalability means different things to different people. Common traits associated include: 1. Redundancy in the

More information

I Logs. Apache Kafka, Stream Processing, and Real-time Data Jay Kreps

I Logs. Apache Kafka, Stream Processing, and Real-time Data Jay Kreps I Logs Apache Kafka, Stream Processing, and Real-time Data Jay Kreps The Plan 1. What is Data Integration? 2. What is Apache Kafka? 3. Logs and Distributed Systems 4. Logs and Data Integration 5. Logs

More information

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets

More information

Big Data: Beyond the Hype

Big Data: Beyond the Hype Big Data: Beyond the Hype Why Big Data Matters to You WHITE PAPER By DataStax Corporation March 2012 Contents Introduction... 3 Big Data and You... 5 Big Data Is More Prevalent Than You Think... 5 Big

More information

Apache Cassandra 1.2 Documentation

Apache Cassandra 1.2 Documentation Apache Cassandra 1.2 Documentation January 13, 2013 2013 DataStax. All rights reserved. Contents Apache Cassandra 1.2 Documentation 1 What's new in Apache Cassandra 1.2 1 Key Improvements 1 Concurrent

More information

Big Data JAMES WARREN. Principles and best practices of NATHAN MARZ MANNING. scalable real-time data systems. Shelter Island

Big Data JAMES WARREN. Principles and best practices of NATHAN MARZ MANNING. scalable real-time data systems. Shelter Island Big Data Principles and best practices of scalable real-time data systems NATHAN MARZ JAMES WARREN II MANNING Shelter Island contents preface xiii acknowledgments xv about this book xviii ~1 Anew paradigm

More information

Alternatives to HIVE SQL in Hadoop File Structure

Alternatives to HIVE SQL in Hadoop File Structure Alternatives to HIVE SQL in Hadoop File Structure Ms. Arpana Chaturvedi, Ms. Poonam Verma ABSTRACT Trends face ups and lows.in the present scenario the social networking sites have been in the vogue. The

More information

Table of Contents... 2

Table of Contents... 2 Why NoSQL? Table of Contents Table of Contents... 2 Abstract... 3 Introduction... 3 You Have Big Data... 3 How Does DataStax Helps Manage Big Data... 3 Big Data Performance... 4 You Need Continuous Availability...

More information

How To Scale Out Of A Nosql Database

How To Scale Out Of A Nosql Database Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

NOSQL DATABASES AND CASSANDRA

NOSQL DATABASES AND CASSANDRA NOSQL DATABASES AND CASSANDRA Semester Project: Advanced Databases DECEMBER 14, 2015 WANG CAN, EVABRIGHT BERTHA Université Libre de Bruxelles 0 Preface The goal of this report is to introduce the new evolving

More information

Can the Elephants Handle the NoSQL Onslaught?

Can the Elephants Handle the NoSQL Onslaught? Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented

More information

Real Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA

Real Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA Real Time Fraud Detection With Sequence Mining on Big Data Platform Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA Open Source Big Data Eco System Query (NOSQL) : Cassandra,

More information

StratioDeep. An integration layer between Cassandra and Spark. Álvaro Agea Herradón Antonio Alcocer Falcón

StratioDeep. An integration layer between Cassandra and Spark. Álvaro Agea Herradón Antonio Alcocer Falcón StratioDeep An integration layer between Cassandra and Spark Álvaro Agea Herradón Antonio Alcocer Falcón StratioDeep An integration layer between Cassandra and Spark Álvaro Agea Herradón Antonio Alcocer

More information

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world Analytics March 2015 White paper Why NoSQL? Your database options in the new non-relational world 2 Why NoSQL? Contents 2 New types of apps are generating new types of data 2 A brief history of NoSQL 3

More information

Scalable Architecture on Amazon AWS Cloud

Scalable Architecture on Amazon AWS Cloud Scalable Architecture on Amazon AWS Cloud Kalpak Shah Founder & CEO, Clogeny Technologies kalpak@clogeny.com 1 * http://www.rightscale.com/products/cloud-computing-uses/scalable-website.php 2 Architect

More information

ORACLE COHERENCE 12CR2

ORACLE COHERENCE 12CR2 ORACLE COHERENCE 12CR2 KEY FEATURES AND BENEFITS ORACLE COHERENCE IS THE #1 IN-MEMORY DATA GRID. KEY FEATURES Fault-tolerant in-memory distributed data caching and processing Persistence for fast recovery

More information

Putting Apache Kafka to Use!

Putting Apache Kafka to Use! Putting Apache Kafka to Use! Building a Real-time Data Platform for Event Streams! JAY KREPS, CONFLUENT! A Couple of Themes! Theme 1: Rise of Events! Theme 2: Immutability Everywhere! Level! Example! Immutable

More information

Hive Development. (~15 minutes) Yongqiang He Software Engineer. Facebook Data Infrastructure Team

Hive Development. (~15 minutes) Yongqiang He Software Engineer. Facebook Data Infrastructure Team Hive Development (~15 minutes) Yongqiang He Software Engineer Facebook Data Infrastructure Team Agenda 1 Introduction 2 New Features 3 Future What is Hive? A system for managing and querying structured

More information

LARGE-SCALE DATA STORAGE APPLICATIONS

LARGE-SCALE DATA STORAGE APPLICATIONS BENCHMARKING AVAILABILITY AND FAILOVER PERFORMANCE OF LARGE-SCALE DATA STORAGE APPLICATIONS Wei Sun and Alexander Pokluda December 2, 2013 Outline Goal and Motivation Overview of Cassandra and Voldemort

More information

Hadoop: Embracing future hardware

Hadoop: Embracing future hardware Hadoop: Embracing future hardware Suresh Srinivas @suresh_m_s Page 1 About Me Architect & Founder at Hortonworks Long time Apache Hadoop committer and PMC member Designed and developed many key Hadoop

More information

Replicating to everything

Replicating to everything Replicating to everything Featuring Tungsten Replicator A Giuseppe Maxia, QA Architect Vmware About me Giuseppe Maxia, a.k.a. "The Data Charmer" QA Architect at VMware Previously at AB / Sun / 3 times

More information

Benchmarking the Availability and Fault Tolerance of Cassandra

Benchmarking the Availability and Fault Tolerance of Cassandra Benchmarking the Availability and Fault Tolerance of Cassandra Marten Rosselli, Raik Niemann, Todor Ivanov, Karsten Tolle, Roberto V. Zicari Goethe-University Frankfurt, Germany Frankfurt Big Data Lab

More information

GigaSpaces Real-Time Analytics for Big Data

GigaSpaces Real-Time Analytics for Big Data GigaSpaces Real-Time Analytics for Big Data GigaSpaces makes it easy to build and deploy large-scale real-time analytics systems Rapidly increasing use of large-scale and location-aware social media and

More information

MyISAM Default Storage Engine before MySQL 5.5 Table level locking Small footprint on disk Read Only during backups GIS and FTS indexing Copyright 2014, Oracle and/or its affiliates. All rights reserved.

More information

Unified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia

Unified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia Unified Big Data Processing with Apache Spark Matei Zaharia @matei_zaharia What is Apache Spark? Fast & general engine for big data processing Generalizes MapReduce model to support more types of processing

More information

CloudDB: A Data Store for all Sizes in the Cloud

CloudDB: A Data Store for all Sizes in the Cloud CloudDB: A Data Store for all Sizes in the Cloud Hakan Hacigumus Data Management Research NEC Laboratories America http://www.nec-labs.com/dm www.nec-labs.com What I will try to cover Historical perspective

More information

An Approach to Implement Map Reduce with NoSQL Databases

An Approach to Implement Map Reduce with NoSQL Databases www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh

More information

Optimization of Analytic Data Flows for Next Generation Business Intelligence Applications

Optimization of Analytic Data Flows for Next Generation Business Intelligence Applications Optimization of Analytic Data Flows for Next Generation Business Intelligence Applications Umeshwar Dayal, Kevin Wilkinson, Alkis Simitsis, Malu Castellanos, Lupita Paz HP Labs Palo Alto, CA, USA umeshwar.dayal@hp.com

More information

Katta & Hadoop. Katta - Distributed Lucene Index in Production. Stefan Groschupf Scale Unlimited, 101tec. sg{at}101tec.com

Katta & Hadoop. Katta - Distributed Lucene Index in Production. Stefan Groschupf Scale Unlimited, 101tec. sg{at}101tec.com 1 Katta & Hadoop Katta - Distributed Lucene Index in Production Stefan Groschupf Scale Unlimited, 101tec. sg{at}101tec.com foto by: belgianchocolate@flickr.com 2 Intro Business intelligence reports from

More information

Using Cloud Services for Test Environments A case study of the use of Amazon EC2

Using Cloud Services for Test Environments A case study of the use of Amazon EC2 Using Cloud Services for Test Environments A case study of the use of Amazon EC2 Lee Hawkins (Quality Architect) Quest Software, Melbourne Copyright 2010 Quest Software We are gathered here today to talk

More information

Four Orders of Magnitude: Running Large Scale Accumulo Clusters. Aaron Cordova Accumulo Summit, June 2014

Four Orders of Magnitude: Running Large Scale Accumulo Clusters. Aaron Cordova Accumulo Summit, June 2014 Four Orders of Magnitude: Running Large Scale Accumulo Clusters Aaron Cordova Accumulo Summit, June 2014 Scale, Security, Schema Scale to scale 1 - (vt) to change the size of something let s scale the

More information

Ryan Horn, Lead Software Engineer at Twilio. November 12, 2014 Las Vegas. BDT312 Using the Cloud to Scale from a Database to a Data Platform

Ryan Horn, Lead Software Engineer at Twilio. November 12, 2014 Las Vegas. BDT312 Using the Cloud to Scale from a Database to a Data Platform BDT312 Using the Cloud to Scale from a Database to a Data Platform Ryan Horn, Lead Software Engineer at Twilio November 12, 2014 Las Vegas 2014 Amazon.com, Inc. and its affiliates. All rights reserved.

More information

Big Data: Beyond the Hype

Big Data: Beyond the Hype Big Data: Beyond the Hype Why Big Data Matters to You WHITE PAPER Big Data: Beyond the Hype Why Big Data Matters to You By DataStax Corporation October 2011 Table of Contents Introduction...4 Big Data

More information

Big Data Management. Big Data Management. (BDM) Autumn 2013. Povl Koch September 30, 2013 29-09-2013 1

Big Data Management. Big Data Management. (BDM) Autumn 2013. Povl Koch September 30, 2013 29-09-2013 1 Big Data Management Big Data Management (BDM) Autumn 2013 Povl Koch September 30, 2013 29-09-2013 1 Overview Today s program 1. Little more practical details about this course 2. Recap from last time 3.

More information

Evaluator s Guide. McKnight. Consulting Group. McKnight Consulting Group

Evaluator s Guide. McKnight. Consulting Group. McKnight Consulting Group NoSQL Evaluator s Guide McKnight Consulting Group William McKnight is the former IT VP of a Fortune 50 company and the author of Information Management: Strategies for Gaining a Competitive Advantage with

More information

STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE SAMZA. Processing billions of events every day

STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE SAMZA. Processing billions of events every day STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE SAMZA Processing billions of events every day Neha Narkhede Co-founder and Head of Engineering @ Stealth Startup Prior to this Lead, Streams Infrastructure

More information

Neil Stobart Cloudian Inc. CLOUDIAN HYPERSTORE Smart Data Storage

Neil Stobart Cloudian Inc. CLOUDIAN HYPERSTORE Smart Data Storage Neil Stobart Cloudian Inc. CLOUDIAN HYPERSTORE Smart Data Storage Storage is changing forever Scale Up / Terabytes Flash host/array Tradi/onal SAN/NAS Scalability / Big Data Object Storage Scale Out /

More information

Apache HBase. Crazy dances on the elephant back

Apache HBase. Crazy dances on the elephant back Apache HBase Crazy dances on the elephant back Roman Nikitchenko, 16.10.2014 YARN 2 FIRST EVER DATA OS 10.000 nodes computer Recent technology changes are focused on higher scale. Better resource usage

More information

Data Modeling in the New World with Apache Cassandra TM. Jonathan Ellis CTO, DataStax Project chair, Apache Cassandra

Data Modeling in the New World with Apache Cassandra TM. Jonathan Ellis CTO, DataStax Project chair, Apache Cassandra Data Modeling in the New World with Apache Cassandra TM Jonathan Ellis CTO, DataStax Project chair, Apache Cassandra Download & install Cassandra http://planetcassandra.org/cassandra/ 2014 DataStax. Do

More information

Search Big Data with MySQL and Sphinx. Mindaugas Žukas www.ivinco.com

Search Big Data with MySQL and Sphinx. Mindaugas Žukas www.ivinco.com Search Big Data with MySQL and Sphinx Mindaugas Žukas www.ivinco.com Agenda Big Data Architecture Factors and Technologies MySQL and Big Data Sphinx Search Server overview Case study: building a Big Data

More information

NoSQL Data Base Basics

NoSQL Data Base Basics NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS

More information