Apache Cassandra for Big Data Applications
|
|
|
- Eric Gilmore
- 10 years ago
- Views:
Transcription
1 Apache Cassandra for Big Data Applications Christof Roduner COO and co-founder Java User Group Switzerland January 7, 2014
2 2 AGENDA Cassandra origins and use How we use Cassandra Data model and query language Cluster organization Replication and consistency Practical experience
3 3 WHAT IS CASSANDRA? SQL
4 4 WHAT IS CASSANDRA? not only SQL
5 5 ORIGINS Dynamo distributed storage BigTable data model
6 6 USED BY
7 7 SCANDIT ETH Zurich startup company Our mission: provide the best mobile barcode scanning platform Customers: Bayer, Coop, CapitalOne, Saks 5 th Avenue, Nasa, Barcode scanning SDKs for: ios, Android Phonegap Titanium Xamarin de Scanner SDK ios v3.0.0 De
8 8 SCANDIT
9 9 THE SCANALYTICS PLATFORM Two purposes: 1. External tool for app publishers: App-specific real-time usage statistics Insights into user behavior What do users scan? Product categories? Groceries, electronics, books, cosmetics,? Where do users scan? At home? Or while in a retail store? Top products and brands 2. Internal tool for our algorithms team: Improve our image processing algorithms Detect devices and OS versions with camera issues Monitor scan performance of our SDK
10
11
12 BACKEND REQUIREMENTS 12 Analysis of scans Accept and store high volumes of scans Keep history of billions of camera parameters Generate statistics over extended time periods Provide reports to developers
13 13 BACKEND DESIGN GOALS Scalability High-volume storage High-volume throughput Support large number of concurrent client requests (mobile devices) Availability Low maintenance Even as our customer base grows Multiple data centers
14 14 WHY DID WE CHOOSE CASSANDRA? Partitioning A..J K..R S..Z
15 15 WHY DID WE CHOOSE CASSANDRA? Simplicity Master Slave Coordinator
16 16 MORE REASONS Looked very fast Even when data is much larger than RAM Performs well in write-heavy environment Proven scalability Without downtime Tunable replication Data model YMMV
17 WHAT YOU HAVE TO GIVE UP 17 Joins Referential integrity Transactions Expressive query language (nested queries, etc.) Consistency (tunable, but not by default ) Limited support for secondary indices
18 18 HELLO CQL CREATE TABLE users ( username TEXT, TEXT, web TEXT, phone TEXT, PRIMARY KEY (username) );
19 19 HELLO CQL CREATE TABLE users ( username TEXT, TEXT, web TEXT, phone TEXT, PRIMARY KEY (username) ); INSERT INTO users (username, , phone) VALUES ('alice', '[email protected]', ' '); INSERT INTO users (username, , web) VALUES ('bob', '[email protected]', '
20 20 HELLO CQL CREATE TABLE users ( username TEXT, TEXT, web TEXT, phone TEXT, PRIMARY KEY (username) ); INSERT INTO users (username, , phone) VALUES ('alice', '[email protected]', ' '); INSERT INTO users (username, , web) VALUES ('bob', '[email protected]', ' cqlsh:demo> SELECT * FROM users; username phone web bob [email protected] null alice [email protected] null
21 21 FAMILIAR BUT DIFFERENT CREATE TABLE users ( username TEXT, TEXT, web TEXT, phone TEXT, PRIMARY KEY (username) ); No auto increments (use natural key or UUID instead) Primary key always mandatory
22 22 FAMILIAR BUT DIFFERENT cqlsh:demo> SELECT * FROM users; username phone web bob [email protected] null alice [email protected] null
23 23 FAMILIAR BUT DIFFERENT CREATE TABLE users ( username TEXT, TEXT, web TEXT, phone TEXT, PRIMARY KEY (username) ); Sort order? cqlsh:demo> SELECT * FROM users; username phone web bob [email protected] null alice [email protected] null
24 24 UNDER THE HOOD: CLUSTER ORGANIZATION Node 1 Token 0 Range 1-64, stored on node 2 Node 4 Token 192 Node 2 Token 64 Node 3 Token 128 Range , stored on node 3
25 STORING A ROW Calculate md5 hash for row key (the username field in the example above) Example: md5( alice") = Determine data range for hash Example: 48 lies within range 1-64 Node 1 Token 0 Range 1-64, stored on node 2 3. Store row on node responsible for range Example: store on node 2 Node 4 Token 192 Node 2 Token 64 Node 3 Token 128 Range , stored on node 3
26 26 IMPLICATIONS Cluster automatically balanced Load is shared equally between nodes No hotspots Scaling out? Easy Divide data ranges by adding more nodes Cluster rebalances itself automatically Range queries not possible You can t retrieve «all rows from A-C» Rows are not stored in their «natural» order Rows are stored in order of their md5 hashes
27 27 FAMILIAR BUT DIFFERENT CREATE TABLE users ( username TEXT, TEXT, web TEXT, phone TEXT, PRIMARY KEY (username) ); Sort order? cqlsh:demo> SELECT * FROM users; username phone web bob [email protected] null alice [email protected] null
28 28 UNDER THE HOOD: PHYSICAL STORAGE A physical row stores data in namevalue pairs ( cells ) Cell name is CQL field name (e.g. ) Cell value is field data (e.g. [email protected] ) Cells in row are automatically sorted by name ( < phone < web ) Cell names can be different in rows Up to 2 billion cells per row INSERT INTO users (username, , phone) VALUES ('alice', '[email protected]', ' '); INSERT INTO users (username, , web) VALUES ('bob', '[email protected]', ' bob [email protected] web: Physical row with row key alice alice [email protected] phone:
29 29 FAMILIAR BUT DIFFERENT CREATE TABLE users ( username TEXT, TEXT, web TEXT, phone TEXT, PRIMARY KEY (username) ); Sort order? cqlsh:demo> SELECT * FROM users; username phone web bob [email protected] null alice [email protected] null
30 30 TWO BILLION CELLS CREATE TABLE users ( ); username TEXT, TEXT, web TEXT, phone TEXT, address TEXT, spouse TEXT, hobbies TEXT, hair_color TEXT, favorite_dish TEXT, pet_name TEXT, favorite_bands TEXT, two_billionth_field TEXT, PRIMARY KEY (username) Who needs 2 billion fields in a table?!?
31 31 2 BILLION CELLS: WIDE ROWS Use case: track logins of users Data model: One (wide) physical row per user User name as row key Login details (time, IP address, user agent) in cells Cells ordered and grouped ( clustered ) by login timestamp Cells are now tuple-value pairs Advantage: range queries! alice [ , agent]: Firefox [ , ip_address]: [ , agent]: Firefox [ , ip_address]: bob [ , agent]: Chrome [ , ip_address]:
32 32 2 BILLION CELLS: WIDE ROWS Use case: track logins of users Data model: One (wide) physical row per user User name as row key Login details (time, IP address, user agent) in cells Cells ordered and grouped ( clustered ) by login timestamp Cells are now tuple-value pairs CREATE TABLE logins ( ); username TEXT, timestamp TIMESTAMP, ip_address TEXT, agent TEXT, PRIMARY KEY (username, timestamp) Advantage: range queries! alice [ , agent]: Firefox [ , ip_address]: [ , agent]: Firefox [ , ip_address]: bob [ , agent]: Chrome [ , ip_address]:
33 33 QUERYING THE LOGINS INSERT INTO logins (username, timestamp, ip_address, agent) VALUES ('alice', ' :22: ', ' ', 'Firefox'); cqlsh:demo> SELECT * FROM logins; username timestamp agent ip_address bob :12: Chrome alice :22: Firefox alice :48: Firefox alice :06: Firefox alice :37: Firefox
34 34 ONE CQL ROW FOR EACH CELL CLUSTER Physical rows alice bob [ , agent]: Firefox [ , agent]: Chrome [ , ip_address]: [ , ip_address]: [ , agent]: Firefox [ , ip_address]: cqlsh:demo> SELECT * FROM logins; CQL rows username timestamp agent ip_address bob :12: Chrome alice :22: Firefox alice :48: Firefox alice :06: Firefox alice :37: Firefox
35 35 RANGE QUERIES REVISITED Range queries involving timestamp field are possible (because cells are ordered by timestamp): cqlsh:demo> SELECT * FROM logins WHERE username = 'bob' AND timestamp > ' ' AND timestamp < ' '; username timestamp agent ip_address bob :12: Chrome But you still have to provide a row key: cqlsh:demo> SELECT * FROM logins WHERE timestamp > ' ' AND timestamp < ' '; Bad Request: Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING
36 36 SECONDARY INDICES Queries involving a non-indexed field are not possible: cqlsh:demo> SELECT * FROM users WHERE = '[email protected]'; Bad Request: No indexed columns present in by-columns clause with Equal operator Secondary indices can be defined for (single) fields: CREATE INDEX _key ON users ( ); SELECT * FROM users WHERE = '[email protected]';
37 SECONDARY INDICES 37 Secondary indices only support equality predicate (=) in queries Each node maintains index for data it owns Request must be forwarded to all nodes Sometimes not the most efficient approach Often better to denormalize and manually maintain your own index
38 REPLICATION 38 Tunable replication factor (RF) RF > 1: rows are automatically replicated to next RF-1 nodes Node 1 Token 0 Replica 1 of row «foobar» Tunable replication strategy «Ensure two replicas in different data centers, racks, etc.» Node 4 Token 192 Node 2 Token 64 Node 3 Token 128 Replica 2 of row «foobar»
39 CLIENT ACCESS 39 Clients can send read and write requests to any node This node will act as coordinator Coordinator forwards request to nodes where data resides Node 1 Token 0 Replica 1 of row «alice» Client Node 4 Token 192 Node 2 Token 64 Request: INSERT INTO users(username, ) VALUES ('alice', '[email protected]') Node 3 Token 128 Replica 2 of row «alice»
40 40 CONSISTENCY LEVELS Cassandra offers tunable consistency For all requests, clients can set a consistency level (CL) For writes: CL defines how many replicas must be written before «success» is returned to client For reads: CL defines how many replicas must respond before a result is returned to client Consistency levels: ONE QUORUM ALL (data center-aware levels)
41 41 INCONSISTENT DATA Example scenario: Replication factor 2 Two existing replica for row «foobar» Client overwrites existing data in «foobar» Replica 2 is down What happens: Cells are updated in replica 1, but not replica 2 (even with CL=ALL!) Timestamps to the rescue Every cell has a timestamp Timestamps are supplied by clients Upon read, the cell with the latest timestamp wins Use NTP
42 42 PREVENTING INCONSISTENCIES Read repair Hinted handoff Anti entropy
43 43 EXPIRING DATA Data will be deleted automatically after a given amount of time INSERT INTO users (username, , phone) VALUES ('alice', '[email protected]', ' ') USING TTL 86400;
44 44 DISTRIBUTED COUNTERS Useful for analytics applications Atomic increment operation UPDATE counters SET access = access + 1 WHERE url = '
45 PRODUCTION EXPERIENCE: CLUSTER AT SCANDIT 45 We ve had Cassandra in production use for almost 4 years Nodes in three data centers Linux machines Identical setup on every node Allows for easy failover
46 46 PRODUCTION EXPERIENCE Mature, no stability issues Very fast Language bindings don t always have the same quality Sometimes out of sync with server, buggy Data model is a mental twist Design-time decisions sometimes hard to change No support for geospatial data
47 TRYING OUT CASSANDRA 48 Set up a single-node cluster Install binary: Debian, Ubuntu, RHEL, CentOS packages Windows 7 MSI installer Mac OS X (tarball) Amazon Machine Image
48 DOCUMENTATION 49 DataStax website Company founded by Cassandra developers Apache website Mailing lists
49 THANK YOU! Questions? (By the way, we re hiring )
HDB++: HIGH AVAILABILITY WITH. l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
HDB++: HIGH AVAILABILITY WITH Page 1 OVERVIEW What is Cassandra (C*)? Who is using C*? CQL C* architecture Request Coordination Consistency Monitoring tool HDB++ Page 2 OVERVIEW What is Cassandra (C*)?
Introduction to Cassandra
Introduction to Cassandra DuyHai DOAN, Technical Advocate Agenda! Architecture cluster replication Data model last write win (LWW), CQL basics (CRUD, DDL, collections, clustering column) lightweight transactions
Simba Apache Cassandra ODBC Driver
Simba Apache Cassandra ODBC Driver with SQL Connector 2.2.0 Released 2015-11-13 These release notes provide details of enhancements, features, and known issues in Simba Apache Cassandra ODBC Driver with
Distributed Systems. Tutorial 12 Cassandra
Distributed Systems Tutorial 12 Cassandra written by Alex Libov Based on FOSDEM 2010 presentation winter semester, 2013-2014 Cassandra In Greek mythology, Cassandra had the power of prophecy and the curse
Highly available, scalable and secure data with Cassandra and DataStax Enterprise. GOTO Berlin 27 th February 2014
Highly available, scalable and secure data with Cassandra and DataStax Enterprise GOTO Berlin 27 th February 2014 About Us Steve van den Berg Johnny Miller Solutions Architect Regional Director Western
Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation
Facebook: Cassandra Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/24 Outline 1 2 3 Smruti R. Sarangi Leader Election
Real-Time Big Data in practice with Cassandra. Michaël Figuière @mfiguiere
Real-Time Big Data in practice with Cassandra Michaël Figuière @mfiguiere Speaker Michaël Figuière @mfiguiere 2 Ring Architecture Cassandra 3 Ring Architecture Replica Replica Replica 4 Linear Scalability
LARGE-SCALE DATA STORAGE APPLICATIONS
BENCHMARKING AVAILABILITY AND FAILOVER PERFORMANCE OF LARGE-SCALE DATA STORAGE APPLICATIONS Wei Sun and Alexander Pokluda December 2, 2013 Outline Goal and Motivation Overview of Cassandra and Voldemort
A programming model in Cloud: MapReduce
A programming model in Cloud: MapReduce Programming model and implementation developed by Google for processing large data sets Users specify a map function to generate a set of intermediate key/value
Practical Cassandra. Vitalii Tymchyshyn [email protected] @tivv00
Practical Cassandra NoSQL key-value vs RDBMS why and when Cassandra architecture Cassandra data model Life without joins or HDD space is cheap today Hardware requirements & deployment hints Vitalii Tymchyshyn
Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1
Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots
Apache Cassandra Present and Future. Jonathan Ellis
Apache Cassandra Present and Future Jonathan Ellis History Bigtable, 2006 Dynamo, 2007 OSS, 2008 Incubator, 2009 TLP, 2010 1.0, October 2011 Why people choose Cassandra Multi-master, multi-dc Linearly
Amazon Redshift & Amazon DynamoDB Michael Hanisch, Amazon Web Services Erez Hadas-Sonnenschein, clipkit GmbH Witali Stohler, clipkit GmbH 2014-05-15
Amazon Redshift & Amazon DynamoDB Michael Hanisch, Amazon Web Services Erez Hadas-Sonnenschein, clipkit GmbH Witali Stohler, clipkit GmbH 2014-05-15 2014 Amazon.com, Inc. and its affiliates. All rights
An Approach to Implement Map Reduce with NoSQL Databases
www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh
How To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 [email protected] www.scch.at Michael Zwick DI
Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk
Benchmarking Couchbase Server for Interactive Applications By Alexey Diomin and Kirill Grigorchuk Contents 1. Introduction... 3 2. A brief overview of Cassandra, MongoDB, and Couchbase... 3 3. Key criteria
CQL for Cassandra 2.2 & later
CQL for Cassandra 2.2 & later Documentation January 21, 2016 Apache, Apache Cassandra, Apache Hadoop, Hadoop and the eye logo are trademarks of the Apache Software Foundation 2016 DataStax, Inc. All rights
No-SQL Databases for High Volume Data
Target Conference 2014 No-SQL Databases for High Volume Data Edward Wijnen 3 November 2014 The New Connected World Needs a Revolutionary New DBMS Today The Internet of Things 1990 s Mobile 1970 s Mainfram
NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015
NoSQL Databases Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015 Database Landscape Source: H. Lim, Y. Han, and S. Babu, How to Fit when No One Size Fits., in CIDR,
Time series IoT data ingestion into Cassandra using Kaa
Time series IoT data ingestion into Cassandra using Kaa Andrew Shvayka [email protected] Agenda Data ingestion challenges Why Kaa? Why Cassandra? Reference architecture overview Hands-on Sandbox
Introduction to Apache Cassandra
Introduction to Apache Cassandra White Paper BY DATASTAX CORPORATION JULY 2013 1 Table of Contents Abstract 3 Introduction 3 Built by Necessity 3 The Architecture of Cassandra 4 Distributing and Replicating
Apache Cassandra 1.2 Documentation
Apache Cassandra 1.2 Documentation January 13, 2013 2013 DataStax. All rights reserved. Contents Apache Cassandra 1.2 Documentation 1 What's new in Apache Cassandra 1.2 1 Key Improvements 1 Concurrent
Cassandra A Decentralized, Structured Storage System
Cassandra A Decentralized, Structured Storage System Avinash Lakshman and Prashant Malik Facebook Published: April 2010, Volume 44, Issue 2 Communications of the ACM http://dl.acm.org/citation.cfm?id=1773922
Mark Bennett. Search and the Virtual Machine
Mark Bennett Search and the Virtual Machine Agenda Intro / Business Drivers What to do with Search + Virtual What Makes Search Fast (or Slow!) Virtual Platforms Test Results Trends / Wrap Up / Q & A Business
Evaluation of NoSQL databases for large-scale decentralized microblogging
Evaluation of NoSQL databases for large-scale decentralized microblogging Cassandra & Couchbase Alexandre Fonseca, Anh Thu Vu, Peter Grman Decentralized Systems - 2nd semester 2012/2013 Universitat Politècnica
Big Data Development CASSANDRA NoSQL Training - Workshop. March 13 to 17-2016 9 am to 5 pm HOTEL DUBAI GRAND DUBAI
Big Data Development CASSANDRA NoSQL Training - Workshop March 13 to 17-2016 9 am to 5 pm HOTEL DUBAI GRAND DUBAI ISIDUS TECH TEAM FZE PO Box 121109 Dubai UAE, email training-coordinator@isidusnet M: +97150
So What s the Big Deal?
So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data
Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world
Analytics March 2015 White paper Why NoSQL? Your database options in the new non-relational world 2 Why NoSQL? Contents 2 New types of apps are generating new types of data 2 A brief history of NoSQL 3
NoSQL: Going Beyond Structured Data and RDBMS
NoSQL: Going Beyond Structured Data and RDBMS Scenario Size of data >> disk or memory space on a single machine Store data across many machines Retrieve data from many machines Machine = Commodity machine
Hypertable Architecture Overview
WHITE PAPER - MARCH 2012 Hypertable Architecture Overview Hypertable is an open source, scalable NoSQL database modeled after Bigtable, Google s proprietary scalable database. It is written in C++ for
High Throughput Computing on P2P Networks. Carlos Pérez Miguel [email protected]
High Throughput Computing on P2P Networks Carlos Pérez Miguel [email protected] Overview High Throughput Computing Motivation All things distributed: Peer-to-peer Non structured overlays Structured
HBase Schema Design. NoSQL Ma4ers, Cologne, April 2013. Lars George Director EMEA Services
HBase Schema Design NoSQL Ma4ers, Cologne, April 2013 Lars George Director EMEA Services About Me Director EMEA Services @ Cloudera ConsulFng on Hadoop projects (everywhere) Apache Commi4er HBase and Whirr
these three NoSQL databases because I wanted to see a the two different sides of the CAP
Michael Sharp Big Data CS401r Lab 3 For this paper I decided to do research on MongoDB, Cassandra, and Dynamo. I chose these three NoSQL databases because I wanted to see a the two different sides of the
NoSQL Databases. Nikos Parlavantzas
!!!! NoSQL Databases Nikos Parlavantzas Lecture overview 2 Objective! Present the main concepts necessary for understanding NoSQL databases! Provide an overview of current NoSQL technologies Outline 3!
JVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra
JVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra January 2014 Legal Notices Apache Cassandra, Spark and Solr and their respective logos are trademarks or registered trademarks
Open Source DBMS CUBRID 2008 & Community Activities. Byung Joo Chung [email protected]
Open Source DBMS CUBRID 2008 & Community Activities Byung Joo Chung [email protected] Agenda Open Source DBMS CUBRID 2008 CUBRID Community Activities Open Source DBMS CUBRID 2008 Open Source DBMS CUBRID
Lecture 10: HBase! Claudia Hauff (Web Information Systems)! [email protected]
Big Data Processing, 2014/15 Lecture 10: HBase!! Claudia Hauff (Web Information Systems)! [email protected] 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind the
Enabling SOX Compliance on DataStax Enterprise
Enabling SOX Compliance on DataStax Enterprise Table of Contents Table of Contents... 2 Introduction... 3 SOX Compliance and Requirements... 3 Who Must Comply with SOX?... 3 SOX Goals and Objectives...
Windows Azure Storage Essential Cloud Storage Services http://www.azureusergroup.com
Windows Azure Storage Essential Cloud Storage Services http://www.azureusergroup.com David Pallmann, Neudesic Windows Azure Windows Azure is the foundation of Microsoft s Cloud Platform It is an Operating
NoSQL and Hadoop Technologies On Oracle Cloud
NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath
Going Native With Apache Cassandra. QCon London, 2014 www.datastax.com @DataStaxEMEA
Going Native With Apache Cassandra QCon London, 2014 www.datastax.com @DataStaxEMEA About Me Johnny Miller Solutions Architect www.datastax.com @DataStaxEU [email protected] @CyanMiller https://www.linkedin.com/in/johnnymiller
Remote Application Server Version 14. Last updated: 06-02-15
Remote Application Server Version 14 Last updated: 06-02-15 Information in this document is subject to change without notice. Companies, names, and data used in examples herein are fictitious unless otherwise
<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store
Oracle NoSQL Database A Distributed Key-Value Store Charles Lamb, Consulting MTS The following is intended to outline our general product direction. It is intended for information
The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL [email protected] / @marcua
The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg Adam Marcus MIT CSAIL [email protected] / @marcua About Me Social Computing + Database Systems Easily Distracted: Wrote The NoSQL Ecosystem in
Realtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens
Realtime Apache Hadoop at Facebook Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens Agenda 1 Why Apache Hadoop and HBase? 2 Quick Introduction to Apache HBase 3 Applications of HBase at
Cassandra vs MySQL. SQL vs NoSQL database comparison
Cassandra vs MySQL SQL vs NoSQL database comparison 19 th of November, 2015 Maxim Zakharenkov Maxim Zakharenkov Riga, Latvia Java Developer/Architect Company Goals Explore some differences of SQL and NoSQL
Introduction to Hbase Gkavresis Giorgos 1470
Introduction to Hbase Gkavresis Giorgos 1470 Agenda What is Hbase Installation About RDBMS Overview of Hbase Why Hbase instead of RDBMS Architecture of Hbase Hbase interface Summarise What is Hbase Hbase
Apache Cassandra Query Language (CQL)
REFERENCE GUIDE - P.1 ALTER KEYSPACE ALTER TABLE ALTER TYPE ALTER USER ALTER ( KEYSPACE SCHEMA ) keyspace_name WITH REPLICATION = map ( WITH DURABLE_WRITES = ( true false )) AND ( DURABLE_WRITES = ( true
Big Data with Component Based Software
Big Data with Component Based Software Who am I Erik who? Erik Forsberg Linköping University, 1998-2003. Computer Science programme + lot's of time at Lysator ACS At Opera Software
Open Source Technologies on Microsoft Azure
Open Source Technologies on Microsoft Azure A Survey @DChappellAssoc Copyright 2014 Chappell & Associates The Main Idea i Open source technologies are a fundamental part of Microsoft Azure The Big Questions
NoSQL Database Options
NoSQL Database Options Introduction For this report, I chose to look at MongoDB, Cassandra, and Riak. I chose MongoDB because it is quite commonly used in the industry. I chose Cassandra because it has
Keep SQL Service Running On Replica Member While Replicating Data In Realtime
Page 1 of 7 Keep SQL Service Running On Replica Member While Replicating Data In Realtime ClusterReplica Enterprise resolves the issue by redirect the data in real-time data replication to a temporary
Joining Cassandra. Luiz Fernando M. Schlindwein Computer Science Department University of Crete Heraklion, Greece [email protected].
Luiz Fernando M. Schlindwein Computer Science Department University of Crete Heraklion, Greece [email protected] Joining Cassandra Binjiang Tao Computer Science Department University of Crete Heraklion,
F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar ([email protected]) 15-799 10/21/2013
F1: A Distributed SQL Database That Scales Presentation by: Alex Degtiar ([email protected]) 15-799 10/21/2013 What is F1? Distributed relational database Built to replace sharded MySQL back-end of AdWords
extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010
System/ Scale to Primary Secondary Joins/ Integrity Language/ Data Year Paper 1000s Index Indexes Transactions Analytics Constraints Views Algebra model my label 1971 RDBMS O tables sql-like 2003 memcached
Integrating Big Data into the Computing Curricula
Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big
Case Study: Real-time Analytics With Druid. Salil Kalia, Tech Lead, TO THE NEW Digital
Case Study: Real-time Analytics With Druid Salil Kalia, Tech Lead, TO THE NEW Digital Agenda Understanding the use-case Ad workflow Our use case Experiments with technologies Redis Cassandra Introduction
Database Replication with MySQL and PostgreSQL
Database Replication with MySQL and PostgreSQL Fabian Mauchle Software and Systems University of Applied Sciences Rapperswil, Switzerland www.hsr.ch/mse Abstract Databases are used very often in business
How To Use Big Data For Telco (For A Telco)
ON-LINE VIDEO ANALYTICS EMBRACING BIG DATA David Vanderfeesten, Bell Labs Belgium ANNO 2012 YOUR DATA IS MONEY BIG MONEY! Your click stream, your activity stream, your electricity consumption, your call
Evaluator s Guide. McKnight. Consulting Group. McKnight Consulting Group
NoSQL Evaluator s Guide McKnight Consulting Group William McKnight is the former IT VP of a Fortune 50 company and the author of Information Management: Strategies for Gaining a Competitive Advantage with
Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.
Architectures Cluster Computing Job Parallelism Request Parallelism 2 2010 VMware Inc. All rights reserved Replication Stateless vs. Stateful! Fault tolerance High availability despite failures If one
Data Management in the Cloud
Data Management in the Cloud Ryan Stern [email protected] : Advanced Topics in Distributed Systems Department of Computer Science Colorado State University Outline Today Microsoft Cloud SQL Server
NoSQL Data Base Basics
NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS
Weaving Stored Procedures into Java at Zalando
Weaving Stored Procedures into Java at Zalando Jan Mussler JUG DO April 2013 Outline Introduction Stored procedure wrapper Problems before the wrapper How it works How to use it More features including
May 6, 2013. DataStax Cassandra South Bay Meetup. Cassandra Modeling. Best Practices and Examples. Jay Patel Architect, Platform Systems @pateljay3001
May 6, 2013 DataStax Cassandra South Bay Meetup Cassandra Modeling Best Practices and Examples Jay Patel Architect, Platform Systems @pateljay3001 That s me Technical Architect @ ebay Passion for building
VMware Identity Manager Connector Installation and Configuration
VMware Identity Manager Connector Installation and Configuration VMware Identity Manager This document supports the version of each product listed and supports all subsequent versions until the document
Remote Application Server Version 14. Last updated: 25-02-15
Remote Application Server Version 14 Last updated: 25-02-15 Information in this document is subject to change without notice. Companies, names, and data used in examples herein are fictitious unless otherwise
CASE STUDY: Oracle TimesTen In-Memory Database and Shared Disk HA Implementation at Instance level. -ORACLE TIMESTEN 11gR1
CASE STUDY: Oracle TimesTen In-Memory Database and Shared Disk HA Implementation at Instance level -ORACLE TIMESTEN 11gR1 CASE STUDY Oracle TimesTen In-Memory Database and Shared Disk HA Implementation
A Distributed Network Security Analysis System Based on Apache Hadoop-Related Technologies. Jeff Springer, Mehmet Gunes, George Bebis
A Distributed Network Security Analysis System Based on Apache Hadoop-Related Technologies Bingdong Li, Jeff Springer, Mehmet Gunes, George Bebis University of Nevada Reno FloCon 2013 January 7-10, Albuquerque,
Inge Os Sales Consulting Manager Oracle Norway
Inge Os Sales Consulting Manager Oracle Norway Agenda Oracle Fusion Middelware Oracle Database 11GR2 Oracle Database Machine Oracle & Sun Agenda Oracle Fusion Middelware Oracle Database 11GR2 Oracle Database
Case study: CASSANDRA
Case study: CASSANDRA Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu Cassandra:
The last 18 months. AutoScale. IaaS. BizTalk Services Hyper-V Disaster Recovery Support. Multi-Factor Auth. Hyper-V Recovery.
Offline Operations Traffic ManagerLarge Memory SKU SQL, SharePoint, BizTalk Images HDInsight Windows Phone Support Per Minute Billing HTML 5/CORS Android Support Custom Mobile API AutoScale BizTalk Services
Contents. SnapComms Data Protection Recommendations
Contents Abstract... 2 SnapComms Solution Environment... 2 Concepts... 3 What to Protect... 3 Database Failure Scenarios... 3 Physical Infrastructure Failures... 3 Logical Data Failures... 3 Service Recovery
Apache Cassandra 1.2
Apache Cassandra 1.2 Documentation January 21, 2016 Apache, Apache Cassandra, Apache Hadoop, Hadoop and the eye logo are trademarks of the Apache Software Foundation 2016 DataStax, Inc. All rights reserved.
MariaDB Cassandra interoperability
MariaDB Cassandra interoperability Cassandra Storage Engine in MariaDB Sergei Petrunia Colin Charles Who are we Sergei Petrunia Principal developer of CassandraSE, optimizer developer, formerly from MySQL
MySQL High Availability Solutions. Lenz Grimmer <[email protected]> http://lenzg.net/ 2009-08-22 OpenSQL Camp St. Augustin Germany
MySQL High Availability Solutions Lenz Grimmer < http://lenzg.net/ 2009-08-22 OpenSQL Camp St. Augustin Germany Agenda High Availability: Concepts & Considerations MySQL Replication
Tushar Joshi Turtle Networks Ltd
MySQL Database for High Availability Web Applications Tushar Joshi Turtle Networks Ltd www.turtle.net Overview What is High Availability? Web/Network Architecture Applications MySQL Replication MySQL Clustering
INTRODUCTION TO CASSANDRA
INTRODUCTION TO CASSANDRA This ebook provides a high level overview of Cassandra and describes some of its key strengths and applications. WHAT IS CASSANDRA? Apache Cassandra is a high performance, open
Axway API Gateway. Version 7.4.1
K E Y P R O P E R T Y S T O R E U S E R G U I D E Axway API Gateway Version 7.4.1 26 January 2016 Copyright 2016 Axway All rights reserved. This documentation describes the following Axway software: Axway
Big Data and Scripting Systems build on top of Hadoop
Big Data and Scripting Systems build on top of Hadoop 1, 2, Pig/Latin high-level map reduce programming platform interactive execution of map reduce jobs Pig is the name of the system Pig Latin is the
SIP-DECT Knowledge Base SIP-DECT System Update
SIP-DECT Knowledge Base SIP-DECT System Update MAI 2015 DEPL-2046 VERSION 1.6 KNOWLEDGE BASE TABLE OF CONTENT 1) Introduction... 2 2) Update (New Service Pack in the same Release)... 3 2.1 OMM HOSTED ON
Physical Database Design Process. Physical Database Design Process. Major Inputs to Physical Database. Components of Physical Database Design
Physical Database Design Process Physical Database Design Process The last stage of the database design process. A process of mapping the logical database structure developed in previous stages into internal
In-Memory Databases MemSQL
IT4BI - Université Libre de Bruxelles In-Memory Databases MemSQL Gabby Nikolova Thao Ha Contents I. In-memory Databases...4 1. Concept:...4 2. Indexing:...4 a. b. c. d. AVL Tree:...4 B-Tree and B+ Tree:...5
Design and Evolution of the Apache Hadoop File System(HDFS)
Design and Evolution of the Apache Hadoop File System(HDFS) Dhruba Borthakur Engineer@Facebook Committer@Apache HDFS SDC, Sept 19 2011 Outline Introduction Yet another file-system, why? Goals of Hadoop
Referential Integrity in Cloud NoSQL Databases
Referential Integrity in Cloud NoSQL Databases by Harsha Raja A thesis submitted to the Victoria University of Wellington in partial fulfilment of the requirements for the degree of Master of Engineering
Understanding NoSQL Technologies on Windows Azure
David Chappell Understanding NoSQL Technologies on Windows Azure Sponsored by Microsoft Corporation Copyright 2013 Chappell & Associates Contents Data on Windows Azure: The Big Picture... 3 Windows Azure
Completing the Big Data Ecosystem:
Completing the Big Data Ecosystem: in sqrrl data INC. August 3, 2012 Design Drivers in Analysis of big data is central to our customers requirements, in which the strongest drivers are: Scalability: The
Cassandra A Decentralized Structured Storage System
Cassandra A Decentralized Structured Storage System Avinash Lakshman, Prashant Malik LADIS 2009 Anand Iyer CS 294-110, Fall 2015 Historic Context Early & mid 2000: Web applicaoons grow at tremendous rates
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next
Benchmarking Cassandra on Violin
Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract
