NoSQL Systems for Big Data Management



Similar documents
Structured Data Storage

NoSQL Databases. Nikos Parlavantzas

How To Handle Big Data With A Data Scientist

So What s the Big Deal?

Lecture Data Warehouse Systems

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

NoSQL Data Base Basics

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island

Applications for Big Data Analytics

Big Systems, Big Data

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

How To Scale Out Of A Nosql Database

Introduction to NOSQL

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

How To Write A Database Program

Can the Elephants Handle the NoSQL Onslaught?

BASHO DATA PLATFORM SIMPLIFIES BIG DATA, IOT, AND HYBRID CLOUD APPS

GigaSpaces Real-Time Analytics for Big Data

INTRODUCTION TO CASSANDRA

Cloud Scale Distributed Data Storage. Jürmo Mehine

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

NoSQL Database Options

Introduction to Apache Cassandra

Benchmarking and Analysis of NoSQL Technologies

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

Big Data Solutions. Portal Development with MongoDB and Liferay. Solutions

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

Big Data Technologies Compared June 2014

CitusDB Architecture for Real-Time Big Data

NoSQL. Thomas Neumann 1 / 22

Slave. Master. Research Scholar, Bharathiar University


Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper

Elastic Application Platform for Market Data Real-Time Analytics. for E-Commerce

How To Use Big Data For Telco (For A Telco)

Integrating Big Data into the Computing Curricula

NoSQL for SQL Professionals William McKnight

You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.

A survey of big data architectures for handling massive data

An Approach to Implement Map Reduce with NoSQL Databases

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

BIG DATA-AS-A-SERVICE

Big Data and Data Science: Behind the Buzz Words

Using Big Data for Smarter Decision Making. Colin White, BI Research July 2011 Sponsored by IBM

Challenges for Data Driven Systems

Comparing SQL and NOSQL databases

Understanding NoSQL on Microsoft Azure

Advanced Data Management Technologies

The Next Wave of Data Management. Is Big Data The New Normal?

BRAC. Investigating Cloud Data Storage UNIVERSITY SCHOOL OF ENGINEERING. SUPERVISOR: Dr. Mumit Khan DEPARTMENT OF COMPUTER SCIENCE AND ENGEENIRING

Evaluator s Guide. McKnight. Consulting Group. McKnight Consulting Group

Data Modeling for Big Data

CloudDB: A Data Store for all Sizes in the Cloud

The Quest for Extreme Scalability

In Memory Accelerator for MongoDB

bigdata Managing Scale in Ontological Systems

Scalable Architecture on Amazon AWS Cloud

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Actian SQL in Hadoop Buyer s Guide

No-SQL Databases for High Volume Data

nosql and Non Relational Databases

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Luncheon Webinar Series May 13, 2013

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

BIG DATA TECHNOLOGY. Hadoop Ecosystem

Understanding traffic flow

Enterprise Operational SQL on Hadoop Trafodion Overview

BIG DATA TRENDS AND TECHNOLOGIES

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?

Cloud & Big Data a perfect marriage? Patrick Valduriez

Yahoo! Cloud Serving Benchmark

Bigdata : Enabling the Semantic Web at Web Scale

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

NewSQL: Towards Next-Generation Scalable RDBMS for Online Transaction Processing (OLTP) for Big Data Management

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

Design and Evolution of the Apache Hadoop File System(HDFS)

BIG DATA TOOLS. Top 10 open source technologies for Big Data

Reference Architecture, Requirements, Gaps, Roles

Making Sense of NoSQL Dan McCreary Ann Kelly

Transcription:

NoSQL Systems for Big Data Management Venkat N Gudivada East Carolina University Greenville, North Carolina USA Venkat Gudivada NoSQL Systems for Big Data Management 1/28

Outline 1 An Overview of NoSQL Systems 2 Special Needs of Big Data Applications 3 Technical Enablers of NoSQL Systems 4 Data Models 5 Conclusions Venkat Gudivada NoSQL Systems for Big Data Management 2/28

RDBMS vs NoSQL RDBMS market revenue in 2011 was $26 billion - 2013 IDC Report RDBMS revenues predicted to reach $41 billion by 2016-2013 IDC Report Object-oriented DBMS (OODBMS) of 1980s Native XML databases of 1990s Special requirements of Web 2.0, Big Data, IoT, and mobile applications NoSQL products will generate $14 billion in revenues over the period 2013 2018 Market Research Media Worldwide NoSQL market is expected to reach $3.4 billion by 2018 Market Research Media Venkat Gudivada NoSQL Systems for Big Data Management 3/28

Database Requirements for Some Applications Flexible data models that evolve over time Near real-time query response times Concurrent users in the order of millions Terabyte scale data volumes spread across multiple data centers to improve locality High availability and elastic scaling without system downtime Simple data model but fast inserts and lookups are critical for some applications In others, updates are almost non-existent and are implemented as a delete followed by an insert RDBMS for such applications the proverbial square peg in a round hole Venkat Gudivada NoSQL Systems for Big Data Management 4/28

Explosive Growth of NoSQL Systems DB-Engines NoSQL Databases Breakthrough Analysis Data Blogger EMC Big Data IBM Big Data & Analytics Hub Forrester Big Data Blogs Top 8 Trends in Big Data for 2016 http://www.dataversity.net/ http://planetbigdata.com/ Venkat Gudivada NoSQL Systems for Big Data Management 5/28

NoSQL Systems Performance at scale Do not provide all of the RDBMS features Principally focus on providing near real-time reads and writes for applications with millions of concurrent users Terminology NoSQL, NewSQL, Not only SQL, and non-rdbms Renaissance in data management - OO DBMS and Native XML Consistency, availability, and partition tolerance (CAP) Impossible for any distributed system to achieve all these three features simultaneously View CAPs as spanning a spectrum rather than as binary values Venkat Gudivada NoSQL Systems for Big Data Management 6/28

SQL, NoSQL and NewSQL Venkat Gudivada NoSQL Systems for Big Data Management 7/28

SQL, NoSQL and NewSQL Venkat Gudivada NoSQL Systems for Big Data Management 8/28

NoSQL Systems Bigtable, Hypertable, HBase, MongoDB, and Redis strive for consistency and partition tolerance DynamoDB, CouchDB, Cassandra, SimpleDB, Voldemort, and Riak emphasize availability and partition tolerance RDBMS provide consistency and availability, and not partition tolerance Some RDBMS vendors are introducing new features into their products to mitigate the potential risk posed by NoSQL Rivalry between the RDBMS and NoSQL vendors Exaggeration of features as well as confusion in the current DBMS product space Venkat Gudivada NoSQL Systems for Big Data Management 9/28

Web 2.0 Applications Web 2.0 new generation of systems that allow users to interact and collaborate with each other and contribute content Social media applications, blogs, wikis, folksonomies, image and video sharing sites, and crowdsourcing MySpace, Twitter, Facebook, YouTube, and Flickr Critically depend on integrated and real-time data from diverse sources Preponderance of insert and retrieval operations on a very large scale Fraud and anomaly detection, dynamic pricing, real-time order status through supply chain visibility, superior customer service, real-time predictive analytics, user experience personalization, and Web server access log analysis Venkat Gudivada NoSQL Systems for Big Data Management 10/28

Big Data Data that is too big and complex to capture, store, process, analyze, and interpret using the state-of-the-art tools and methods Five Vs: volume, velocity, variety, veracity, and value Data volume is in the order of terabytes (2 40 ) or even petabytes (2 50 ), and is rapidly heading towards exabytes (2 60 ) Stream data processing Data heterogeneity Secure data provenance Predictive models to answer what-if queries, discovering counterintuitive insights and actionable intelligence Venkat Gudivada NoSQL Systems for Big Data Management 11/28

Big Data Large Hadron Collider experiments 50 million sensors capturing data about nearly 600 million collisions per second 2013 Nobel Prize in Chemistry measuring and visualizing the behavior of 50,000 or more atoms in a reaction over the course of a fraction of a millisecond Square Kilometer Array telescope goal is to answer fundamental questions about the origin and evolution of the Universe Venkat Gudivada NoSQL Systems for Big Data Management 12/28

Mobile Computing and Location-based Services Location-based services have become an important channel for marketing, sales, and brand development Location-based services leverage users geographic location information and context in providing relevant and timely services to users Mobile devices impose additional requirements such as disconnected operation and delayed synchronization on location-based services Applications include locating a nearby business or service (e.g., retail store, ATM, restaurant), receiving alerts about traffic jams and severe weather, mobile advertising, and recommending social events in a city Facebook Places, Foursquare, Gowalla, and Where Venkat Gudivada NoSQL Systems for Big Data Management 13/28

Functional Inadequacy of RDBMS Accommodate increased workload and data volume by using faster CPUs, more memory, and bigger and faster disks vertical scaling Oracle Real Application Clusters (RAC) Instead, horizontal scaling is needed for economic reasons Sparse data and partial record updates are the norm in some Big Data applications Complete database schema does not exist up front and evolves over ACID-compliant transactions are an overkill as they require only relaxed consistency Multiple query methods ranging from simple REST API to complex and ad hoc queries are required Horizontal scaling using processors built from commodity hardware Venkat Gudivada NoSQL Systems for Big Data Management 14/28

Functional Inadequacy of RDBMS Interactive processing of ad hoc queries mandates massive parallel computation using schemes such as MapReduce Non-overlapping data partitioning (aka sharding) is necessary to address the data volumes Auto-sharding is highly desirable Simple data replication is mandatory to ensure high availability Built-in support for versioning and compression is needed Venkat Gudivada NoSQL Systems for Big Data Management 15/28

NoSQL Systems NoSQL systems significantly vary in functionality from each other Riak is highly scalable and available MongoDB s defining characteristic is managing deeply nested structured documents and computing aggregates on the documents Neo4j excels at managing data that is rich in relationships Redis is known for its data structures and elegant query mechanisms HBase offers extreme scalability, reliability, and flexibility, however, these features come with considerable complexity Venkat Gudivada NoSQL Systems for Big Data Management 16/28

NoSQL Systems Use distributed computing architectures Feature an assortment of data models query languages and data consistency models Each class of systems targets a category of applications System architectures are optimized for achieving the primary goal of providing performance at scale Command line queries moving towards declarative query languages Venkat Gudivada NoSQL Systems for Big Data Management 17/28

A NoSQL Cluster Venkat Gudivada NoSQL Systems for Big Data Management 18/28

Shared-nothing Architecture Venkat Gudivada NoSQL Systems for Big Data Management 19/28

NoSQL Architectures Elastic scalability through Database as a Service (DBaaS) Amazon s Simple Storage Service (S3) and DynamoDB are two examples of DBaaS Sharding and replication are orthogonal concepts contribute to high availability ACID vs. BASE Basic availability support for disconnected client operation and delayed synchronization, and tolerance to temporary inconsistency and its implications Data consistency spanning a spectrum from strict to eventual Soft state refers to state change without input, which is required for eventual consistency Venkat Gudivada NoSQL Systems for Big Data Management 20/28

NoSQL Architectures Applications can benefit from choosing a right level of data consistency Choose a level closer to strong consistency for writes but a weaker consistency for reads write operations will take longer to complete without contributing to read operation latency Type of architecture, data partitioning and replication methods determine the degree of difficulty involved in implementing data consistency models Some NoSQL systems provide options for clients to choose a desired data consistency level tunable consistency Venkat Gudivada NoSQL Systems for Big Data Management 21/28

NoSQL Architectures Cassandra provides options for choosing consistency levels for reads and writes separately Write consistency level specifies on how many replicas the write must succeed before returning an acknowledgment to the client application that the write is successful Read consistency specifies how many replicas must respond before a result is returned to the client application Venkat Gudivada NoSQL Systems for Big Data Management 22/28

Ad hoc and Compute-intensive Queries Venkat Gudivada NoSQL Systems for Big Data Management 23/28

Data Models Key-value optimized for fast inserts and retrievals; updates are almost non-existent and are implemented as a delete followed by an insert Column-family optimized for storing extremely large and sparse data with efficient support for primary key-based retrieval and partial record updates Document-oriented NoSQL systems JSON-style document data model Graph model for relationships-rich data RDF triples XML data models Venkat Gudivada NoSQL Systems for Big Data Management 24/28

Conclusions Data consistency, high availability, and partition tolerance are the three primary concerns that determine which data management system is suitable for a given application Scalability in NoSQL systems comes at the cost of transactional guarantees, application maintained relationships, and simpler data models Trade-off between scaling and ad hoc queries massive scaling entails simple queries; ad hoc queries typically require programming and parallel execution Venkat Gudivada NoSQL Systems for Big Data Management 25/28

Conclusions Many NoSQL system hide the complexity of sharding from the application developer by providing built-in algorithms for auto-sharding; however, there is an implicit assumption that shards are independent and database transactions are confined to a single shard If a transaction spans multiple shards, some systems simply reject such transactions If multi-shard transactions are prevalent, suitability of NoSQL systems needs careful analysis It is quite difficult to determine which system is the right match This problem is further exacerbated by rapidly evolving features Venkat Gudivada NoSQL Systems for Big Data Management 26/28

Conclusions Type of data to be managed structured, semi-structured, unstructured, or a mix of these types? Type of query language desired - declarative, procedural, or both? Are ad hoc queries so prevalent that a native MapReduce framework is required? Are the users geographically distributed to warrant data partitioning? Is an algorithmic approach needed for data partitioning with automated redistribution to avoid hotspots or performance bottlenecks? Are the read or write throughput levels so high for a conventional RDBMS to handle? Venkat Gudivada NoSQL Systems for Big Data Management 27/28

Conclusions What is the minimal level of consistency essential for reads and writes? Is automated failover with no service interruption essential? Does the application load fluctuate unpredictably to provision resources dynamically without service interruption? What type and level of user authentication and authorization are required? What type of replication is needed? Is versioning of data required? Is cloud hosting a required deployment option? Venkat Gudivada NoSQL Systems for Big Data Management 28/28