1 SQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab
2 Outline A brief history of DBMSs. OSs SQL NoSQL 1960/
3 Before Computers Database DBMS/Data Store
4 Digital Era Database File System/ Data Store
5 Before DBMSs: 1960/70s Developer 1 Application programs Data Developer 2 Application programs Data
6 After DBMSs Developer 1 Application programs DBMS Application programs Developer 2 Physical Data Independence. SQL as a what -oriented language.
7 SQL Data Stores Manage records/tuples A record/tuple is a row in a table where attribute names are pre-defined in a schema. Alternative physical designs: Column-store versus Row-store. Transactions with ACID properties
9 SQL IS OVERHYPED
10 Why? Marketing campaigns have become too exaggerated! Relational vendors claim RDBMS is the answer to all data management needs. What are some counter examples? Seltzer. Beyond Relational Databases. Communications of the ACM, July 2008.
11 Web Search Semi-structured data HTML pages instead of raw data. Queries are keyword lookups and the desired response is a sorted list of possible answers. Need for efficient inverted indices. Bulk updates, read mostly. Need for nontraditional indexing.
12 Directory Services International organizations with distributed resources and personnel. Requirement: fast lookup of entities arranged in a hierarchical structure that corresponds to a hierarchy of the organization. LDAP standard. Core of identification and authentication system from a number of vendors, e.g., IBM Tivoli, Microsoft Active Directory Server, SUN ONE Directory Server. Bulk updates similar to data warehousing. Multi-valued attributes. Queries are single-row retrieval or lookups based on attribute values.
13 Other Examples Mobile device caching Your cell phone s directory as a transient cache of a global directory. Stream management Real-time filtering of streams for interesting patterns. Example: identify hotly traded stock, or a stock that is not traded as heavily as expected. Filters look like SQL selection predicates, causing developers to mistake a RDBMS as the right choice. XML management
14 Summary Relational DBMS have been designed for transaction processing and workloads consisting of ad hoc queries and significant amount of updates. 25 years ago, One market for DBMS: Business data processing. This has changed to include different applications with different requirements. Example applications are read-dominated: No need for transactional guarantees. SQL is the wrong choice for stream processing. One software architecture will not support the diverse needs of these applications. Possible solutions: 1) each application re-builds its own storage manager from scratch, 2) provide a flexible solution that can be tailored to the needs of a particular application.
15 Past 25 Years Two trends: 1. Bloated systems. Need for a specialist, a trained DBA, to keep a system and its applications running. 2. Few applications need all the features available in today s RDBMSs. The application must pay for all the features even though it requires a small subset.
16 NOSQL DATA STORES
17 NoSQL Data Stores Scale horizontally for simple operations using many servers. Replicate and distribute (partition) data across many servers. Provide a simple call level interface or protocol. A weaker concurrency model than ACID: Basically Available, Soft state, Eventually consistent (BASE). Efficient use of distributed indexes and DRAM for data storage. Ability to dynamically add new attributes to data records. Cattell. Scalable SQL and NoSQL Data Stores. SIGMOD Record 39(4), Ghandeharizadeh, Boghrati, and Barahmand. An Evaluation of Graph Data Models. TPCTC 2014.
18 NoSQL Data Model A key-value store: A distributed hash table, A key/value may be an arbitrary sequence of bytes, E.g., memcached, Voldemort, Riak, Redis, Tokyo Cabinet, Membase, Membrain. A document store: A value may be a scalar, lists, nested documents, Attribute names might be dynamically defined at runtime, E.g., SimpleDB, CouchDB, MongoDB, Terrastore. An Extensible record store: A hybrid between a SQL store and a document store, Families of attributes are defined in a schema and new attributes can be added, Attributes may be list-valued, E.g., BigTable, HBase, HyperTable, Cassandra, PNUTs.
19 MIDDLEWARE: CACHE AUGMENTED DATA STORES
20 Simple Operations Operations that read and write a small amount of data. Challenge: High volume of requests with a low latency requirement. Person-to-person service providers in 1 Minute: 100M queries 7K user visits 147K page views 347K Tweets Facebook, Google, Twitter, https://about.twitter.com/company Wikipedia,
21 How? Look up query result instead of query processing. Ideal for applications with workloads that exhibit a high read to write ratio. Key-value store as the cache manager. Query result caching: Key: query string, Value: result set Trillions of cached key-value pairs.
22 Cache Augmented DBMSs 1. Value = Get (Key) 2. If Value is found, go to Step SQL queries 4. Query results Application constructs Value using the results 5. Put(Key, Value) 6. Use Value to generate HTML result page 4 RDBMS Server Cache Server (KVS, e.g., memcached)
23 CADBMS: Update 1. SQL DML Command: Insert, Delete, Update 2. Invalidate keyvalue pairs: Delete 1 2 Alternatives to invalidate include Refill/Refresh and incremental update RDBMS Server Cache Server (KVS, e.g., memcached)
24 CADBMS Today Developer 1 Stale Application programs In-memory Copy of Data memcached Cache Server Developer 2 Application programs Persistent Data Data Store
25 Future CADBMSs Developer 1 Application programs Key Value Cache Server Application programs CADBMS Developer 2 Physical Data Independence. A what -oriented language. Data Store
26 KOSAR Developer 1 Application programs Key Value Cache Server Application programs KOSAR Developer 2 RDBMS Physical Data Independence. SQL as a what -oriented language. Ghandeharizadeh et. al. A Demonstration of KOSAR. Middleware 2014.
27 Architecture A database driven application: Application Data Store Client Data Store Server
28 Architecture: Example An RDBMS driven application authored using Java: Application JDBC SQL Result Set MySQL Server
29 KOSAR: Transparent Caching Simply replace the client component of your application with KOSAR and see it run much faster. Application Data Store Client Data Store Server Ghandeharizadeh, Yap, and Nguyen. Strong Consistency in Cache Augmented SQL Systems. Middleware Ghandeharizadeh, Irani, Lam, Yap. CAMP: A Multi-Queue Eviction Policy for Key-Value Stores. Middleware 2014.
30 How? 1. Lookup query result instead of query processing. Application Data Store Client Data Store Server memcached Servers Ideal for workloads that exhibit a high read to write ratio.
31 Client-Server Architecture SoAR (Actions/Second) CADBMS CADBMS SQL-X SQL-X 0.1% Write 10% Write SLA: 95% of actions to observe a response time faster than 100 msec. Barahmand and Ghandeharizadeh. BG: A Social Networking Benchmark. CIDR Barahmand and Ghandeharizadeh. Expedited Benchmarking of Social Network Actions. CIKM 2013.
32 BG Benchmark, BG is a macro benchmark for interactive social networking actions. BG quantifies the Social Action Rating (SoAR) of a data store: For a given workload, the maximum number of simultaneous actions performed by a data store while satisfying a pre-specified SLA. Ph.D. Fellowship Barahmand and Ghandeharizadeh. BG: A Social Networking Benchmark. CIDR Barahmand and Ghandeharizadeh. D-Zipfian: A Decentralized Implementation of Zipfian. SIGMOD DBTest Barahmand and Ghandeharizadeh. Expedited Benchmarking of Social Network Actions. CIKM Alabdulkarim, Barahmand and Ghandeharizadeh. A Scalable Benchmark for Interactive Social Networking Actions.
33 Client-Server Architecture SoAR (Actions/Second) CADBMS CADBMS SQL-X SQL-X 0.1% Write 10% Write SLA: 95% of actions to observe a response time faster than 100 msec.
34 Shared Address Space 1. Avoid overhead of serialization and network communication Application Data Store Client Data Store Server
35 Shared Address Space SoAR (Actions/Second) CADBMS CADBMS SQL-X 0.1% Write SQL-X 10% Write SLA: 95% of actions to observe a response time faster than 100 msec.
36 Shared Address Space SoAR (Actions/Second) CADBMS CADBMS SQL-X 0.1% Write SQL-X 10% Write SLA: 95% of actions to observe a response time faster than 100 msec.
37 Why? 1. CPU overhead of query processing is more than 85% [1, 2]. Application Data Store Client Data Store Server Cache Servers Harizopoulos et. al. OLTP: Through the Looking Glass and What We Found There. SIGMOD Stonebraker and Cattell. 10 Rules for Scalable Performance in Simple Operation Datastores. CACM 2011.
38 Architectures Client-Server, Shared-Address Space, and Hybrids. Client-Server Shared-Address Space Ghandeharizadeh, and Yap. Cache Augmented Data Stores. SIGMOD DBSocial 2013.
39 NON VOLATILE MEMORY
40 Non Volatile Memory Flash CPU CPU DRAM HDD NVM Flash CPU DRAM HDD Flash CPU DRAM HDD Traditional DRAM (late 2016)
41 Non-Volatile Memory Byte-addressable Time to rewrite the key-value stores & database engine! Configurable: DRAM CPU CPU Emulated Flash Emulated HDD Emulated DRAM Emulated Flash Emulated HDD NVM Time to re-design algorithms NVM
42 Digital Era Database File System/ Data Store
43 Future (Biological) Computers Database DBMS/Data Store
System/ Scale to Primary Secondary Joins/ Integrity Language/ Data Year Paper 1000s Index Indexes Transactions Analytics Constraints Views Algebra model my label 1971 RDBMS O tables sql-like 2003 memcached
NewSQL Andy Pavlo February 6, 2012 Outline The Last Decade of Databases NewSQL Introduction H-Store Early-2000s All the big players were heavyweight and expensive. Oracle, DB2, Sybase, SQL Server, etc.
Structured Data Storage Xgen Congress Short Course 2010 Adam Kraut BioTeam Inc. Independent Consulting Shop: Vendor/technology agnostic Staffed by: Scientists forced to learn High Performance IT to conduct
INTRODUCTION xvii PART I: GETTING STARTED CHAPTER 1: NOSQL: WHAT IT IS AND WHY YOU NEED IT 3 Definition and Introduction 4 Context and a Bit of History 4 Big Data 7 Scalability 9 Defi nition and Introduction
NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS
Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores
Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores Composite Software October 2010 TABLE OF CONTENTS INTRODUCTION... 3 BUSINESS AND IT DRIVERS... 4 NOSQL DATA STORES LANDSCAPE...
BG: A Benchmark to Evaluate Interactive Social Networking Actions Sumita Barahmand, Shahram Ghandeharizadeh Database Laboratory Technical Report 2012-06 Computer Science Department, USC Los Angeles, California
Table of Contents Développement logiciel pour le Cloud (TLC) 5. NoSQL data models Guillaume Pierre Université de Rennes 1 Fall 2012 http://www.globule.org/~gpierre/ Développement logiciel pour le Cloud
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2
Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots
NoSQL Databases Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015 Database Landscape Source: H. Lim, Y. Han, and S. Babu, How to Fit when No One Size Fits., in CIDR,
NoSQL Systems for Big Data Management Venkat N Gudivada East Carolina University Greenville, North Carolina USA Venkat Gudivada NoSQL Systems for Big Data Management 1/28 Outline 1 An Overview of NoSQL
A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA Ompal Singh Assistant Professor, Computer Science & Engineering, Sharda University, (India) ABSTRACT In the new era of distributed system where
Analytics March 2015 White paper Why NoSQL? Your database options in the new non-relational world 2 Why NoSQL? Contents 2 New types of apps are generating new types of data 2 A brief history of NoSQL 3
Scalable SQL and NoSQL Data Stores Rick Cattell Cattell.Net Software Email: email@example.com ABSTRACT In this paper, we examine a number of SQL and socalled NoSQL data stores designed to scale simple OLTP-style
CSCI 550: Advanced Data Stores Basic Information Place and time: Spring 2014, Tue/Thu 9:30-10:50 am Instructor: Prof. Shahram Ghandeharizadeh, firstname.lastname@example.org, 213-740-4781 ITS Help: E-mail: email@example.com
NOSQL DATABASES AND CASSANDRA Semester Project: Advanced Databases DECEMBER 14, 2015 WANG CAN, EVABRIGHT BERTHA Université Libre de Bruxelles 0 Preface The goal of this report is to introduce the new evolving
!!!! NoSQL Databases Nikos Parlavantzas Lecture overview 2 Objective! Present the main concepts necessary for understanding NoSQL databases! Provide an overview of current NoSQL technologies Outline 3!
Introduction: management system Introduction s vs. files Basic concepts Brief history of databases Architectures & languages System User / Programmer Application program Software to process queries Software
Scalable SQL and NoSQL Data Stores Rick Cattell Originally published in 2010, last revised December 2011 ABSTRACT In this paper, we examine a number of SQL and socalled NoSQL data stores designed to scale
Evaluating NoSQL for Enterprise Applications Dirk Bartels VP Strategy & Marketing Agenda The Real Time Enterprise The Data Gold Rush Managing The Data Tsunami Analytics and Data Case Studies Where to go
WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications
The Quest for Extreme Scalability In times of a growing audience, very successful internet applications have all been facing the same database issue: while web servers can be multiplied without too many
Introduction Databases vs. files Basic concepts Brief history of databases Architectures & languages Introduction: Database management system User / Programmer Database System Application program Software
The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg Adam Marcus MIT CSAIL firstname.lastname@example.org / @marcua About Me Social Computing + Database Systems Easily Distracted: Wrote The NoSQL Ecosystem in
WHITE PAPER Oracle NoSQL Database and SanDisk Offer Cost-Effective Extreme Performance for Big Data 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Abstract... 3 What Is Big Data?...
NoSQL Thomas Neumann 1 / 22 What are NoSQL databases? hard to say more a theme than a well defined thing Usually some or all of the following: no SQL interface no relational model / no schema no joins,
Cloud data store services and NoSQL databases Ricardo Vilaça Universidade do Minho Portugal Context Introduction Traditional RDBMS were not designed for massive scale. Storage of digital data has reached
NewSQL: Towards Next-Generation Scalable RDBMS for Online Transaction Processing (OLTP) for Big Data Management A B M Moniruzzaman Department of Computer Science and Engineering, Daffodil International
Where We Are Introduction to Data Management CSE 344 Lecture 25: DBMS-as-a-service and NoSQL We learned quite a bit about data management see course calendar Three topics left: DBMS-as-a-service and NoSQL
2011 International Conference on Cloud and Service Computing NoSQL Evaluation A Use Case Oriented Survey Robin Hecht Chair of Applied Computer Science IV University ofbayreuth Bayreuth, Germany robin.hecht@uni
COSC 6397 Big Data Analytics Data Formats (II) HBase Edgar Gabriel Spring 2015 Comparing SQL and NOSQL databases Types Development History Data Storage Model SQL One type (SQL database) with minor variations
COMP 378 Database Systems Notes for Chapter 1 of Database System Concepts Introduction A database management system (DBMS) is a collection of data and an integrated set of programs that access that data.
NoSQL Source: Pramod J. Sadalage and Martin Fowler NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence, Pearson Education, 2013 Objectives Introduce some key concepts behind the
Comparison of the Frontier Distributed Database Caching System with NoSQL Databases Dave Dykstra email@example.com Fermilab is operated by the Fermi Research Alliance, LLC under contract No. DE-AC02-07CH11359
wow CPSC350 relational schemas table normalization practical use of relational algebraic operators tuple relational calculus and their expression in a declarative query language relational schemas CPSC350
Data Management in the Cloud - current issues and research directions Patrick Valduriez Esther Pacitti DNAC Congress, Paris, nov. 2010 http://www.med-hoc-net-2010.org SOPHIA ANTIPOLIS - MÉDITERRANÉE Is
Postgres Plus Advanced Server An Updated Performance Benchmark An EnterpriseDB White Paper For DBAs, Application Developers & Enterprise Architects June 2013 Table of Contents Executive Summary...3 Benchmark
BRAC UNIVERSITY SCHOOL OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGEENIRING 12-12-2012 Investigating Cloud Data Storage Sumaiya Binte Mostafa (ID 08301001) Firoza Tabassum (ID 09101028) BRAC University
Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY ANN KELLY II MANNING Shelter Island contents foreword preface xvii xix acknowledgments xxi about this book xxii Part 1 Introduction
Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale
Akmal B. Chaudhri ( 艾 克 摩 曹 理 ) -- IBM Senior IT Specialist 9 March 2012 Mirror, mirror on the wall, what s the fairest database technology of all? Abstract What s the the best best fit fit of of database
Mastering Massive Data Volumes with Hypertable Doug Judd Talk Outline Overview Architecture Performance Evaluation Case Studies Hypertable Overview Massively Scalable Database Modeled after Google s Bigtable
Sentimental Analysis using Hadoop Phase 2: Week 2 MARKET / INDUSTRY, FUTURE SCOPE BY ANKUR UPRIT The key value type basically, uses a hash table in which there exists a unique key and a pointer to a particular
CASE STUDY: Oracle TimesTen In-Memory Database and Shared Disk HA Implementation at Instance level -ORACLE TIMESTEN 11gR1 CASE STUDY Oracle TimesTen In-Memory Database and Shared Disk HA Implementation
Big Data Technologies Prof. Dr. Uta Störl Hochschule Darmstadt Fachbereich Informatik Sommersemester 2015 Situation: Bigger and Bigger Volumes of Data Big Data Use Cases Log Analytics (Web Logs, Sensor
Practical Cassandra NoSQL key-value vs RDBMS why and when Cassandra architecture Cassandra data model Life without joins or HDD space is cheap today Hardware requirements & deployment hints Vitalii Tymchyshyn
INTRODUCTION TO CASSANDRA This ebook provides a high level overview of Cassandra and describes some of its key strengths and applications. WHAT IS CASSANDRA? Apache Cassandra is a high performance, open
A Performance and Distributed Performance Benchmark of InfiniteGraph and a Leading Open Source Graph Database Using Synthetic Data Objectivity, Inc. 640 West California Ave. Suite 240 Sunnyvale, CA 94086
www.raima.com Scott Meder Senior Regional Sales Manager firstname.lastname@example.org Short Introduction to Raima What is Data Management What are your requirements? How do I make the right decision? - Architecture
Introduction to Apache Cassandra White Paper BY DATASTAX CORPORATION JULY 2013 1 Table of Contents Abstract 3 Introduction 3 Built by Necessity 3 The Architecture of Cassandra 4 Distributing and Replicating
Hurtownie Danych i Business Intelligence: Big Data Robert Wrembel Politechnika Poznańska Instytut Informatyki Robert.Wrembel@cs.put.poznan.pl www.cs.put.poznan.pl/rwrembel Outline Introduction to Big Data
Amr El Abbadi Computer Science, UC Santa Barbara email@example.com Collaborators: Divy Agrawal, Sudipto Das, Aaron Elmore, Hatem Mahmoud, Faisal Nawab, and Stacy Patterson. Client Site Client Site Client
Benchmarking Correctness of Operations in Big Data Applications Sumita Barahmand and Shahram Ghandeharizadeh Database Laboratory Technical Report 2014-05 Computer Science Department, USC Los Angeles, California
A Study of Application Performance with Non-Volatile Main Memory Yiying Zhang, Steven Swanson 2 Memory Storage Fast Slow Volatile In bytes Persistent In blocks Next-Generation Non-Volatile Memory (NVM)
Introduction to Polyglot Persistence Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace FOSSCOMM 2016 Background - 14 years in databases and system engineering - NoSQL DBA @ ObjectRocket
CISC 432/CMPE 432/CISC 832 Advanced Database Systems Course Info Instructor: Patrick Martin Goodwin Hall 630 613 533 6063 firstname.lastname@example.org Office Hours: Wednesday 11:00 1:00 or by appointment Schedule:
BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research & Innovation 04-08-2011 to the EC 8 th February, Luxembourg Your Atos business Research technologists. and Innovation
Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam email@example.com Agenda The rise of Big Data & Hadoop MySQL in the Big Data Lifecycle MySQL Solutions for Big Data Q&A
Lecture 4 Introduction to Hadoop & GAE Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu Outline Introduction to Hadoop The Hadoop ecosystem Related projects
Big Data Analytics - Accelerated stream-horizon.com Legacy ETL platforms & conventional Data Integration approach Unable to meet latency & data throughput demands of Big Data integration challenges Based
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing
Spring Data Modern Data Access for Enterprise Java Mark Pollack, Oliver Gierke, Thomas Risberg, Jon Brisbin, and Michael Hunger O'REILLY* Beijing Cambridge Farnham Koln Sebastopol Tokyo Table of Contents
CSIT 6910 Independent Project A survey of big data architectures for handling massive data Jordy Domingos - firstname.lastname@example.org Supervisor : Dr David Rossiter Content Table 1 - Introduction a - Context
Big Data Processing, 2014/15 Lecture 10: HBase!! Claudia Hauff (Web Information Systems)! email@example.com 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind the
International Journal of Applied Information Systems (IJAIS) ISSN : 2249-868 Performance Evaluation of NoSQL Systems Using YCSB in a resource Austere Environment Yusuf Abubakar Department of Computer Science
Objectivity, Inc. Graph Database Proof of Concept Report Managing The Internet of Things Table of Contents Executive Summary 3 Background 3 Proof of Concept 4 Dataset 4 Process 4 Query Catalog 4 Environment
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
SQL Server 2014 New Features/In- Memory Store Juergen Thomas Microsoft Corporation AGENDA 1. SQL Server 2014 what and when 2. SQL Server 2014 In-Memory 3. SQL Server 2014 in IaaS scenarios 2 SQL Server
Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop
Cassandra A Decentralized, Structured Storage System Avinash Lakshman and Prashant Malik Facebook Published: April 2010, Volume 44, Issue 2 Communications of the ACM http://dl.acm.org/citation.cfm?id=1773922
MySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!) Erdélyi Ernő, Component Soft Kft. firstname.lastname@example.org www.component.hu 2013 (c) Component Soft Ltd Leading Hadoop Vendor Copyright 2013,
Big Systems, Big Data When considering Big Distributed Systems, it can be noted that a major concern is dealing with data, and in particular, Big Data Have general data issues (such as latency, availability,