NO:SQL NOT ONLY SQL. DOAG-Regionaltreffen München/Südbayern Donnerstag, 17. Februar 2011 um 17:00 Uhr



Similar documents
Lecture Data Warehouse Systems

Cloud Scale Distributed Data Storage. Jürmo Mehine

Big Data Technologies. Prof. Dr. Uta Störl Hochschule Darmstadt Fachbereich Informatik Sommersemester 2015

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA

Introduction to NOSQL

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Practical Cassandra. Vitalii

nosql and Non Relational Databases

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

Cloud Computing with Microsoft Azure

Cloud Computing Is In Your Future

An Approach to Implement Map Reduce with NoSQL Databases

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)

NoSQL Data Base Basics

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

Distributed Data Stores

Using Object Database db4o as Storage Provider in Voldemort

Advanced Data Management Technologies

GRAPH DATABASE SYSTEMS. h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe

The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

Do Relational Databases Belong in the Cloud? Michael Stiefel

So What s the Big Deal?

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

Big Data With Hadoop

NoSQL in der Cloud Why? Andreas Hartmann

Challenges for Data Driven Systems

Structured Data Storage

The CAP theorem and the design of large scale distributed systems: Part I

NoSQL Databases. Nikos Parlavantzas

A programming model in Cloud: MapReduce

How To Scale Out Of A Nosql Database

NoSQL Database - mongodb

these three NoSQL databases because I wanted to see a the two different sides of the CAP


NoSQL Database Options

High Throughput Computing on P2P Networks. Carlos Pérez Miguel

How To Improve Performance In A Database

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

Transactions and ACID in MongoDB

Databases : Lecture 11 : Beyond ACID/Relational databases Timothy G. Griffin Lent Term Apologies to Martin Fowler ( NoSQL Distilled )

Domain driven design, NoSQL and multi-model databases

bigdata Managing Scale in Ontological Systems

Can the Elephants Handle the NoSQL Onslaught?

How To Write A Database Program

Big Data Storage, Management and challenges. Ahmed Ali-Eldin

NoSQL Evaluation. A Use Case Oriented Survey

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store

Integrating Big Data into the Computing Curricula

NoSQL and Hadoop Technologies On Oracle Cloud

Introduction to Big Data Training

How To Use Big Data For Telco (For A Telco)

Analysis and Classication of NoSQL Databases and Evaluation of their Ability to Replace an Object-relational Persistence Layer

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island

Big Data Management and NoSQL Databases

Comparing SQL and NOSQL databases

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

A survey of big data architectures for handling massive data

NoSQL. Thomas Neumann 1 / 22

Preparing Your Data For Cloud

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

Вовченко Алексей, к.т.н., с.н.с. ВМК МГУ ИПИ РАН

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

In Memory Accelerator for MongoDB

Study and Comparison of Elastic Cloud Databases : Myth or Reality?

Slave. Master. Research Scholar, Bharathiar University

A Review of Column-Oriented Datastores. By: Zach Pratt. Independent Study Dr. Maskarinec Spring 2011

CloudDB: A Data Store for all Sizes in the Cloud

NoSQL Databases. Polyglot Persistence

Cassandra A Decentralized, Structured Storage System

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

Apache Cassandra for Big Data Applications

Benchmarking and Analysis of NoSQL Technologies

Cloud data store services and NoSQL databases. Ricardo Vilaça Universidade do Minho Portugal

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367

Tips and Tricks for Using Oracle TimesTen In-Memory Database in the Application Tier

Hypertable Architecture Overview

Data sharing in the Big Data era

An Open Source NoSQL solution for Internet Access Logs Analysis

NOSQL DATABASES AND CASSANDRA

Cloud Computing at Google. Architecture

Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ. Cloudera World Japan November 2014

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Database Management System Choices. Introduction To Database Systems CSE 373 Spring 2013

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

Big Systems, Big Data

NoSQL Systems for Big Data Management

Hadoop and Map-Reduce. Swati Gore

NoSQL. What Is NoSQL? Why NoSQL?

How graph databases started the multi-model revolution

Cassandra A Decentralized Structured Storage System

Chapter 7. Using Hadoop Cluster and MapReduce

MASTER PROJECT. Resource Provisioning for NoSQL Datastores

How To Handle Big Data With A Data Scientist

Transcription:

DOAG-Regionaltreffen München/Südbayern Donnerstag, 17. Februar 2011 um 17:00 Uhr NOSQL Only a Hype? Or the trend for the next future? NO:SQL NOT ONLY SQL Seite 1

Agenda 1 Why? 2 ACID versus BASE Horizontal versus vertical scalability 3 CAP Theorem 4 NoSQL most important ideas 5 Some NoSQL DB examples 6 Oracle NoSQL features 7 Summary and discussion Timeline: ~ 40 min + 5min discussion Seite 2

The Big Battle? NO:SQL Seite 3

NoSQL Market Forecast 2011-2015.. According to Market Research Media s latest forecasts, the worldwide NoSQL market is expected to reach $1.8 Billion by 2015 at a CAGR of 32% between 2011 and 2015. NoSQL market will generate $5.6 Billion revenues over the period 2011 2015... See: http://www.marketresearchmedia.com/2010/11/11/nosql-market/ Seite 4

NoSQL Not Only SQL Why? Current popular problems in the database world Excessive growth of data Performance and scalability Consequences: Increasing hardware costs (especially the increasing higher requirements of storage hardware)! Unbelievable high license cost for strictly required features for commercial high-volume database like EE partioning, inmemory TimesTen NoSQL i I m cheaper! Seite 5

NoSQL Not Only SQL Why? Current popular problems in the programming world Lack of experience with SQL databases Unsolved problem of easy object to relational mapping Complex test automation for SQL statements Mindset to SQL => old-fashioned, to declarative Consequences: Unbelievable complex persistence layers to unlink java from the database Thousands of ideas to avoid the usage of SQL Lean programming! Relationali Seite 6

No:SQL? Rolle Rechte Schema The "no:sql(east)" conference 2009 in Atlanta Seite 7

NoSQL Not Only SQL Why? Current popular problems in the distributed server world How we can grantee unlimited scalability? How to a upgrade to a new software version? Consequences: Unbelievable high license and hardware costs. NOSQL i I m cheaper! Seite 8

NoSQL Not Only SQL Why? Current popular problems in the 24/7 world How change a database schema without downtime? How to change/delete a huge amount of values at runtime? Consequences: Again unbelievable high license and hardware costs. NOSQL i I m easyer! Seite 9

NOSQL Definition A pure NoSQL DB s is: non relational schema free from the scratch distributed multi-node architecture Goal: Shared nothing cluster solutions easy replication mechanisms Simple API ( Not only SQL = NoSQL) No ACID consistency model Open Source BUT! The big disadvantage at the moment => mostly no easy language to search => Complex Joins? Not implemented Seite 10

NOSQL Polyglot Persistence Fight for free Database Software Use the right database for your problem Not only the default one! More effort to evaluate the requirement and search after the right solution for your problem Seite 11

ACID versus BASE Horizontal versus vertical scalability Seite 12

Our Oracle religion - ACID ACID are the transactional properties of database management systems Atomicity All of the operations in the transaction will complete, or none will. Consistency The database will be in a consistent state when the transaction begins and ends. Isolation The transaction will behave as if it is the only operation being performed upon the database. Durability Upon completion of the transaction, the operation will not be reversed. Seite 13

Horizontal scalability One huge central database Bigger hardware helps to solve the performance issues Only scalable to one absolute value Costs Performance 1 B Z One Database 2 3 n A C F G ONE DATABASE! n Storage cell machines n Instance n App. Server Seite 14

BUT? Do we really need strong ACID this in every Situation? GPS Tracking Xing Facebook Amazon Ebay IT Monitoring E-Mail Archive Data ware house traffic guidance systems Toll collect syslog Online News Spiegel.de Seite 15

The persistency of the data in a system All data is visible to all users Partly visible in the system 16:00 17:00 Time frame Now! Seite 16

Vertical scalability Small components working together in parallel All Components are independent Costs nearly endlessly scalable Performance 1. Database 2. Database N. Database 1 2 3 n Z F G C B A N databases N Instance N App. Server Seite 17

CAP Theorem Multi node environments Seite 18

CAP Theorem Start situation - Two nodes Requirements: Consistency Availability Partition Tolerance A B X 0 X 0 Eric Brewer : ACM Symposium 2000 Seite 19

Brewer's CAP Theorem for a distributed model Is it possible to solve all this requirements? Consistency The client perceives that a set of operations has occurred all at once. Only two of Consistency, Availability and Partition Tolerance can be guaranteed! Partition Tolerance Operations will complete, even if individual components are unavailable. Availability Every operation must terminate in an intended response. See: http://www.julianbrowne.com/article/viewer/brewers-cap-theorem Seite 20

Theory of the CAP Theorem (1) Sunny Day Scenario for a update 1 2 3 A Write 1 X= 1 X 0 A X 1 A X 1 X= 1 B X 0 B X 1 B X= 1 X 1 Read 1 Seite 21

Theory of the CAP Theorem (2) Error Scenario for a update 1 2 3 A Write 1 X= 1 X 0 A X 1 A X 1 X= 1 B X 0 B X 0 B X= 1 X 0 Read 0 Seite 22

Theory of the CAP Theorem (3) How we solve the problem? Drop Partition Tolerance Only Node survived Side effect => we drop Availability Drop Consistency Live with inconsistencie Seite 23

A other consistency model - BASE A latency tolerant alternative to ASCI Basically Available Soft state Eventual consistency Given a sufficiently long period of time, over which no updates are sent, we can expect that during this period, all updates will, eventually, propagate through the system and all the replicas will be consistent Seite 24

ACID vs. BASE ACID BASE Strong consistency Isolation Focus on commit Nested transactions Availability? Conservative (pessimistic) Difficult evolution (e.g. schema) Weak consistency stale data OK Availability first Best effort Approximate answers OK Aggressive (optimistic) Simpler! Faster Easier evolution Do the right thing at each application layer Our application Source: http://www.cs.berkeley.edu/~brewer/cs262b-2004/podc-keynote.pdf Seite 25

NoSQL architecture Some base ideas Seite 26

Some basic ideas and concepts Map and Reduce Consistent Hashing MVCC-Protocol Vector Clocks Paxos Wide Column Stores Seite 27

Map and Reduce Base ideas from the functional programming List processing Programing model two functions Processes input key/value pair Produces set of intermediate pairs map (input) -> list(intermediate_value) Combines all intermediate values for a particular key Produces a set of merged output values (usually just one) reduce (out_key, list(intermediate_value)) -> list(out_value) Source:http://labs.google.com/papers/mapreduce-osdi04-slides Seite 28

Map and Reduce Example Count of Tokens --data token1 = 'BMW, AUDI,VW,BMW,AUDI,SKODA' token2 = 'KIA, VW,VW,BMW,AUDI,VW' select company, count(*) from cars_in_stock group by company; map (token) : f() {count word} => List[] -- first Run for each token list map( tocken1 ) => list1 [ ('BMW,1,1'), ('AUDI,1,1'), ('VW,1'), ('SKODA,1') ] map( tocken2 ) => list2 [ ('BMW,1'), ('AUDI,1'), ('VW,1,1,1'),, ('KIA,1') ] -- Intermediate values map (list1+list2) => list3 [ ('BMW,1,1,1'), ('AUDI,1,1'), ('VW,1,1,1,1'), ('SKODA,1'), ('KIA,1') ] reduce : f() { ++ } => List [] -reduce to the result reduce(list3) => result [ ('BMW,3'), ('AUDI,3'), (VW,4), ('SKODA,1'), ('KIA,1') ] Seite 29

Map and Reduce Very easy to implement in parallel DATA Map Intermediate Reduce Result BMW, AUDI,VW, BMW,AUDI,SKODA KIA,VW, VW,BMW,A UDI,VW MAP MAP Reduce Result KIA,VW,VW,BMW, AUDI,VW KIA,VW,VW,BMW, AUDI,VW MAP MAP Reduce Seite 30

Consistent Hashing Consistent Hashing A hash function to find the data in the right slot The addition or removal of one slot does not significantly change the mapping of keys to slots. Adress data Slots Server X 99 X 60 X 5 X 4 192.168.1.90 X 50 Hash(data) X 3 192.168.1.30 X 26 X 25 X 2 X 1 192.168.1.10 See: http://en.wikipedia.org/wiki/consistent_hashing Seite 31

Consistent Hashing (1) Distributed Servers hold the data Hash(server) => Range of the storage location hash(data) => storage location Example: 1. hash(192.168.1.10) = 15 2. hash(192.168.1.30) = 30 3. hash(192.168.1.90) = 80 X 99 1 X 25 X 60 3 X 26 2 X 50 Seite 32

Consistent Hashing (2) Distributed servers hold the data If one server leave the cluster, only his data moved to the next server The addressing at self is not changing! X 99 X 60 3 1 Move to 1 X 25 X 26 X 50 2 Leave Cluster Seite 33

MVCC-Protocol Multiversion concurrency control (abbreviated MCC or MVCC) provide concurrent access to the database a database will implement updates not by deleting an old piece of data and overwriting it with a new one instead marking the old data as obsolete and adding the newer "version." multiple versions where stored, but only one is the latest. See: http://en.wikipedia.org/wiki/multiversion_concurrency_control Seite 34

Vector clock algorithm for generating a partial ordering of events in a distributed system and detecting causality violations. Contain the state of the sending process's logical clock. Sender ID Sender ID Sender ID Time Counter Time Counter Time Counter See: http://en.wikipedia.org/wiki/vector_clock Seite 35

Vector clock Example A C:1 A:1 300 C:1 A:2 300 B C C:1 C:1 B:1 200 C:2 B:1 A:0 C:1 B:0 A:2 Use the counter stamp to identify the status and solve problem with this information 200 200 300 Seite 36

Paxos Paxos is a family of protocols for solving consensus in a network of unreliable processors. Consensus is the process of agreeing on one result among a group of participants. See: http://en.wikipedia.org/wiki/paxos_algorithm Opposite to the ACID two phase Commit (2PC => all processes can hang until all commit) Seite 37

Paxos Roles in a consensus Model proposer acceptor learner quorum Make a proposal with the value v Accepts a proposal Learn chosen proposal Seite 38

Wide Column Stores Opposite of the row oriented table structure (for example in a oracle database) Column oriented Example: Wide Column stores ID:NAME:CITY:SALARY 1,2,3 : Scott,hugo,tiger : Huston,Munic,New York : 1000,4500,6000 Seite 39

Examples NoSQL Some Examples Seite 40

NOSQL Categories for each problem a solution Document store Graph Key Store Key/value store on disk Key/value cache in RAM Eventually consistent key value store Key-value stores implementing the Paxos algorithm Hierarchical key-value store Ordered key-value store Multivalue databases Object database Tabular Tuple store Source: http://en.wikipedia.org/wiki/nosql Seite 41

Document store -CouchDB Apache CouchDB cluster of unreliable commodity hardware Data Base document-oriented database queried and indexed in a MapReduce fashion using JavaScript. incremental replication with bi-directional conflict detection and resolution. http://couchdb.apache.org/ Store documents Example of a document: Surname: Gunther Adress : Munich Hobbies: Oracle DB Seite 42

Graph database Neo4J Neo4j is a graph database fully transactional database stores data structured as graphs. http://neo4j.org/ Seite 43

Key-Value Database Redis open source, advanced key-value store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets and sorted sets. Key Value Key Value Key Value Key Value In Memory Database http://redis.io/ Seite 44

ColumnFamily-based data model Cassandra The Apache Cassandra Project develops a highly scalable second-generation distributed database, bringing together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model. Implemented in Java Example: Facebook, Twitter Schema less Only entities (column families are defined at startup) Add column at runtime http://cassandra.apache.org/ Seite 45

Oracle? Seite 46

Oracle Berkeley DB 11g Oracle Berkeley DB 11g designed to store data as opaque byte arrays of data in key/value pairs indexed in one of the available access methods. Create, read, update and delete (CRUD) operations on these key/value pairs is done using the BDB get/put or cursor position-based APIs. http://www.oracle.com/technetwork/database/be rkeleydb/overview/index.html Seite 47

Where we find this ideas in an Oracle RDBMS? Map and Reduce??? Consistent Hashing Hash partioning MVCC-Protocol Default since V3 Vector Clocks May be in the inter process mechanism Paxos??? Seite 48

Oracle NoSQL Feature Mostly some Storage Feature in the database Paritioning Clustering Bitmap Join Index But where is the scalability? Cluster? Only Horizontal See: http://asktom.oracle.com/pls/apex/f?p=100:11:0::::p11_question_id:2664632900346253817 Seite 49

Summary Relational databases => ALL Features First Solutions Best for feature-rich solutions Proven technologies Traditional applications NoSQL => Solve one problem First solution Solves the problem, for that the DB is designed In some situations=> the best solution to solve the problem One Question on mass data Seite 50

Conclusion From the technical view: If we use the technologies to solve real problems both, strong ACID and BASE DB s will live together Trend of the future! From the business view: Save education for Java developers Save license costs Fight against the arrogance of the established companies More a hype then a advance Seite 51

Discussion Fragen NO:SQL Seite 52

More Information and Sources The NoSQL Websites: http://nosql-database.org/ CAP http://en.wikipedia.org/wiki/cap_theorem http://www.julianbrowne.com/article/viewer/brewers-captheorem ACID versus Base http://www.cs.berkeley.edu/~brewer/cs262b-2004/podckeynote.pdf Map And Reduce http://labs.google.com/papers/mapreduce-osdi04-slides Seite 53

More Information Paxos http://informatik.unibas.ch/lehre/hs10/cs341/_downloads/wo rkshop/reports/2010-hs-dis-p_fath-paxos-report.pdf Seite 54

Books NoSQL: Einstieg in die Welt nichtrelationaler Web 2.0 Datenbanken Stefan Edlich, Achim Friedland, Jens Hampe, Benjamin Brauer Verlag: Hanser Fachbuchverlag (7. Oktober 2010) Sprache: Deutsch ISBN-10: 3446423559 Seite 55

Referent Technology Consultant Gunther Pippèrr Dipl. Ing. Technische Informatik (FH) >10 Jahre IT Beratungserfahrung Freiberuflich tätig Schwerpunkte der Beratungstätigkeit IT System Architekt und technische Projektleitung Web-Technologien Design und Implementierung von Datenbank-basierten Internet- /Intranet-Anwendungen Entwurf und Umsetzung von IT Infrastrukturen zum Datenmanagement mit Oracle Basis Technologien Training rund um die Oracle Datenbank Beruflicher Hintergrund Dipl. Ing. Technische Informatik (FH) Weingarten Mehr als10 Jahre Erfahrung in komplexen IT Projekten zum Thema Datenhaltung/Datenmanagement Freiberufliche Projektarbeit 8 Jahre Geschäftsführer Consultant bei großen Datenbank Hersteller Projekterfahrung Architekt für ein System zur Massendaten-Erfassung für Monitoring in der Telekommunikation Architekt und technische Projektverantwortung für ein Smart Metering Portal für das Erfassen von Energiezählerdaten und Asset Management Architekt und Projektleitung, Datenbank Design und Umsetzung für die Auftragsverwaltung mit Steuerung von externen Mitarbeitern für den Sprachdienstleister von deutschen Technologiekonzern Architekt und technische Projektverantwortung für IT Infrastrukturprojekte, z.b.: - Zentrale Datenhaltung für Münchner Hotelgruppe mit über 25 Hotels weltweit, - Messdaten Erfassung für russischen Kabelnetzbetreiber - Redundante Cluster Datenbank Infrastrukturen für diverse größere Web Anwendungen wie Fondplattform und Versicherungsportale CRM- und Ausschreibungsportal für großen Münchner Bauträger Kontaktdaten: E-Mail: gunther@pipperr.de Mobil:+49- (0)17180656113 Internet : www.pipperr.de Seite 56