NOSQL DATABASE SYSTEMS



Similar documents
NOSQL DATABASE SYSTEMS

Big Data Technologies. Prof. Dr. Uta Störl Hochschule Darmstadt Fachbereich Informatik Sommersemester 2015

Cloud Scale Distributed Data Storage. Jürmo Mehine

Lecture Data Warehouse Systems

NoSQL Databases. Nikos Parlavantzas

Big Data Management. Big Data Management. (BDM) Autumn Povl Koch November 11,

Introduction to NoSQL Databases. Tore Risch Information Technology Uppsala University

How To Scale Out Of A Nosql Database

Preparing Your Data For Cloud

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Hadoop WordCount Explained! IT332 Distributed Systems

Structured Data Storage

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

Big Data Analytics with MapReduce VL Implementierung von Datenbanksystemen 05-Feb-13

CS54100: Database Systems

You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.

Hadoop at Yahoo! Owen O Malley Yahoo!, Grid Team owen@yahoo-inc.com

HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367

Big Data Management in the Clouds. Alexandru Costan IRISA / INSA Rennes (KerData team)

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

How To Write A Database Program

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

HPCHadoop: MapReduce on Cray X-series

Internals of Hadoop Application Framework and Distributed File System

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

Comparing SQL and NOSQL databases

Introduction to MapReduce and Hadoop

NOSQL INTRODUCTION WITH MONGODB AND RUBY GEOFF

What We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea

Hadoop: Understanding the Big Data Processing Method

Big Data With Hadoop

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

Hadoop and Eclipse. Eclipse Hawaii User s Group May 26th, Seth Ladd

Slave. Master. Research Scholar, Bharathiar University

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Integrating Big Data into the Computing Curricula

Big Data Management and NoSQL Databases

Can the Elephants Handle the NoSQL Onslaught?

.NET User Group Bern

Hadoop Ecosystem B Y R A H I M A.

Moving From Hadoop to Spark

Open Source Technologies on Microsoft Azure

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia

Advanced Data Management Technologies

Introduction to NoSQL Databases and MapReduce. Tore Risch Information Technology Uppsala University

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

Dominik Wagenknecht Accenture

Enterprise Operational SQL on Hadoop Trafodion Overview

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2012/13

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

Infrastructures for big data

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hadoop Framework. technology basics for data scientists. Spring Jordi Torres, UPC - BSC

Introduc)on to the MapReduce Paradigm and Apache Hadoop. Sriram Krishnan

Word count example Abdalrahman Alsaedi

Open source large scale distributed data management with Google s MapReduce and Bigtable

CSE-E5430 Scalable Cloud Computing Lecture 2

Lecture 10: HBase! Claudia Hauff (Web Information Systems)!

NoSQL Data Base Basics

Sentimental Analysis using Hadoop Phase 2: Week 2

NoSQL Databases. Polyglot Persistence

Big Data Architectures. Tom Cahill, Vice President Worldwide Channels, Jaspersoft

Challenges for Data Driven Systems

Applications for Big Data Analytics

Xiaoming Gao Hui Li Thilina Gunarathne

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

Big Data Analytics* Outline. Issues. Big Data

The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL

NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases

Hadoop IST 734 SS CHUNG

The Hadoop Eco System Shanghai Data Science Meetup

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases


Big Data and Scripting Systems build on top of Hadoop

Big Data Course Highlights

Big Data and Apache Hadoop s MapReduce

Introduction to Polyglot Persistence. Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace

Hadoop/MapReduce. Object-oriented framework presentation CSCI 5448 Casey McTaggart

Introduc8on to Apache Spark

Introduction to Apache Cassandra

Qsoft Inc

Вовченко Алексей, к.т.н., с.н.с. ВМК МГУ ИПИ РАН

Implement Hadoop jobs to extract business value from large and varied data sets

NoSQL and Hadoop Technologies On Oracle Cloud

Jeffrey D. Ullman slides. MapReduce for data intensive computing

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

Big Data and Scripting Systems build on top of Hadoop

How To Improve Performance In A Database

TRAINING PROGRAM ON BIGDATA/HADOOP

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

So What s the Big Deal?

Map Reduce & Hadoop Recommended Text:

Data Management in the Cloud

Transcription:

NOSQL DATABASE SYSTEMS Big Data Technologies: NoSQL DBMS - SoSe 2015 1

Categorization NoSQL Data Model Storage Layout Query Models Solution Architectures NoSQL Database Systems Data Modeling id ti Application Development Scalability, Availability and Consistency Partitioning, Replication Consistency Models and Transactions Select the Right DBMS Performance and Benchmarks Polyglot Persistence Big Data Technologies: NoSQL DBMS - SoSe 2015 2

NoSQL Database Systems NoSQL Considered Categories of NoSQL Database Systems Key-Value Database Systems Document Database Systems Column Family Database Systems Big Data Technologies: NoSQL DBMS - SoSe 2015 3

Key-Value Database Systems NoSQL Data Model Key-value pairs Unique keys Values arbitrary type (serialized byte arrays) or strings, lists, sets, ordered sets (of strings) Schema-free key key key key key value value value value value Storage Layout Hash-Maps, B-Trees, Indexes Primary indexes (Hash, B-tree) on key Secondary indexes on values? Big Data Technologies: NoSQL DBMS - SoSe 2015 4

Key-Value Database Systems (Cont.) NoSQL Query Models Simple API set (key, value) value = get (key) delete (key) Operations on values? More complex operations Language Bindings MapReduce later in this chapter key key key key key value value value value value Systems Oracle Berkeley DB (mid-90s) Caches (EHCache, Memcache) Amazon Dynamo/S3, Redis, Riak, Voldemort, Big Data Technologies: NoSQL DBMS - SoSe 2015 5

Document Store Database Systems NoSQL Data Model Key-value pairs with documents as value Document format: JSON or BSON (Binary JSON) Loosely structured name(key)-value pairs Hierarchical Additionally, MongoDB uses collections arbitrary documents could be grouped together documents in a collection should be similar to facilitate effective indexing { } "id": 1, "name": football boot", "price": 199, "stock": { "warehouse": 120, "retail": 10 } Storage Layout B-Trees to store the documents MongoDB: Documents in a single collection are stored together Big Data Technologies: NoSQL DBMS - SoSe 2015 6

Document Store Database Systems (Cont.) NoSQL Indexes Primary indexes on documentid (key) Secondary indexes on JSON-names Default or user defined Composite indexes may be supported Query Models Simple API: set/get/delete Further query support differ widely Powerful ad-hoc queries with integrated query language (MongoDB) No ad-hoc queries, predefined views with indexes only (CouchDB & Couchbase) Language Bindings MapReduce later in this chapter Systems MongoDB, CouchDB, Couchbase, { } "id": 1, "name": football boot", "price": 199, "stock": { "warehouse": 120, "retail": 10 } Big Data Technologies: NoSQL DBMS - SoSe 2015 7

Column Family Database Systems NoSQL Data Model Loosely structured by columns and column families ( set of nested maps ) Column Family set of columns grouped together into a bundle Column families have to be predefined Column Not predefined; any type or data (can be nested) Table Column Family Column Family Row Key1 column column column column column Row Key2 column column column column Big Data Technologies: NoSQL DBMS - SoSe 2015 8

Column Family Database Systems (Cont.) NoSQL Data Model (Cont.) Example: Row Key: title Column Family text Column Family revision "NoSQL" "Redis" text:content: "A NoSQL database provides a mechanism " text:content: "Redis is an open-source, networked " revision:author: "Mendel" revision:comment": "changed " revision:author: "Torben" revision:comment: "initial " Column family database systems support multiple versions of each cell by timestamps: Row Key: title Time Stamp Column Family text Column Family revision "NoSQL" t5 text:content: " " revision:author: "Mendel" revision:comment: "changed " t4 revision:author: "Torben" revision:comment: "there " "Redis" t3 text:content: " " revision:author: "Torben" revision:comment: "initial " Big Data Technologies: NoSQL DBMS - SoSe 2015 9

Column Family Database Systems (Cont.) NoSQL Row Key: title Time Stamp Column Family text Column Family revision "NoSQL" t5 text:content: " " revision:author: "Mendel" revision:comment: "changed view " Storage Layout Data is stored by column family t4 Row Key: title Time Stamp Column Family text column: content NoSQL t5 A NoSQL database provides a mechanism Redis t3 Redis is an open-source, networked revision:author: "Torben" revision:comment: there should be " "Redis" t3 text:content: " " revision:author: "Torben" revision:comment: "initial " Row Key: title Time Stamp ColumnFamily revision column: author column: comment NoSQL t5 Mendel changed view NoSQL t4 Torben there should be Redis t3 Torben initial Big Data Technologies: NoSQL DBMS - SoSe 2015 10

Column Family Database Systems (Cont.) NoSQL Classical example: Web table Row Key Time Stamp Column Family contents Column Family anchor "com.cnn.www" t9 anchor:anchor:"cnnsi.com anchor:anchortext:"cnn" t8 t6 t5 "<html> " "<html> " anchor:anchor:"my.look.ch anchor:anchortext: "CNN.com" Big Data Technologies: NoSQL DBMS - SoSe 2015 11

Column Family Database Systems (Cont.) NoSQL Query Models Simple API set (table, row, column, value) value = get (table, row, column) delete (table, row, column) timestamp optional Language Bindings More powerful query engines integrated (Cassandra Query Language) or as additional software products (e.g. Google App Engine / Google Datastore for BigTable, Hive for Data Warehousing on HBase) MapReduce later in this chapter Indexes Primary indexes (B-Trees sorted ordered) Default or user defined secondary indexes Systems Google BigTable, HBase, Cassandra, Amazon SimpleDB, Big Data Technologies: NoSQL DBMS - SoSe 2015 12

NoSQL (Not only SQL): Definition NoSQL NoSQL Definition: Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontally scalable. The original intention has been modern web-scale databases. The movement began early 2009 and is growing rapidly. Often more characteristics apply such as: schema-free, easy replication support, simple API, eventually consistent / BASE (not ACID), a huge amount of data and more. So the misleading term "nosql" (the community now translates it mostly with "not only sql") should be seen as an alias to something like the definition above. Source: S. Edlich, nosql-database.org Big Data Technologies: NoSQL DBMS - SoSe 2015 13

NoSQL (Not only SQL): Definition NoSQL Next Generation Databases mostly addressing some of the points: non-relational schema-free simple API more complex APIs currently under development distributed and horizontally scalable easy replication support eventually consistent / BASE (not ACID) BASE as well as ACID are supported nowadays open-source??? Big Data Technologies: NoSQL DBMS - SoSe 2015 14

NoSQL: The Essence NoSQL Data Model non-relational schema-free Scalability distributed and horizontally scalable easy replication support Big Data Technologies: NoSQL DBMS - SoSe 2015 15

NoSQL Database Systems: Use Cases NoSQL Key-Value Database Systems Suitable Use Cases Storing Session Information User Profiles, Preferences Shopping Cart Data Examples Amazon (shopping carts) Temetra (meter data) Document Store Database Systems Event Logging Content Management Systems Blogging Platforms Web Analytics or Real-Time Analytics Forbes (CMS) MTV (CMS) Column Family Database Systems Event Logging Content Management Systems Blogging Platforms Google (web pages) Facebook (messaging) Twitter (places of interest) Big Data Technologies: NoSQL DBMS - SoSe 2015 16

NoSQL Family Tree NoSQL Source: cloudant.com Big Data Technologies: NoSQL DBMS - SoSe 2015 17

Solution Architectures (Examples) NoSQL Google Stack Hadoop Stack Source: Saake/Schallehn:2011 Big Data Technologies: NoSQL DBMS - SoSe 2015 18

Categorization NoSQL Data Model Storage Layout Query Models Solution Architectures NoSQL Database Systems Data Modeling id ti Application Development Scalability, Availability and Consistency Partitioning, Replication Consistency Models and Transactions Select the Right DBMS Performance and Benchmarks Polyglot Persistence Big Data Technologies: NoSQL DBMS - SoSe 2015 19

Data Modeling id ti Object-relational impedance mismatch Example: blog, blogpost, comment, author Object-oriented modeling Mapping to relational database Big Data Technologies: NoSQL DBMS - SoSe 2015 20

Data Modeling Decisions id ti Primary Decision: Embedding vs. Referencing However, to consider There are no join operations within NoSQL database systems! There are no distributed transactions within NoSQL! Advantages and Disadvantages of Embedding Advantages and Disadvantages of Referencing Martin Fowler: Aggregate-Oriented Modeling Big Data Technologies: NoSQL DBMS - SoSe 2015 21

Data Modeling: Document Store DBS How to realize references? id ti Direction of references? Embedding: What about denormalization and redundancy? Big Data Technologies: NoSQL DBMS - SoSe 2015 22

Data Modeling: Column Family DBS id ti How to implement embedded objects in column family database systems? Variant 1: Using run-time named column qualifiers Variant 2: Using timestamps (or other id s) New (Cassandra CQL3): Using collection types (map, set, list) What about column families? Big Data Technologies: NoSQL DBMS - SoSe 2015 23

Data Modeling id ti What about data modeling in key-value database systems? Data Modeling: Conclusion More degrees of freedom Embedding vs. referencing Denormalization and redundancy Big Data Technologies: NoSQL DBMS - SoSe 2015 24

Categorization NoSQL Data Model Storage Layout Query Models Solution Architectures NoSQL Database Systems Data Modeling id ti Application Development Scalability, Availability and Consistency Partitioning, Replication Consistency Models and Transactions Select the Right DBMS Performance and Benchmarks Polyglot Persistence Big Data Technologies: NoSQL DBMS - SoSe 2015 25

Application Development for NoSQL Simple command line APIs REST-API (Some) more powerful query languages / query engines Language Bindings Java, Ruby, C#, Python, Erlang, PHP, Perl, REST Thrift Big Data Technologies: NoSQL DBMS - SoSe 2015 26

Application Development for NoSQL Example: title, content from blogpost with id = 042 // HBase get 'blogposts', '042', { COLUMN => ['blogpost_data:title', 'blogpost_data:content'] } // Cassandra SELECT title, content FROM blogposts WHERE id = '042'; // MongoDB db.blogposts.find( { _id : '042' }, { title: 1, content: 1 } ) // Couchbase function (doc) { if (doc._id == '042') { emit(doc._id, [doc.title, doc.content]); } } Big Data Technologies: NoSQL DBMS - SoSe 2015 27

Application Development for NoSQL Challenge Big data Data distributed over several hundred notes (remember: scale out) Data-to-Code or Code-to-Data? Executing jobs in parallel over several nodes There is a need for appropriated algorithms and frameworks! Big Data Technologies: NoSQL DBMS - SoSe 2015 28

MapReduce: Basic Idea Old idea from functional programming (LISP, ML, Erlang, Scala etc.) Divide tasks into small discrete tasks and run them in parallel Never change original data (pipe concept) Different operations on the same data do not influence No concurrency conflicts No deadlocks No race conditions MapReduce Basic idea and framework introduced by Google 2004: J. Dean and S. Gehmawat. MapReduce: Simplified Data Processing on Large Clusters. OSDI'04. 2004 http://labs.google.com/papers/mapreduce.html Big Data Technologies: NoSQL DBMS - SoSe 2015 29

MapReduce: Basic Idea & WordCount Example Doc1 Doc2 Doc3 Doc4 Developers should implement two primary methods Map: (key1, val1) [(key2, val2)] Reduce: (key2, [val2]) [(key3, val3)] Documents Sport, Handball, Soccer Soccer, FIFA Documents Sport, Gym, Money Soccer, FIFA, Money MAP MAP Key Sport 1 Handball 1 Soccer 1 Value Soccer 1 Key Value FIFA 1 Sport 1 Gym 1 Money 1 Soccer 1 FIFA 1 Money 1 REDUCE REDUCE Key Sport 2 Handball 1 Soccer 3 Key FIFA 2 Gym 1 Money 2 Value Value Big Data Technologies: NoSQL DBMS - SoSe 2015 30

MapReduce: Architecture and Phases Source: https://developers.google.com/appengine/docs/python/dataprocessing/overview Big Data Technologies: NoSQL DBMS - SoSe 2015 31

Hadoop Example Map & Reduce Functions (Example) public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { public void map(longwritable key, Text value, OutputCollector<Text, IntWritable> output, ) { String line = value.tostring(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasmoretokens()) { word.set(tokenizer.nexttoken()); output.collect(word, one); } } } public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, ) { int sum = 0; while (values.hasnext()) { sum += values.next().get(); } output.collect(key, new IntWritable(sum)); } } Source: http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html Big Data Technologies: NoSQL DBMS - SoSe 2015 32

MapReduce: Optional Combine Phase Decrease the shuffling cost Reduce the result size of map functions Perform reduce-like function in each machine Documents Sport, Handball, Soccer Soccer, FIFA Documents Sport, Gym, Money Soccer, FIFA, Money MAP MAP Key Sport 1 Handball 1 Soccer 1 Value Soccer 1 Key FIFA Value 1 Sport 1 Gym 1 Money 1 Soccer 1 FIFA 1 Money 1 COMBINE COMBINE Key Value Sport 1 Handball 1 Soccer 2 FIFA 1 Key Value Sport 1 Gym 1 Money 2 Soccer 1 FIFA 1 REDUCE REDUCE Big Data Technologies: NoSQL DBMS - SoSe 2015 33

MapReduce Frameworks MapReduce frameworks take care of Scaling Fault tolerance (Load balancing) MapReduce Frameworks Google (however, Google now promotes Dataflow) Apache Hadoop standalone or integrated in NoSQL (and SQL) DBMS Also commercial distributors: Cloudera, MapR, HortonWorks, Proprietary MapReduce framework integrated in NoSQL DBMS Big Data Technologies: NoSQL DBMS - SoSe 2015 34

Map Reduce and Query Languages MapReduce paradigm is too low-level Only two declarative primitives (map + reduce) Custom code for simple operations like projection and filtering Code is difficult to reuse and maintain Combination of high-level declarative querying and low-level programming with MapReduce Dataflow Programming Languages HiveQL Pig (Jaql) Big Data Technologies: NoSQL DBMS - SoSe 2015 35

Hadoop Stack Source: Saake/Schallehn:2011 Big Data Technologies: NoSQL DBMS - SoSe 2015 36

HiveQL Hive: data warehouse infrastructure built on top of Hadoop, providing: Data Summarization Ad hoc querying Simple query language: HiveQL (based on SQL) Extendable via custom mappers and reducers Developed by Facebook, now subproject of Hadoop http://hadoop.apache.org/hive/ Big Data Technologies: NoSQL DBMS - SoSe 2015 37

HiveQL: Example Source: Saake/Schallehn: Data Management in the Cloud, 2011 Big Data Technologies: NoSQL DBMS - SoSe 2015 38

Pig A platform for analyzing large data sets Pig consists of two parts: PigLatin: A Data Processing Language Pig Infrastructure: An Evaluator for PigLatin programs Pig compiles Pig Latin into physical plans Plans are to be executed over Hadoop Interface between the declarative style of SQL and low-level, procedural style of MapReduce http://hadoop.apache.org/pig/ Big Data Technologies: NoSQL DBMS - SoSe 2015 39

Pig: Example Source: Saake/Schallehn: Data Management in the Cloud, 2011 Big Data Technologies: NoSQL DBMS - SoSe 2015 40

MapReduce in Practice VLDB 2012: Chen, Alspaugh, Katz: Interactive Analytical Processing in Big Data Systems: A CrossIndustry Study of MapReduce Workloads: 7 Hadoop deployments (Cloudera = Hadoop with commercial services) Big Data Technologies: NoSQL DBMS - SoSe 2015 41

MapReduce in Practice (Cont.) Source: Chen, Alspaugh, Katz. Interactive Analytical Processing in Big Data Systems: A CrossIndustry Study of MapReduce Workloads; VLDB2012 Big Data Technologies: NoSQL DBMS - SoSe 2015 42

MapReduce Trends Hadoop 2.0 with YARN (Abstract from MapReduce) Source: HortonWorks Apache In-Memory Hadoop Performance! Written in Scala Big Data Technologies: NoSQL DBMS - SoSe 2015 43

Application Development for NoSQL MapReduce: Concept and Frameworks State of the art application development With relational database systems: Object-Relational Mapping (ORM) frameworks and standards (Java Persistence API etc.) Frameworks for Object-NoSQL mapping?! Big Data Technologies: NoSQL DBMS - SoSe 2015 44

Object-NoSQL Mapper: Architecture Applikation id tit SELECT b.titel, b.text FROM blogpost b WHERE b.id = 042 Objekt-NoSQL Mapper id tit db.blogposts.find ( { _id : 042 }, { titel: 1, text: 1 } ) SELECT titel, text FROM blogposts WHERE id = 042 ; get blogposts, 042, { COLUMN => [ blogpost_daten:titel, blogpost_daten:text ] } { } "id" : "042", "titel" :... id titel 042 id titel 042 MongoDB Cassandra HBase Big Data Technologies: NoSQL DBMS - SoSe 2015 45

Object-NoSQL Mapper: Market Overview Mapper for different Programming Languages Java,.NET, Python, Ruby Volatile Market Main Focus: Object-NoSQL Mapper for Java Standardization: Java Persistence API (JPA) with Java Persistence Query Language (JPQL) Categorization Multi Data Store Mapper Single Data Store Mapper Big Data Technologies: NoSQL DBMS - SoSe 2015 46

Java Multi Data Store Mapper Support for Document Store, Column Family, and Graph Database Systems in Java Multi Data Store Mapper Document Store Couchbase Data Nucleus Eclipse Link Hibernate OGM Kundera CouchDB PlayORM MongoDB Column-Family DBMS Cassandra HBase Graph DBMS Neo4J Spring Data Big Data Technologies: NoSQL DBMS - SoSe 2015 47

Java Multi Data Store Mapper Support for Key-Value Database Systems in Java Multi Data Store Mapper Key-Value DBMS AmazonDynamoDB Apache Solr Ehcache Data Nucleus Eclipse Link Hibernate Kundera PlayORM Elasticsearch GemFire Infinispan Oracle NoSQL Redis Spring Data Big Data Technologies: NoSQL DBMS - SoSe 2015 48

Java Object-NoSQL Mapper: Supported Functionality Single Data Store Mapper *Limited functionality (depending from the underlying NoSQL data store) Source: Störl/Hauf/Klettke/Scherzinger: Schemaless NoSQL Data Stores Object-NoSQL Mappers to the Rescue? BTW 2015, Hamburg, March 2015 Big Data Technologies: NoSQL DBMS - SoSe 2015 49

Object-NoSQL Mapper: Query Language Support Challenge: Different Query Language Interfaces Examples: Most systems do not support any JOINS Many systems do not offer aggregate functions, LIKE operator, or NOT operator, Approaches 1. Offer only the particular subset of features that is implemented by all supported NoSQL data stores, i.e. the intersection of features 2. Distinguish by data store and offer only the set of features implemented by a particular NoSQL data store 3. Offer the same set of features for all supported NoSQL data stores, possibly complementing missing features by implementing them inside the Object-NoSQL Mapper Big Data Technologies: NoSQL DBMS - SoSe 2015 50

Object-NoSQL Mapper: Query Language Support Approach 2: NoSQL data store specific support of JPQL operators Drawback: restricted portability Systems: Hibernate OGM, Kundera, EclipseLink Example: JPQL operators (selection) in Kundera https://github.com/impetus-opensource/kundera/wiki/jpql Big Data Technologies: NoSQL DBMS - SoSe 2015 51

Object-NoSQL Mapper: Query Language Support Approach 2: NoSQL data store specific support of JPQL operators Extension: Use third-party libraries to offer more functionality for some but not for all supported NoSQL data stores Systems: Hibernate OGM (Hibernate Search), Kundera each with Apache Lucene Application Object-NoSQL Mapper NoSQL-DBMS Search Engine Index Big Data Technologies: NoSQL DBMS - SoSe 2015 52

Object-NoSQL Mapper: Query Language Support Approach 3: Offer the same set of features for all supported NoSQL data stores Complementing missing features by implementing them inside the Object-NoSQL Mapper Benefit: Portability Drawback: Performance Application Systems: DataNucleus, Hibernate OGM (announced) Object-NoSQL Mapper NoSQL-DBMS Big Data Technologies: NoSQL DBMS - SoSe 2015 53

Object-NoSQL Mapper: Query Language Support Outlook: Combination of Approach 2 and 3 Application Object-NoSQL Mapper NoSQL-DBMS Search Engine Index Systems: Hibernate OGM (announced) Big Data Technologies: NoSQL DBMS - SoSe 2015 54

Conclusion: Java Object-NoSQL Mapper Vendor Independency / Portability Standardized Query Language (JPQL) Support for different NoSQL data stores Supported query operators often depend on the capabilities of the underlying NoSQL data stores Performance (as of end of 2014) In reading data, there is only a small gap between native access and the Object-NoSQL Mappers for the majority of the evaluated products Yet in writing, object mappers introduce a significant overhead Further reading: U. Störl, Th. Hauf, M. Klettke and S. Scherzinger: Schemaless NoSQL Data Stores Object-NoSQL Mappers to the Rescue? BTW 2015, Hamburg, March 2015 Big Data Technologies: NoSQL DBMS - SoSe 2015 55

Categorization NoSQL Data Model Storage Layout Query Models Solution Architectures NoSQL Database Systems Data Modeling id ti Application Development Scalability, Availability and Consistency Partitioning, Replication Consistency Models and Transactions Select the Right DBMS Performance and Benchmarks Polyglot Persistence Big Data Technologies: NoSQL DBMS - SoSe 2015 56