The Synergy Between the Object Database, Graph Database, Cloud Computing and NoSQL Paradigms

Similar documents
Cloud Computing and Advanced Relationship Analytics

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Choosing The Right Big Data Tools For The Job A Polyglot Approach

Graph Database Proof of Concept Report

InfiniteGraph: The Distributed Graph Database

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

Cloud Computing at Google. Architecture

How graph databases started the multi-model revolution

MongoDB Developer and Administrator Certification Course Agenda

NoSQL Data Base Basics

Big Systems, Big Data

Distributed File Systems

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

Hadoop Architecture. Part 1

Preparing Your Data For Cloud

Hadoop Ecosystem B Y R A H I M A.

OBJECTIVITY/DB FLEXIBLE DEPLOYMENT

How to Choose Between Hadoop, NoSQL and RDBMS

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

A survey of big data architectures for handling massive data

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)

Comparing SQL and NOSQL databases

NoSQL and Hadoop Technologies On Oracle Cloud

Oracle Database 12c Plug In. Switch On. Get SMART.

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store

Achieving Real-Time Multi-INT Data Fusion: Using Objectivity/DB to Correlate Multiple Data Sources

Search and Real-Time Analytics on Big Data

Using an In-Memory Data Grid for Near Real-Time Data Analysis

Cloud Scale Distributed Data Storage. Jürmo Mehine

Katta & Hadoop. Katta - Distributed Lucene Index in Production. Stefan Groschupf Scale Unlimited, 101tec. sg{at}101tec.com

Using Object Database db4o as Storage Provider in Voldemort

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Chapter 7. Using Hadoop Cluster and MapReduce

An Oracle White Paper October Oracle: Big Data for the Enterprise

Integrating Big Data into the Computing Curricula

NoSQL - What we ve learned with mongodb. Paul Pedersen, Deputy CTO paul@10gen.com DAMA SF December 15, 2011

Object Oriented Database Management System for Decision Support System.

Big Data Analytics - Accelerated. stream-horizon.com

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013

Oracle Big Data Handbook

Hypertable Architecture Overview

Building the Internet of Things Jim Green - CTO, Data & Analytics Business Group, Cisco Systems

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

Introduction to Hadoop

INTRODUCING APACHE IGNITE An Apache Incubator Project

Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing. October 29th, 2015

Client Overview. Engagement Situation. Key Requirements

Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk

Вовченко Алексей, к.т.н., с.н.с. ВМК МГУ ИПИ РАН

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Domain driven design, NoSQL and multi-model databases

From Spark to Ignition:

Practical Cassandra. Vitalii

Fast Innovation requires Fast IT

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Increase Agility and Reduce Costs with a Logical Data Warehouse. February 2014

Objectivity positions graph database as relational complement to InfiniteGraph 3.0

Big Data With Hadoop

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

INTRODUCTION TO CASSANDRA

Hadoop and Map-Reduce. Swati Gore

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007

CASE STUDY: Oracle TimesTen In-Memory Database and Shared Disk HA Implementation at Instance level. -ORACLE TIMESTEN 11gR1

TORNADO Solution for Telecom Vertical

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island

How To Use Big Data For Telco (For A Telco)

Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco

Big Data JAMES WARREN. Principles and best practices of NATHAN MARZ MANNING. scalable real-time data systems. Shelter Island

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Big Data Use Case. How Rackspace is using Private Cloud for Big Data. Bryan Thompson. May 8th, 2013

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

I/O Considerations in Big Data Analytics

How To Store Data On An Ocora Nosql Database On A Flash Memory Device On A Microsoft Flash Memory 2 (Iomemory)

Using Oracle NoSQL Database

NoSQL Databases. Nikos Parlavantzas

Ibrahim Sallam Director of Development

High Availability Databases based on Oracle 10g RAC on Linux

HADOOP MOCK TEST HADOOP MOCK TEST I

Hadoop IST 734 SS CHUNG

Introduction to Big Data Training

Big Data Solutions. Portal Development with MongoDB and Liferay. Solutions

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) /21/2013

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

An Oracle White Paper June Oracle: Big Data for the Enterprise

Big Data: Beyond the Hype

Accelerating and Simplifying Apache

Transcription:

ICOODB 2010 - Frankfurt, Deutschland The Synergy Between the Object Database, Graph Database, Cloud Computing and NoSQL Paradigms Leon Guzenda - Objectivity, Inc. 1

AGENDA Historical Overview Inherent Advantages of ODBMSs Technology Evolution Leveraging Technologies Graph Databases Summary 2

Historical Overview 3

The ODBMS Players 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2009 2010 Matisse VBase (Ontologic) Ontos GemStone (Servio Logic Servio GemStone) Objectivity/DB InfiniteGraph ObjectStore (Object Design Excelon Progress) GBase (Graphael) Versant Object Database (Versant) O2 ( Ardent Ascential) UniSQL Poet FastObjects Note that many of these companies were founded earlier, e.g. Objectivity, Inc. was founded in June 1988. Berkeley DB (Sleepycat Oracle) db4objects 4

ODBMS Evolution 1980s Performance, Performance, Performance! Primarily scientific and engineering applications 1990s Reliability and Scalability New languages and Operating Systems Large deployments in the scientific domain 2000s Ease of use and instrumentation Query languages Performance and scalability Grids and Clouds Embedded systems, government and more... DATA MANIPULATION Applications tended to generate data and relationships RELATIONSHIP ANALYTICS Applications ingest and correlate data and relationships 5

Inherent ODBMS Advantages 6

Faster Navigation PROBLEM: Find all of the Suspects linked to a chosen Incident Relational solution: N * 2 B-Tree lookups N * 2 logical reads B-Trees Incident_Table Join_Table Suspect_Table Objectivity/DB solution: 1 B-Tree lookup 1 + N logical reads Incident Suspects Significantly faster navigation of relationships 7

Relational Lower Query Latency O/R Mapping Send Request Receive & Interpret Request Optimize Access Indices Qualify Read Data Create View Return Result Loop Objectivity/DB Initialize Iterator Start Loop Access Indices Qualify Open Object Loop Qualified objects are returned as soon as they are found. Time 8

Technology Evolution 9

Grids and Clouds 1996 - CERN started looking for a DBMS for the LHC. RD45 team verified that a distributed ODBMS could handle complexity, performance and Petabyte+ scale. Lead to grid deployments at SLAC and Brookhaven Meanwhile, developers migrated from CORBA to SOA. Then, our grid and SOA experience made the migration to cloud environments very easy. 10

The NoSQL Movement Some web application developers found RDBMSs too restrictive. They were dealing with: Huge parallel ingest streams Applications that scan or navigate rather than query Unconventional transaction models. So... they re-invented the wheel! Sharding (Hadoop and Big Table) Key-Value tables (Big Table) Dynamo... 11

Sharding Splits large tables into groups of related rows and puts them onto separate servers. They each have a schema. Schema 1 Client Split/Combine Server Africa & Antarctica Product Catalog Big tables can be split Small hot tables may be replicated The client has to figure out which server to use. Data has to be compatible with the client hardware, OS and language. Server Server Server Schema 1 Americas Product Catalog Schema 1 Australasia and Asia Product Catalog ETC. 12

Hadoop s HDFS Hadoop Distributed File System (HDFS) is the primary storage system used by Hadoop applications. HDFS creates multiple replicas of 64+ Megabyte data blocks and distributes them on compute nodes throughout a cluster to enable reliable, extremely rapid computations. Each data block is replicated 3 times - twice on the same rack and once on another rack. Client HDFS client Name Node Directory Data Node Data block 1a Data block 1b The client needs to know the name of the block to be operated on. The block is copied to the client for processing. Data Node Data block 1c Data block 2 Data block 3 Data block 4 13

Leveraging Technologies 14

Objectivity/DB Federated object-oriented database platform Single Logical View across distributed persistent objects Eliminates the OO language to database mapping layer Ultra-fast object navigation Customizable, distributed query engine Dynamic schema changes and low administration overheads Interoperability across multiple languages and platforms High Availability Grid and cloud environment enabled 15

Distributed Architecture Client Application Simple, Distributed Servers Lock Server Lock Server Smart Cache Lock Server Query Server Objectivity/DB Lock Server Data Server Enhances scalability and availability 16

Parallel Query Engine Application Lock Server Lock Server Smart Cache Lock Server Query Server Filter/ Gateway Objectivity/DB PQE Task Splitter Lock Server Data Server The Task Splitter aims queries at specific databases and containers Filters can run complex qualification methods. Gateways can access other databases or search engines. Replaceable components for smarter optimization 17

Objectivity/DB Advantages Fully Distributed with Client-Side Smart Caching Highly efficient storage and navigation of relationships Flexible Clustering Scalable Collections Customizable Parallel Query Engine Quorum-based Replication and High Availability Flexible, Multi-mode Indexing Fully Interoperable Across Platforms and Languages 18

Objectivity/DB 10.1 User-replaceable Parallel Query Engine search agents Page level and partial backups Eclipse RCP Visual Studio 2010 support Mac OS X support 19

Graph Databases 20

The Link Hunter 21

Graph Databases Nodes are represented as Vertices Relationships are represented as Edges Vertex Edges may be weighted Both are regular objects Properties Methods Inheritance Edge 22

InfiniteGraph... Dedicated Graph API Easy to use and deploy Java now, C# soon... Built on Objectivity/DB Fully interoperable 23

...InfiniteGraph... Create and update graph structures Traversal with constraints: Return only designated Edge types Return if not an excluded Edge type Direction matches intent Properties do not match a provided predicate Maximum path depth has been reached... Optional Event notification Target vertex was found Path is being abandoned 24

...InfiniteGraph Search and Query Processing Supports Objectivity/DB and Lucene indexing Key and range queries Full text searching Regular expression search of string-based keys Path Finding Start and end vertices Maximum depth Vertex type inclusion and exclusion lists Edge type list 25

InfiniteGraph Licensing 60-day free trial Free GoGrid cloud development and deployment environment for qualified startups and non-profits Licenses must be procured after a pre-agreed annual revenue level is reached Open Source framework licenses Standard commercial licenses 26

Case Studies 27

Objectivity/DB & Hadoop... HDFS Weaknesses HDFS cannot be directly mounted by an existing operating system Getting data into and out of the HDFS file system, an action that often needs to be performed before and after executing a job, can be inconvenient. A filesystem in userspace has been developed to address this problem, at least for Linux and some other Unix systems. It moves 64+ MB blocks and doesnʼt support POSIX file operations Replicating data three times is costly. However, there is a version that uses a parity block, decreasing the physical storage requirements from 3x to around 2.2x. The Name Node is a single point of failure If the name node goes down, the filesystem is offline. When it comes back up the name node must replay all outstanding operations. This replay process can take over half an hour for a big cluster. 28

...Objectivity/DB & Hadoop... 64+ MB Container file/blocks Client Objectivity/DB PAGE SERVER Security OOFS + (HDFS client) HDFS AMS PAGE SERVER Cache Problem: HDFS works best when transferring 64+ MB blocks of data. Objectivity/DB works best with 16-64KB blocks. Solution: Implement a memory cache for the HDFS blocks. 29

...Objectivity/DB & Hadoop Name Node FD Client HDFS client Objectivity/DB Name Node FD Data Node Data block 1a Data block 1b Removes the single point of failure by using an Objectivity/DB Name Node Federation (replicated). Could provide other services too, such as content tagging and connectivity. Data Node Data block 1c Data block 2 Data block 3 Data block 4 30

InfiniteGraph and Cassandra Apache Cassandra provides a structured key-value store with eventual consistency. It is a resilient, distributed DBMS. The prototype uses social network data. It extracts data from Cassandra, then finds the shortest paths between people. Application InfiniteGraph Cassandra 31

Summary 32

NoSQL? All of these features could have been obtained from Commercial Off The Shelf ODBMSs: Unique object/document IDs Sharding Shared-Nothing Fully distributed No lock and novel transaction modes Iterators and fast, predictable traversals Fast scans Optimization for random access In Memory Database configurability Flexible object clustering Effectivity (data in a relationship) Geospatial and multi-dimensional indexing Hash table (key-value) lookups Hyperspace = = single logical view of a federation Multi-way replication High Availability Text searching 33

Summary We need to re-examine the reasons that the NoSQL movement didn t just use ODBMSs. The NoSQL movement helps strengthen our argument that RDBMSs aren t always the best choice. Graph DBMSs can supplement RDBMS, NoSQL and ODBMS technologies. The best Graph DBMSs are built on ODBMSs. ODBMSs are here to stay. 34

Questions? objectivity.com - White papers, downloads etc. EMAIL: info @ objectivity.com infinitegraph.com - WPs, downloads etc. EMAIL: info @ infinitegraph.com Presenter: leon @ objectivity.com If they give you paper with lines on, write sideways! 35