NoSQL and Graph Database



Similar documents
Big Data Analytics. Rasoul Karimi

BIG DATA: STORAGE, ANALYSIS AND IMPACT GEDIMINAS ŽYLIUS

Cloud Scale Distributed Data Storage. Jürmo Mehine

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Lecture Data Warehouse Systems

GRAPH DATABASE SYSTEMS. h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe

NoSQL Databases. Nikos Parlavantzas

How graph databases started the multi-model revolution

The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL

Introduction to NOSQL

NoSQL in der Cloud Why? Andreas Hartmann

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

Journal of Cloud Computing: Advances, Systems and Applications


Heterogeneous databases mediation

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

Domain driven design, NoSQL and multi-model databases

INTRODUCTION TO CASSANDRA

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

Preparing Your Data For Cloud

EFFECTIVE APPROACHES FOR PROCESSING OF NOSQL DATABASES IN BIG DATA ENVIRONMENT

A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA

A Comparison of Current Graph Database Models

Databases 2 (VU) ( )

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

Structured Data Storage

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Introduction to Polyglot Persistence. Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace

MEAP Edition Manning Early Access Program Neo4j in Action MEAP version 3

Making Sense of NoSQL Dan McCreary Wednesday, Nov. 13 th 2014

AllegroGraph. a graph database. Gary King gwking@franz.com

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island

NoSQL Databases. Polyglot Persistence

Advanced Data Management Technologies

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

An Open Source NoSQL solution for Internet Access Logs Analysis

NOSQL DATABASES AND CASSANDRA

Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing. October 29th, 2015

NoSQL Database Options

NoSQL Evaluation. A Use Case Oriented Survey

bigdata Managing Scale in Ontological Systems

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)

The NoSQL Generation: Embracing the Document Model. May 2014

Slave. Master. Research Scholar, Bharathiar University

Enterprise Operational SQL on Hadoop Trafodion Overview

Challenges for Data Driven Systems

Peninsula Strategy. Creating Strategy and Implementing Change

A survey of big data architectures for handling massive data

NoSQL Database Systems and their Security Challenges

The Quest for Extreme Scalability

Big Data Management in the Clouds. Alexandru Costan IRISA / INSA Rennes (KerData team)

NoSQL Systems for Big Data Management

InfiniteGraph: The Distributed Graph Database

Integrating Big Data into the Computing Curricula

So What s the Big Deal?

! E6893 Big Data Analytics Lecture 9:! Linked Big Data Graph Computing (I)

NoSQL. What Is NoSQL? Why NoSQL?

NOSQL stores, graph databases and data analytics tools

multiparadigm programming Multiparadigm Data Storage for Enterprise Applications

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

these three NoSQL databases because I wanted to see a the two different sides of the CAP

Big Data Technologies. Prof. Dr. Uta Störl Hochschule Darmstadt Fachbereich Informatik Sommersemester 2015

Big Data, Fast Data, Complex Data. Jans Aasman Franz Inc

Applications for Big Data Analytics

Cassandra vs MySQL. SQL vs NoSQL database comparison

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

Schema Extraction of Document Database - MongoDB. Master of Engineering. Computer Science and Engineering. Submitted By.

Can the Elephants Handle the NoSQL Onslaught?

BIG DATA TOOLS. Top 10 open source technologies for Big Data

Big Data Solutions. Portal Development with MongoDB and Liferay. Solutions

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

nosql and Non Relational Databases

NewSQL: Towards Next-Generation Scalable RDBMS for Online Transaction Processing (OLTP) for Big Data Management

Making Sense of NoSQL Dan McCreary Ann Kelly

Blockchain, Throughput, and Big Data Trent McConaghy

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料

How To Improve Performance In A Database

Choosing The Right Big Data Tools For The Job A Polyglot Approach

NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases

INTERNATIONAL JOURNAL of RESEARCH GRANTHAALAYAH A knowledge Repository

NOSQL DATABASE SYSTEMS

Performance Evaluation of NoSQL Systems Using YCSB in a resource Austere Environment

Configuration and Deployment Guide for the Cassandra NoSQL Data Store on Intel Architecture

NoSQL and Hadoop Technologies On Oracle Cloud

The Current State of Graph Databases

Big Systems, Big Data

Network Graph Databases, RDF, SPARQL, and SNA

Overview on Graph Datastores and Graph Computing Systems. -- Litao Deng (Cloud Computing Group)

Data sharing in the Big Data era

Benchmarking and Analysis of NoSQL Technologies

Transcription:

NoSQL and Graph Database Biswanath Dutta DRTC, Indian Statistical Institute 8th Mile Mysore Road R. V. College Post Bangalore 560059 International Conference on Big Data, Bangalore, 9-20 March 2015

Outlines NoSQL Problem NoSQL properties, types Motivation Various solutions and why Graph Database Graph and graph database Graph analytics Various graph databases Graph and RDF triple RDF Graph database and RDF triple store Conclusion

Introduction Big Data Immense processing and storage requirement Varieties of applications

RDBMS Problem Slow reading and writing with the data size increases, the database prone to deadlocks Limited capacity Existing SQL solutions do not scale big enough. Expansion is difficult Database technology is becoming increasingly important

Various facets of demands High concurrency of reading and writing with low latency. Efficient big data storage and access requirements. High scalability and availability. Lower management and operational costs. Source: [4]

NoSQL Not only SQL A non-relational database system tends to be inherently distributed, schema-less, and horizontally scalable (sharding). A means of storage and retrieval of data other than the tabular relations used in relational databases. A very wide category for a group of persistence solutions which don't follow the relational data model, and who do not use SQL as the query language.

NoSQL Properties Scale horizontally Simple and flexible non-relational data models (schema less) Data replication and distribution over multiple machines for coping with failure and achieving eventual consistency. Simple interfaces for searching the data and calling procedures. Most NoSQL stores lack ACID (Atomicity, Consistency, Isolation, Durability) transactions. Although a few recent systems, such as FairCom c-treeace, Google Spanner, FoundationDB and OrientDB have made them central to their designs. NoSQL sometimes referred as BASE system (Basically Available, Soft-state, Eventually consistent) High availability: many NoSQL stores compromise Consistency (in terms of CAP (Consistency, Availability and Partition tolerance) theorem)

The key motivations for NoSQL Motivations for this approach include simplicity of design, horizontal scaling, and finer control over availability.

Types of NoSQL Databases (A classification based on data model) 1. Column: Accumulo, Cassandra, Druid, HBase, Vertica 2. Document: Clusterpoint, Apache CouchDB, Couchbase, MarkLogic, MongoDB, OrientDB 3. Key-value: Dynamo, FoundationDB, MemcacheDB, Redis, Riak, FairCom c-treeace, Aerospike, OrientDB 4. Graph: Allegro, Neo4J, InfiniteGraph, OrientDB, Virtuoso, Stardog 5. Multi-model: OrientDB, FoundationDB, ArangoDB, Alchemy Database, CortexDB

Why so many NoSQL Solutions? One size fits all solutions cannot be provided Because of varieties requirements. of data and their processing E.g., blogging data, transportation data, social relations, road map. Blogging data requires document type NoSQL solution. Transportation, social relations, road map require graph solution. The particular suitability of a given NoSQL database depends on the problem it must solve.

NoSQL Barriers Barriers to the greater adoption of NoSQL stores include Lack of a standard querying language (such as SQL)/ use of lowlevel query languages Lack of ACID transactions [7] Lack of standardized interfaces Huge investments already made in SQL by enterprises

Graph Databases

What is Graph? An abstract representation of a set of objects where some pairs of objects are connected by links. The interconnected objects are represented by abstractions called vertices or nodes. The links that connect some pairs of vertices are called edges. Vertex/ node Edge/ arc

Types of Graphs Directed graph: Undirected graph: Mixed graph: Multi graph: Hyper graph:

Types of Graphs (contd 2) 0.5 Weighted graph: Labeled graph: knows John Mary type: knows Property graph: name: John age: 32 name: Peter age: 30

What is a Graph Database? A database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. Key characteristic: Provides index-free adjacency. (an index is unnecessary as each node knows the location of its adjacent nodes)

Why Graph Databases? Graph databases are designed to: Store interconnected data. E.g., the relationships between people in social networks, between people and artifacts, between items and attributes in recommendation engines. Make easy to make sense of the data. Make it easy to evolve the database. Enable optimal performance operations: Discovery of connected data patterns; Relatedness queries of arbitrary length. Ironically relational databases do not store relations.

Why people use Graph Databases? Problems with join performance. Continuously evolving data set. Naturally the shape of the domain is a graph. Graph is everywhere. E.g., Social networks, biological network, interstate highway system, hyperlink structure of the Web,

Graph Analytics Sushi restaurants at Trento that my friends like most likes Restaurant: Japoni located_in John serves IsFriedOf Cuisine: Sushi Bob IsFriedOf City: Trento serves Mary located_in likes Restaurant: ishushi

Graph Analytics (contd 2) Transportation network Return the shortest or cheapest flight/road from one city to another Social network Determine whether there is a path less than 4 steps which connects two users in a social network Find the movies acted by actor X people link most Financial network Fraud detection, money laundering Find the path connecting two suspicious transactions Temporal network Compute the number of computers who were affected by a particular computer virus in three days, thirty days since its discovery Recommendation Network impact analysis Information/document usage pattern Source: [5]

Graph based NoSQL Solutions Graph database: Neo4j - Open Source, Java, Property Graph model Sones - Closed Source,.NET focused HyergraphDB - Open Source, Java, HyperGraph model FlockDB Open source, Java RDF database (triple store): AllegroGraph - Closed Source, RDF-QuadStore Virtuoso - Closed Source, RDF focused 4store RDF based

Neo4j Made up of nodes, relationships and properties Nodes contain properties in the form of keyvalue pairs Relationship connect and structure node consists of relationship, a label, a start node, end node Relationships also has properties like nodes Source: http://neo4j.com/product/

Neo4j (contd 2) Properties: One of the most popular graph databases It is based on property graph. Open source (enterprise edition licensed under AGPL) ACID compliant Java based but has bindings for other languages, e.g., Ruby and Python. Highly scalable, up to several billion nodes and relationships. Flexible schema. Query language: Cypher

FlockDB A distributed and fault tolerant graph database FlockDB was created by Twitter Licensed under the Apache License, Version 2.0 Useful for large and shallow graphs Properties: A high rate of add/update/remove operations Potientially complex set arithmetic queries Paging through query result sets containing millions of entries Ability to "archive" and later restore archived edges Horizontal scaling including replication Online data migration Source: https://github.com/twitter/flockdb

FlockDB (contd 2) Twitter uses FlockDB to store social graphs (who follows whom, who blocks whom). The major difference of FlockDB with other graph databases like Neo4j is graph traverlsal. Twitter's model has no need for traversing the social graph. Twitter is only concerned about the direct edges (relationships) on a given node (account). For example, Twitter doesn't want to know who follows a person you follow. Instead, it is only interested in the people you follow. By trimming off graph traversal functions, FlockDB is able to allocate resources elsewhere. (Source: http://readwrite.com/2011/04/20/5-graphdatabases-to-consider)

AllegroGraph A graph database built around the W3C specification for the Resource Description Framework. A proprietary product of Franz Inc. 100% ACID, supporting Transactions: Commit, Rollback, and Check pointing. 100% Read Concurrency, near Full Write Concurrency Dynamic and Automatic Indexing All committed triples are always indexed Advanced Text Indexing Text indexing per predicate SOLR and MongoDB Integration supports SPARQL, RDFS++, and Prolog reasoning from numerous client applications Source: http://franz.com/agraph/allegrograph/

AllegroGraph (contd 2) The company claims Pfizer, Ford, Kodak, NASA and the Department of Defence among its AllegroGraph customers. Source: http://franz.com/agraph/success/

Graph Database and RDF Triple Store

RDF RDF is a decentralized directed labeled graph wherein the arcs start with subject URIs, are labeled with predicate URIs, and end up pointing to object URIs or scalar values.

Graph database and RDF Triple store (similarities) Both graph database and RDF triple store are designed to store linked data. Graph databases and RDF triple stores focus on the relationships between the data. A web of nodes and edges can be put together into interesting visualizations a defining characteristic of graph databases.

Why RDF based graph solution? Simple and uniform data model Powerful standard query language Standardized NoSQL solution (built upon W3C Linked Data technology) No vendor or product lock-in (ensure portability and tool chain interoperability) Standardized data interchange (import/export) formats Inferences on data e.g., :human rdfs:subclassof :mammal. and :man rdfs:subclassof :human. The RDF database can infer a new triple: :man rdfs:subclassof mammal.

Why RDF based graph solution? (contd 2) Future proof It is hardly evident that except RDF based solution, any other would be available down 20-30 years. RDF based solution is future proof in a sense that it is based on a very basic technology that is URI. Support for globally-addressable row identifiers and property names Data modeling standards and tooling for creating and publishing schemas Metastandards for being able to declaratively specify that one piece of information entails another, Inference engines that implement such data transformation rules.

Conclusion NoSQL is inescapable. NoSQL is not an analytical tool, but can play an indispensable role in analytics. There are reasons to use RDF based graph databases.

References 1. Why SPARQL and RDF for graph analytics. http://sparqlcity.com/why-sparql 2. Use cases. http://sparqlcity.com/use-cases 3. *Graph Databases, NOSQL and Neo4j. http://www.infoq.com/articles/graph-nosqlneo4j 4. Jing Han, Haihong E, Guan Le and Jian Du (2011). Survey on NoSQL databases. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6106531 5. Max De Marzi. Graph Databases Use Cases. http://www.slideshare.net/maxdemarzi/graph-database-use-cases 6. RDF meets NoSQL (2010). http://decentralyze.com/2010/03/09/rdf-meets-nosql/ 7. Katarina Grolinger, Wilson A Higashino, Abhinav Tiwari and Miriam AM Capretz (2013). Journal of Cloud Computing: Advances, Systems and Applications 2013, 2:22 http://www.journalofcloudcomputing.com/content/2/1/22

Thank you very much for kind attention!! Question??