AllegroGraph. a graph database. Gary King gwking@franz.com

Similar documents
InfiniteGraph: The Distributed Graph Database

Understanding the Value of In-Memory in the IT Landscape

Big Data, Fast Data, Complex Data. Jans Aasman Franz Inc

Introduction to Polyglot Persistence. Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace

NoSQL and Graph Database

How To Handle Big Data With A Data Scientist

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013

Big Data Analytics. Rasoul Karimi

Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce

How to Choose Between Hadoop, NoSQL and RDBMS

The Sierra Clustered Database Engine, the technology at the heart of

Trafodion Operational SQL-on-Hadoop

Comparison of Triple Stores

THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS

Introduction to Ontologies

Hypertable Architecture Overview

Chapter 7. Using Hadoop Cluster and MapReduce

Adobe Insight, powered by Omniture

HadoopRDF : A Scalable RDF Data Analysis System

How To Improve Performance In A Database

MySQL for Beginners Ed 3

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

Oracle Big Data SQL Technical Update

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

12 File and Database Concepts 13 File and Database Concepts A many-to-many relationship means that one record in a particular record type can be relat

Lecture Data Warehouse Systems

ENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

How to Ingest Data into Google BigQuery using Talend for Big Data. A Technical Solution Paper from Saama Technologies, Inc.

Cassandra A Decentralized, Structured Storage System

PLATFORA INTERACTIVE, IN-MEMORY BUSINESS INTELLIGENCE FOR HADOOP

Crack Open Your Operational Database. Jamie Martin September 24th, 2013

A survey of big data architectures for handling massive data

The Game of Big Data! Analytics Infrastructure at KIXEYE

Columnstore in SQL Server 2016

Realtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens

Product Guide. Sawmill Analytics, Swindon SN4 9LZ UK tel:

Citrix XenServer Backups with SEP sesam

Investigating Hadoop for Large Spatiotemporal Processing Tasks

Bigdata : Enabling the Semantic Web at Web Scale

Big Data Technology Pig: Query Language atop Map-Reduce

In-Memory Data Management for Enterprise Applications

Data Integrator Performance Optimization Guide

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

Supercomputing and Big Data: Where are the Real Boundaries and Opportunities for Synergy?

Apache HBase. Crazy dances on the elephant back

Introduction to Databases

Data processing goes big

Big Data Course Highlights

In-Memory Databases MemSQL

Big Data and Big Data Modeling

Chapter 6 8/12/2015. Foundations of Business Intelligence: Databases and Information Management. Problem:

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

DATABASE MANAGEMENT SYSTEM

Big Data Technology CS , Technion, Spring 2013

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Hadoop Ecosystem B Y R A H I M A.

Hadoop and Map-Reduce. Swati Gore

The Internet of Things and Big Data: Intro

In Memory Accelerator for MongoDB

Lecture 10: HBase! Claudia Hauff (Web Information Systems)!

Big Data and Analytics by Seema Acharya and Subhashini Chellappan Copyright 2015, WILEY INDIA PVT. LTD. Introduction to Pig

Big Data JAMES WARREN. Principles and best practices of NATHAN MARZ MANNING. scalable real-time data systems. Shelter Island

Memory-Centric Database Acceleration

Graph Database Performance: An Oracle Perspective

Graph Mining on Big Data System. Presented by Hefu Chai, Rui Zhang, Jian Fang

BIG DATA TECHNOLOGY. Hadoop Ecosystem

SharePoint Server 2010 Capacity Management: Software Boundaries and Limits

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

IS IN-MEMORY COMPUTING MAKING THE MOVE TO PRIME TIME?

American International Journal of Research in Science, Technology, Engineering & Mathematics

Choosing The Right Big Data Tools For The Job A Polyglot Approach

Databases What the Specification Says

BIG DATA: STORAGE, ANALYSIS AND IMPACT GEDIMINAS ŽYLIUS

The Entity-Relationship Model

Data Management in the Cloud

DATABASE VIRTUALIZATION AND INSTANT CLONING WHITE PAPER

Big Data: Tools and Technologies in Big Data

CitusDB Architecture for Real-Time Big Data

CSE 344 Introduction to Data Management. Section 9: AWS, Hadoop, Pig Latin TA: Yi-Shu Wei

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Leveraging Big Data Technologies to Support Research in Unstructured Data Analytics

Apache Spark 11/10/15. Context. Reminder. Context. What is Spark? A GrowingStack

Big Data With Hadoop

Combining Unstructured, Fully Structured and Semi-Structured Information in Semantic Wikis

Introduction to Big Data Training

Understanding traffic flow

brief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 PART 2 PART 3 BIG DATA PATTERNS PART 4 BEYOND MAPREDUCE...385

Comparing SQL and NOSQL databases

<Insert Picture Here> Oracle and/or Hadoop And what you need to know

Practical Cassandra. Vitalii

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

Guideline for stresstest Page 1 of 6. Stress test

Transcription:

AllegroGraph a graph database Gary King gwking@franz.com

Overview What we store How we store it the possibilities Using AllegroGraph

Databases Put stuff in Get stuff out quickly safely

Stuff things with attributes and connections Reasoning, rules, inference Lots of things. Really. Lots. Lots of change. all the time.

What stuff in? Modeling knowledge of assets in an Enterprise Modeling an extensive river network Representing 1000 s of different types of objects Managing biological knowledge Multimedia Metadata Bug and version tracking Collaborative Workspace for Analyst

NASA Constellation project Deals with 1000s of different types of objects: Machine parts Processes Software People skills Drawings Documents In 100s of distributed databases Coordinated through registries To provide meaningful search

A River Network Given the polluted segment S1 find all the upstream segments within 50 miles of City1200 Given the polluted drainage D1 find all the schools in the rectangle <x1, y1, x2, y2> that might be influenced

Semantic Web...

What stuff out? Things like this Things like this only with that Things like this only with that and the other thing sorted by that Things like this linked to that linked to that linked to that and that and back to things like this Things like this where that can be inferred from this other stuff

In particular We want to ask for What Attributes Where geospatial Find the people I know that share my taste and have traveled to Hawaii during the last year? When events and temporal logic Whom Social networks

Data Dissected Documents (unstructured mostly) Key/value subject/predicate/object Tuples (by row, by column)

System of Analysis The main data is stored safely away somewhere else Batch & Bulk oriented loads Materialize types and other inferences Do queries & analysis Few simultaneous users

System of Record Data changes on a second to second basis You care about the long time persistence of the data You care about transactions and recoverability You care about concurrent access You care about continuous querying and instant reasoning

How? Relational Database Systems Object Oriented Databases Key-value Databases Graph databases (Triple Stores) Essentially equivalent; the devil is in the details.

RDBMS Tables Columns Indices Joins

RDBMS Mature and Standardized (SQL) Robust, safe, scalable But... Great for simple queries that touch only a few tables once Modeling the world in tables is hard Table schema is inflexible; early design lock-in One-to-many and many-to-many relationships add extra tables Lousy for queries that follow transitive relationships across many tables (or the same table many times)

Graph DBMS Subject Predicate Object

GDBMS Easy to put stuff in No Schema, everything indexed But... Young technology Less robust, less standardized

Our problem Continually accrue massive interconnected information with an evolving schema (or no schema) including text, events, relationships, locations Query this data using description logics, custom rule sets and ask for information on moving objects, events, and social networks, in real-time

In particular We want to ask for What Attributes Where geospatial Find another truck that can pick up package X at location Y so that I can pick up package A at location B so that we both will arrive at P before time T. When events and temporal logic Whom Social networks

RDBMS is not the answer A graph database looks like an relational database with only one table so start with an RDBMS and add triple-store features The relational model is too complex for triplestores The relational model is too simplistic for rapidly evolving schemas and massive transitive relations

Hadoop is not the answer Yes, it is a great way to store billions of triples Hadoop can be used for work that is batchoriented rather than real-time, very dataintensive, and parallelizable. But what about! Deeply nested SPARQL or rule based queries (e.g., Prolog) Graph & Social network analysis. Reasoning and inference

Building a triplestore Start fresh and add enterprise features adding triples (with five parts) Emphasis is on addition (not updates, not deletion)

Really Simple Diagram Triples In Triples processes to Transaction Log checkpointer Merge index merge in-memory Index Chunk Indexing agents Sorted Sorted Index Sorted Chunk Index Sorted Chunk Index Sorted Chunk Index Chunk Index Chunk text index text indexing agent string lookup agent process strings Queries

AllegroGraph 4.0 ACID Transactions and Recoverability page management checkpointing every x-minutes or y-triples Read/write concurrency 100 % read concurrency at all times Dynamic and automatic indexing with column based compression Resource management Use all disks, all memory and all processors (one box) Automatic, or user configurable

AllegroGraph 4.0 Per-predicate Lucene style text indexing 2D and 3D geo-temporal indexing for moving objects Social networking toolkit with path finding, importance measures, etc. REST protocol for all client interaction Franz supported: Sesame, Jena, Python, Community supported: Ruby, Perl, C#

2D and 3D details

AllegroGraph

Performance: Input $5000 quad-core machine with 32 Gigabytes RAM dataset Size (Billions) Time LUBM 8000 1.1 3:48 Billion Triples Challenge 1.15 5:13 2000 Census data 0.99 2:00 3.2 11:01 with full-text indexing on all strings

Thanks gwking@franz.com http://www.franz.com