Big Graph Data Management
|
|
|
- Melinda Miller
- 10 years ago
- Views:
Transcription
1 y d-b hel topic Antonio Maccioni Big Data Course locatedin re whe 14 May 2015 is-a when affiliated Big Graph Data Management Rome
2 this talk is about Graph Databases: models, languages, use cases Graph Database Management Systems Graph Processing Systems (intro) Open Problems in Graph Data Management...and research project around us
3 scenario is working on PGX (Parallel Graph Analytics), has launched Green-Marl and has implemented an RDF layer on top of Oracle NoSQL developing a layer, VERTEXICA, for graph mining on top of the analytical database is going towards graph search capabilities (Tao, Unicorn, Open Graph protocol, etc.) has been working on the Knowledge Graph for a while, has created Pregel and has launched Cayley launched the language interface SQL-GR to run graph analysis on top of its analytical database has open-sourced its graph database FlockDB has working on Trinity, an in-memory graph database has just aquired Aurelius TitanDB
4 scenario Over 25 percent of enterprises will use graph databases by Enterprise DBMS, Q Forrester Research graph databases are catching on commercially - Michael Stonebraker (2014 ACM Turing Award)
5 nosql scenario Simple data models Graph Databases are an odd fish in the NoSQL pond - P.J. Sadalage, M. Fowler - NoSQL Distilled
6 nosql scenario Simple data models But if we want to represent connections we may opt for a graph database management systems:
7 a graph Adjacency lists Adjacency matrix from\to [1]->2->4->5 [2]->3 [3]->5 [4]->5->6 [5]->6
8 graph + data = graph database admin belongs belongs follows likes works married admin belongs likes friends friends worked belongs
9 why graph databases? More natural modeling Manage connections explicitly Run algorithms of network science (e.g., PageRank)
10 use case 1: semantic web A Web-scale architecture Web of Data for metadata and data management (together) for interoperarbility of data and services Compatible with other Web technologies Based a set of W3C standards (HTTP, IRI/URI,RDF, SPARQL, OWL) Web 3.0 make information understandable by machines
11 use case 1: semantic web HTTP request of data by URI You can follow links (the edges of the graph)
12 use case 1: semantic web Semantic Web + Open Data = Linked Open Data
13 use case 1: semantic web
14 sparql SELECT?name {?person1 married?person2.?person2 born?city.?city name Honolulu.?person1 name?name1. }?person1?person2 married name Honolulu name name?city?name1 Which are the names of the people married with people born in Honolulu?
15 sparql?person1 SELECT?name {?person1 married?person2.?person2 born?city.?city name Honolulu.?person1 name?name1. } Barack Obama name married name Honolulu name Honolulu Pattern matching style of querying name?city?name1 p2 married born name?person2 p1 name Michelle Obama born c1 c2 locatedin name Chicago locatedin n1 name USA
16 sparql?person1 SELECT?name {?person1 married?person2.?person2 born?city.?city name Honolulu.?person1 name?name1. } Barack Obama name married name Honolulu name Honolulu Pattern matching style of querying name?city?name1 p2 married p1 name Michelle Obama born born name?person2 c1 c2 locatedin name Chicago locatedin n1 name USA
17 triple stores G: s p o p1 married p2 p1 name p2 born Michelle Obama c2 c2 name Honolulu us name USA Indexes on: (s,p,o), (s,o,p), (p,s,o), (p,o,s), (o,s,p), (o,p,s)
18 query processing?person1 name married Honolulu?person2 name name?city?name?person1?person1 name?person2 married?name name?person2?person1?city?person2?city Honolulu?city?city?person2 G(P=name)?person1 G(P=name) G(P=married) G(P=name and O=Honolulu)
19 use case 2: social networks admin belongs belongs follows likes works married admin belongs likes friends friends worked belongs
20 use case 2: social networks Find common friends for every single profile visit FRIEND 1 FROM_DAY FRIEND 2 ~ 1.2 billions tuples * average number of friends per person
21 use case 2: social networks Find common friends for every single profile visit FRIEND 1 FROM_DAY FRIEND 2 ~ 1.2 billions tuples * average number of friends per person
22 graph database management systems Three main properties: 1. Property Graph (as data model) 2. Index-free Adjacency (as physical level organization) 3. Path-traversal (as query language)
23 property graph data model A property graph is a directed multigraph g= (N, E) where every node n N and every edge e E is associated with a set of pairs <key, value>, called properties. It's a schema-less data model
24 index-free adjacency We say that a (graph) database g satisfies the index-free adjacency if the existence of an edge between two nodes n1 and n2 in g can be tested on those nodes and does not require to access an external, global, index. GOAL: make the cost of a basic traversal independent of the size of the database, in case keeping O(1)
25 index-free adjacency...trying to keep the cost of a basic traversal O(1)
26 index-free adjacency...trying to keep the cost of a basic traversal O(1)
27 neo4j physical layer Store files for different parts of the graph Node store Relationship store Property store Each record contains 4 properties The properties of an element may use more records Property's values can be either stored in the property store or stored in a dynamic string store I. Robinson, J. Webber, E. Eifrem Graph Databases, 2013.
28 titan, infinitegraph, levelgraph Built above the extensible column store Apache Cassandra Built above the Object Oriented Database Objectivity/DB Built on node.js above the key-value store LevelDB but pluggable to different stores LevelGraph
29 building a neo4j graph db Server Mode 1. Go to download Neo4J Server and unzip it 2. Run the command./bin/neo4j start (use bin\neo4j.bat on Windows) 3. Find a graphical dashboard at 4. You can also use it with REST API:
30 building a neo4j graph db Embedded mode 1. Import in your java project the library neo4j-kernel-*-*-*.jar and its classes: import org.neo4j.graphdb.*; import org.neo4j.graphdb.factory.graphdatabasefactory; 2. Create the database: GraphDatabaseService gdb = new GraphDatabaseFactory(). newembeddeddatabase("/home/..."); 3. Create nodes and edges: Enum implementing RelationshipType Node n1 = gdb.createnode(); Node n2 = gdb.createnode(); Relationship e12 = n1.createrelationshipto(n2, EdgeType.TYPE); 4. Set the properties: n1.setproperty( name, Rome ); n2.setproperty( name, Italy ); e12.setproperty( type, locatedin );
31 tinkerpop stack BLUEPRINTS Blueprints is a property graph model interface with provided implementations. GREMLIN Gremlin is a domain specific language for traversing property graphs FRAMES Frames exposes the elements of a Blueprints graph as Java objects: software is written in terms of domain objects and their relationships to each other. FURNACE Furnace is a property graph algorithms package REXTER PIPES
32 building a graph db with blueprints 1. Import in your java project the libraries blueprints-core-*.*.*.jar and blueprints-neo4j-graph-*.*.*.jar with their classes: import com.tinkerpop.blueprints.*; import com.tinkerpop.blueprints.impls.neo4j.neo4jgraph; 2. Create the database: Graph gdb = new Neo4jGraph("/home/..."); 3. Create nodes and edges: Vertex n1 = gdb.addvertex(null); Vertex n2 = gdb.addvertex(null); Edge e12 = gdb.addedge(null, n1, n2, locatedin ); 4. Set the properties: n1.setproperty( name, Rome ); n2.setproperty( name, Italy ); e12.setproperty( type, locatedin );
33 querying a graph database Gremlin: Imperative query language Descendant of languages such as XPATH Cypher: Declarative query language Descendant of languages such as SQL
34 gremlin: a path traversal QL gremlin> g = new Neo4jGraph("/home/..."); gremlin> g.v.oute.filter{it.edgeid == 'e1'} ==>e[2][1 EDGE >4] ==>e[1][1 EDGE >3] ==>e[0][1 EDGE >2] ==>e[7][6 EDGE >2]
35 gremlin: a path traversal QL gremlin> g.v.oute.filter{it.edgeid == 'e1'}.inv.oute. filter{it.edgeid == 'e2'}.inv.nodeid ==>F ==>E ==>E
36 cypher: a pattern matching QL START: starting point in the graph MATCH: the pattern to match, bound to the starting point WHERE: filtering criteria RETURN: what to return
37 cypher: a pattern matching QL MATCH: the pattern to match, bound to the starting point node1 edge1 >node2 edge2 >node3 node1 [?] >node2 [?] >node3 node1 [*] >node3 Live hands-on a graph database about beers at
38 cypher: a pattern matching QL START n = node(*) MATCH n [r1:edge] >x [r2:edge] >m WHERE (r1.edgeid = 'e1') and (r2.edgeid = 'e2') RETURN m.nodeid ==> F ==> E ==> E
39 other features Secondary Indexes: defined on properties Transactions: graph databases usually support ACID properties. In Neo4J all operations have to be performed in a transaction: try ( Transaction tx = gdb.begintx() ) { tx.success(); } Other programming language wrappers:
40 graph processing systems Frameworks to compute (distributed) graph analysis on large graphs: have similar motivations of Hadoop, Spark, etc. help programmers to focus on the algorithm rather than on the implementation support different types of graphs provide a variety of algorithms already implemented Pregel/Giraph GraphLab/Dato Pegasus
41 google pregel/apache giraph User specifies a vertex program Computation runs a sequence of supersteps In each superstep the program is executed over all the vertexes The program can use messages received in a previous superstep from the neighbors and can send messages to them for the next superstep A vertex can deactivate itself and the computation halts when all the vertexes are deactivated.
42 research problems about graph DBs How to model a graph database? How to migrate data and queries from existing databases? How to scale queries over large graphs?
43 modeling graph databases Compact: Sparse: Dense: Reduces the number of data accesses Accesses and updates can be inefficient Reduces number of joins Can violate property graph constraints Needs human intervention for a semantic enrichment
44 modeling graph databases Orienting the ER: ENTITY 1 ENTITY 1 (0:1) RELATIONSHIP (0:1) ENTITY 2 ENTITY 1 ENTITY 1 (0:N) RELATIONSHIP ENTITY 2 ENTITY 1 ENTITY 1 (0:N) RELATIONSHIP : 2 (0:N) ENTITY 2 RELATIONSHIP : 0 RELATIONSHIP RELATIONSHIP : 1 (0:1) ENTITY 2 ENTITY 2 ENTITY 2 R. De Virgilio, A. Maccioni, R. Torlone Model-driven design of graph databases, ER International Conference on Conceptual Modeling, 2014.
45 modeling graph databases Oriented-ER: R. De Virgilio, A. Maccioni, R. Torlone Model-driven design of graph databases, ER International Conference on Conceptual Modeling, 2014.
46 modeling graph databases Partitioning: Rule 1: if a node n is disconnected then it forms a group by itself. Rule 2: if a node n has w (n)>1 and w+(n)>0 then n forms a group by itself. Rule 3: if a node n has w (n)<2 and w+(n)<2 then n is added to the group of a node m such that there exists the edge (m, n) in the O-ER diagram. R. De Virgilio, A. Maccioni, R. Torlone Model-driven design of graph databases, ER International Conference on Conceptual Modeling, 2014.
47 modeling graph databases Partitioning: R. De Virgilio, A. Maccioni, R. Torlone Model-driven design of graph databases, ER International Conference on Conceptual Modeling, 2014.
48 modeling graph databases Graph Database Template R. De Virgilio, A. Maccioni, R. Torlone Model-driven design of graph databases, ER International Conference on Conceptual Modeling, 2014.
49 R2G: from relations to graphs SQL select * from T where T.A1 = v1 R. De Virgilio, A. Maccioni, R. Torlone R2G: a Tool for Migrating Relations to Graphs EDBT International Conference on Extending Database Technology, 2014
50 R2G: unifiability of data values Joinable tuples t1 R1 and t2 R2: there is a foreign key constraint between R1.A and R2.B and t1[a] = t2[b]. Unifiability of data values t1[a] and t2[b]: (i) t1=t2 and both A and B do not belong to a multi-attribute key; (ii) t1 and t2 are joinable and A belongs to a multi-attribute key; (iii) t1 and t2 are joinable, A and B do not belong to a multi-attribute key and there is no other tuple t3 that is joinable with t2. R. De Virgilio, A. Maccioni, R. Torlone R2G: a Tool for Migrating Relations to Graphs EDBT International Conference on Extending Database Technology, 2014
51 R2G: schema graph Full Schema Paths: FR.fuser US.uid US.uname FR.fuser FR.fblog BG.bid BG.bname FR.fuser FR.fblog BG.bid BG.admin US.uid US.uname... R. De Virgilio, A. Maccioni, R. Torlone R2G: a Tool for Migrating Relations to Graphs EDBT International Conference on Extending Database Technology, 2014
52 R2G: data migration R. De Virgilio, A. Maccioni, R. Torlone R2G: a Tool for Migrating Relations to Graphs EDBT International Conference on Extending Database Technology, 2014
53 R2G: query migration R. De Virgilio, A. Maccioni, R. Torlone R2G: a Tool for Migrating Relations to Graphs EDBT International Conference on Extending Database Technology, 2014
54 scalability over real-world graphs > 500 million users > 1.2 billion active users > 500 million users Graph Databases are hard to scale
55 real-world graphs 10% of the users follow the same five users Graph Databases are hard to scale power-law graphs preferential attachment scale-free graphs
56 real-world graphs 10% of the users follow the same five users Graph Databases are very hard to scale Replication Partitioning
57 real-world graphs A. Maccioni, D. J. Abadi Scalable Pattern Matching over Compressed Graphs via Sparsification
58 real-world graphs follows fol low s fo l lo ws A. Maccioni, D. J. Abadi Scalable Pattern Matching over Compressed Graphs via Sparsification
59 real-world graphs A. Maccioni, D. J. Abadi Scalable Pattern Matching over Compressed Graphs via Sparsification
60 any redundancy? A. Maccioni, D. J. Abadi Scalable Pattern Matching over Compressed Graphs via Sparsification
61 sparsification SPARSIFICATION high-degree node low-degree node compressor node A. Maccioni, D. J. Abadi Scalable Pattern Matching over Compressed Graphs via Sparsification
62 compression via sparsification 1 C 2 B 3 A 4 Y D E E GR N O I AT C I SIF R SPA BC C ABC B AB A SPA CEAW ARE SPA RSI FIC ATI ON 1 C B ABC A A. Maccioni, D. J. Abadi Scalable Pattern Matching over Compressed Graphs via Sparsification
63 graph pattern matching with Greedy Compressed Graphs with Space-aware Compressed Graphs A. Maccioni, D. J. Abadi Scalable Pattern Matching over Compressed Graphs via Sparsification
64 open problems How to shard/partition a graph database? How to visualize large graphs? Specialized startups are addressing the problem Standardization of a query language Graph processing with GPUs Medusa-gpu, MapGraph What is the best way to implements graph layer on top of SQL/NoSQL systems? IBM, Oracle, Teradata, HP,...
65 conclusion Graphs are used in many fields Graphs are more complicated to manage than other types of data and we need different considerations When we need to store a big graph Social Networks, Bioinformatics, Semantic Web, Geo-informatics,... we have many options, each one with both advantages and disadvantages Scaling queries over graph databases is still an infant area of database industry and research
66 thanks for the attention
GRAPH DATABASE SYSTEMS. h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 1
GRAPH DATABASE SYSTEMS h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 1 Use Case: Route Finding Source: Neo Technology, Inc. h_da Prof. Dr. Uta Störl Big Data Technologies:
Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing. October 29th, 2015
E6893 Big Data Analytics Lecture 8: Spark Streams and Graph Computing (I) Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing
! E6893 Big Data Analytics Lecture 9:! Linked Big Data Graph Computing (I)
! E6893 Big Data Analytics Lecture 9:! Linked Big Data Graph Computing (I) Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science Mgr., Dept. of Network Science and
Graph Databases: Neo4j
Course NDBI040: Big Data Management and NoSQL Databases Practice 05: Graph Databases: Neo4j Martin Svoboda 5. 1. 2016 Faculty of Mathematics and Physics, Charles University in Prague Outline Graph databases
How graph databases started the multi-model revolution
How graph databases started the multi-model revolution Luca Garulli Author and CEO @OrientDB QCon Sao Paulo - March 26, 2015 Welcome to Big Data 90% of the data in the world today has been created in the
Cloud Scale Distributed Data Storage. Jürmo Mehine
Cloud Scale Distributed Data Storage Jürmo Mehine 2014 Outline Background Relational model Database scaling Keys, values and aggregates The NoSQL landscape Non-relational data models Key-value Document-oriented
Overview on Graph Datastores and Graph Computing Systems. -- Litao Deng (Cloud Computing Group) 06-08-2012
Overview on Graph Datastores and Graph Computing Systems -- Litao Deng (Cloud Computing Group) 06-08-2012 Graph - Everywhere 1: Friendship Graph 2: Food Graph 3: Internet Graph Most of the relationships
Converting Relational to Graph Databases
Converting Relational to Graph Databases Roberto De Virgilio Università Roma Tre Rome, Italy [email protected] Antonio Maccioni Università Roma Tre Rome, Italy [email protected] Riccardo Torlone
A Comparison of Current Graph Database Models
A Comparison of Current Graph Database Models Renzo Angles Universidad de Talca (Chile) 3rd Int. Workshop on Graph Data Management: Techniques and applications (GDM 2012) 5 April, Washington DC, USA Outline
NoSQL and Graph Database
NoSQL and Graph Database Biswanath Dutta DRTC, Indian Statistical Institute 8th Mile Mysore Road R. V. College Post Bangalore 560059 International Conference on Big Data, Bangalore, 9-20 March 2015 Outlines
Client Overview. Engagement Situation. Key Requirements
Client Overview Our client is one of the leading providers of business intelligence systems for customers especially in BFSI space that needs intensive data analysis of huge amounts of data for their decision
Introduction to NOSQL
Introduction to NOSQL Université Paris-Est Marne la Vallée, LIGM UMR CNRS 8049, France January 31, 2014 Motivations NOSQL stands for Not Only SQL Motivations Exponential growth of data set size (161Eo
TITAN BIG GRAPH DATA WITH CASSANDRA #TITANDB #CASSANDRA12
TITAN BIG GRAPH DATA WITH CASSANDRA #TITANDB #CASSANDRA12 Matthias Broecheler, CTO August VIII, MMXII AURELIUS THINKAURELIUS.COM Abstract Titan is an open source distributed graph database build on top
Integrating Big Data into the Computing Curricula
Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big
A Performance Evaluation of Open Source Graph Databases. Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader
A Performance Evaluation of Open Source Graph Databases Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader Overview Motivation Options Evaluation Results Lessons Learned Moving Forward
Objectivity positions graph database as relational complement to InfiniteGraph 3.0
Objectivity positions graph database as relational complement to InfiniteGraph 3.0 Analyst: Matt Aslett 1 Oct, 2012 Objectivity Inc has launched version 3.0 of its InfiniteGraph graph database, improving
Database Management System Choices. Introduction To Database Systems CSE 373 Spring 2013
Database Management System Choices Introduction To Database Systems CSE 373 Spring 2013 Outline Introduction PostgreSQL MySQL Microsoft SQL Server Choosing A DBMS NoSQL Introduction There a lot of options
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing
Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores
Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores Composite Software October 2010 TABLE OF CONTENTS INTRODUCTION... 3 BUSINESS AND IT DRIVERS... 4 NOSQL DATA STORES LANDSCAPE...
Analysis of Web Archives. Vinay Goel Senior Data Engineer
Analysis of Web Archives Vinay Goel Senior Data Engineer Internet Archive Established in 1996 501(c)(3) non profit organization 20+ PB (compressed) of publicly accessible archival material Technology partner
NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre
NoSQL systems: introduction and data models Riccardo Torlone Università Roma Tre Why NoSQL? In the last thirty years relational databases have been the default choice for serious data storage. An architect
Big Data Analytics. Rasoul Karimi
Big Data Analytics Rasoul Karimi Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 1 Introduction
NoSQL and Hadoop Technologies On Oracle Cloud
NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath
Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)
Not Relational Models For The Management of Large Amount of Astronomical Data Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF) What is a DBMS A Data Base Management System is a software infrastructure
Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1
Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots
Databases 2 (VU) (707.030)
Databases 2 (VU) (707.030) Introduction to NoSQL Denis Helic KMI, TU Graz Oct 14, 2013 Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 1 / 37 Outline 1 NoSQL Motivation 2 NoSQL Systems 3 NoSQL Examples 4
InfiniteGraph: The Distributed Graph Database
A Performance and Distributed Performance Benchmark of InfiniteGraph and a Leading Open Source Graph Database Using Synthetic Data Objectivity, Inc. 640 West California Ave. Suite 240 Sunnyvale, CA 94086
Challenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2
Data Modeling for Big Data
Data Modeling for Big Data by Jinbao Zhu, Principal Software Engineer, and Allen Wang, Manager, Software Engineering, CA Technologies In the Internet era, the volume of data we deal with has grown to terabytes
1-Oct 2015, Bilbao, Spain. Towards Semantic Network Models via Graph Databases for SDN Applications
1-Oct 2015, Bilbao, Spain Towards Semantic Network Models via Graph Databases for SDN Applications Agenda Introduction Goals Related Work Proposal Experimental Evaluation and Results Conclusions and Future
HIGH PERFORMANCE BIG DATA ANALYTICS
HIGH PERFORMANCE BIG DATA ANALYTICS Kunle Olukotun Electrical Engineering and Computer Science Stanford University June 2, 2014 Explosion of Data Sources Sensors DoD is swimming in sensors and drowning
How To Improve Performance In A Database
Some issues on Conceptual Modeling and NoSQL/Big Data Tok Wang Ling National University of Singapore 1 Database Models File system - field, record, fixed length record Hierarchical Model (IMS) - fixed
Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam [email protected]
Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam [email protected] Agenda The rise of Big Data & Hadoop MySQL in the Big Data Lifecycle MySQL Solutions for Big Data Q&A
Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>
s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline
Oracle Big Data Spatial & Graph Social Network Analysis - Case Study
Oracle Big Data Spatial & Graph Social Network Analysis - Case Study Mark Rittman, CTO, Rittman Mead OTN EMEA Tour, May 2016 [email protected] www.rittmanmead.com @rittmanmead About the Speaker Mark
Software tools for Complex Networks Analysis. Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team
Software tools for Complex Networks Analysis Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team MOTIVATION Why do we need tools? Source : nature.com Visualization Properties extraction
Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world
Analytics March 2015 White paper Why NoSQL? Your database options in the new non-relational world 2 Why NoSQL? Contents 2 New types of apps are generating new types of data 2 A brief history of NoSQL 3
Big Graph Analytics on Neo4j with Apache Spark. Michael Hunger Original work by Kenny Bastani Berlin Buzzwords, Open Stage
Big Graph Analytics on Neo4j with Apache Spark Michael Hunger Original work by Kenny Bastani Berlin Buzzwords, Open Stage My background I only make it to the Open Stages :) Probably because Apache Neo4j
Taming Big Data Variety with Semantic Graph Databases. Evren Sirin CTO Complexible
Taming Big Data Variety with Semantic Graph Databases Evren Sirin CTO Complexible About Complexible Semantic Tech leader since 2006 (née Clark & Parsia) software, consulting W3C leadership Offices in DC
HadoopRDF : A Scalable RDF Data Analysis System
HadoopRDF : A Scalable RDF Data Analysis System Yuan Tian 1, Jinhang DU 1, Haofen Wang 1, Yuan Ni 2, and Yong Yu 1 1 Shanghai Jiao Tong University, Shanghai, China {tian,dujh,whfcarter}@apex.sjtu.edu.cn
www.objectivity.com Choosing The Right Big Data Tools For The Job A Polyglot Approach
www.objectivity.com Choosing The Right Big Data Tools For The Job A Polyglot Approach Nic Caine NoSQL Matters, April 2013 Overview The Problem Current Big Data Analytics Relationship Analytics Leveraging
Big Data, Fast Data, Complex Data. Jans Aasman Franz Inc
Big Data, Fast Data, Complex Data Jans Aasman Franz Inc Private, founded 1984 AI, Semantic Technology, professional services Now in Oakland Franz Inc Who We Are (1 (2 3) (4 5) (6 7) (8 9) (10 11) (12
How To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 [email protected] www.scch.at Michael Zwick DI
NoSQL Databases. Nikos Parlavantzas
!!!! NoSQL Databases Nikos Parlavantzas Lecture overview 2 Objective! Present the main concepts necessary for understanding NoSQL databases! Provide an overview of current NoSQL technologies Outline 3!
Cloud Computing at Google. Architecture
Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale
In Memory Accelerator for MongoDB
In Memory Accelerator for MongoDB Yakov Zhdanov, Director R&D GridGain Systems GridGain: In Memory Computing Leader 5 years in production 100s of customers & users Starts every 10 secs worldwide Over 15,000,000
MySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!)
MySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!) Erdélyi Ernő, Component Soft Kft. [email protected] www.component.hu 2013 (c) Component Soft Ltd Leading Hadoop Vendor Copyright 2013,
Big Data Technologies. Prof. Dr. Uta Störl Hochschule Darmstadt Fachbereich Informatik Sommersemester 2015
Big Data Technologies Prof. Dr. Uta Störl Hochschule Darmstadt Fachbereich Informatik Sommersemester 2015 Situation: Bigger and Bigger Volumes of Data Big Data Use Cases Log Analytics (Web Logs, Sensor
Review of Graph Databases for Big Data Dynamic Entity Scoring
Review of Graph Databases for Big Data Dynamic Entity Scoring M. X. Labute, M. J. Dombroski May 16, 2014 Disclaimer This document was prepared as an account of work sponsored by an agency of the United
Domain driven design, NoSQL and multi-model databases
Domain driven design, NoSQL and multi-model databases Java Meetup New York, 10 November 2014 Max Neunhöffer www.arangodb.com Max Neunhöffer I am a mathematician Earlier life : Research in Computer Algebra
The Synergy Between the Object Database, Graph Database, Cloud Computing and NoSQL Paradigms
ICOODB 2010 - Frankfurt, Deutschland The Synergy Between the Object Database, Graph Database, Cloud Computing and NoSQL Paradigms Leon Guzenda - Objectivity, Inc. 1 AGENDA Historical Overview Inherent
Graph Processing and Social Networks
Graph Processing and Social Networks Presented by Shu Jiayu, Yang Ji Department of Computer Science and Engineering The Hong Kong University of Science and Technology 2015/4/20 1 Outline Background Graph
Lecture Data Warehouse Systems
Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores
Big Graph Processing: Some Background
Big Graph Processing: Some Background Bo Wu Colorado School of Mines Part of slides from: Paul Burkhardt (National Security Agency) and Carlos Guestrin (Washington University) Mines CSCI-580, Bo Wu Graphs
SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford
SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55 Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems
Graph Database Performance: An Oracle Perspective
Graph Database Performance: An Oracle Perspective Xavier Lopez, Ph.D. Senior Director, Product Management 1 Copyright 2012, Oracle and/or its affiliates. All rights reserved. Program Agenda Broad Perspective
Introduction to Big Data Training
Introduction to Big Data Training The quickest way to be introduce with NOSQL/BIG DATA offerings Learn and experience Big Data Solutions including Hadoop HDFS, Map Reduce, NoSQL DBs: Document Based DB
Architectures for massive data management
Architectures for massive data management Apache Spark Albert Bifet [email protected] October 20, 2015 Spark Motivation Apache Spark Figure: IBM and Apache Spark What is Apache Spark Apache
NoSQL Evaluation. A Use Case Oriented Survey
2011 International Conference on Cloud and Service Computing NoSQL Evaluation A Use Case Oriented Survey Robin Hecht Chair of Applied Computer Science IV University ofbayreuth Bayreuth, Germany robin.hecht@uni
NOSQL DATABASES IN EEG/ERP
University of West Bohemia Faculty of applied sciences Department of computer science and Engineering DIPLOMA THESIS NOSQL DATABASES IN EEG/ERP DOMAIN Pilsen, 2013 Ladislav Janák Acknowledgments First
Oracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing
Evaluating NoSQL for Enterprise Applications Dirk Bartels VP Strategy & Marketing Agenda The Real Time Enterprise The Data Gold Rush Managing The Data Tsunami Analytics and Data Case Studies Where to go
Evaluating partitioning of big graphs
Evaluating partitioning of big graphs Fredrik Hallberg, Joakim Candefors, Micke Soderqvist [email protected], [email protected], [email protected] Royal Institute of Technology, Stockholm, Sweden Abstract. Distributed
Big Data and Scripting Systems beyond Hadoop
Big Data and Scripting Systems beyond Hadoop 1, 2, ZooKeeper distributed coordination service many problems are shared among distributed systems ZooKeeper provides an implementation that solves these avoid
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics
Semantic Web Success Story
Semantic Web Success Story Practical Integration of Semantic Web Technology Chris Chaulk, Software Architect EMC Corporation 1 Who is this guy? Software Architect at EMC 12 years, Storage Management Software
Teradata s Big Data Technology Strategy & Roadmap
Teradata s Big Data Technology Strategy & Roadmap Artur Borycki, Director International Solutions Marketing 18 March 2014 Agenda > Introduction and level-set > Enabling the Logical Data Warehouse > Any
Architectures for Big Data Analytics A database perspective
Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum
A Brief Study of Open Source Graph Databases
A Brief Study of Open Source Graph Databases Rob McColl David Ediger Jason Poovey Dan Campbell David Bader Georgia Tech Research Institute, Georgia Institute of Technology Abstract With the proliferation
Graph Database Applications and Concepts with Neo4j
with Neo4j Justin J. Miller Georgia Southern University [email protected] ABSTRACT Graph databases (GDB) are now a viable alternative to Relational Database Systems (RDBMS). Chemistry, biology,
StratioDeep. An integration layer between Cassandra and Spark. Álvaro Agea Herradón Antonio Alcocer Falcón
StratioDeep An integration layer between Cassandra and Spark Álvaro Agea Herradón Antonio Alcocer Falcón StratioDeep An integration layer between Cassandra and Spark Álvaro Agea Herradón Antonio Alcocer
Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware
Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Created by Doug Cutting and Mike Carafella in 2005. Cutting named the program after
INTRODUCTION TO CASSANDRA
INTRODUCTION TO CASSANDRA This ebook provides a high level overview of Cassandra and describes some of its key strengths and applications. WHAT IS CASSANDRA? Apache Cassandra is a high performance, open
Big Data and Analytics: Challenges and Opportunities
Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif
Graph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPop Jason Plurad Software Engineer, IBM Committer, Apache TinkerPop Project Update Graph Landscape A Graph Problem Hands-On Graph http://tinkerpop.apache.org About Me
Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013
Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013 James Maltby, Ph.D 1 Outline of Presentation Semantic Graph Analytics Database Architectures In-memory Semantic Database Formulation
Cloud Computing and Advanced Relationship Analytics
Cloud Computing and Advanced Relationship Analytics Using Objectivity/DB to Discover the Relationships in your Data By Brian Clark Vice President, Product Management Objectivity, Inc. 408 992 7136 [email protected]
Big Data looks Tiny from the Stratosphere
Volker Markl http://www.user.tu-berlin.de/marklv [email protected] Big Data looks Tiny from the Stratosphere Data and analyses are becoming increasingly complex! Size Freshness Format/Media Type
Software Life-Cycle Management
Ingo Arnold Department Computer Science University of Basel Theory Software Life-Cycle Management Architecture Styles Overview An Architecture Style expresses a fundamental structural organization schema
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to
Infrastructures for big data
Infrastructures for big data Rasmus Pagh 1 Today s lecture Three technologies for handling big data: MapReduce (Hadoop) BigTable (and descendants) Data stream algorithms Alternatives to (some uses of)
Relational Database Basics Review
Relational Database Basics Review IT 4153 Advanced Database J.G. Zheng Spring 2012 Overview Database approach Database system Relational model Database development 2 File Processing Approaches Based on
Rule-Based Engineering Using Declarative Graph Database Queries
Rule-Based Engineering Using Declarative Graph Database Queries Sten Grüner, Ulrich Epple Chair of Process Control Engineering, RWTH Aachen University MBEES 2014, Dagstuhl, 05.03.14 Motivation Every plant
E6895 Advanced Big Data Analytics Lecture 4:! Data Store
E6895 Advanced Big Data Analytics Lecture 4:! Data Store Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science Mgr., Dept. of Network Science and Big Data Analytics,
Scalable Architecture on Amazon AWS Cloud
Scalable Architecture on Amazon AWS Cloud Kalpak Shah Founder & CEO, Clogeny Technologies [email protected] 1 * http://www.rightscale.com/products/cloud-computing-uses/scalable-website.php 2 Architect
Big Data and Scripting Systems build on top of Hadoop
Big Data and Scripting Systems build on top of Hadoop 1, 2, Pig/Latin high-level map reduce programming platform Pig is the name of the system Pig Latin is the provided programming language Pig Latin is
Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu
Lecture 4 Introduction to Hadoop & GAE Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu Outline Introduction to Hadoop The Hadoop ecosystem Related projects
Logistics. Database Management Systems. Chapter 1. Project. Goals for This Course. Any Questions So Far? What This Course Cannot Do.
Database Management Systems Chapter 1 Mirek Riedewald Many slides based on textbook slides by Ramakrishnan and Gehrke 1 Logistics Go to http://www.ccs.neu.edu/~mirek/classes/2010-f- CS3200 for all course-related
Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料
Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料 美 國 13 歲 學 生 用 Big Data 找 出 霸 淩 熱 點 Puri 架 設 網 站 Bullyvention, 藉 由 分 析 Twitter 上 找 出 提 到 跟 霸 凌 相 關 的 詞, 搭 配 地 理 位 置
Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.
Oracle BI EE Implementation on Netezza Prepared by SureShot Strategies, Inc. The goal of this paper is to give an insight to Netezza architecture and implementation experience to strategize Oracle BI EE
An Approach to Implement Map Reduce with NoSQL Databases
www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh
Enterprise Operational SQL on Hadoop Trafodion Overview
Enterprise Operational SQL on Hadoop Trafodion Overview Rohit Jain Distinguished & Chief Technologist Strategic & Emerging Technologies Enterprise Database Solutions Copyright 2012 Hewlett-Packard Development
A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA
A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA Ompal Singh Assistant Professor, Computer Science & Engineering, Sharda University, (India) ABSTRACT In the new era of distributed system where
Big Data and Data Science: Behind the Buzz Words
Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing
