Big Graph Data Management
|
|
- Melinda Miller
- 8 years ago
- Views:
Transcription
1 y d-b hel topic Antonio Maccioni Big Data Course locatedin re whe 14 May 2015 is-a when affiliated Big Graph Data Management Rome
2 this talk is about Graph Databases: models, languages, use cases Graph Database Management Systems Graph Processing Systems (intro) Open Problems in Graph Data Management...and research project around us
3 scenario is working on PGX (Parallel Graph Analytics), has launched Green-Marl and has implemented an RDF layer on top of Oracle NoSQL developing a layer, VERTEXICA, for graph mining on top of the analytical database is going towards graph search capabilities (Tao, Unicorn, Open Graph protocol, etc.) has been working on the Knowledge Graph for a while, has created Pregel and has launched Cayley launched the language interface SQL-GR to run graph analysis on top of its analytical database has open-sourced its graph database FlockDB has working on Trinity, an in-memory graph database has just aquired Aurelius TitanDB
4 scenario Over 25 percent of enterprises will use graph databases by Enterprise DBMS, Q Forrester Research graph databases are catching on commercially - Michael Stonebraker (2014 ACM Turing Award)
5 nosql scenario Simple data models Graph Databases are an odd fish in the NoSQL pond - P.J. Sadalage, M. Fowler - NoSQL Distilled
6 nosql scenario Simple data models But if we want to represent connections we may opt for a graph database management systems:
7 a graph Adjacency lists Adjacency matrix from\to [1]->2->4->5 [2]->3 [3]->5 [4]->5->6 [5]->6
8 graph + data = graph database admin belongs belongs follows likes works married admin belongs likes friends friends worked belongs
9 why graph databases? More natural modeling Manage connections explicitly Run algorithms of network science (e.g., PageRank)
10 use case 1: semantic web A Web-scale architecture Web of Data for metadata and data management (together) for interoperarbility of data and services Compatible with other Web technologies Based a set of W3C standards (HTTP, IRI/URI,RDF, SPARQL, OWL) Web 3.0 make information understandable by machines
11 use case 1: semantic web HTTP request of data by URI You can follow links (the edges of the graph)
12 use case 1: semantic web Semantic Web + Open Data = Linked Open Data
13 use case 1: semantic web
14 sparql SELECT?name {?person1 married?person2.?person2 born?city.?city name Honolulu.?person1 name?name1. }?person1?person2 married name Honolulu name name?city?name1 Which are the names of the people married with people born in Honolulu?
15 sparql?person1 SELECT?name {?person1 married?person2.?person2 born?city.?city name Honolulu.?person1 name?name1. } Barack Obama name married name Honolulu name Honolulu Pattern matching style of querying name?city?name1 p2 married born name?person2 p1 name Michelle Obama born c1 c2 locatedin name Chicago locatedin n1 name USA
16 sparql?person1 SELECT?name {?person1 married?person2.?person2 born?city.?city name Honolulu.?person1 name?name1. } Barack Obama name married name Honolulu name Honolulu Pattern matching style of querying name?city?name1 p2 married p1 name Michelle Obama born born name?person2 c1 c2 locatedin name Chicago locatedin n1 name USA
17 triple stores G: s p o p1 married p2 p1 name p2 born Michelle Obama c2 c2 name Honolulu us name USA Indexes on: (s,p,o), (s,o,p), (p,s,o), (p,o,s), (o,s,p), (o,p,s)
18 query processing?person1 name married Honolulu?person2 name name?city?name?person1?person1 name?person2 married?name name?person2?person1?city?person2?city Honolulu?city?city?person2 G(P=name)?person1 G(P=name) G(P=married) G(P=name and O=Honolulu)
19 use case 2: social networks admin belongs belongs follows likes works married admin belongs likes friends friends worked belongs
20 use case 2: social networks Find common friends for every single profile visit FRIEND 1 FROM_DAY FRIEND 2 ~ 1.2 billions tuples * average number of friends per person
21 use case 2: social networks Find common friends for every single profile visit FRIEND 1 FROM_DAY FRIEND 2 ~ 1.2 billions tuples * average number of friends per person
22 graph database management systems Three main properties: 1. Property Graph (as data model) 2. Index-free Adjacency (as physical level organization) 3. Path-traversal (as query language)
23 property graph data model A property graph is a directed multigraph g= (N, E) where every node n N and every edge e E is associated with a set of pairs <key, value>, called properties. It's a schema-less data model
24 index-free adjacency We say that a (graph) database g satisfies the index-free adjacency if the existence of an edge between two nodes n1 and n2 in g can be tested on those nodes and does not require to access an external, global, index. GOAL: make the cost of a basic traversal independent of the size of the database, in case keeping O(1)
25 index-free adjacency...trying to keep the cost of a basic traversal O(1)
26 index-free adjacency...trying to keep the cost of a basic traversal O(1)
27 neo4j physical layer Store files for different parts of the graph Node store Relationship store Property store Each record contains 4 properties The properties of an element may use more records Property's values can be either stored in the property store or stored in a dynamic string store I. Robinson, J. Webber, E. Eifrem Graph Databases, 2013.
28 titan, infinitegraph, levelgraph Built above the extensible column store Apache Cassandra Built above the Object Oriented Database Objectivity/DB Built on node.js above the key-value store LevelDB but pluggable to different stores LevelGraph
29 building a neo4j graph db Server Mode 1. Go to download Neo4J Server and unzip it 2. Run the command./bin/neo4j start (use bin\neo4j.bat on Windows) 3. Find a graphical dashboard at 4. You can also use it with REST API:
30 building a neo4j graph db Embedded mode 1. Import in your java project the library neo4j-kernel-*-*-*.jar and its classes: import org.neo4j.graphdb.*; import org.neo4j.graphdb.factory.graphdatabasefactory; 2. Create the database: GraphDatabaseService gdb = new GraphDatabaseFactory(). newembeddeddatabase("/home/..."); 3. Create nodes and edges: Enum implementing RelationshipType Node n1 = gdb.createnode(); Node n2 = gdb.createnode(); Relationship e12 = n1.createrelationshipto(n2, EdgeType.TYPE); 4. Set the properties: n1.setproperty( name, Rome ); n2.setproperty( name, Italy ); e12.setproperty( type, locatedin );
31 tinkerpop stack BLUEPRINTS Blueprints is a property graph model interface with provided implementations. GREMLIN Gremlin is a domain specific language for traversing property graphs FRAMES Frames exposes the elements of a Blueprints graph as Java objects: software is written in terms of domain objects and their relationships to each other. FURNACE Furnace is a property graph algorithms package REXTER PIPES
32 building a graph db with blueprints 1. Import in your java project the libraries blueprints-core-*.*.*.jar and blueprints-neo4j-graph-*.*.*.jar with their classes: import com.tinkerpop.blueprints.*; import com.tinkerpop.blueprints.impls.neo4j.neo4jgraph; 2. Create the database: Graph gdb = new Neo4jGraph("/home/..."); 3. Create nodes and edges: Vertex n1 = gdb.addvertex(null); Vertex n2 = gdb.addvertex(null); Edge e12 = gdb.addedge(null, n1, n2, locatedin ); 4. Set the properties: n1.setproperty( name, Rome ); n2.setproperty( name, Italy ); e12.setproperty( type, locatedin );
33 querying a graph database Gremlin: Imperative query language Descendant of languages such as XPATH Cypher: Declarative query language Descendant of languages such as SQL
34 gremlin: a path traversal QL gremlin> g = new Neo4jGraph("/home/..."); gremlin> g.v.oute.filter{it.edgeid == 'e1'} ==>e[2][1 EDGE >4] ==>e[1][1 EDGE >3] ==>e[0][1 EDGE >2] ==>e[7][6 EDGE >2]
35 gremlin: a path traversal QL gremlin> g.v.oute.filter{it.edgeid == 'e1'}.inv.oute. filter{it.edgeid == 'e2'}.inv.nodeid ==>F ==>E ==>E
36 cypher: a pattern matching QL START: starting point in the graph MATCH: the pattern to match, bound to the starting point WHERE: filtering criteria RETURN: what to return
37 cypher: a pattern matching QL MATCH: the pattern to match, bound to the starting point node1 edge1 >node2 edge2 >node3 node1 [?] >node2 [?] >node3 node1 [*] >node3 Live hands-on a graph database about beers at
38 cypher: a pattern matching QL START n = node(*) MATCH n [r1:edge] >x [r2:edge] >m WHERE (r1.edgeid = 'e1') and (r2.edgeid = 'e2') RETURN m.nodeid ==> F ==> E ==> E
39 other features Secondary Indexes: defined on properties Transactions: graph databases usually support ACID properties. In Neo4J all operations have to be performed in a transaction: try ( Transaction tx = gdb.begintx() ) { tx.success(); } Other programming language wrappers:
40 graph processing systems Frameworks to compute (distributed) graph analysis on large graphs: have similar motivations of Hadoop, Spark, etc. help programmers to focus on the algorithm rather than on the implementation support different types of graphs provide a variety of algorithms already implemented Pregel/Giraph GraphLab/Dato Pegasus
41 google pregel/apache giraph User specifies a vertex program Computation runs a sequence of supersteps In each superstep the program is executed over all the vertexes The program can use messages received in a previous superstep from the neighbors and can send messages to them for the next superstep A vertex can deactivate itself and the computation halts when all the vertexes are deactivated.
42 research problems about graph DBs How to model a graph database? How to migrate data and queries from existing databases? How to scale queries over large graphs?
43 modeling graph databases Compact: Sparse: Dense: Reduces the number of data accesses Accesses and updates can be inefficient Reduces number of joins Can violate property graph constraints Needs human intervention for a semantic enrichment
44 modeling graph databases Orienting the ER: ENTITY 1 ENTITY 1 (0:1) RELATIONSHIP (0:1) ENTITY 2 ENTITY 1 ENTITY 1 (0:N) RELATIONSHIP ENTITY 2 ENTITY 1 ENTITY 1 (0:N) RELATIONSHIP : 2 (0:N) ENTITY 2 RELATIONSHIP : 0 RELATIONSHIP RELATIONSHIP : 1 (0:1) ENTITY 2 ENTITY 2 ENTITY 2 R. De Virgilio, A. Maccioni, R. Torlone Model-driven design of graph databases, ER International Conference on Conceptual Modeling, 2014.
45 modeling graph databases Oriented-ER: R. De Virgilio, A. Maccioni, R. Torlone Model-driven design of graph databases, ER International Conference on Conceptual Modeling, 2014.
46 modeling graph databases Partitioning: Rule 1: if a node n is disconnected then it forms a group by itself. Rule 2: if a node n has w (n)>1 and w+(n)>0 then n forms a group by itself. Rule 3: if a node n has w (n)<2 and w+(n)<2 then n is added to the group of a node m such that there exists the edge (m, n) in the O-ER diagram. R. De Virgilio, A. Maccioni, R. Torlone Model-driven design of graph databases, ER International Conference on Conceptual Modeling, 2014.
47 modeling graph databases Partitioning: R. De Virgilio, A. Maccioni, R. Torlone Model-driven design of graph databases, ER International Conference on Conceptual Modeling, 2014.
48 modeling graph databases Graph Database Template R. De Virgilio, A. Maccioni, R. Torlone Model-driven design of graph databases, ER International Conference on Conceptual Modeling, 2014.
49 R2G: from relations to graphs SQL select * from T where T.A1 = v1 R. De Virgilio, A. Maccioni, R. Torlone R2G: a Tool for Migrating Relations to Graphs EDBT International Conference on Extending Database Technology, 2014
50 R2G: unifiability of data values Joinable tuples t1 R1 and t2 R2: there is a foreign key constraint between R1.A and R2.B and t1[a] = t2[b]. Unifiability of data values t1[a] and t2[b]: (i) t1=t2 and both A and B do not belong to a multi-attribute key; (ii) t1 and t2 are joinable and A belongs to a multi-attribute key; (iii) t1 and t2 are joinable, A and B do not belong to a multi-attribute key and there is no other tuple t3 that is joinable with t2. R. De Virgilio, A. Maccioni, R. Torlone R2G: a Tool for Migrating Relations to Graphs EDBT International Conference on Extending Database Technology, 2014
51 R2G: schema graph Full Schema Paths: FR.fuser US.uid US.uname FR.fuser FR.fblog BG.bid BG.bname FR.fuser FR.fblog BG.bid BG.admin US.uid US.uname... R. De Virgilio, A. Maccioni, R. Torlone R2G: a Tool for Migrating Relations to Graphs EDBT International Conference on Extending Database Technology, 2014
52 R2G: data migration R. De Virgilio, A. Maccioni, R. Torlone R2G: a Tool for Migrating Relations to Graphs EDBT International Conference on Extending Database Technology, 2014
53 R2G: query migration R. De Virgilio, A. Maccioni, R. Torlone R2G: a Tool for Migrating Relations to Graphs EDBT International Conference on Extending Database Technology, 2014
54 scalability over real-world graphs > 500 million users > 1.2 billion active users > 500 million users Graph Databases are hard to scale
55 real-world graphs 10% of the users follow the same five users Graph Databases are hard to scale power-law graphs preferential attachment scale-free graphs
56 real-world graphs 10% of the users follow the same five users Graph Databases are very hard to scale Replication Partitioning
57 real-world graphs A. Maccioni, D. J. Abadi Scalable Pattern Matching over Compressed Graphs via Sparsification
58 real-world graphs follows fol low s fo l lo ws A. Maccioni, D. J. Abadi Scalable Pattern Matching over Compressed Graphs via Sparsification
59 real-world graphs A. Maccioni, D. J. Abadi Scalable Pattern Matching over Compressed Graphs via Sparsification
60 any redundancy? A. Maccioni, D. J. Abadi Scalable Pattern Matching over Compressed Graphs via Sparsification
61 sparsification SPARSIFICATION high-degree node low-degree node compressor node A. Maccioni, D. J. Abadi Scalable Pattern Matching over Compressed Graphs via Sparsification
62 compression via sparsification 1 C 2 B 3 A 4 Y D E E GR N O I AT C I SIF R SPA BC C ABC B AB A SPA CEAW ARE SPA RSI FIC ATI ON 1 C B ABC A A. Maccioni, D. J. Abadi Scalable Pattern Matching over Compressed Graphs via Sparsification
63 graph pattern matching with Greedy Compressed Graphs with Space-aware Compressed Graphs A. Maccioni, D. J. Abadi Scalable Pattern Matching over Compressed Graphs via Sparsification
64 open problems How to shard/partition a graph database? How to visualize large graphs? Specialized startups are addressing the problem Standardization of a query language Graph processing with GPUs Medusa-gpu, MapGraph What is the best way to implements graph layer on top of SQL/NoSQL systems? IBM, Oracle, Teradata, HP,...
65 conclusion Graphs are used in many fields Graphs are more complicated to manage than other types of data and we need different considerations When we need to store a big graph Social Networks, Bioinformatics, Semantic Web, Geo-informatics,... we have many options, each one with both advantages and disadvantages Scaling queries over graph databases is still an infant area of database industry and research
66 thanks for the attention
GRAPH DATABASE SYSTEMS. h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 1
GRAPH DATABASE SYSTEMS h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 1 Use Case: Route Finding Source: Neo Technology, Inc. h_da Prof. Dr. Uta Störl Big Data Technologies:
More informationChing-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing. October 29th, 2015
E6893 Big Data Analytics Lecture 8: Spark Streams and Graph Computing (I) Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing
More information! E6893 Big Data Analytics Lecture 9:! Linked Big Data Graph Computing (I)
! E6893 Big Data Analytics Lecture 9:! Linked Big Data Graph Computing (I) Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science Mgr., Dept. of Network Science and
More informationGraph Databases: Neo4j
Course NDBI040: Big Data Management and NoSQL Databases Practice 05: Graph Databases: Neo4j Martin Svoboda 5. 1. 2016 Faculty of Mathematics and Physics, Charles University in Prague Outline Graph databases
More informationHow graph databases started the multi-model revolution
How graph databases started the multi-model revolution Luca Garulli Author and CEO @OrientDB QCon Sao Paulo - March 26, 2015 Welcome to Big Data 90% of the data in the world today has been created in the
More informationCloud Scale Distributed Data Storage. Jürmo Mehine
Cloud Scale Distributed Data Storage Jürmo Mehine 2014 Outline Background Relational model Database scaling Keys, values and aggregates The NoSQL landscape Non-relational data models Key-value Document-oriented
More informationOverview on Graph Datastores and Graph Computing Systems. -- Litao Deng (Cloud Computing Group) 06-08-2012
Overview on Graph Datastores and Graph Computing Systems -- Litao Deng (Cloud Computing Group) 06-08-2012 Graph - Everywhere 1: Friendship Graph 2: Food Graph 3: Internet Graph Most of the relationships
More informationConverting Relational to Graph Databases
Converting Relational to Graph Databases Roberto De Virgilio Università Roma Tre Rome, Italy dvr@dia.uniroma3.it Antonio Maccioni Università Roma Tre Rome, Italy maccioni@dia.uniroma3.it Riccardo Torlone
More informationA Comparison of Current Graph Database Models
A Comparison of Current Graph Database Models Renzo Angles Universidad de Talca (Chile) 3rd Int. Workshop on Graph Data Management: Techniques and applications (GDM 2012) 5 April, Washington DC, USA Outline
More informationNoSQL and Graph Database
NoSQL and Graph Database Biswanath Dutta DRTC, Indian Statistical Institute 8th Mile Mysore Road R. V. College Post Bangalore 560059 International Conference on Big Data, Bangalore, 9-20 March 2015 Outlines
More informationClient Overview. Engagement Situation. Key Requirements
Client Overview Our client is one of the leading providers of business intelligence systems for customers especially in BFSI space that needs intensive data analysis of huge amounts of data for their decision
More informationwww.objectivity.com An Introduction To Presented by Leon Guzenda, Founder, Objectivity
www.objectivity.com An Introduction To Graph Databases Presented by Leon Guzenda, Founder, Objectivity Mark Maagdenberg, Sr. Sales Engineer, Objectivity Paul DeWolf, Dir. Field Engineering, Objectivity
More informationIntroduction to NOSQL
Introduction to NOSQL Université Paris-Est Marne la Vallée, LIGM UMR CNRS 8049, France January 31, 2014 Motivations NOSQL stands for Not Only SQL Motivations Exponential growth of data set size (161Eo
More informationTITAN BIG GRAPH DATA WITH CASSANDRA #TITANDB #CASSANDRA12
TITAN BIG GRAPH DATA WITH CASSANDRA #TITANDB #CASSANDRA12 Matthias Broecheler, CTO August VIII, MMXII AURELIUS THINKAURELIUS.COM Abstract Titan is an open source distributed graph database build on top
More informationIntegrating Big Data into the Computing Curricula
Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big
More informationA SURVEY OF PERSISTENT GRAPH DATABASES
A SURVEY OF PERSISTENT GRAPH DATABASES A thesis submitted to Kent State University in partial fulfillment of the requirements for the degree of Master of Science by Yufan Liu March 2014 Thesis written
More informationA Performance Evaluation of Open Source Graph Databases. Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader
A Performance Evaluation of Open Source Graph Databases Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader Overview Motivation Options Evaluation Results Lessons Learned Moving Forward
More informationObjectivity positions graph database as relational complement to InfiniteGraph 3.0
Objectivity positions graph database as relational complement to InfiniteGraph 3.0 Analyst: Matt Aslett 1 Oct, 2012 Objectivity Inc has launched version 3.0 of its InfiniteGraph graph database, improving
More informationDatabase Management System Choices. Introduction To Database Systems CSE 373 Spring 2013
Database Management System Choices Introduction To Database Systems CSE 373 Spring 2013 Outline Introduction PostgreSQL MySQL Microsoft SQL Server Choosing A DBMS NoSQL Introduction There a lot of options
More informationChapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing
More informationComposite Data Virtualization Composite Data Virtualization And NOSQL Data Stores
Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores Composite Software October 2010 TABLE OF CONTENTS INTRODUCTION... 3 BUSINESS AND IT DRIVERS... 4 NOSQL DATA STORES LANDSCAPE...
More informationAnalysis of Web Archives. Vinay Goel Senior Data Engineer
Analysis of Web Archives Vinay Goel Senior Data Engineer Internet Archive Established in 1996 501(c)(3) non profit organization 20+ PB (compressed) of publicly accessible archival material Technology partner
More informationNoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre
NoSQL systems: introduction and data models Riccardo Torlone Università Roma Tre Why NoSQL? In the last thirty years relational databases have been the default choice for serious data storage. An architect
More informationBig Data Analytics. Rasoul Karimi
Big Data Analytics Rasoul Karimi Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 1 Introduction
More informationNoSQL and Hadoop Technologies On Oracle Cloud
NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath
More informationNot Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)
Not Relational Models For The Management of Large Amount of Astronomical Data Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF) What is a DBMS A Data Base Management System is a software infrastructure
More informationWhy NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1
Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots
More informationDatabases 2 (VU) (707.030)
Databases 2 (VU) (707.030) Introduction to NoSQL Denis Helic KMI, TU Graz Oct 14, 2013 Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 1 / 37 Outline 1 NoSQL Motivation 2 NoSQL Systems 3 NoSQL Examples 4
More informationInfiniteGraph: The Distributed Graph Database
A Performance and Distributed Performance Benchmark of InfiniteGraph and a Leading Open Source Graph Database Using Synthetic Data Objectivity, Inc. 640 West California Ave. Suite 240 Sunnyvale, CA 94086
More informationChallenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2
More informationData Modeling for Big Data
Data Modeling for Big Data by Jinbao Zhu, Principal Software Engineer, and Allen Wang, Manager, Software Engineering, CA Technologies In the Internet era, the volume of data we deal with has grown to terabytes
More information1-Oct 2015, Bilbao, Spain. Towards Semantic Network Models via Graph Databases for SDN Applications
1-Oct 2015, Bilbao, Spain Towards Semantic Network Models via Graph Databases for SDN Applications Agenda Introduction Goals Related Work Proposal Experimental Evaluation and Results Conclusions and Future
More informationBig Data Management. Big Data Management. (BDM) Autumn 2013. Povl Koch September 30, 2013 29-09-2013 1
Big Data Management Big Data Management (BDM) Autumn 2013 Povl Koch September 30, 2013 29-09-2013 1 Overview Today s program 1. Little more practical details about this course 2. Recap from last time 3.
More informationHIGH PERFORMANCE BIG DATA ANALYTICS
HIGH PERFORMANCE BIG DATA ANALYTICS Kunle Olukotun Electrical Engineering and Computer Science Stanford University June 2, 2014 Explosion of Data Sources Sensors DoD is swimming in sensors and drowning
More informationHow To Improve Performance In A Database
Some issues on Conceptual Modeling and NoSQL/Big Data Tok Wang Ling National University of Singapore 1 Database Models File system - field, record, fixed length record Hierarchical Model (IMS) - fixed
More informationUsing MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com
Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com Agenda The rise of Big Data & Hadoop MySQL in the Big Data Lifecycle MySQL Solutions for Big Data Q&A
More informationOracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>
s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline
More informationOracle Big Data Spatial & Graph Social Network Analysis - Case Study
Oracle Big Data Spatial & Graph Social Network Analysis - Case Study Mark Rittman, CTO, Rittman Mead OTN EMEA Tour, May 2016 info@rittmanmead.com www.rittmanmead.com @rittmanmead About the Speaker Mark
More informationSoftware tools for Complex Networks Analysis. Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team
Software tools for Complex Networks Analysis Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team MOTIVATION Why do we need tools? Source : nature.com Visualization Properties extraction
More informationAnalytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world
Analytics March 2015 White paper Why NoSQL? Your database options in the new non-relational world 2 Why NoSQL? Contents 2 New types of apps are generating new types of data 2 A brief history of NoSQL 3
More informationBig Graph Analytics on Neo4j with Apache Spark. Michael Hunger Original work by Kenny Bastani Berlin Buzzwords, Open Stage
Big Graph Analytics on Neo4j with Apache Spark Michael Hunger Original work by Kenny Bastani Berlin Buzzwords, Open Stage My background I only make it to the Open Stages :) Probably because Apache Neo4j
More informationTaming Big Data Variety with Semantic Graph Databases. Evren Sirin CTO Complexible
Taming Big Data Variety with Semantic Graph Databases Evren Sirin CTO Complexible About Complexible Semantic Tech leader since 2006 (née Clark & Parsia) software, consulting W3C leadership Offices in DC
More informationHadoopRDF : A Scalable RDF Data Analysis System
HadoopRDF : A Scalable RDF Data Analysis System Yuan Tian 1, Jinhang DU 1, Haofen Wang 1, Yuan Ni 2, and Yong Yu 1 1 Shanghai Jiao Tong University, Shanghai, China {tian,dujh,whfcarter}@apex.sjtu.edu.cn
More informationwww.objectivity.com Choosing The Right Big Data Tools For The Job A Polyglot Approach
www.objectivity.com Choosing The Right Big Data Tools For The Job A Polyglot Approach Nic Caine NoSQL Matters, April 2013 Overview The Problem Current Big Data Analytics Relationship Analytics Leveraging
More informationBig Data, Fast Data, Complex Data. Jans Aasman Franz Inc
Big Data, Fast Data, Complex Data Jans Aasman Franz Inc Private, founded 1984 AI, Semantic Technology, professional services Now in Oakland Franz Inc Who We Are (1 (2 3) (4 5) (6 7) (8 9) (10 11) (12
More informationHow To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI
More informationNoSQL Databases. Nikos Parlavantzas
!!!! NoSQL Databases Nikos Parlavantzas Lecture overview 2 Objective! Present the main concepts necessary for understanding NoSQL databases! Provide an overview of current NoSQL technologies Outline 3!
More informationCloud Computing at Google. Architecture
Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale
More informationIn Memory Accelerator for MongoDB
In Memory Accelerator for MongoDB Yakov Zhdanov, Director R&D GridGain Systems GridGain: In Memory Computing Leader 5 years in production 100s of customers & users Starts every 10 secs worldwide Over 15,000,000
More informationString-Based Semantic Web Data Management Using Ternary B-Trees PhD Seminar, April 29, 2010
String-Based Semantic Web Data Management Using Ternary B-Trees PhD Seminar, April 29, 2010 Jürg Senn Department of Computer Science, University of Basel RDF Resource Description Framework (RDF) basis
More informationNoSQL Drill-Down: So What s a Graph Database? NoCOUG Aug 2013. Philip Rathle Sr. Director of Products for Neo4j philip@neotechnology.
NoSQL Drill-Down: So What s a Graph Database? NoCOUG Aug 2013 Philip Rathle Sr. Director of Products for Neo4j philip@neotechnology.com @prathle 143 Philip 143 326 326 725 Big Data Fremont Neo4j San Francisco
More informationMySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!)
MySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!) Erdélyi Ernő, Component Soft Kft. erno@component.hu www.component.hu 2013 (c) Component Soft Ltd Leading Hadoop Vendor Copyright 2013,
More informationBig Data Technologies. Prof. Dr. Uta Störl Hochschule Darmstadt Fachbereich Informatik Sommersemester 2015
Big Data Technologies Prof. Dr. Uta Störl Hochschule Darmstadt Fachbereich Informatik Sommersemester 2015 Situation: Bigger and Bigger Volumes of Data Big Data Use Cases Log Analytics (Web Logs, Sensor
More informationReview of Graph Databases for Big Data Dynamic Entity Scoring
Review of Graph Databases for Big Data Dynamic Entity Scoring M. X. Labute, M. J. Dombroski May 16, 2014 Disclaimer This document was prepared as an account of work sponsored by an agency of the United
More informationMarketUpdate. Graph and RDF databases 2015. Market basics
Graph and RDF databases 2015 Market basics Graph databases represent a significant growth area. Indeed, research suggests that it is the fastest growing segment of the database market. There are arguably
More informationDomain driven design, NoSQL and multi-model databases
Domain driven design, NoSQL and multi-model databases Java Meetup New York, 10 November 2014 Max Neunhöffer www.arangodb.com Max Neunhöffer I am a mathematician Earlier life : Research in Computer Algebra
More informationThe Synergy Between the Object Database, Graph Database, Cloud Computing and NoSQL Paradigms
ICOODB 2010 - Frankfurt, Deutschland The Synergy Between the Object Database, Graph Database, Cloud Computing and NoSQL Paradigms Leon Guzenda - Objectivity, Inc. 1 AGENDA Historical Overview Inherent
More informationGraph Processing and Social Networks
Graph Processing and Social Networks Presented by Shu Jiayu, Yang Ji Department of Computer Science and Engineering The Hong Kong University of Science and Technology 2015/4/20 1 Outline Background Graph
More informationLecture Data Warehouse Systems
Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores
More informationBig Graph Processing: Some Background
Big Graph Processing: Some Background Bo Wu Colorado School of Mines Part of slides from: Paul Burkhardt (National Security Agency) and Carlos Guestrin (Washington University) Mines CSCI-580, Bo Wu Graphs
More informationSQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford
SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55 Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems
More informationGraph Database Performance: An Oracle Perspective
Graph Database Performance: An Oracle Perspective Xavier Lopez, Ph.D. Senior Director, Product Management 1 Copyright 2012, Oracle and/or its affiliates. All rights reserved. Program Agenda Broad Perspective
More informationIntroduction to Big Data Training
Introduction to Big Data Training The quickest way to be introduce with NOSQL/BIG DATA offerings Learn and experience Big Data Solutions including Hadoop HDFS, Map Reduce, NoSQL DBs: Document Based DB
More informationArchitectures for massive data management
Architectures for massive data management Apache Spark Albert Bifet albert.bifet@telecom-paristech.fr October 20, 2015 Spark Motivation Apache Spark Figure: IBM and Apache Spark What is Apache Spark Apache
More informationNoSQL Evaluation. A Use Case Oriented Survey
2011 International Conference on Cloud and Service Computing NoSQL Evaluation A Use Case Oriented Survey Robin Hecht Chair of Applied Computer Science IV University ofbayreuth Bayreuth, Germany robin.hecht@uni
More informationNOSQL DATABASES IN EEG/ERP
University of West Bohemia Faculty of applied sciences Department of computer science and Engineering DIPLOMA THESIS NOSQL DATABASES IN EEG/ERP DOMAIN Pilsen, 2013 Ladislav Janák Acknowledgments First
More informationOracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
More informationEvaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing
Evaluating NoSQL for Enterprise Applications Dirk Bartels VP Strategy & Marketing Agenda The Real Time Enterprise The Data Gold Rush Managing The Data Tsunami Analytics and Data Case Studies Where to go
More informationEvaluating partitioning of big graphs
Evaluating partitioning of big graphs Fredrik Hallberg, Joakim Candefors, Micke Soderqvist fhallb@kth.se, candef@kth.se, mickeso@kth.se Royal Institute of Technology, Stockholm, Sweden Abstract. Distributed
More informationBig Data and Scripting Systems beyond Hadoop
Big Data and Scripting Systems beyond Hadoop 1, 2, ZooKeeper distributed coordination service many problems are shared among distributed systems ZooKeeper provides an implementation that solves these avoid
More informationArchitecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics
More informationSemantic Web Success Story
Semantic Web Success Story Practical Integration of Semantic Web Technology Chris Chaulk, Software Architect EMC Corporation 1 Who is this guy? Software Architect at EMC 12 years, Storage Management Software
More informationTeradata s Big Data Technology Strategy & Roadmap
Teradata s Big Data Technology Strategy & Roadmap Artur Borycki, Director International Solutions Marketing 18 March 2014 Agenda > Introduction and level-set > Enabling the Logical Data Warehouse > Any
More informationArchitectures for Big Data Analytics A database perspective
Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum
More informationA Brief Study of Open Source Graph Databases
A Brief Study of Open Source Graph Databases Rob McColl David Ediger Jason Poovey Dan Campbell David Bader Georgia Tech Research Institute, Georgia Institute of Technology Abstract With the proliferation
More informationGraph Database Applications and Concepts with Neo4j
with Neo4j Justin J. Miller Georgia Southern University jm10197@georgiasouthern.edu ABSTRACT Graph databases (GDB) are now a viable alternative to Relational Database Systems (RDBMS). Chemistry, biology,
More informationStratioDeep. An integration layer between Cassandra and Spark. Álvaro Agea Herradón Antonio Alcocer Falcón
StratioDeep An integration layer between Cassandra and Spark Álvaro Agea Herradón Antonio Alcocer Falcón StratioDeep An integration layer between Cassandra and Spark Álvaro Agea Herradón Antonio Alcocer
More informationOpen source software framework designed for storage and processing of large scale data on clusters of commodity hardware
Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Created by Doug Cutting and Mike Carafella in 2005. Cutting named the program after
More informationINTRODUCTION TO CASSANDRA
INTRODUCTION TO CASSANDRA This ebook provides a high level overview of Cassandra and describes some of its key strengths and applications. WHAT IS CASSANDRA? Apache Cassandra is a high performance, open
More informationBig Data and Analytics: Challenges and Opportunities
Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif
More informationGraph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPop Jason Plurad Software Engineer, IBM Committer, Apache TinkerPop Project Update Graph Landscape A Graph Problem Hands-On Graph http://tinkerpop.apache.org About Me
More informationComplexity and Scalability in Semantic Graph Analysis Semantic Days 2013
Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013 James Maltby, Ph.D 1 Outline of Presentation Semantic Graph Analytics Database Architectures In-memory Semantic Database Formulation
More informationCloud Computing and Advanced Relationship Analytics
Cloud Computing and Advanced Relationship Analytics Using Objectivity/DB to Discover the Relationships in your Data By Brian Clark Vice President, Product Management Objectivity, Inc. 408 992 7136 brian.clark@objectivity.com
More informationBig Data looks Tiny from the Stratosphere
Volker Markl http://www.user.tu-berlin.de/marklv volker.markl@tu-berlin.de Big Data looks Tiny from the Stratosphere Data and analyses are becoming increasingly complex! Size Freshness Format/Media Type
More informationSoftware Life-Cycle Management
Ingo Arnold Department Computer Science University of Basel Theory Software Life-Cycle Management Architecture Styles Overview An Architecture Style expresses a fundamental structural organization schema
More informationA Tour of the Zoo the Hadoop Ecosystem Prafulla Wani
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to
More informationInfrastructures for big data
Infrastructures for big data Rasmus Pagh 1 Today s lecture Three technologies for handling big data: MapReduce (Hadoop) BigTable (and descendants) Data stream algorithms Alternatives to (some uses of)
More informationRelational Database Basics Review
Relational Database Basics Review IT 4153 Advanced Database J.G. Zheng Spring 2012 Overview Database approach Database system Relational model Database development 2 File Processing Approaches Based on
More informationRule-Based Engineering Using Declarative Graph Database Queries
Rule-Based Engineering Using Declarative Graph Database Queries Sten Grüner, Ulrich Epple Chair of Process Control Engineering, RWTH Aachen University MBEES 2014, Dagstuhl, 05.03.14 Motivation Every plant
More informationE6895 Advanced Big Data Analytics Lecture 4:! Data Store
E6895 Advanced Big Data Analytics Lecture 4:! Data Store Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science Mgr., Dept. of Network Science and Big Data Analytics,
More informationScalable Architecture on Amazon AWS Cloud
Scalable Architecture on Amazon AWS Cloud Kalpak Shah Founder & CEO, Clogeny Technologies kalpak@clogeny.com 1 * http://www.rightscale.com/products/cloud-computing-uses/scalable-website.php 2 Architect
More informationBig Data and Scripting Systems build on top of Hadoop
Big Data and Scripting Systems build on top of Hadoop 1, 2, Pig/Latin high-level map reduce programming platform Pig is the name of the system Pig Latin is the provided programming language Pig Latin is
More informationCloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu
Lecture 4 Introduction to Hadoop & GAE Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu Outline Introduction to Hadoop The Hadoop ecosystem Related projects
More informationLogistics. Database Management Systems. Chapter 1. Project. Goals for This Course. Any Questions So Far? What This Course Cannot Do.
Database Management Systems Chapter 1 Mirek Riedewald Many slides based on textbook slides by Ramakrishnan and Gehrke 1 Logistics Go to http://www.ccs.neu.edu/~mirek/classes/2010-f- CS3200 for all course-related
More informationBig Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料
Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料 美 國 13 歲 學 生 用 Big Data 找 出 霸 淩 熱 點 Puri 架 設 網 站 Bullyvention, 藉 由 分 析 Twitter 上 找 出 提 到 跟 霸 凌 相 關 的 詞, 搭 配 地 理 位 置
More informationOracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.
Oracle BI EE Implementation on Netezza Prepared by SureShot Strategies, Inc. The goal of this paper is to give an insight to Netezza architecture and implementation experience to strategize Oracle BI EE
More informationAn Approach to Implement Map Reduce with NoSQL Databases
www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh
More informationEnterprise Operational SQL on Hadoop Trafodion Overview
Enterprise Operational SQL on Hadoop Trafodion Overview Rohit Jain Distinguished & Chief Technologist Strategic & Emerging Technologies Enterprise Database Solutions Copyright 2012 Hewlett-Packard Development
More informationA COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA
A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA Ompal Singh Assistant Professor, Computer Science & Engineering, Sharda University, (India) ABSTRACT In the new era of distributed system where
More informationBig Data and Data Science: Behind the Buzz Words
Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing
More information