Big Data, Fast Data, Complex Data. Jans Aasman Franz Inc

Size: px

Start display at page:

Download "Big Data, Fast Data, Complex Data. Jans Aasman Franz Inc"

Amice Hines
9 years ago
Views:

1 Big Data, Fast Data, Complex Data Jans Aasman Franz Inc

2 Private, founded 1984 AI, Semantic Technology, professional services Now in Oakland Franz Inc Who We Are

4 (1 (2 3) (4 5) (6 7) (8 9) (10 11) (12 13) (14 15)(16 17) ( ) (29 30))

6 No Schema. How is it different from an RDB and why is it more flexible? Say whatever you want to say but ontologies may constrain what you put in triple store No Link Tables because you can do one to many relationships directly No Indexing Choices Can add new data attributes (predicates) on the fly that willbe real timeavailablefor available querying, because everything is automatically indexed. Takes anything you give it: it is trivial to consume Rows and columns from RDB, XML, RDF(S), OWL, Text and Extracted Entities

many relationships directly No Indexing Choices Can add new data attributes (predicates) on the fly that willbe real

7 We in the Semantic Community call what we do Complex Data

8 Complex data is good at Knowledge (instead of data) RDF and Logic Built to share information about objects, think Linked Open Data Cloud (Public and Enterprise) Complex ad hoc queries and rules and graph algorithms Getting more and more scalable by the day And all built on standards

(Public and Enterprise) Complex ad hoc queries and rules and graph

9 But they keep asking Shouldn t we do this with big data/ nosql solutions Or with a fast in memory graph database?

11 Big Data really good at Insane amounts of data Relatively flexible data structures Finding a single object very fast Rudimentary analytics using map/reduce

12 Hadoop brought parallel data processing to the masses but this is what we do in our labs Notice the Sparse Graph problem And here is where Map/reduce fails

problem And here is where Map/reduce fails http://cacm.

13 And what about fast data? A new OVUM marketing term for in memory triple stores or in memory graph databases Do we need them? Well, if you have problems expressed as graphs.

14 Q1: A reasonable hard query for horizontally scaling stores and rdb, a straight forward query for a graph database Select?a?b?c?d?e where { Franz send-money?a?a send-money?b?b send-money?c?c send-money Cray Cray send-money?d Not (?d =?c)?d send-money?e Not (?e?b)?e send-money Franz}

a?a send-money?b?b send-money?c?c send-money Cray Cray send-money?

15 Q1: A very hard query for nosql stores and rdb, a straight forward query for graph database Find a money trail from Franz to Cray that is more than two steps, find another money trail from Franz Cray that is more than two step where the two trails are completely different (Select (?path1?path2) (path Franz Cray <send-money> >= 2?path1) (path Cray Franz <send-money> >= 2?path2) (empty (intersection?path1?path2))

that is more than two step where the two trails are completely different (Select (?path1?

19 You have billions of sametype objects and you need to retrieve them extremely fast. Or you need simple analytics. You have a fixed size, static data set and you need fast graph computations and pattern matching. You need all the features of an enterprise database but You need to work with ontology driven knowledge base, rules but also the flexibility of a graph database

You have a fixed size, static data set and you need fast graph computations and pattern

20 Are there applications where we Track customers, insurance customers credit cards, employees, parts, etc in real time. Always have a 360 view on every entity need all three?

22 Big Data: hadoop would be great for storing all triples about a customer, but map/reduce wouldn t get you anywhere to deal with individual triples or detailed analysis. And it certainly won t help you with single user updates Fast Data: graph databases currently not dynamic enough and memory footprint too big. Triple stores: We currently solve the problem in AG4 with partitioning on account id and device id Get an object by graph, create memory cache, apply rules andprediction engine, store changes

23 AGHorizontal: Distributed triple store. Using Hadoop principles Automatic SPARQL to MapReduce translation AG Vertical: Mostly in mem triple store. 500 % more triples per Gig, including all strings and indices Programmable as graph database.

24 AG Horizontal Uses BigData hashing ideas for partitioning Redundant storage for multiple indices (slices) We have a SPARQL 1.0 and partially SPARQL 1.1 were we translate a query in a query flow graph and pipeline.

35 AG Vertical We have a new in graph based database kernel called AIMS Almost In Memory Store Almost In a Micro Second Total disk size for 1 B triples = 35 Gig Including all strings and inverse indices: 35 bytes per triple. 25% is for spogi index = 8.75 bytes per triple A breakthrough in terms of speed and size

36 A simple memory footprint test X?a?y?z

37 Memory Footprint Results* Test data and SPARQL, CYPHER, and PROLOG code available on our website.

38 Thanks!

Bigdata : Enabling the Semantic Web at Web Scale

Bigdata : Enabling the Semantic Web at Web Scale Presentation outline What is big data? Bigdata Architecture Bigdata RDF Database Performance Roadmap What is big data? Big data is a new way of thinking