Big Data, Fast Data, Complex Data Jans Aasman Franz Inc
Private, founded 1984 AI, Semantic Technology, professional services Now in Oakland Franz Inc Who We Are
(1 (2 3) (4 5) (6 7) (8 9) (10 11) (12 13) (14 15)(16 17) (18 19 20 21 22 23 24 27 28) (29 30))
No Schema. How is it different from an RDB and why is it more flexible? Say whatever you want to say but ontologies may constrain what you put in triple store No Link Tables because you can do one to many relationships directly No Indexing Choices Can add new data attributes (predicates) on the fly that willbe real timeavailablefor available querying, because everything is automatically indexed. Takes anything you give it: it is trivial to consume Rows and columns from RDB, XML, RDF(S), OWL, Text and Extracted Entities
We in the Semantic Community call what we do Complex Data
Complex data is good at Knowledge (instead of data) RDF and Logic Built to share information about objects, think Linked Open Data Cloud (Public and Enterprise) Complex ad hoc queries and rules and graph algorithms Getting more and more scalable by the day And all built on standards
But they keep asking Shouldn t we do this with big data/ nosql solutions Or with a fast in memory graph database?
Big Data really good at Insane amounts of data Relatively flexible data structures Finding a single object very fast Rudimentary analytics using map/reduce
Hadoop brought parallel data processing to the masses but this is what we do in our labs Notice the Sparse Graph problem And here is where Map/reduce fails http://cacm.acm.org/blogs/blog-cacm/149074-possible-hadoop-trajectories/fulltext
And what about fast data? A new OVUM marketing term for in memory triple stores or in memory graph databases Do we need them? Well, if you have problems expressed as graphs.
Q1: A reasonable hard query for horizontally scaling stores and rdb, a straight forward query for a graph database Select?a?b?c?d?e where { Franz send-money?a?a send-money?b?b send-money?c?c send-money Cray Cray send-money?d Not (?d =?c)?d send-money?e Not (?e?b)?e send-money Franz}
Q1: A very hard query for nosql stores and rdb, a straight forward query for graph database Find a money trail from Franz to Cray that is more than two steps, find another money trail from Franz Cray that is more than two step where the two trails are completely different (Select (?path1?path2) (path Franz Cray <send-money> >= 2?path1) (path Cray Franz <send-money> >= 2?path2) (empty (intersection?path1?path2))
You have billions of sametype objects and you need to retrieve them extremely fast. Or you need simple analytics. You have a fixed size, static data set and you need fast graph computations and pattern matching. You need all the features of an enterprise database but You need to work with ontology driven knowledge base, rules but also the flexibility of a graph database
Are there applications where we Track customers, insurance customers credit cards, employees, parts, etc in real time. Always have a 360 view on every entity need all three?
Big Data: hadoop would be great for storing all triples about a customer, but map/reduce wouldn t get you anywhere to deal with individual triples or detailed analysis. And it certainly won t help you with single user updates Fast Data: graph databases currently not dynamic enough and memory footprint too big. Triple stores: We currently solve the problem in AG4 with partitioning on account id and device id Get an object by graph, create memory cache, apply rules andprediction engine, store changes
AGHorizontal: Distributed triple store. Using Hadoop principles Automatic SPARQL to MapReduce translation AG Vertical: Mostly in mem triple store. 500 % more triples per Gig, including all strings and indices Programmable as graph database.
AG Horizontal Uses BigData hashing ideas for partitioning Redundant storage for multiple indices (slices) We have a SPARQL 1.0 and partially SPARQL 1.1 were we translate a query in a query flow graph and pipeline.
AG Vertical We have a new in graph based database kernel called AIMS Almost In Memory Store Almost In a Micro Second Total disk size for 1 B triples = 35 Gig Including all strings and inverse indices: 35 bytes per triple. 25% is for spogi index = 8.75 bytes per triple A breakthrough in terms of speed and size
A simple memory footprint test X?a?y?z
Memory Footprint Results* Test data and SPARQL, CYPHER, and PROLOG code available on our website.
Thanks!