# MEAP Edition Manning Early Access Program Neo4j in Action MEAP version 3

Save this PDF as:

Size: px
Start display at page:

Download "MEAP Edition Manning Early Access Program Neo4j in Action MEAP version 3"

## Transcription

1

2 MEAP Edition Manning Early Access Program Neo4j in Action MEAP version 3 Copyright 2012 Manning Publications For more information on this and other Manning titles go to

3 brief contents PART 1: INTRODUCTION TO NEO4J 1. A case for Neo4j database 2. Starting development with Neo4j 3. The power of traversals 4. Indexing the data PART II: APPLICATION DEVELOPMENT WITH NEO4J 5. Cypher Neo4j query language 6. Neo4j transactions 7. Traversals in depth 8. Neo4j with Spring data PART III: NEO4J IN PRODUCTION 9. Neo4j: embedded vs server 10. Monitoring Neo4j 11. Maintaining Neo4j 12. Highly available Neo4j

4 1 1 A Case for Neo4j Database Computer science is closely related to mathematics, with a lot of its concepts coming originally from the philosophy of mathematics. Algorithms, cryptography, computation and automation, and even basic theories of mathematical logic and Boolean algebra are all mathematical concepts that closely couple these two disciplines. Another mathematical topic can often be found in the computer science books and articles graph theory. In computer science, graphs are used to represent specific data structures, for example organization hierarchy, social network, processing flow. Typically, during software design phase, the structures, flows and algorithms are described as graph diagrams on the whiteboard. The object-oriented structure of the computer system is modeled as graph as well, with inheritance, composition and object members. However, while the graphs are used extensively during software development process, when it comes to data persistence, we usually forget about graphs. We try to fit our data to relational tables and columns, normalize and renormalize its structure until it looks completely different to what it is trying to represent. Take one example access control list, problem solved over and over again in many enterprise applications. We would typically have tables for users, roles and resources. Then we would have many-to-many tables to map users to roles, and roles to resources. In the end we would have at least five relational tables to represent rather simple data structure, which is actually a graph. Then we would use an Object Relational Mapping tool to map this data to our object model, which is also actually a graph. Wouldn t it be nice if we could represent the data in its natural form, so that we can make mappings more intuitive, and skip the repeating process of translating our data to and from storage engine? Thanks to graph databases, we can. Graph databases use the graph model to store data as graph, with structure consisting of vertices and edges, the two entities used to model any graph.

5 2 In addition, we can use all algorithms from the long history of graph theory, to solve graph problems more efficiently and in less time then using relational database queries. Once you have read this book you will be familiar with Neo4j, one of the most prominent graph databases currently available. You will learn how the Neo4j graph database helps you model and solve graph problems in a better performing and more elegant way, even when working with large data sets. 1.1 Why Neo4j Why would we use graph database, or more specifically Neo4j, as database of choice? To try to answer that question, let s take a look at en example of a graph domain problem and compare the Neo4j-based solution to the traditional usage of relational database as data storage technology. The example we are going to explore is a social network a set of users that can be friends with each other. Figure 1.1 illustrates the social network where users connected with arrows are representing friends. Figure 1.1 User and their friends represented as graph data structure

6 3 Let s take a look at the relational model that would use to store data about users and their friends. 1.2 Graph Data in a Relational Database In a relational database, we would typically have two relational tables for storing social network data - one for storing user information, and another one that stores the relationships between users (see relational diagram on figure 1.2). Figure 1.1 SQL diagram of tables representing user and friends data Listing 1.1 shows the actual SQL script for the creating tables using MySQL database. Listing 1.1 SQL script defining tables for social network data create table t_user ( #A id bigint not null, name varchar(255) not null, primary key (id) ); create table t_user_friend ( #B

8 5 could have a performance problem. As we will see, this can make a huge strain our relational database engine. To illustrate the performance of such queries, we have run the friends of friends query on the small data set of 1000 users, where each user has on average 50 friends. This means that table t_user contains 1000 records, while table t_user_friend contains 1000 X 50 = 50,000 records. No additional database performance tuning has been performed apart form column indices defined in the SQL script from Listing 1-1. We run the friends of friends query for depths two, three and four and five. We measured the execution time, repeating toe query ten times in order to warm up any caches available that would improve performance. Table 1.1 shows the results of our experiment. NOTE All experiments we executed on the Intel i7 powered commodity laptop with 8GB of RAM, same one that is used to write this book. Table 1.1 Execution times for multiple join queries using MySQL database engine on data set of 1,000 users Depth Execution time (seconds) 1,000 users Records returned ~ ~ ~ , ~999 NOTE Note that with depth 3, 4 and 5 we return 999 records. Due to the small data set, any user in our database is connected to all others, instead himself. As we can see, MySQL handles queries with depths 2 and 3 quite well. That s not unexpected join operations are common in the relational world, so most databases engines are designed and tuned with this in mind. In addition, we used databases indexes on relevant columns to increase performance of join queries. With depths 4 and 5, however, we see significant degradation of performance: query involving four joins takes over 10 seconds to executes, while depth 5 execution takes way too long over minute and a half, although the number of rows returned does not change. This just illustrates the limitation of MySQL when modelling graph data deep graphs require multiple joins, which relational databases typically don t handle too well.

9 6 Inefficiency of SQL Joins In order to find all user s friends on depth 5, relational database engine needs to generate Cartesian product of t_user_friend table five times, With 50,000 record in the table, the resulting set will have 50,000 5 rows (102,4 X ), which takes quite a lot of time and computing power to calculate. Then, we discard more the 99% of that, to return just under 1,000 records we re interested in! So, let s take a look how does Neo4j perform with the same data set. Instead of tables, columns and foreign keys, we are going to model users as nodes, and friendships as relationships between nodes. 1.3 Graph Data in Neo4j Neo4j stores data as vertices and edges, or, in Neo4j terminology, nodes and relationships. Users will be represented as nodes, and friendships will be represented as relationships between user nodes. If you take another look at our social network on Figure 1-1, you will see that it represents nothing more then a graph, with users as nodes and friendship arrows as relationships. There is one key difference between relational database and Neo4j, which we will come across straight away data querying. We don t have table and columns in Neo4j, no SQL with select and join commands. How do we query graph database? And no, the answer is not write a distributed map-reduce function. Neo4j, like all graph databases, takes a powerful mathematical concept from graph theory and uses it as powerful and efficient engine for querying data. This concept is graph traversal, and it s one of the main tools that make Neo4j so powerful it dealing with large-scale graph data Traversing the Graph The traversal is the operation of visiting a set of nodes in the graph by moving between nodes connected with relationships. It s a fundamental operation for data retrieval in a graph, and as such it is unique to the graph model. The key concept of traversals is that they are localized querying the data using traversal only takes into account the data which is required, without the need to perform expensive grouping operations on the entire data set, like we did with join operations on relational data. Neo4j provides rich traversal API, which you can employ to navigate through the graph. In addition, you can use REST API, or Neo4j query languages to traverse your data. We will dedicate much of this book to teach you the principles and best practices when traversing data with Neo4j. In order to get all friends of user s friends we will run the code in Listing 3.2 Listing 3-2 Neo4j Traversal API code for finding all friends on depth two. TraversalDescription traversaldescription = [CA]Traversal.description()[CA].relationships( IS_FRIEND_OF,

10 7 Direction.OUTGOING) [CA].evaluator(Evaluators.atDepth(2)) [CA].uniqueness(Uniqueness.NODE_GLOBAL); Iterable<Node> nodes = traversaldescription.traverse(nodebyid).nodes(); Don t worry if you don t understand the syntax of the above code snippet everything will be explained slowly and thoroughly in the next few chapters. Figure 1.3 illustrates the traversal of our social network graph, based on the traversal description above. Figure 1.3 Traversing the social network graph data Before the traversal starts, we select the starting node from which the traversal will start (Node X of Figure 1.3). Then we follow all friendship relationships (arrows) and collect the visited nodes as result. While traversal are satisfied, the traversal continues it s journey from one node to another, via relationships that connect them. When the rules stop applying, the traversal stops. For example, the rule can be visit only nodes that are on depth one from the starting node, in which case once all nodes on depth one are visited, the traversal stops. Traversal can also be configured to stop once it visits all nodes or all relationships in a graph. Our friends traversal example will finish when all nodes in the graph have been visited at least once.

11 8 Table 1.2 shows the performance metrics for functionally same queries we executed on MySQL database, with data set of 1,000 users and 50 friends per user. Table 1.2 Execution times for graph traversal using Neo4j on data set of 1,000 users Depth Execution time (seconds) 1,000 users Records returned ~ ~ ~ ~999 NOTE Similarly to MySQL setup, no additional performance tuning has been performed on Neo4j instance. Neo4j was running in embedded mode, with default configuration. First thing to notice is that Neo4j performance is significantly better for all queries, except the simplest one. Only when looking for friends of friends the MySQL performance is comparable to the performance of Neo4j traversals. Traversal of friends on depth 3 is four times faster then the SQL counterpart. When performing traversal on depth 4, the results are 5 orders of magnitude times better! Finally, the depth 5 results are ten million times faster with Neo4j traversal compared to SQL queries! Another conclusion that can be made from the results from table 1.2 is that performance of the query does degrade with the depth of the traversal when the number of nodes returned remains the same. MySQL query performance degrades with the depth of the query because of the Cartesian product operations that are executed before most of the results are discarded. Neo4j, on the other hand, keeps track of nodes visited, so it can skip the nodes already visited, and therefore significantly improve performance. To find all friends on depth 5, MySQL will perform Cartesian product on t_user_friend table 5 times, resulting in 50,000 5 records, out of which all but 1,000 are discarded. Neo4j, on the other hand, will simply visit nodes in the database, and when there is no more nodes to visit, it will stop the traversal. That is why Neo4j can keep constant performance as long as the number of nodes returned remains the same, opposed to significant degradation in performance when using MySQL queries. NOTE Graph traversals perform significantly better then the equivalent SQL queries (thousands times better with traversal depth of 4 and 5!). At the same time, the performance does not decrease dramatically with the depth the traversal on depth 5 is

12 9 only 0.03 seconds slower then the traversal on depth 2. The perfoemance of the most complex SQL queries is more then 10,000 times slower comparing to the simple ones. But how does this approach scale? To get the answer to that question, let s repeat the experiment with the data set of one million users. 1.4 SQL Joins vs Graph Traversal on Large Scale We are going to use exactly the same data structures as before, the only difference will be the amount of data. In MySQL we will have 1,000,000 records in t_user table, and approximately 1,000,000 X 50 = 50,000,000 records in t_user_friend table. We have run the same four queries against this data set (friends on depth 2,3,4 and 5). Table 1.3 shows the collected results for the performance of SQL queries in this case. Table 1.3 Execution times for multiple join queries using MySQL database engine on data set of 1,000,000 users Depth Execution time (seconds) 1,000 users Records returned ~2, ~125, , ~600,000 5 NOT FINISHED - Comparing to the SQL results for a 1,000 users data set, we can see that the performance of depth-two query has stayed the same, which can be explained by the design of the MySQL engine to handle table joins efficiently using indexes. Queries with depth 3 and 4 (which use 3 and 4 multiple join operations. respectively) demonstrate much worse results, by at least two orders of magnitude. SQL query for all friends on depth 5 did not finish in one hour we have been running the script! These results clearly show that the MySQL relational database is optimized for a single join queries, even on the large data sets. However, the performance of multiple join queries on large data sets degrades significantly, to the point that some queries are not even executable (for example friends on depth 5 on one million users data set) Why are are relational database queries so slow? The results in the Table 3.3 are somewhat expected given the way joins work. As we discussed earlier, each join creates a Cartesian product of all potential combinations of rows, and then filters out those that are not matching the where clause. With one million users, the Cartesian product of 5 joins (equivalent of query on depth 5) contains huge

13 10 number of rows billion billion billions way to many zeros to be readable. Filtering out all the records that don t match the query is too expensive such that the SQL query on depth 5 never finishes in a reasonable time. Let s now repeat the same experiment with Neo4j traversals. We will have one million nodes representing users and approximately 50 million relationships stored in Neo4j. We run exactly the same four traversals as in the previous example, and collected the performance results in Table 1.4. Table 1-4 Execution times for graph traversal using Neo4j on data set of 1,000,000 users Depth Execution time (seconds) 1,000 users Records returned ~2, ~110, ~600, ~800,000 As you can see from results in table 1.4, the increase in data by thousand times did not affect Neo4j performance significantly. The traversal get slower as we increase the depth, however the main reason for that is the increased number of results that are returned. The performance is slowing linearly with the increase of the result size, and is predictable even with larger data set and depth level. In addition, this is at least hundred times better then the corresponding MySQL performance. Main reason for this predictability of Neo4j is the localized nature of the graph traversal irrespective how many nodes and relationships there are in the graph, traversal will only visit ones that are connected to the starting node, according to the traversal rules. Remember, the relational join operations will compute Cartesian product before discarding irrelevant results, affecting performance exponentially with the growth of the data set. Neo4j, however, visits only nodes that are relevant to the traversal, so it can keep predictable performance irrespective of the total data set. The more nodes traversal has to visit, the slower the traversal, as we have seen while increasing the traversal depth. However, this increase is linear and still doesn t depend on the total graph size. What is the secret of Neo4j s speed? No, Neo4j developers haven t invented super-fast algorithm for the military. Nor is the Neo4j speed product of fantastic speed of the technologies it relies on it s implemented in Java after all!

15 12 semantic rules and syntax trees. Network analysis is based on graph theory as well, with applications in traffic modeling, telecommunications and topology. In computer science a lot of problems are modeled as graphs: social networks, access control lists, network topologies, taxonomy hierarchies, recommendations engines, etc. Traditionally, all of these problems would use graph object models (implemented by the development team), and store the data in the normalized form in a set of tables with foreign key associations. In the previous section we have seen how social network is stored in the relational model using two tables and a foreign key constraint. Using Neo4j graph database, we have preserved the natural graph structure of the data, without compromising performance or storage space improving the query performance significantly. When using Neo4j graph database to solve graph problems the amount of work for the developer is reduced, as the graph structures are already available via the database APIs. The data is stored in its natural format as graph nodes and relationships, so the effects of object-relational mismatch are minimized. And, the graph query operations are executed in an efficient and performant way by using Neo4j s traversal APIs. Throughout this book we will demonstrate the power of Neo4j by solving problems such as access control lists and social networks with recommendation engines. If you are not familiar with graph theory, don t worry; we will introduce some of the graph theory concepts and algorithms, as well as their applicability to graph storage engines such as Neo4j when it becomes relevant in the later chapters. In the next section we are going to the discuss the reasons why have graph databases (and other NoSQL technologies) started to gain popularity on the computer world. We will also discuss different categories of NoSQL systems with the focus on their differences and applicability. 1.6 Neo4j in NoSQL space Since the beginning of computer software, the data that the applications had to deal with has grown in complexity enormously. Complexity of the data included not only its size, but also the inter-connectedness of the data, it s ever-changing structure and concurrent access to the data. With all these aspects of changing data, it has been recognized that relational databases, which have for a long time been de-facto standard for data storage, are not the best fit for all problems that increasingly complex data required. From that recognition, a number of new storage technologies have been created, with one common goal to solve the problems relational databases were not good at. All these new storage technologies are known under umbrella term NoSQL. NOTE While the NoSQL name stuck, it doesn t reflect accurately the nature of the movement, giving the (wrong) impression that it s against SQL as a concept. The better name would probably be non-relational databases, as the relational/non-relational

16 13 paradigm was the subject of discussion, where SQL is just a language used with relational technologies. The NoSQL movement was born as the acknowledgment that new technologies were required to cope with the data changes. Neo4j, and graph databases in general, are part of the NoSQL movement, together with a lot of other more or less related storage technologies. With rapid developments in NoSQL space, its growing popularity and a lot of different solutions and technologies to choose from, anyone new coming into the NoSQL world faces a lot of confusion when selecting the right technology. That s why in this section we will try to clarify categorization of NoSQL technologies, with the special view on the applicability of each category. In addition, we will make sure to explain the place of graph databases and Neo4j within NoSQL at the same time. KEY-VALUE STORES Key-value stores represent the simplest, yet very powerful category, designed from the need of handling high volume concurrent access to the data. Caching is a typical use case keyvalue based technology. Key-value stores allow storing data in very simple structures, often in-memory, for very fast access even in highly concurrent environment. The data is stored in a huge hash table and is accessible by its key. The data structure is in the form of key/value pairs, and the operations are mostly limited to simple put (write) and get (read) operations. The values support only simple data structures like text or binary content, although some more recent key-value stores support a limited set of complex data structures (Redis for example supports lists and maps as values). Key-value stores are the simplest NoSQL technologies. All other NoSQL categories build on the simplicity, performance and scalability of key-value stores to fit some specific use cases better. COLUMN-FAMILY STORES The need for some sort of data structure, while keeping the distributed key-value model, which scales very well, brought the column-family store category to the NoSQL scene. The idea was to group similar values (or columns) together, by keeping them in the same column-family (for example, user data, or books information). The column families are stored in a single file enabling better reads and writes performance on related data. The main goal of this approach was high performance and high availability when working with big data therefore it s no surprise that the leading technologies in this space are Google s BigTable and Cassandra, originally developed by Facebook. DOCUMENT-ORIENTED DATABASES A lot of the real problems required the data structure that looks like a document (content management systems, user registration data, CRM data ), and document-oriented databases used that structure to describe it in a simple, yet efficient schema-less manner.

17 14 The document model adds additional complexity to the data structure, by adding the selfcontained documents and associative relationships to the document data. This makes it easier to model the data for the common software problems, however it comes at the expense of slightly lower performance and scalability compared to key-value and columnfamily stores. But the convenience of the object model built into the storage system is usually a good trade-off for all but massively concurrent use cases (most of us are not trying to build another Google or Facebook after all). GRAPH DATABASES The graph databases were designed with the view that a lot of the time, we build graph-like structures in our applications, but still store the data in the un-natural way either in tables and columns of relational databases, or even in other NoSQL storage systems. As we mentioned before, problems like ACL lists, social networks, or indeed any kind of networks are natural graph problems. The idea of the graph data model is at the core of graph databases we are finally able to store our object model which represents graph data as a persisted graph! The data model that can naturally represent a lot of complex software requirements, and the efficiency and performance of the graph traversal querying are main strength of graph databases. Table 1.4 illustrates the use cases and representative technologies for each of the NoSQL categories. Table 1.4 Overview of NoSQL categories NoSQL Category Typical Use Cases Best-Known Technologies Key-Value Stores - Caches - Simple domain with fast read access - Massively concurrent systems - Redis - Memcached - Tokyo Cabinet Column-Family Stores - Write on a big scale - Co-located data access (for reading and writing) - Cassandra - Google BigTable - Apache HBase Document-Oriented Databases - When domain model is document by nature - To simplify development using natural document data structures - MongoDb - CounchDb

18 15 Graph Databases - Highly scalable systems (although on the lower level then the key-value and column-family stores) - With inteconnected data - Domain can be represented with nodes and relationships naturally - Social Networks - Recommendation engines - Access Control Lists - Neo4j - AllegroGraph - OrientDB NOTE You can read about NoSQL movement, and Neo4j s place in it, in more detail in Appendix A So far in this chapter we ve seen some example of efficient usage of Neo4j to solve graph-related problems, illustrated how common real world problems can be naturally modeled as graphs, and learned where graph databases and Neo4j in particular sit within the wider NoSQL space. There is one key aspect of Neo4j that none of the other NoSQL stores have, and which is very important when it comes in adoption of new storage technologies in the enterprise world its transactional behavior. 1.7 Neo4j the ACID-compliant database Transaction management has been the talking point of NoSQL technologies since they started to gain popularity. Trade-off of transactional attributes for performance and scalability has been the common theme in non-relational technologies that targeted big data. For example, some (BigTable, Cassandra, CouchDB) opted to trade off consistency, allowing the clients to read stale data in some cases in distributed system (eventual consistency). Or, in key-value stores that concentrated on read performance, durability of the data wasn t of too much interest (Memcached). Or, atomicity on a single operation level, without possibility to wrap multiple database operations within single transaction, typical for document oriented databases. While each of approaches mentioned are valid in specific use cases (for example caching, high data read volumes, high load and concurrency), they are usually the first hurdles when it comes to introducing non-relational databases to any enterprise or corporate environment.

19 16 Although devised a long time ago for the relational databases, transaction attributes are still important in the most practical use cases. Neo4j has taken a different approach here. Neo4j s goal is to be a graph database, with the emphasis on database. This means that you will get full ACID support from the Neo4j database: atomicity (A), where you can wrap multiple database operations within the single transaction, and make sure they are all executed atomically if one of the operations fails, the entire transaction will be rolled back consistency (C), when you write data to the Neo4j, you can be sure that every client accessing the databases afterwards will read the latest updated data Isolation (I), where you can be sure that operations within the single transaction will be isolated one from another, so that writes in one transaction, won t affect reads in another transaction durability (D), so you are certain that the data you write to Neo4j will be written to disk and available after database restart or server crash The ACID transactional support provides seamless transition to Neo4j for anyone used to relational databases, and offers safety and convenience in working with graph data. Transactional support is one of the strong points of Neo4j, which differentiates it from the majority of NoSQL solutions, and makes it a good option not only for NoSQL enthusiasts, but in enterprise environments as well. 1.8 Summary In this chapter we learned how much more efficient and performant can Neo4j be compared to relational databases to solve specific problems, like social network in our example. We illustrated performance and scalability benefits of graph databases represented by Neo4j, showing clear advantage of Neo4j to relational database when dealing with graph related problems. We also learned about some of the key characteristics of Neo4j, as well as discussed its place within the larger NoSQL movement of related data storage technologies. In the next chapter we are going to move straight to the hands-on, practical usage of Neo4j, and start modeling a specific graph problem with Neo4j Java API.

### Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores Composite Software October 2010 TABLE OF CONTENTS INTRODUCTION... 3 BUSINESS AND IT DRIVERS... 4 NOSQL DATA STORES LANDSCAPE...

### Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1

Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots

### NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

NoSQL systems: introduction and data models Riccardo Torlone Università Roma Tre Why NoSQL? In the last thirty years relational databases have been the default choice for serious data storage. An architect

### Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Analytics March 2015 White paper Why NoSQL? Your database options in the new non-relational world 2 Why NoSQL? Contents 2 New types of apps are generating new types of data 2 A brief history of NoSQL 3

### Database Management System Choices. Introduction To Database Systems CSE 373 Spring 2013

Database Management System Choices Introduction To Database Systems CSE 373 Spring 2013 Outline Introduction PostgreSQL MySQL Microsoft SQL Server Choosing A DBMS NoSQL Introduction There a lot of options

### Cloud Scale Distributed Data Storage. Jürmo Mehine

Cloud Scale Distributed Data Storage Jürmo Mehine 2014 Outline Background Relational model Database scaling Keys, values and aggregates The NoSQL landscape Non-relational data models Key-value Document-oriented

### How graph databases started the multi-model revolution

How graph databases started the multi-model revolution Luca Garulli Author and CEO @OrientDB QCon Sao Paulo - March 26, 2015 Welcome to Big Data 90% of the data in the world today has been created in the

### CHAPTER 1: NOSQL: WHAT IT IS AND WHY YOU NEED IT 3

INTRODUCTION xvii PART I: GETTING STARTED CHAPTER 1: NOSQL: WHAT IT IS AND WHY YOU NEED IT 3 Definition and Introduction 4 Context and a Bit of History 4 Big Data 7 Scalability 9 Defi nition and Introduction

### Sentimental Analysis using Hadoop Phase 2: Week 2

Sentimental Analysis using Hadoop Phase 2: Week 2 MARKET / INDUSTRY, FUTURE SCOPE BY ANKUR UPRIT The key value type basically, uses a hash table in which there exists a unique key and a pointer to a particular

### NoSQL Database Options

NoSQL Database Options Introduction For this report, I chose to look at MongoDB, Cassandra, and Riak. I chose MongoDB because it is quite commonly used in the industry. I chose Cassandra because it has

### extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

System/ Scale to Primary Secondary Joins/ Integrity Language/ Data Year Paper 1000s Index Indexes Transactions Analytics Constraints Views Algebra model my label 1971 RDBMS O tables sql-like 2003 memcached

### Objectives. Introduce some key concepts behind the NoSQL family of databases

NoSQL Source: Pramod J. Sadalage and Martin Fowler NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence, Pearson Education, 2013 Objectives Introduce some key concepts behind the

### Domain driven design, NoSQL and multi-model databases

Domain driven design, NoSQL and multi-model databases Java Meetup New York, 10 November 2014 Max Neunhöffer www.arangodb.com Max Neunhöffer I am a mathematician Earlier life : Research in Computer Algebra

### InfiniteGraph: The Distributed Graph Database

A Performance and Distributed Performance Benchmark of InfiniteGraph and a Leading Open Source Graph Database Using Synthetic Data Objectivity, Inc. 640 West California Ave. Suite 240 Sunnyvale, CA 94086

### III Big Data Technologies

III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

### Integrating Big Data into the Computing Curricula

Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big

### NoSQL Data Base Basics

NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS

### White Paper. Optimizing the Performance Of MySQL Cluster

White Paper Optimizing the Performance Of MySQL Cluster Table of Contents Introduction and Background Information... 2 Optimal Applications for MySQL Cluster... 3 Identifying the Performance Issues.....

### Big Data Analytics. 6. NoSQL Databases. Lars Schmidt-Thieme

Big Data Analytics 6. NoSQL Databases Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany original slides by Lucas Rego

### NoSQL Databases. Polyglot Persistence

The future is: NoSQL Databases Polyglot Persistence a note on the future of data storage in the enterprise, written primarily for those involved in the management of application development. Martin Fowler

### OPERATIONAL SCENARIOS USING THE MICROSOFT DATA PLATFORM

David Chappell OPERATIONAL SCENARIOS USING THE MICROSOFT DATA PLATFORM A GUIDE FOR IT LEADERS Sponsored by Microsoft Corporation Copyright 2016 Chappell & Associates Contents Microsoft s Data Platform:

### Understanding NoSQL Technologies on Windows Azure

David Chappell Understanding NoSQL Technologies on Windows Azure Sponsored by Microsoft Corporation Copyright 2013 Chappell & Associates Contents Data on Windows Azure: The Big Picture... 3 Windows Azure

### Lecture 10: HBase! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Big Data Processing, 2014/15 Lecture 10: HBase!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind the

### The Quest for Extreme Scalability

The Quest for Extreme Scalability In times of a growing audience, very successful internet applications have all been facing the same database issue: while web servers can be multiplied without too many

### Structured Data Storage

Structured Data Storage Xgen Congress Short Course 2010 Adam Kraut BioTeam Inc. Independent Consulting Shop: Vendor/technology agnostic Staffed by: Scientists forced to learn High Performance IT to conduct

### Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

Overview of Databases On MacOS Karl Kuehn Automation Engineer RethinkDB Session Goals Introduce Database concepts Show example players Not Goals: Cover non-macos systems (Oracle) Teach you SQL Answer what

### Some issues on Conceptual Modeling and NoSQL/Big Data

Some issues on Conceptual Modeling and NoSQL/Big Data Tok Wang Ling National University of Singapore 1 Database Models File system - field, record, fixed length record Hierarchical Model (IMS) - fixed

### SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55 Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems

### In the same spirit, our QuickBooks 2008 Software Installation Guide has been completely revised as well.

QuickBooks 2008 Software Installation Guide Welcome 3/25/09; Ver. IMD-2.1 This guide is designed to support users installing QuickBooks: Pro or Premier 2008 financial accounting software, especially in

### Application of NoSQL Database in Web Crawling

Application of NoSQL Database in Web Crawling College of Computer and Software, Nanjing University of Information Science and Technology, Nanjing, 210044, China doi:10.4156/jdcta.vol5.issue6.31 Abstract

### D61830GC30. MySQL for Developers. Summary. Introduction. Prerequisites. At Course completion After completing this course, students will be able to:

D61830GC30 for Developers Summary Duration Vendor Audience 5 Days Oracle Database Administrators, Developers, Web Administrators Level Technology Professional Oracle 5.6 Delivery Method Instructor-led

### Practical Cassandra. Vitalii Tymchyshyn tivv00@gmail.com @tivv00

Practical Cassandra NoSQL key-value vs RDBMS why and when Cassandra architecture Cassandra data model Life without joins or HDD space is cheap today Hardware requirements & deployment hints Vitalii Tymchyshyn

### Understanding NoSQL on Microsoft Azure

David Chappell Understanding NoSQL on Microsoft Azure Sponsored by Microsoft Corporation Copyright 2014 Chappell & Associates Contents Data on Azure: The Big Picture... 3 Relational Technology: A Quick

### Big Systems, Big Data

Big Systems, Big Data When considering Big Distributed Systems, it can be noted that a major concern is dealing with data, and in particular, Big Data Have general data issues (such as latency, availability,

### Graph Database Proof of Concept Report

Objectivity, Inc. Graph Database Proof of Concept Report Managing The Internet of Things Table of Contents Executive Summary 3 Background 3 Proof of Concept 4 Dataset 4 Process 4 Query Catalog 4 Environment

### Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

Evaluating NoSQL for Enterprise Applications Dirk Bartels VP Strategy & Marketing Agenda The Real Time Enterprise The Data Gold Rush Managing The Data Tsunami Analytics and Data Case Studies Where to go

### NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases

NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases Background Inspiration: postgresapp.com demo.beatstream.fi (modern desktop browsers without

### NoSQL. Thomas Neumann 1 / 22

NoSQL Thomas Neumann 1 / 22 What are NoSQL databases? hard to say more a theme than a well defined thing Usually some or all of the following: no SQL interface no relational model / no schema no joins,

### Comparing SQL and NOSQL databases

COSC 6397 Big Data Analytics Data Formats (II) HBase Edgar Gabriel Spring 2015 Comparing SQL and NOSQL databases Types Development History Data Storage Model SQL One type (SQL database) with minor variations

### A survey of big data architectures for handling massive data

CSIT 6910 Independent Project A survey of big data architectures for handling massive data Jordy Domingos - jordydomingos@gmail.com Supervisor : Dr David Rossiter Content Table 1 - Introduction a - Context

### NOSQL INTRODUCTION WITH MONGODB AND RUBY GEOFF LANE <GEOFF@ZORCHED.NET> @GEOFFLANE

NOSQL INTRODUCTION WITH MONGODB AND RUBY GEOFF LANE @GEOFFLANE WHAT IS NOSQL? NON-RELATIONAL DATA STORAGE USUALLY SCHEMA-FREE ACCESS DATA WITHOUT SQL (THUS... NOSQL) WIDE-COLUMN / TABULAR

### Performance Management of SQL Server

Performance Management of SQL Server Padma Krishnan Senior Manager When we design applications, we give equal importance to the backend database as we do to the architecture and design of the application

### An Approach to Implement Map Reduce with NoSQL Databases

www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh

### White Paper: Big Data and the hype around IoT

1 White Paper: Big Data and the hype around IoT Author: Alton Harewood 21 Aug 2014 (first published on LinkedIn) If I knew today what I will know tomorrow, how would my life change? For some time the idea

### Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料 美 國 13 歲 學 生 用 Big Data 找 出 霸 淩 熱 點 Puri 架 設 網 站 Bullyvention, 藉 由 分 析 Twitter 上 找 出 提 到 跟 霸 凌 相 關 的 詞, 搭 配 地 理 位 置

### Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)

Not Relational Models For The Management of Large Amount of Astronomical Data Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF) What is a DBMS A Data Base Management System is a software infrastructure

### Using Object Database db4o as Storage Provider in Voldemort

Using Object Database db4o as Storage Provider in Voldemort by German Viscuso db4objects (a division of Versant Corporation) September 2010 Abstract: In this article I will show you how

### Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications

### Big Data Analytics. Rasoul Karimi

Big Data Analytics Rasoul Karimi Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 1 Introduction

### INTRODUCTION TO CASSANDRA

INTRODUCTION TO CASSANDRA This ebook provides a high level overview of Cassandra and describes some of its key strengths and applications. WHAT IS CASSANDRA? Apache Cassandra is a high performance, open

### You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.

What is this course about? This course is an overview of Big Data tools and technologies. It establishes a strong working knowledge of the concepts, techniques, and products associated with Big Data. Attendees

### these three NoSQL databases because I wanted to see a the two different sides of the CAP

Michael Sharp Big Data CS401r Lab 3 For this paper I decided to do research on MongoDB, Cassandra, and Dynamo. I chose these three NoSQL databases because I wanted to see a the two different sides of the

### Applications for Big Data Analytics

Smarter Healthcare Applications for Big Data Analytics Multi-channel sales Finance Log Analysis Homeland Security Traffic Control Telecom Search Quality Manufacturing Trading Analytics Fraud and Risk Retail:

### 2.1.5 Storing your application s structured data in a cloud database

30 CHAPTER 2 Understanding cloud computing classifications Table 2.3 Basic terms and operations of Amazon S3 Terms Description Object Fundamental entity stored in S3. Each object can range in size from

ADMT 2014/15 Unit 15 J. Gamper 1/44 Advanced Data Management Technologies Unit 15 Introduction to NoSQL J. Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE ADMT 2014/15 Unit 15

### INTRODUCTION TO K2VIEW FABRIC

INTRODUCTION TO K2VIEW FABRIC A WHITE PAPER BY K2VIEW. ABSTRACT Across industries, the amount of data to be managed is exponentially growing, and with it the need for modern, fully-distributed and scalable

### NoSQL a view from the top

Red Stack Tech Ltd James Anthony Technology Director NoSQL a view from the top Part 1 1 Contents Introduction...Page 3 Key Value Stores..... Page 4 Column Family Data Stores.. Page 6 Document Data Stores...Page

wow CPSC350 relational schemas table normalization practical use of relational algebraic operators tuple relational calculus and their expression in a declarative query language relational schemas CPSC350

### BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next

### Lecture Data Warehouse Systems

Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores

### Firebird meets NoSQL (Apache HBase) Case Study

Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

### A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA

A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA Ompal Singh Assistant Professor, Computer Science & Engineering, Sharda University, (India) ABSTRACT In the new era of distributed system where

### Challenges for Data Driven Systems

Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2

### MySQL Storage Engines

MySQL Storage Engines Data in MySQL is stored in files (or memory) using a variety of different techniques. Each of these techniques employs different storage mechanisms, indexing facilities, locking levels

### Department of Software Systems. Presenter: Saira Shaheen, 227233 saira.shaheen@tut.fi 0417016438 Dated: 02-10-2012

1 MongoDB Department of Software Systems Presenter: Saira Shaheen, 227233 saira.shaheen@tut.fi 0417016438 Dated: 02-10-2012 2 Contents Motivation : Why nosql? Introduction : What does NoSQL means?? Applications

### HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367

HBase A Comprehensive Introduction James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367 Overview Overview: History Began as project by Powerset to process massive

### NoSQL Databases. Nikos Parlavantzas

!!!! NoSQL Databases Nikos Parlavantzas Lecture overview 2 Objective! Present the main concepts necessary for understanding NoSQL databases! Provide an overview of current NoSQL technologies Outline 3!

### Microsoft Azure Data Technologies: An Overview

David Chappell Microsoft Azure Data Technologies: An Overview Sponsored by Microsoft Corporation Copyright 2014 Chappell & Associates Contents Blobs... 3 Running a DBMS in a Virtual Machine... 4 SQL Database...

### Spring Data. Modern Data Access for Enterprise Java. Jon Brisbin, and Michael Hunger O'REILLY* Mark Pollack, Oliver Gierke, Thomas Risberg, Cambridge

Spring Data Modern Data Access for Enterprise Java Mark Pollack, Oliver Gierke, Thomas Risberg, Jon Brisbin, and Michael Hunger O'REILLY* Beijing Cambridge Farnham Koln Sebastopol Tokyo Table of Contents

### Performance Evaluation of NoSQL Systems Using YCSB in a resource Austere Environment

International Journal of Applied Information Systems (IJAIS) ISSN : 2249-868 Performance Evaluation of NoSQL Systems Using YCSB in a resource Austere Environment Yusuf Abubakar Department of Computer Science

### Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY ANN KELLY II MANNING Shelter Island contents foreword preface xvii xix acknowledgments xxi about this book xxii Part 1 Introduction

### nosql and Non Relational Databases

nosql and Non Relational Databases Image src: http://www.pentaho.com/big-data/nosql/ Matthias Lee Johns Hopkins University What NoSQL? Yes no SQL.. Atleast not only SQL Large class of Non Relaltional Databases

### Analytic Applications With PHP and a Columnar Database

AnalyticApplicationsWithPHPandaColumnarDatabase No matter where you look these days, PHP continues to gain strong use both inside and outside of the enterprise. Although developers have many choices when

### Can the Elephants Handle the NoSQL Onslaught?

Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented

### SQL, NoSQL, and Next Generation DBMSs. Shahram Ghandeharizadeh Director of the USC Database Lab

SQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab Outline A brief history of DBMSs. OSs SQL NoSQL 1960/70 1980+ 2000+ Before Computers Database DBMS/Data Store

### NOSQL, BIG DATA AND GRAPHS. Technology Choices for Today s Mission- Critical Applications

NOSQL, BIG DATA AND GRAPHS Technology Choices for Today s Mission- Critical Applications 2 NOSQL, BIG DATA AND GRAPHS NOSQL, BIG DATA AND GRAPHS TECHNOLOGY CHOICES FOR TODAY S MISSION- CRITICAL APPLICATIONS

### The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect IT Insight podcast This podcast belongs to the IT Insight series You can subscribe to the podcast through

### Bringing Big Data into the Enterprise

Bringing Big Data into the Enterprise Overview When evaluating Big Data applications in enterprise computing, one often-asked question is how does Big Data compare to the Enterprise Data Warehouse (EDW)?

### MongoDB in the NoSQL and SQL world. Horst Rechner horst.rechner@fokus.fraunhofer.de Berlin, 2012-05-15

MongoDB in the NoSQL and SQL world. Horst Rechner horst.rechner@fokus.fraunhofer.de Berlin, 2012-05-15 1 MongoDB in the NoSQL and SQL world. NoSQL What? Why? - How? Say goodbye to ACID, hello BASE You

### A Performance Evaluation of Open Source Graph Databases. Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader

A Performance Evaluation of Open Source Graph Databases Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader Overview Motivation Options Evaluation Results Lessons Learned Moving Forward

### ON-LINE VIDEO ANALYTICS EMBRACING BIG DATA

ON-LINE VIDEO ANALYTICS EMBRACING BIG DATA David Vanderfeesten, Bell Labs Belgium ANNO 2012 YOUR DATA IS MONEY BIG MONEY! Your click stream, your activity stream, your electricity consumption, your call

### Big Data Management. Big Data Management. (BDM) Autumn 2013. Povl Koch September 30, 2013 29-09-2013 1

Big Data Management Big Data Management (BDM) Autumn 2013 Povl Koch September 30, 2013 29-09-2013 1 Overview Today s program 1. Little more practical details about this course 2. Recap from last time 3.

### NoSQL Database Systems and their Security Challenges

NoSQL Database Systems and their Security Challenges Morteza Amini amini@sharif.edu Data & Network Security Lab (DNSL) Department of Computer Engineering Sharif University of Technology September 25 2

### The Sierra Clustered Database Engine, the technology at the heart of

A New Approach: Clustrix Sierra Database Engine The Sierra Clustered Database Engine, the technology at the heart of the Clustrix solution, is a shared-nothing environment that includes the Sierra Parallel

### Study concluded that success rate for penetration from outside threats higher in corporate data centers

Auditing in the cloud Ownership of data Historically, with the company Company responsible to secure data Firewall, infrastructure hardening, database security Auditing Performed on site by inspecting

### Raima Database Manager Version 14.0 In-memory Database Engine

+ Raima Database Manager Version 14.0 In-memory Database Engine By Jeffrey R. Parsons, Senior Engineer January 2016 Abstract Raima Database Manager (RDM) v14.0 contains an all new data storage engine optimized

### Ques 1. Define dbms and file management system? Ans- Database management system (DBMS) A file management system

UNIT-1 Ques 1. Define dbms and file management system? Ans- Database management system (DBMS) is a collection of interrelated data and a set of programs to access those data. Some of the very well known

Big Data Are You Ready? Thomas Kyte http://asktom.oracle.com The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated

### Reference Architecture, Requirements, Gaps, Roles

Reference Architecture, Requirements, Gaps, Roles The contents of this document are an excerpt from the brainstorming document M0014. The purpose is to show how a detailed Big Data Reference Architecture

### Cloud Computing and Advanced Relationship Analytics

Cloud Computing and Advanced Relationship Analytics Using Objectivity/DB to Discover the Relationships in your Data By Brian Clark Vice President, Product Management Objectivity, Inc. 408 992 7136 brian.clark@objectivity.com

### The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL marcua@csail.mit.edu / @marcua

The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg Adam Marcus MIT CSAIL marcua@csail.mit.edu / @marcua About Me Social Computing + Database Systems Easily Distracted: Wrote The NoSQL Ecosystem in

### Data sharing in the Big Data era

www.bsc.es Data sharing in the Big Data era Anna Queralt and Toni Cortes Storage System Research Group Introduction What ignited our research Different data models: persistent vs. non persistent New storage

### Similarity Search in a Very Large Scale Using Hadoop and HBase

Similarity Search in a Very Large Scale Using Hadoop and HBase Stanislav Barton, Vlastislav Dohnal, Philippe Rigaux LAMSADE - Universite Paris Dauphine, France Internet Memory Foundation, Paris, France

### Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D.

Big Data Technology ดร.ช ชาต หฤไชยะศ กด Choochart Haruechaiyasak, Ph.D. Speech and Audio Technology Laboratory (SPT) National Electronics and Computer Technology Center (NECTEC) National Science and Technology

### Introducing DocumentDB

David Chappell Introducing DocumentDB A NoSQL Database for Microsoft Azure Sponsored by Microsoft Corporation Copyright 2014 Chappell & Associates Contents Why DocumentDB?... 3 The DocumentDB Data Model...

### Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Created by Doug Cutting and Mike Carafella in 2005. Cutting named the program after

### REAL-TIME BIG DATA ANALYTICS

www.leanxcale.com info@leanxcale.com REAL-TIME BIG DATA ANALYTICS Blending Transactional and Analytical Processing Delivers Real-Time Big Data Analytics 2 ULTRA-SCALABLE FULL ACID FULL SQL DATABASE LeanXcale

### bigdata Managing Scale in Ontological Systems

Managing Scale in Ontological Systems 1 This presentation offers a brief look scale in ontological (semantic) systems, tradeoffs in expressivity and data scale, and both information and systems architectural