1 A Performance and Distributed Performance Benchmark of InfiniteGraph and a Leading Open Source Graph Database Using Synthetic Data Objectivity, Inc. 640 West California Ave. Suite 240 Sunnyvale, CA Office:
2 Table of Contents High-level Overview Open Source Graph Database Architecture... 4 High-level Overview InfiniteGraph Architecture... 5 Performance Benchmarks... 6 Graph Construction (Ingesting Vertices and Building Relationships)... 7 Sampling for 64 Search Keys Based on a Unique Indexed Property (Vertex Lookup) Reading Graph Objects from the Database Graph Navigation/Traversal Using Breadth-first-search and Depth-first-search Distributed Performance Benchmark of InfiniteGraph Graph Construction Ingesting Vertices Ingesting Edges Finding Vertices Graph Traversal Conclusion... 20
3 Introduction: In our expanding digital ecosystem the data gathered by companies is not only increasingly complex and interconnected, but there is much more of it. In fact, IBM estimates that 90% of the data in the world today has been created in the last two years alone. Corporate databases have gone from being measured in gigabytes to terabytes and even petabytes, driven by web traffic, social media, financial transactions, , phone calls, IT logs and more. Buried in this mountain of data is intelligence that can be used to shape strategy, improve business processes and increase profits. Unfortunately, the ability to understand and quickly act on extremely large datasets, or Big Data, has proven to be a big headache for many companies. Leading analyst firm, Gartner, reports global enterprise data assets to grow by an additional 650 percent by the end of Technologies such as those in the NOSQL space were developed to deal with large-scale data needs and the storage capacity limitations of traditional relational databases. Using a NOSQL approach, data can be interconnected in ways that, when organized properly, unlock valuable information. These solutions can be divided into four different technology categories: Key-value databases, column family databases, document databases and graph databases. Key-value Databases. A Key-value database, or key value store, is similar to a relational database with rows, but only two columns. The indexing system uses a single string (key) to retrieve the data (value). Very fast for direct look-ups Schema-less, meaning the value could be anything, such as an object or a pointer to data in another data store Column Family Databases. Column family databases also have rows and columns like a relational database, but storage on disk is organized so that columns of related data are grouped together in the same file. As a result, attributes (columns) can be accessed without having to access all of the other columns in the row. Results in very fast actions related to attributes, such as calculating average age Performs poorly in regular OLTP applications where the entire row is required Document Databases. Document databases are similar to object databases, but without the need to predefine an object s attributes (i.e., no schema required). Provides flexibility to store new types or unanticipated sizes of data/objects during operation
4 Graph Databases. Graph databases are also similar to object databases, but the objects and relationships between them are all represented as objects with their own respective sets of attributes. Enables very fast queries when the value of the data is the relationships between people or items Use Graph Databases to identify a relationship between people/items, even when there are many degrees of separation Where the relationships represent costs, identify the optimal combination of groups of people/items The more closely a data model matches the data store structure, the faster queries can be executed and the easier it is to write applications. Of course one size doesn t fit all needs, and multiple tools may be necessary to fully solve a problem. Selecting a Graph Database InfiniteGraph, the distributed and scalable graph database from Objectivity, Inc., was designed specifically to traverse complex relationships and provide the framework for a new set of products that deliver real-time business decision support and Big Data connection analytics across copious stores of disparate, unstructured data. Using InfiniteGraph, companies can visualize data relationships and "connect the dots" on a global scale. This benchmark compares InfiniteGraph and a well-known open source graph database. A high-level view of the two database engines is presented, with a comparison of functionality and performance metrics for some of the most common graph database operations in a distributed environment. High-level Overview Open Source Graph Database Architecture The open source graph database is designed to operate in a single-machine/single-disk type of environment. Access to the database from a remote network is achieved using an open source REST web service. Even when accessed locally, concurrent write access to the database is controlled at the database level, preventing two different processes from simultaneously opening the database in write mode. Single server architecture solutions are suitable for environments where access to a distributed or external set of data is not required, but present barriers to scaling for organizations with multiple, distributed data sources and real-world graphs representing complex data relationships.
5 The following figure shows a common use model for the open source graph database. Several local processes can access the database in read-only mode. In the event multiple applications, or clients, of the database require write operations or when the data is being consumed from a remote machine, the database application has no choice but to use the REST-based web service. High-level Overview InfiniteGraph Architecture InfiniteGraph is built on a highly scalable, distributed database architecture where both data and processing are distributed across the network. A single graph database can be partitioned and distributed across multiple disk volumes and machines, with the ability to query data across machine boundaries. The same database client program can access the graph database locally or across a network in a native manner. The lock server handles lock requests from database applications, allowing for concurrent read-write access to the graph database. Unlike the open source alternative, database access is not controlled when an instance of the database is created, but rather at the transaction level. The data servers handle remote database application requests for nodes and edges from the distributed graph database. Although the REST interface is useful for interactive access to a database from a browser, it is not suitable for high performance access, ingest (data loading) of very large graphs or navigation across very large graphs. While an InfiniteGraph developer could provide a REST interface if desired, one is not provided with the product as it is designed for high performance and very large graph problems that leverage InfiniteGraph s unique distributed capabilities.
6 The following figure shows a common deployment model for the InfiniteGraph database. The InfiniteGraph deployment includes a dedicated lock server process that manages access to the complete database. There is also a separate data server process on each machine where the database is distributed. These processes provide both local and remote access to the data residing on the disk volumes. The same database application code can be run locally or remotely. A set of runtime configuration files assists the application code in determining how to interact with the distributed graph database. Performance Benchmarks The benchmarking tests outlined below are similar to those used by the Graph 500 (http://www.graph500.org/) benchmark for evaluating data intensive applications. The tests involve consuming a set of statistically generated graph data and performing various graph operations on the consumed data. In addition to these tests, the Objectivity team performed common operations like reading elements from the graph. Following are the benchmark operations: Graph construction (Ingesting vertices and building relationships) Sampling for 64 search keys based on a unique indexed property (Vertex lookup)
7 WHITEPAPER InﬁniteGraph: The Distributed Graph Database Graph Navigation/Traversal using Breadth-ﬁrst-search and Depth-ﬁrst-search Reading the graph from the database The size of the graph is speciﬁed in terms of scale and edge-factor (degree). Scale speciﬁes the number of vertices while edge-factor speciﬁes the average number of relationships per vertex. Given a scale and edge-factor, the number of graph elements (graph size) is determined by: Number of vertices = 2scale Number of edges = Number of vertices * edge-factor Graph Construction (Ingesting Vertices and Building Relationships) The graph construction operation involves adding a number of automatically indexed vertices with a single property and building the relationships between these vertices. We measure the rate of ingestion for both vertices and edges. We will examine the ingestion rates as a function of the transaction size, number of threads/processes and graph size. Vertex Ingestion as a Function of Transaction Size The following ﬁgure shows how the database performs when ingesting vertices for a scale of 24 while varying the size of the transaction and the number of concurrent ingest threads. Note that transaction size equals the number of nodes (and edges) per transaction.
8 There are two key observations that can be made: 1. InfiniteGraph can handle large transactions much more efficiently than the open source graph database. 2. The ingest rate for InfiniteGraph is proportional to the number of threads. When using 4 concurrent threads and ingesting 10 5 vertices in a transaction, InfiniteGraph can ingest at a rate greater than 70K vertices per second. Under any condition, the best rate for the open source graph database is less than 20K vertices per second. This is primarily due to InfiniteGraph s strong support for concurrency and locking management. The figure below compares the InfiniteGraph ingest rate to the open source graph database when the open source graph database is deployed using the REST web service (note that this is a log-log plot). We can see that InfiniteGraph s ingest rate is at least two orders of magnitude larger. The transaction size for the REST was controlled using the batch REST API. If the batch REST API is not used, each vertex insert is performed as a single transaction, which would make the open source graph database at least 2 orders of magnitude slower than InfiniteGraph.
9 Vertex Ingestion as a Function of Graph Size The following figure shows the relationship between ingestion rate and the graph scale. In this case the maximum transaction size, or number of ingested vertices, was 100,000. When the number of vertices to be added was less than 100,000, the transaction size was simply the total number of vertices to be added. For example for 10,000 vertices, the transaction size was 10,000. When adding 100,000 or more vertices the transaction size was 100,000. The transaction size was kept constant and 4 threads were used to create and index the vertices. For graphs with a very small number of vertices (less than 1000), the open source graph database has a slight advantage. However for graphs with a large number of vertices, InfiniteGraph can be as much as 4x to 5x faster than the open source graph database. Edge Ingestion (Building Relationships) InfiniteGraph has two modes of ingesting edges: accelerated and standard ingestion. Accelerated ingestion is a concurrent non-blocking ingest operation where the graph is made to be eventually consistent. The rate of ingestion can be several orders of magnitude faster than standard ingestion due
10 to the fact that the operation is multi-tasked and distributed. Accelerated ingestion, in which transactions are ACID compliant, solves a locking issue that is common when updates are performed on the vertex objects. The accelerated ingestion operation resolves these locking issues by creating a set of secondary processes, which efficiently coordinate the relationship building process. The figure below depicts InfiniteGraph in accelerated ingest mode using the following parameters: A single machine 4 disk volumes 4 agents (not counting the monitoring agents) 50MB cache size for the agents Observations: The open source embedded database can only ingest using a single process. When using multiple threads to build the relationship, locking issues are common. While the open source graph database performs well when the number of relationships is kept small, the performance degrades significantly when large number of relationships is required.
11 Using the accelerated ingestion feature, InfiniteGraph allows multiple threads and processes to create relationships. The ingest rate for InfiniteGraph scales with the number of processing units (threads/processes). When using the REST API to build a large relationship, the ingest rate is generally 3 orders of magnitude slower than InfiniteGraph. Sampling for 64 Search Keys Based on a Unique Indexed Property (Vertex Lookup) Before starting the graph traversal the search tests lookup vertices that were indexed using a unique property key. Automatic indexing is used for both databases; the index provider for the open source graph database is Lucene and built-in Graph indexing is used for InfiniteGraph. The following figure shows the results of this benchmark for various graphs.
12 Observations: InfiniteGraph is slightly faster for lookups compared to the embedded open source graph database. The REST-based web service can have a rate that is an order of magnitude slower than InfiniteGraph. Reading Graph Objects from the Database The reading test iterates the vertex-list in the database and reads the properties of the vertex object and the list of edges for each vertex. There isn t an equivalent iterator method for the open source graph database REST service, but fetching the Graph elements using the REST API is in the order of 100s of reads per second. Graph Navigation/Traversal Using Breadth-first-search and Depth-first-search
13 The Traversal Test navigates the paths between two given vertices and counts the number of paths that are visited in the process of getting from the first vertex to the second. The search list is the same list as in the query test, which the Graph 500 edge tool generates. It is also worth noting that there is a higher level of control when using the native database interface for navigation versus the REST service API. The tests involves 64 search paths, which were run when there is a cold read, i.e. with none of the graph elements cached in memory upon initial test start. The following table shows the average rate of traversal for a large graph. InfiniteGraph 68K edges/second 61K edges/second Open Source Graph DB (embedded) Open Source Graph DB (REST service) 920 edges/second
14 Distributed Performance Benchmark of InfiniteGraph This section of the document presents the benchmark results of the InfiniteGraph database operation in a distributed environment. At the time of this benchmark no other distributed graph database was on the market or available for benchmarking comparisons of distributed performance. The storage of the graph is distributed across several identical machines that are connected over a network. Each machine has several local hard disks connected to it. In the diagram above, one of the machines contains the system and index databases, while the actual graph objects (vertices/edges) are equally distributed across all the other machines. A single lock server, located on one of the machines, manages access to the complete database. Each machine has an AMS server that serves data stored on a local machine over the network. The ingest agents processes are used when accelerated ingesting is performed to create the edges of the graph. The benchmark operations are: Graph construction Finding vertices Reading vertices Graph traversal Graph Construction Constructing the graph involves ingesting a number of vertices and then building the relationship (edges) between these vertices. Each vertex has a single property and is automatically indexed using the InfiniteGraph Graph Index.
15 Ingesting Vertices An application, which ingests the vertices using multiple threads, runs concurrently on each machine. In total a little over 16 million vertices, equally distributed across all the machines, were ingested. Each vertex has a unique key and is indexed using the Graph Index. The plot below shows the effective rate as a function of the number of machines that were used. The rate is linearly proportional to the number of machines that were used.
16 Ingesting Edges Similar to ingesting vertices, an application that ingests the edges using multiple threads runs concurrently on each machine. The edge-ingest application builds the edges given a set of pairs of vertex ids. The graph below shows the rate of ingest as a function of the number of machines. The rate is linearly proportioned to the number of machines used.
17 Finding Vertices Searching for vertices, which are uniformly distributed across several machines, involved looking for randomly selected vertices based on unique identifiers and reading the vertex properties. If the properties of the vertex objects were not read, the query rate is a constant regardless of the number of machines used. Note that the query operation is performed on a single machine while the storage of the graph is distributed. The following plot shows the query rate as a function of distributing the graph across several machines, when all the properties of the vertices are read. The network bandwidth is the key factor, which determines the query rate. As the number of machines increases the amount of data that has to be transferred over the network also increases. This is the reason for the drop in the query rate. With InfiniteGraph, clients can run the query operation on each of the machines that are hosting the data, simultaneously and the search rate per machine would be constant, as shown below.
18 The cumulative effect, using InfiniteGraph in an ideal environment, would be N times faster where N is the number of machines. For example if you have 8 machines and the rate that each machine experiences is 120 query/sec, if the operations is performed on 8 machines simultaneously then the overall rate that you would expect is 8x120 query/sec., an order of magnitude faster than other solutions. Read Graph Vertex The reading operation involves iterating the vertex list in the database and reading all the properties of the vertex objects. The plot below shows the rate of this operation as a function of distributing the graph across several machines. Unlike the ingest operations, the rate is not proportional to the number of machines used because similar to the query operations the computation is performed on a single machine while the storage of data is distributed. The network bandwidth is the key factor which determines the rate of this operation.
19 With InfiniteGraph, the read operation may also be distributed, and depending on the network bandwidth, a more constant read rate is possible, allowing for consistently faster read results.
20 Graph Traversal The graph traversal operation traverses the paths between two given vertices and keeps track of the number of paths that were visited in the process of getting from the first vertex to the second. Two sets of tests were performed for this operation. In the first case, the operation was performed when the cache was completely empty (cold traversal) and the rate of traversal degrades as the number of machines used increases. This is primarily due to the increased network traffic. The same traversal is repeated for the second case, however the graph objects are cached in the local memory where the operation was performed, resulting in a traversal rate that is an order of magnitude faster and independent of the number of machines used. Conclusion InfiniteGraph and Open Source Graph Database are both NoSQL graph databases that target organizations with large amounts of unstructured data. While similar in some regards, the products have significant differences that have been highlighted in this paper and underscored by a series of benchmark tests.
21 Open Source Graph Database is a single server solution that works well when dealing with smaller graphs. InfiniteGraph, with a distributed data and process model, has been demonstrated to scale both horizontally and up while maintaining high performance. With support for a true concurrent client connection model, InfiniteGraph is proven to outperform the Open Source Graph Database by orders of magnitude when it comes to ingest and traversal rates. The InfiniteGraph solution enables organizations to Identify and understand complex relationships between distributed data with many degrees of separation. Use cases such as fraud detection, finding the bad guys/surveillance tools, prescription analytics, and network security information and event management (SIEM) which require real-time discovery of connections between n degrees of distributed data will require the use of a robust and proven distributed graph database. The Benchmark findings illustrate some of the reasons why InfiniteGraph is a leading choice in largescale graph processing and analytics for organizations seeking valuable connections in data and information. Learn more about InfiniteGraph at
WHITE PAPER July 2014 Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks Contents Executive Summary...2 Background...3 InfiniteGraph...3 High Performance
www.objectivity.com Choosing The Right Big Data Tools For The Job A Polyglot Approach Nic Caine NoSQL Matters, April 2013 Overview The Problem Current Big Data Analytics Relationship Analytics Leveraging
Cloud Computing and Advanced Relationship Analytics Using Objectivity/DB to Discover the Relationships in your Data By Brian Clark Vice President, Product Management Objectivity, Inc. 408 992 7136 firstname.lastname@example.org
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract
Objectivity, Inc. Graph Database Proof of Concept Report Managing The Internet of Things Table of Contents Executive Summary 3 Background 3 Proof of Concept 4 Dataset 4 Process 4 Query Catalog 4 Environment
Cray: Enabling Real-Time Discovery in Big Data Discovery is the process of gaining valuable insights into the world around us by recognizing previously unknown relationships between occurrences, objects
CitusDB Architecture for Real-Time Big Data CitusDB Highlights Empowers real-time Big Data using PostgreSQL Scales out PostgreSQL to support up to hundreds of terabytes of data Fast parallel processing
A New Approach: Clustrix Sierra Database Engine The Sierra Clustered Database Engine, the technology at the heart of the Clustrix solution, is a shared-nothing environment that includes the Sierra Parallel
The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific
WHITE PAPER Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale Sponsored by: IBM Carl W. Olofson December 2014 IN THIS WHITE PAPER This white paper discusses the concept
INTRODUCTION TO CASSANDRA This ebook provides a high level overview of Cassandra and describes some of its key strengths and applications. WHAT IS CASSANDRA? Apache Cassandra is a high performance, open
Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores Composite Software October 2010 TABLE OF CONTENTS INTRODUCTION... 3 BUSINESS AND IT DRIVERS... 4 NOSQL DATA STORES LANDSCAPE...
In Memory Accelerator for MongoDB Yakov Zhdanov, Director R&D GridGain Systems GridGain: In Memory Computing Leader 5 years in production 100s of customers & users Starts every 10 secs worldwide Over 15,000,000
With Saurabh Singh email@example.com The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
White Paper Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage A Benchmark Report August 211 Background Objectivity/DB uses a powerful distributed processing architecture to manage
Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework Many corporations and Independent Software Vendors considering cloud computing adoption face a similar challenge: how should
I N T E R S Y S T E M S W H I T E P A P E R F O R F I N A N C I A L SERVICES EXECUTIVES Deploying an elastic Data Fabric with caché Deploying an elastic Data Fabric with caché Executive Summary For twenty
How to Ingest Data into Google BigQuery using Talend for Big Data A Technical Solution Paper from Saama Technologies, Inc. July 30, 2013 Table of Contents Intended Audience What you will Learn Background
QLIKVIEW INTEGRATION TION WITH AMAZON REDSHIFT John Park Partner Engineering June 2014 Page 1 Contents Introduction... 3 About Amazon Web Services (AWS)... 3 About Amazon Redshift... 3 QlikView on AWS...
Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots
Urika: TM Enabling Real-Time Discovery in Big Data Discovery is the process of gaining valuable insights into the world around us by recognizing previously unknown relationships between occurrences, objects
SCALEOUT SOFTWARE Using an In-Memory Data Grid for Near Real-Time Data Analysis by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 IN today s competitive world, businesses
Analance Data Integration Technical Whitepaper Executive Summary Business Intelligence is a thriving discipline in the marvelous era of computing in which we live. It s the process of analyzing and exploring
WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications
NoSQL Database Options Introduction For this report, I chose to look at MongoDB, Cassandra, and Riak. I chose MongoDB because it is quite commonly used in the industry. I chose Cassandra because it has
Augmented Search for Software Testing For Testers, Developers, and QA Managers New frontier in big log data analysis and application intelligence Business white paper May 2015 During software testing cycles,
NOSQL, BIG DATA AND GRAPHS Technology Choices for Today s Mission- Critical Applications 2 NOSQL, BIG DATA AND GRAPHS NOSQL, BIG DATA AND GRAPHS TECHNOLOGY CHOICES FOR TODAY S MISSION- CRITICAL APPLICATIONS
www.raima.com Scott Meder Senior Regional Sales Manager firstname.lastname@example.org Short Introduction to Raima What is Data Management What are your requirements? How do I make the right decision? - Architecture
Big Data Processing, 2014/15 Lecture 10: HBase!! Claudia Hauff (Web Information Systems)! email@example.com 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind the
Augmented Search for IT Data Analytics New frontier in big log data analysis and application intelligence Business white paper May 2015 IT data is a general name to log data, IT metrics, application data,
Sterling Business Intelligence Concepts Guide Release 9.0 March 2010 Copyright 2009 Sterling Commerce, Inc. All rights reserved. Additional copyright information is located on the documentation library:
The 3 questions to ask yourself about BIG DATA Do you have a big data problem? Companies looking to tackle big data problems are embarking on a journey that is full of hype, buzz, confusion, and misinformation.
White Paper Optimizing the Performance Of MySQL Cluster Table of Contents Introduction and Background Information... 2 Optimal Applications for MySQL Cluster... 3 Identifying the Performance Issues.....
Qlik Sense scalability Visual analytics platform Qlik Sense is a visual analytics platform powered by an associative, in-memory data indexing engine. Based on users selections, calculations are computed
www.objectivity.com Ibrahim Sallam Director of Development Graphs, what are they and why? Graph Data Management. Why do we need it? Problems in Distributed Graph How we solved the problems Simple Graph
Evaluating NoSQL for Enterprise Applications Dirk Bartels VP Strategy & Marketing Agenda The Real Time Enterprise The Data Gold Rush Managing The Data Tsunami Analytics and Data Case Studies Where to go
Big Systems, Big Data When considering Big Distributed Systems, it can be noted that a major concern is dealing with data, and in particular, Big Data Have general data issues (such as latency, availability,
Analytics March 2015 White paper Why NoSQL? Your database options in the new non-relational world 2 Why NoSQL? Contents 2 New types of apps are generating new types of data 2 A brief history of NoSQL 3
AllegroGraph a graph database Gary King firstname.lastname@example.org Overview What we store How we store it the possibilities Using AllegroGraph Databases Put stuff in Get stuff out quickly safely Stuff things with
Benchmarking Couchbase Server for Interactive Applications By Alexey Diomin and Kirill Grigorchuk Contents 1. Introduction... 3 2. A brief overview of Cassandra, MongoDB, and Couchbase... 3 3. Key criteria
Analytics in the Cloud Peter Sirota, GM Elastic MapReduce Data-Driven Decision Making Data is the new raw material for any business on par with capital, people, and labor. What is Big Data? Terabytes of
HOW INTERSYSTEMS TECHNOLOGY ENABLES BUSINESS INTELLIGENCE SOLUTIONS A white paper by: Dr. Mark Massias Senior Sales Engineer InterSystems Corporation HOW INTERSYSTEMS TECHNOLOGY ENABLES BUSINESS INTELLIGENCE
NoSQL storage and management of geospatial data with emphasis on serving geospatial data using standard geospatial web services Pouria Amirian, Adam Winstanley, Anahid Basiri Department of Computer Science,
IS IN-MEMORY COMPUTING MAKING THE MOVE TO PRIME TIME? EMC and Intel work with multiple in-memory solutions to make your databases fly Thanks to cheaper random access memory (RAM) and improved technology,
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
Fast Innovation requires Fast IT 2014 Cisco and/or its affiliates. All rights reserved. 2 2014 Cisco and/or its affiliates. All rights reserved. 3 IoT World Forum Architecture Committee 2013 Cisco and/or
Data sheet HP Vertica OnDemand Enterprise-class Big Data analytics in the cloud Enterprise-class Big Data analytics for any size organization Vertica OnDemand Organizations today are experiencing a greater
Online Transaction Processing in SQL Server 2008 White Paper Published: August 2007 Updated: July 2008 Summary: Microsoft SQL Server 2008 provides a database platform that is optimized for today s applications,
TIBCO Live Datamart: Push-Based Real-Time Analytics ABSTRACT TIBCO Live Datamart is a new approach to real-time analytics and data warehousing for environments where large volumes of data require a management
High-Volume Data Warehousing in Centerprise Product Datasheet Table of Contents Overview 3 Data Complexity 3 Data Quality 3 Speed and Scalability 3 Centerprise Data Warehouse Features 4 ETL in a Unified
Analance Data Integration Technical Whitepaper Executive Summary Business Intelligence is a thriving discipline in the marvelous era of computing in which we live. It s the process of analyzing and exploring
White Paper February 2010 IBM InfoSphere DataStage Performance and Scalability Benchmark Whitepaper Data Warehousing Scenario 2 Contents 5 Overview of InfoSphere DataStage 7 Benchmark Scenario Main Workload
E-PAPER FEBRUARY 2014 Why DBMSs Matter More than Ever in the Big Data Era Having the right database infrastructure can make or break big data analytics projects. TW_1401138 Big data has become big news
Optimizing Performance Training Division New Delhi Performance tuning : Goals Minimize the response time for each query Maximize the throughput of the entire database server by minimizing network traffic,
ScaleArc idb Solution for SQL Server Deployments Objective This technology white paper describes the ScaleArc idb solution and outlines the benefits of scaling, load balancing, caching, SQL instrumentation
QLIKVIEW ARCHITECTURE AND SYSTEM RESOURCE USAGE QlikView Technical Brief April 2011 www.qlikview.com Introduction This technical brief covers an overview of the QlikView product components and architecture
SAS and Oracle: Big Data and Cloud Partnering Innovation Targets the Third Platform David Lawler, Oracle Senior Vice President, Product Management and Strategy Paul Kent, SAS Vice President, Big Data What
Solution Brief ScaleArc for SQL Server Overview Organizations around the world depend on SQL Server for their revenuegenerating, customer-facing applications, running their most business-critical operations
Application Note Achieving Zero Downtime and Accelerating Performance for WordPress Executive Summary WordPress is the world s most popular open source website content management system (CMS). As usage
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next
Desktop Analytics Table of Contents Contact Center and Back Office Activity Intelligence... 3 Cicero Discovery Sensors... 3 Business Data Sensor... 5 Business Process Sensor... 5 System Sensor... 6 Session
NoSQL Evaluator s Guide McKnight Consulting Group William McKnight is the former IT VP of a Fortune 50 company and the author of Information Management: Strategies for Gaining a Competitive Advantage with
Using RDBMS, NoSQL or Hadoop? DOAG Conference 2015 Jean- Pierre Dijcks Big Data Product Management Server Technologies Copyright 2014 Oracle and/or its affiliates. All rights reserved. Data Ingest 2 Ingest
Modern IT Operations Management Why a New Approach is Required, and How Boundary Delivers TABLE OF CONTENTS EXECUTIVE SUMMARY 3 INTRODUCTION: CHANGING NATURE OF IT 3 WHY TRADITIONAL APPROACHES ARE FAILING
The IBM Cognos Platform for Enterprise Business Intelligence Highlights Optimize performance with in-memory processing and architecture enhancements Maximize the benefits of deploying business analytics
Building the Internet of Things Jim Green - CTO, Data & Analytics Business Group, Cisco Systems Brian McCarson Sr. Principal Engineer & Sr. System Architect, Internet of Things Group, Intel Corp Mac Devine
Innovate, Integrate, Transform Hadoop vs Apache Spark www.altencalsoftlabs.com Introduction Any sufficiently advanced technology is indistinguishable from magic. said Arthur C. Clark. Big data technologies
Cassandra High Performance Cookbook Over 150 recipes to design and optimize large-scale Apache Cassandra deployments Edward Capriolo [PACKT] publishing"1 open source community experience distilled BIRMINGHAM
Data Driven Success Comparing Log Analytics Tools: Flowerfire s Sawmill vs. Google Analytics (GA) In business, data is everything. Regardless of the products or services you sell or the systems you support,
IBM SPSS Statistics Server Understanding the Benefits of IBM SPSS Statistics Server Contents: 1 Introduction 2 Performance 101: Understanding the drivers of better performance 3 Why performance is faster
Can Flash help you ride the Big Data Wave? Steve Fingerhut Vice President, Marketing Enterprise Storage Solutions Corporation Forward-Looking Statements During our meeting today we may make forward-looking
F1: A Distributed SQL Database That Scales Presentation by: Alex Degtiar (email@example.com) 15-799 10/21/2013 What is F1? Distributed relational database Built to replace sharded MySQL back-end of AdWords
MakeMyTrip CUSTOMER SUCCESS STORY MakeMyTrip is the leading travel site in India that is running two ClustrixDB clusters as multi-master in two regions. It removed single point of failure. MakeMyTrip frequently
The NewSQL database you ll never outgrow Integrating with Hadoop Hadoop is an open source framework for managing and manipulating massive volumes of data. is an database for handling high velocity data.
The Power of Pentaho and Hadoop in Action Demonstrating MapReduce Performance at Scale Introduction Over the last few years, Big Data has gone from a tech buzzword to a value generator for many organizations.
Oracle Database 12c Plug In. Switch On. Get SMART. Duncan Harvey Head of Core Technology, Oracle EMEA March 2015 Safe Harbor Statement The following is intended to outline our general product direction.
EMC Unified Storage for Microsoft SQL Server 2008 Enabled by EMC CLARiiON and EMC FAST Cache Reference Copyright 2010 EMC Corporation. All rights reserved. Published October, 2010 EMC believes the information