Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks



Similar documents
Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

Deploying Ceph with High Performance Networks, Architectures and benchmarks for Block Storage Solutions

Building a Scalable Storage with InfiniBand

Connecting the Clouds

Mellanox Academy Online Training (E-learning)

Choosing the Best Network Interface Card for Cloud Mellanox ConnectX -3 Pro EN vs. Intel XL710

SX1012: High Performance Small Scale Top-of-Rack Switch

Choosing the Best Network Interface Card Mellanox ConnectX -3 Pro EN vs. Intel X520

InfiniBand Switch System Family. Highest Levels of Scalability, Simplified Network Manageability, Maximum System Productivity

SX1024: The Ideal Multi-Purpose Top-of-Rack Switch

Installing Hadoop over Ceph, Using High Performance Networking

Simplifying Big Data Deployments in Cloud Environments with Mellanox Interconnects and QualiSystems Orchestration Solutions

Introduction to Cloud Design Four Design Principals For IaaS

SMB Direct for SQL Server and Private Cloud

RoCE vs. iwarp Competitive Analysis

Mellanox Accelerated Storage Solutions

Long-Haul System Family. Highest Levels of RDMA Scalability, Simplified Distance Networks Manageability, Maximum System Productivity

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

State of the Art Cloud Infrastructure

InfiniBand Switch System Family. Highest Levels of Scalability, Simplified Network Manageability, Maximum System Productivity

ConnectX -3 Pro: Solving the NVGRE Performance Challenge

InfiniteGraph: The Distributed Graph Database

Enabling High performance Big Data platform with RDMA

Power Saving Features in Mellanox Products

Cisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage

Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 SMB Direct

Replacing SAN with High Performance Windows Share over a Converged Network

Achieving a High-Performance Virtual Network Infrastructure with PLUMgrid IO Visor & Mellanox ConnectX -3 Pro

Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012

Solution Brief July All-Flash Server-Side Storage for Oracle Real Application Clusters (RAC) on Oracle Linux

Building Enterprise-Class Storage Using 40GbE

StarWind Virtual SAN for Microsoft SOFS

Michael Kagan.

Mellanox Reference Architecture for Red Hat Enterprise Linux OpenStack Platform 4.0

IOmark- VDI. Nimbus Data Gemini Test Report: VDI a Test Report Date: 6, September

Building a cost-effective and high-performing public cloud. Sander Cruiming, founder Cloud Provider

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

FLOW-3D Performance Benchmark and Profiling. September 2012

InfiniBand Software and Protocols Enable Seamless Off-the-shelf Applications Deployment

QLogic 16Gb Gen 5 Fibre Channel for Database and Business Analytics

Evaluation Report: HP Blade Server and HP MSA 16GFC Storage Evaluation

Mellanox WinOF for Windows 8 Quick Start Guide

Storage, Cloud, Web 2.0, Big Data Driving Growth

Cost Efficient VDI. XenDesktop 7 on Commodity Hardware

Optimizing SQL Server AlwaysOn Implementations with OCZ s ZD-XL SQL Accelerator

Enabling the Use of Data

Dell s SAP HANA Appliance

LS DYNA Performance Benchmarks and Profiling. January 2009

The Evolution of Microsoft SQL Server: The right time for Violin flash Memory Arrays

Zadara Storage Cloud A

Realizing the True Potential of Software-Defined Storage

HP ConvergedSystem 900 for SAP HANA Scale-up solution architecture

Fibre Forward - Why Storage Infrastructures Should Be Built With Fibre Channel

Cut I/O Power and Cost while Boosting Blade Server Performance

SMB Advanced Networking for Fault Tolerance and Performance. Jose Barreto Principal Program Managers Microsoft Corporation

Get More Scalability and Flexibility for Big Data

New Features in SANsymphony -V10 Storage Virtualization Software

3G Converged-NICs A Platform for Server I/O to Converged Networks

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

How To Write An Article On An Hp Appsystem For Spera Hana

Mellanox Global Professional Services

High Performance OpenStack Cloud. Eli Karpilovski Cloud Advisory Council Chairman

HP SN1000E 16 Gb Fibre Channel HBA Evaluation

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Data Center Storage Solutions

SOLUTION BRIEF AUGUST All-Flash Server-Side Storage for Oracle Real Application Clusters (RAC) on Oracle Linux

Achieving Data Center Networking Efficiency Breaking the Old Rules & Dispelling the Myths

Post-production Video Editing Solution Guide with Microsoft SMB 3 File Serving AssuredSAN 4000

Unified Computing Systems

Next Steps Toward 10 Gigabit Ethernet Top-of-Rack Networking

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003

Accelerating High-Speed Networking with Intel I/O Acceleration Technology

Private cloud computing advances

Introduction. Need for ever-increasing storage scalability. Arista and Panasas provide a unique Cloud Storage solution

Mellanox OpenStack Solution Reference Architecture

Hyper-V over SMB Remote File Storage support in Windows Server 8 Hyper-V. Jose Barreto Principal Program Manager Microsoft Corporation

UCS M-Series Modular Servers

Windows TCP Chimney: Network Protocol Offload for Optimal Application Scalability and Manageability

Cloud Computing and Advanced Relationship Analytics

Virtual Compute Appliance Frequently Asked Questions

How To Connect Virtual Fibre Channel To A Virtual Box On A Hyperv Virtual Machine

ECLIPSE Performance Benchmarks and Profiling. January 2009

Oracle Database Scalability in VMware ESX VMware ESX 3.5

InfiniBand -- Industry Standard Data Center Fabric is Ready for Prime Time

Converged storage architecture for Oracle RAC based on NVMe SSDs and standard x86 servers

Building Optimized Scale-Out NAS Solutions with Avere and Arista Networks

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

Unlock the value of data with smarter storage solutions.

Block based, file-based, combination. Component based, solution based

Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Levels of Data Center IT Performance, Efficiency and Scalability

InfiniBand Update Addressing new I/O challenges in HPC, Cloud, and Web 2.0 infrastructures. Brian Sparks IBTA Marketing Working Group Co-Chair

I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology

The Advantages of Multi-Port Network Adapters in an SWsoft Virtual Environment

Flash Memory Arrays Enabling the Virtualized Data Center. July 2010

Netvisor Software Defined Fabric Architecture

Hadoop on the Gordon Data Intensive Cluster

Driving Revenue Growth Across Multiple Markets

Transcription:

WHITE PAPER July 2014 Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks Contents Executive Summary...2 Background...3 InfiniteGraph...3 High Performance Networks...4 Test Environment and Results...5 Vertex Ingest Rate...6 Vertex Query Rate...6 Distributed Graph Navigation Processing...7 Task Ingest Rate...7 Conclusion...8

page 2 Executive Summary Companies today are finding that the size and growth of stored data is becoming overwhelming. As the databases grow, the challenge is to create value by discovering insights and connections in the big databases in as close to real time as possible. In this paper, we describe a combination of technologies from Objectivity, Inc., and Mellanox Technologies which offers a solution to this need. Objectivity, Inc, provides tools, such as Infinite Graph, that enable users to find hidden insights in their Big Data. In order to do that, Objectivity s tools must be very scalable to ingest and analyze very large data sets. That scalability is achieved by having many computers work on the data simultaneously - distributing the data and processing across multiple computers. Because Infinite Graph is distributed, performance depends not only on the speed of each machine in the network, but also on the speed of the network itself. Understanding this need, Objectivity has partnered with Mellanox, a provider of high performance networking elements. In this paper, we showcase several examples of Infinite Graph performance, and illustrate the performance improvement achieved by using a Mellanox enabled network. Each of the examples is based on an element of a typical Big Data analysis solution. In the first example, involving Vertex Ingest Rate shows the value of using high performance equipment to enhance real-time data availability. Vertex objects represent nodes in a graph, such as Customers, so this test is representative of the most basic operation - loading new customer data into the graph. In the second example, Vertex Query Rate highlights the improvement in the time needed to receive results, such as finding a particular customer record, or a group of customers. The third example, Distributed graph navigation processing, starts at a Vertex and explores its connections to other Vertices. This is representative of traversing social networks, finding optimal transportation or communications routes and similar problems. The final example, Task Ingest Rate, shows the performance improvement when loading the data connecting each of the vertices. This is similar to entering orders for products, transit times over a communications path and so on. Each of these elements is an important part of a Big Data Analysis solution. Taken together, they show that InfiniteGraph can be made significantly more effective when combined with Mellanox interconnect technology. Solving the Big Data problem is about being able to go massively parallel on both data storage and processing on commodity hardware. This is typically achieved by deploying in a distributed environment across multiple servers with many processors/cores, and multiple disks with local and remote access. Objectivity s distributed architecture is ideally suited for these types of deployment. In such deployments you have a choice of moving the data to the processing, or moving the processing to the data, or a combination. In any case the performance of the network can have a significant impact on overall throughput of the system. Deploying InfiniteGraph in a distributed environment with Mellanox end-to-end 40GbE solutions, can improve overall system performance, without requiring large amounts of shared memory. In this benchmark we show improvements of several 100% using Mellanox s 40GbE infrastructure with a simple graph model and InfiniteGraph. Being able to utilize increased bandwidth is a key factor for solutions that require immediate response, such as security applications, geo-location based recommendation engines, manufacturing/product life cycle management applications, healthcare, finance, and mobile applications, as well as all types of networking applications.

page 3 Background Graph databases provide direct connectivity between adjacent data elements without the use of indexes, which makes them a powerful tool for graph-like queries in social media applications, network diagnostics and more. By treating data and the connections between data as first class citizens in the database, graph technology can be used to leverage the information in the connections between nodes (vertices) to understand relationships and derive business value from the behavior of users, entities, groups, and their interactions. Figure 1: Graph Database InfiniteGraph InfiniteGraph is the leading distributed graph database that enables Big Data Analytics in real-time. Using InfiniteGraph as an embedded technology, organizations can build applications that discover hidden connections and relationships in their data on a global scale, helping to reduce analyst time, improve decision support, and maximize ROI on existing infrastructure and investments. InfiniteGraph is implemented in Java, and is from a class of NOSQL (or Not Only SQL) data technologies focused on graph data structures. Graph data typically consists of vertices and the various relationships (edges) that connect them. Developers may use InfiniteGraph to build applications and services to solve graph problems or answer complex analytical questions that may include Who are the key influencers in my network and their followers, How am I connected to my customers, partners and prospects?, and Which customer accounts are at risk and how can I customize services and marketing to keep those customers satisfied and loyal? Unlike other similar technologies InfiniteGraph is able to effectively handle graphs spread across multiple physical nodes using its unique Advanced Multithreaded Server (AMS) to access data when applications are connecting to a graph database with multiple distributed storage locations. AMS is an alternative to native file servers such as NFS or Microsoft Windows Network. The following figure shows two InfiniteGraph applications, each with its own property file, accessing a distributed graph database. One lock server handles contention for the graph, and an AMS server on each remote machine serves data. Each application has an XML rank file that designates its preferred storage locations.

page 4 Figure 2: InfiniteGraph Application Mellanox s High Performance Networks The ever increasing demand for higher performance, efficiency and scalability in data centers drives the need for faster server and storage connectivity. 40Gb/s Ethernet is one of the leading standardized interconnect solutions that provides high bandwidth, lower latency, and lower power consumption for maximizing compute systems productivity and overall return on investment. Mellanox 40GbE combined with low latency Remote Direct Memory Access (RDMA) guarantees that applications taking advantage of the RDMA protocol can achieve the lowest application-to-application communication latency. Requirements for memory-to-memory transactions and solid state drive throughput can be met by the 40GbE network, which delivers the needed data bandwidth without becoming the data pipe bottleneck. ConnectX-3 Pro adapter cards with Virtual Protocol Interconnect (VPI), supporting InfiniBand and Ethernet connectivity with hardware offload engines to Overlay Networks ( Tunneling ), provide the highest performing and most flexible interconnect solution for PCI Express Gen3 servers used in public and private clouds, enterprise data centers, and high performance computing. Public and private cloud clustered databases, parallel processing, transactional services, and highperformance embedded I/O applications will achieve significant performance improvements resulting in reduced completion time and lower cost per operation. ConnectX-3 Pro with VPI also simplifies system development by serving multiple fabrics with one hardware design. ConnectX-3 with VPI also simplifies system development by serving multiple fabrics with one. Mellanox s SX1036 switch systems provide the highest-performing fabric solutions in a 1RU form factor by delivering 4.032Tb/s of non-blocking bandwidth to High-Performance Computing and Enterprise Data Centers, with 200ns port-to-port latency. Built with Mellanox s latest SwitchX -2 Ethernet switch device, the SX1036B provide up to 56Gb/s full bidirectional bandwidth per port. This stand-alone switch is an ideal choice for top-of-rack leaf connectivity or for building small to medium sized clusters. It is designed to carry converged LAN and SAN traffic with the combination of assured bandwidth and granular Quality of Service (QoS).

page 5 Test Environment and Results Our test environment includes a four- server nodes cluster and one client node. The servers configuration is as follows: CPU: 2x Intel E5-2680 DRAM: 128GB Media: 1x 256GB SSD, Boot drive 6x 600GB, 10K RPM Hard drives 1x800GB, PCIe Gen2x4 SSD Card Network: Mellanox ConnectX -3 Pro Dual Port 10G/40Gb Ethernet NICs, MCX314A-BCBT OS: Red Had Enterprise Linux 6.3 Switching solution: MSX1036B, based on Mellanox SwitchX-2, 36 ports, 40GbE QSFP Figure 3: Testing Environment

page 6 Vertex Ingest Rate The first set of tests attempted to ingest the highest number of vertices possible using remote database servers running at the location where the data storage resides. Separate tests were run on both a solid state drive (SSD) and a standard spinning disk. Figure 4: Indexed Vertex Ingest Results The results show that for an ingest using a spinning disk and for an ingest using an SSD, the ability to write new data into the storage solutions increases significantly with the 40GbE fabric. The spinning disk ingest rate is limited by the ability of the SAS controller to drive data to the slower disk media. It is worth noting that the improvement with the SSD is almost threefold, maximizing the return on the more expensive storage medium. Vertex Query Rate The second test measures the ability to perform database-wide queries (that leverage indexing) to qualify vertices on a remotely connected graph database. The test was completed for in-memory data querying. Figure 5: Vertex Query Rate Results The results show an increased capability of over 160% in retrieving insights from the InfiniteGraph database in cases where the data resides in the database s server memory.

page 7 Distributed Graph Navigation Processing The third test calculates the ability to find connecting paths between vertices in the InfiniteGraph database. Higher ability to calculate possible paths enables data scientists to explore new connections and introduces a better way to understand the relationships between data points in the database. Figure 6: Navigation Path Processing Rate Results The results show an increased capability of over 60% in retrieving insights from the InfiniteGraph database, when using the 40GbE fabric. Note that the unique client-side caching inherent in the Infinite- Graph architecture explains the smaller performance gain here than in the other tests. The gain would probably be greater when navigating a much larger graph, which would be less likely to be cached locally and would require extra transfers across the network. Edge Ingest Rate The last test conducted is the Edge Ingest Rate test. Figure 7: Edge Ingest Test Results The result shows an increased ingest capability of over 250% in driving tasks into InfiniteGraph database, when using the 40GbE fabric.

page 8 Conclusion The ability to harness the value from Big Data in enterprise applications is invaluable. In order to effectively compete in the fast moving markets of today, it is imperative for businesses to stay a step ahead of the competition. By utilizing InfiniteGraph, the only enterprise proven, distributed and scalable graph database in conjunction with Mellanox 40GbE interconnect solutions enable organizations to efficiently identify and understand complex relationships between distributed data with many degrees of separation. The Benchmark findings illustrate the reasons to use Mellanox 40GbE and InfiniteGraph as a choice in large-scale graph processing and analytics for organizations seeking valuable connections in data and information. The results are consistent for both ingest rate requirements and analytics targets, delivering better data value to customers deploying InfiniteGraph database over Mellanox 40GbE. 350 Oakmead Parkway, Suite 100, Sunnyvale, CA 94085 Tel: 408-970-3400 Fax: 408-970-3403 www.mellanox.com 3099 North First Street, Suite 200, San Jose, CA 95134 Tel. 408-992-7186 Fax. 408-992-7171 www.objectivity.com Mellanox, ConnectX, SwitchX, and Virtual Protocol Interconnect (VPI), are registered trademarks of Mellanox Technologies, Ltd. All other trademarks are property of their respective owners. 4026WP Rev 1.2