Streaming Big Data Performance Benchmark for Real-time Log Analytics in an Industry Environment

Similar documents
Streaming Big Data Performance Benchmark. for

SQLstream Blaze and Apache Storm A BENCHMARK COMPARISON

How To Make Data Streaming A Real Time Intelligence

SQLstream 4 Product Brief. CHANGING THE ECONOMICS OF BIG DATA SQLstream 4.0 product brief

Processing and Analyzing Streams. CDRs in Real Time

From Spark to Ignition:

Modern IT Operations Management. Why a New Approach is Required, and How Boundary Delivers

An Oracle White Paper June High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

High Performance Data Management Use of Standards in Commercial Product Development

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

Enabling Cloud Architecture for Globally Distributed Applications

Big Data Analytics - Accelerated. stream-horizon.com

Complex, true real-time analytics on massive, changing datasets.

BIG DATA ANALYTICS For REAL TIME SYSTEM

Create and Drive Big Data Success Don t Get Left Behind

Elastic Application Platform for Market Data Real-Time Analytics. for E-Commerce

Machine Data Analytics with Sumo Logic

How To Handle Big Data With A Data Scientist

Harnessing the Power of Big Data for Real-Time IT: Sumo Logic Log Management and Analytics Service

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Understanding traffic flow

How To Use Hp Vertica Ondemand

Connected Product Maturity Model

Big Data Analytics: Today's Gold Rush November 20, 2013

Maximum performance, minimal risk for data warehousing

Title. Click to edit Master text styles Second level Third level

IBM System x reference architecture solutions for big data

Business opportunities from IOT and Big Data. Joachim Aertebjerg Director Enterprise Solution Sales Intel EMEA

Cisco Unified Data Center Solutions for MapR: Deliver Automated, High-Performance Hadoop Workloads

Cisco Data Preparation

What Is In-Memory Computing and What Does It Mean to U.S. Leaders? EXECUTIVE WHITE PAPER

Minder. simplifying IT. All-in-one solution to monitor Network, Server, Application & Log Data

Dell* In-Memory Appliance for Cloudera* Enterprise

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database

Oracle Big Data SQL Technical Update

Your Path to. Big Data A Visual Guide

IBM WebSphere Premises Server

SQL Server 2012 Parallel Data Warehouse. Solution Brief

Understanding the Value of In-Memory in the IT Landscape

Enabling Real-Time Sharing and Synchronization over the WAN

Using In-Memory Computing to Simplify Big Data Analytics

Solutions for Communications with IBM Netezza Network Analytics Accelerator

Increase Agility and Reduce Costs with a Logical Data Warehouse. February 2014

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Whitepaper Unified Visibility Fabric A New Approach to Visibility

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Powerful Duo: MapR Big Data Analytics with Cisco ACI Network Switches

Real Time Big Data Processing

TIBCO Live Datamart: Push-Based Real-Time Analytics

Integrating Content Management Within Enterprise Applications: The Open Standards Option. Copyright Xythos Software, Inc All Rights Reserved

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

IBM Netezza High Capacity Appliance

Big data platform for IoT Cloud Analytics. Chen Admati, Advanced Analytics, Intel

Copyright 2013 Splunk Inc. Introducing Splunk 6

Get More Scalability and Flexibility for Big Data

Hadoop for Enterprises:

High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances

Delivering secure, real-time business insights for the Industrial world

DATAMEER WHITE PAPER. Beyond BI. Big Data Analytic Use Cases

The 4 Pillars of Technosoft s Big Data Practice

Big Data Are You Ready? Thomas Kyte

Dynamic M2M Event Processing Complex Event Processing and OSGi on Java Embedded

Big Data at Cloud Scale

MERAKI WHITE PAPER Cloud + Wireless LAN = Easier + Affordable

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

Architecting for the Internet of Things & Big Data

How To Store Data On An Ocora Nosql Database On A Flash Memory Device On A Microsoft Flash Memory 2 (Iomemory)

Rethinking the Small Cell Business Model

Why Big Data in the Cloud?

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

How To Use Shareplex

Where is... How do I get to...

Oracle Data Integrator 12c (ODI12c) - Powering Big Data and Real-Time Business Analytics. An Oracle White Paper October 2013

GigaSpaces Real-Time Analytics for Big Data

Big Data & Analytics. A boon under certain conditions. Dr. Christian Keller General Manager IBM Switzerland IBM Corporation

Find the Information That Matters. Visualize Your Data, Your Way. Scalable, Flexible, Global Enterprise Ready

Taking Data Analytics to the Next Level

Top Ten Reasons for Deploying Oracle Virtual Networking in Your Data Center

Evolving from SCADA to IoT

Deploying Big Data to the Cloud: Roadmap for Success

Introducing Oracle Exalytics In-Memory Machine

Breaking News! Big Data is Solved. What Is In-Memory Computing and What Does It Mean to U.S. Leaders? EXECUTIVE WHITE PAPER

Harnessing the power of advanced analytics with IBM Netezza

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Transcription:

Streaming Big Data Performance Benchmark for Real-time Log Analytics in an Industry Environment

SQLstream s-server The Streaming Big Data Engine for Machine Data Intelligence 2 SQLstream proves 15x faster with lower Total Cost of Ownership in streaming Big Data performance test. The V of Big Data Velocity means both how fast data is being produced and how fast the data must be processed to meet demand. Gartner Static Big Data is a necessary but insufficient component of a proactive and responsive business. What is required is a way to not only understand what was happening and what is happening, but also to predict what will happen and to take action - through harnessing the power of real-time, streaming Big Data management. This paper documents a streaming, high velocity solution for an industry problem in the Telecom sector, addressing a significant business issue impacting QoS for 4G/LTE subscribers in all geographies. The requirement was to detect time-based patterns from network performance data that were predictors of potential service failures. The throughput performance requirement was 10 million records per second. The customer performed a comparison benchmark with SQLstream s realtime operational intelligence platform against an alternative solution based on the Storm open source project. The results demonstrated that SQLstream s-server, our core Streaming Big Data Engine, performed 15x faster with significantly lower Total Cost of Ownership as projected over the lifetime of the solution.

SQLstream s-server The Streaming Big Data Engine for Machine Data Intelligence 3 The customer was concerned with applying increasingly complex business logic across multiple sources of network element and radio tower data streams, with a goal of aggregating and analyzing the data payloads with near-zero latency. Customer needs As is the case with many modern connected businesses, Telecom data flows exhibit high data rates with large data payload packets. The customer was concerned with the increasingly complex business logic required for low-latency aggregation and analysis across multiple sources of network element and radio tower data streams. Goal: to increase the quality of service in a dynamic and immediate manner, ensuring robust cellular transmissions and eliminating possible negative events. The objective was to increase the quality of service (QoS) in a dynamic and immediate manner, ensuring cellular transmissions could be made more robust and eliminate as far as possible negative events such as dropped calls and low-quality voice paths. The traditional data management approach of collecting and storing the data streams before processing could not deliver the low-latency analytics required from the high-volume, highvelocity data streams. A call or data transmission would have failed by the time the event was identified.

SQLstream s-server The Streaming Big Data Engine for Machine Data Intelligence 4 In addition, management network architectures for modern 4G/LTE radio towers require multiple regional data centers. The massive data volumes for this particular use case would have required systems to be implemented at each tower site, and then to aggregate each tower s pre-processed data in the relevant regional data centers. Deploying potentially numerous systems in diverse and often remote geographic locations was cost prohibitive. Any solution must be able to handle the constant high volume data payload traffic, but also scale up and scale out during periods of large spikes in traffic volumes. The solution must also be able to handle different data structures and formats, as well as operational differences such as legacy equipment and differences in device firmware or software versions. Addressing the large number of system platform permutations and delivering a normalized flow of data at high volumes with low latency was also a prime consideration. Flexibility in the field would be paramount. The customer decided the most appropriate data management architecture was to deploy a streaming solution. The high level architecture would require remote data collection agents to capture and stream performance data to a single central platform. The central platform must be able to scale dynamically up to the peak forecast load of 10 million records per second. Data must also be filtered, parsed and enhanced dynamically as part of the real-time pipeline flow. Aggregated and streaming intelligence feeds must be delivered continuously to existing non-real-time data warehouses and JDBC applications.

SQLstream s-server The Streaming Big Data Engine for Machine Data Intelligence 5 A Benchmark Comparison Solution Many engineering organizations begin with an evaluation of the most appropriate open source or freeware project. In this benchmark comparison, the alternative selected by the customer was the Storm real-time distributed processing framework with additional open source, Java-based stream processing libraries to address the streaming data analytics and data aggregation. The resulting solution required a number of additional coding steps in order to produce an operationally viable solution based on the latest release software of the project software, including: ä Integration of the Storm messaging middleware technology with the Java-based stream-processing library. ä Development of the data aggregation and analytics use cases as Java extensions to the core project framework. The resulting development effort required a considerable amount of bespoke coding effort to deliver an operational solution. However, three further considerations also contributed to the higher overall TCO costs: ä Lower performance per server required a significantly higher number of servers in order to realize the target throughput. This also contributed to TCO impacting components such as server costs, power and solution administration and management. ä As with other low-level procedural frameworks, new or updated analytics required the core engine to be stopped and restarted. This disruption impacts operational service level agreements and drives higher maintenance overheads. ä Ongoing support and maintenance of custom code over the lifetime of the project (typically measured over a five year period).

SQLstream s-server The Streaming Big Data Engine for Machine Data Intelligence 6 SQLstream s-server: High Throughput Scalability with Lower TCO The code to handle all streaming pipelines consisted of only 350 lines of commented SQL code, driving the lowest TCO to further address the ongoing as-deployed maintenance and support of complex applications in the field. The customer s evaluation team approached SQLstream based on our Google BigQuery relationship, Hadoop support, time-series credentials such as the UCSD Seismology deployment, plus other commercial references for real-time operational intelligence. SQLstream provided the customer with the SQLstream s- Server 3.0 platform with supporting developer and user documentation. The customer team was able to develop prototypes quickly for several different business use cases. SQLstream s Technical Support Team provided support when requested and suggestions for solution optimization, in particular, providing guidance on the differences between implementation of a streaming data solution over the traditional store first, query second paradigm. SQLstream s flexible real-time data collection agents enabled the use of lightweight Java agents to reside outside of the central server, and to perform initial data filtering tasks, and to optimize the transport of valid data flows using SQLstream s Streaming Data Protocol (SDP). SDP is optimized for transport of high velocity, high volume data transport based on efficient data compression. Results Best Throughput ä SQLstream s-server performed at a truly immense level of throughput: 1,350,000 records per second per 4-core Intel Xeon server platform, based on a record payload size of 1 Kbyte. ä Performance throughput per server was 15x faster than the equivalent Storm-based solution. ä The customer s target of 10 million records per second required only 10 servers with the SQLstream solution. The equivalent Storm-based solution would require more than 110 servers. Results Lower Total Cost of Ownership SQLstream s-server was able to demonstrate significant cost savings with dramatically lower projected TCO - one third that of the alternative solution. The TCO savings came from a combination of reduced hardware and power consumption, but was also down to the power and simplicity of SQL over low-level Java development. The code to support the required use cases consisted of only 350 lines of commented SQL code, in contrast to the significant volume of java code development required to deliver a viable operational solution on the Storm framework.

SQLstream s-server The Streaming Big Data Engine for Machine Data Intelligence 7 Summary SQLstream is the Streaming Big Data Engine using machine data to generate operational intelligence. Our s-streaming products unlock the value of high-velocity unstructured log file, sensor and other machine data, giving new levels of visibility and insight and driving both manual and automated actions in real-time. Businesses are moving on from simple monitoring and search-based tools, and trying to understand the meaning and causes of business and system problems. This requires the ability to process high-velocity data on a massive scale. The results of this benchmark demonstrate that SQLstream s-server scales for the most extreme high velocity Big Data use cases while being the lowest TCO option, even when compared with open source or freeware projects. Advantages of SQLstream s s-server, the core element of s-streaming Big Data Engine, as demonstrated in the performance benchmark project include: ä Scaling to a throughput of 1.35 million 1Kbyte records per second per four-core server each fed by twenty remote streaming agents. ä Expressiveness of the standards-based streaming SQL language with support for enhanced streaming User Defined Functions and User Defined Extensions (UDF/UDX). ä Deploying new analytics on the fly without having to stop and recompile or rebuild applications. ä Advanced pipeline operations including data enrichment, sliding time windows, external data storage platform read and write, and other advanced time-series analytics. ä Advanced memory management, with query optimization and execution environments to utilize and recover memory efficiently. ä Higher throughput and performance per server for lower hardware requirements, lower costs and simple to maintain installations. ä Proven, mature enterprise-grade product with a validated roadmap and controlled release schedule. In summary, SQLstream exceled through a combination of a mature, industry-strength streaming Big Data platform, support for standard SQL (SQL:2008) for streaming analysis and integration, plus a flexible adapter and agent architecture. The result was class-leading performance with impressively low TCO. Using 20 remote agents pointed at each single s-server instance running on a 4-core Intel Xeon server platform, SQLstream was able to perform at a truly massive level of throughput: 1,350,000 records per second per 4-core server, with each event having an initial payload of 1 KByte.

SQLstream s-server The Streaming Big Data Engine for Machine Data Intelligence SQLstream, Inc. 1540 Market Street San Francisco, CA, 94102 www.sqlstream.com SQLstream (www.sqlstream.com) is the pioneer and innovator of a patented Streaming Big Data Engine that unlocks the real-time value of high-velocity unstructured machine data. SQLstream's s-streaming products put Big Data on Tap enabling businesses to harness action-oriented and predictive analytics, with on-the-fly visualization and streaming operational intelligence from their log file, sensor, network and device data. SQLstream's core V5 streaming technology is a massively scalable, distributed platform for analyzing unstructured Big Data streams using standards-based SQL, with support for streaming SQL query execution over Hadoop/HBase, Oracle, IBM, and other enterprise database, data warehouse and data management systems. SQLstream is headquartered in San Francisco, CA.