2 2 The V of Big Data Velocity means both how fast data is being produced and how fast the data must be processed to meet demand. Gartner Static Big Data is a necessary but insufficient component of a proactive and responsive business. What is required is a way to not only understand what was happening and what is happening, but also to predict what will happen and to take action - through harnessing the power of real-time, streaming Big Data management. This paper documents a streaming, high velocity solution for an industry problem in the Telecom sector, addressing a significant business issue impacting QoS for 4G/LTE subscribers in all geographies. The requirement was to detect time-based patterns from network performance data that were predictors of potential service failures. The throughput performance requirement was 10 million records per second. The customer performed a comparison benchmark with SQLstream s real-time operational intelligence platform against an alternative solution based on the Storm open source project. The results demonstrated that SQLstream s-server, our core Streaming Big Data Engine, performed 15x faster with significantly lower Total Cost of Ownership as projected over the lifetime of the solution.
3 3 The customer was concerned with applying increasingly complex business logic across multiple sources of network element and radio tower data streams, with a goal of aggregating and analyzing the data payloads with near-zero latency. Customer needs As is the case with many modern connected businesses, Telecom data flows exhibit high data rates with large data payload packets. The customer was concerned with the increasingly complex business logic required for low-latency aggregation and analysis across multiple sources of network element and radio tower data streams. Goal: to increase the quality of service in a dynamic and immediate manner, ensuring robust cellular transmissions and eliminating possible negative The objective was to increase the quality of service (QoS) in a dynamic and immediate manner, ensuring cellular transmissions could be made more robust and eliminate as far as possible negative events such as dropped calls and low-quality voice paths. The traditional data management approach of collecting and storing the data streams before processing could not deliver the low-latency analytics required from the high-volume, highvelocity data streams. A call or data transmission would have failed by the time the event was identified.
4 4 In addition, management network architectures for modern 4G/LTE radio towers require multiple regional data centers. The massive data volumes for this particular use case would have required systems to be implemented at each tower site, and then to aggregate each tower s pre-processed data in the relevant regional data centers. Deploying potentially numerous systems in diverse and often remote geographic locations was cost prohibitive. Any solution must be able to handle the constant high volume data payload traffic, but also scale up and scale out during periods of large spikes in traffic volumes. The solution must also be able to handle different data structures and formats, as well as operational differences such as legacy equipment and differences in device firmware or software versions. Addressing the large number of system platform permutations and delivering a normalized flow of data at high volumes with low latency was also a prime consideration. Flexibility in the field would be paramount. The customer decided the most appropriate data management architecture was to deploy a streaming solution. The high level architecture would require remote data collection agents to capture and stream performance data to a single central platform. The central platform must be able to scale dynamically up to the peak forecast load of 10 million records per second. Data must also be filtered, parsed and enhanced dynamically as part of the real-time pipeline flow. Aggregated and streaming intelligence feeds must be delivered continuously to existing non-real-time data warehouses and JDBC applications.
5 5 A Benchmark Comparison Solution Many engineering organizations begin with an evaluation of the most appropriate open source or freeware project. In this benchmark comparison, the alternative selected by the customer was the Storm real-time distributed processing framework with additional open source, Java-based stream processing libraries to address the streaming data analytics and data aggregation. The resulting solution required a number of additional coding steps in order to produce an operationally viable solution based on the latest release software of the project software, including: Integration of the Storm messaging middleware technology with the Java-based stream-processing library. Development of the data aggregation and analytics use cases as Java extensions to the core project framework. The resulting development effort required a considerable amount of bespoke coding effort to deliver an operational solution. However, three further considerations also contributed to the higher overall TCO costs: Lower performance per server required a significantly higher number of servers in order to realize the target throughput. This also contributed to TCO impacting components such as server costs, power and solution administration and management. As with other low-level procedural frameworks, new or updated analytics required the core engine to be stopped and restarted. This disruption impacts operational service level agreements and drives higher maintenance overheads. Ongoing support and maintenance of custom code over the lifetime of the project (typically measured over a five year period).
6 6 SQLstream s-server: High Throughput Scalability with Lower TCO The customer s evaluation team approached SQLstream based on our Google BigQuery relationship, Hadoop support, time-series credentials such as the UCSD Seismology deployment, plus other commercial references for real-time operational intelligence. SQLstream provided the customer with the SQLstream s-server 3.0 strea, processor with supporting developer and user documentation. The code to handle all streaming pipelines consisted of only 350 lines of commented SQL code, driving the lowest TCO to further address the ongoing as-deployed maintenance and support of complex applications in the field. The customer team was able to develop prototypes quickly for several different business use cases. SQLstream s Technical Support Team provided support when requested and suggestions for solution optimization, in particular, providing guidance on the differences between implementation of a streaming data solution over the traditional store first, query second paradigm. SQLstream s flexible real-time data collection agents enabled the use of lightweight Java agents to reside outside of the central server, and to perform initial data filtering tasks, and to optimize the transport of valid data flows using SQLstream s Streaming Data Protocol (SDP). SDP is optimized for transport of high velocity, high volume data transport based on efficient data compression. Results Best Throughput SQLstream s-server performed at a truly immense level of throughput: 1,350,000 records per second per 4-core Intel Xeon server platform, based on a record payload size of 1 Kbyte. Performance throughput per server was 15x faster than the equivalent Storm-based solution. The customer s target of 10 million records per second required only 10 servers with the SQLstream solution. The equivalent Storm-based solution would require more than 110 servers. Results Lower Total Cost of Ownership SQLstream s-server was able to demonstrate significant cost savings with dramatically lower projected TCO - one third that of the alternative solution. The TCO savings came from a combination of reduced hardware and power consumption, but was also down to the power and simplicity of SQL over low-level Java development. The code to support the required use cases consisted of only 350 lines of commented SQL code, in contrast to the significant volume of java code development required to deliver a viable operational solution on the Storm framework.
7 7 Summary SQLstream is the Streaming Big Data Engine using machine data to generate operational intelligence. Our s- Streaming products unlock the value of high-velocity unstructured log file, sensor and other machine data, giving new levels of visibility and insight and driving both manual and automated actions in real-time. Businesses are moving on from simple monitoring and search-based tools, and trying to understand the meaning and causes of business and system problems. This requires the ability to process high-velocity data on a massive scale. The results of this benchmark demonstrate that SQLstream s-server scales for the most extreme high velocity Big Data use cases while being the lowest TCO option, even when compared with open source or freeware projects. Advantages of SQLstream s s-server, the core element of s-streaming Big Data Engine, as demonstrated in the performance benchmark project include: Scaling to a throughput of 1.35 million 1Kbyte records per second per four-core server each fed by twenty remote streaming agents. Expressiveness of the standards-based streaming SQL language with support for enhanced streaming User Defined Functions and User Defined Extensions (UDF/UDX). Deploying new analytics on the fly without having to stop and recompile or rebuild applications. Advanced pipeline operations including data enrichment, sliding time windows, external data storage platform read and write, and other advanced time-series analytics. Advanced memory management, with query optimization and execution environments to utilize and recover memory efficiently. Higher throughput and performance per server for lower hardware requirements, lower costs and simple to maintain installations. Proven, mature enterprise-grade product with a validated roadmap and controlled release schedule. In summary, SQLstream exceled through a combination of a mature, industry-strength streaming Big Data platform, support for standard SQL (SQL:2008) for streaming analysis and integration, plus a flexible adapter and agent architecture. The result was class-leading performance with impressively low TCO. Using 20 remote agents pointed at each single s-server instance running on a 4-core Intel Xeon server platform, SQLstream was able to perform at a truly massive level of throughput: 1,350,000 records per second per 4-core server, with each event having an initial payload of 1 KByte.
8 SQLstream, Inc Market Street San Francisco, CA, SQLstream (www.sqlstream.com) is the pioneer and innovator of a patented Streaming Big Data Engine that unlocks the real-time value of high-velocity unstructured machine data. SQLstream's s-streaming products put Big Data on Tap enabling businesses to harness action-oriented and predictive analytics, with on-the-fly visualization and streaming operational intelligence from their log file, sensor, network and device data. SQLstream's core V5 streaming technology is a massively scalable, distributed platform for analyzing unstructured Big Data streams using standards-based SQL, with support for streaming SQL query execution over Hadoop/HBase, Oracle, IBM, and other enterprise database, data warehouse and data management systems. SQLstream is headquartered in San Francisco, CA.
Customer Cloud Architecture for Big Data and Analytics Executive Overview Using analytics reveals patterns, trends and associations in data that help an organization understand the behavior of the people
Plug Into The Cloud with Oracle Database 12c ORACLE WHITE PAPER DECEMBER 2014 Disclaimer The following is intended to outline our general product direction. It is intended for information purposes only,
White Paper Intel Reference Architecture Big Data Analytics Predictive Analytics and Interactive Queries on Big Data WRITERS Moty Fania, Principal Engineer, Big Data/Advanced Analytics, Intel IT Parviz
Three steps to put Predictive Analytics to Work The most powerful examples of analytic success use Decision Management to deploy analytic insight in day to day operations helping organizations make more
WHITE PAPER 1ntroduction... 2 Zenoss Enterprise: Functional Overview... 3 Zenoss Architecture: Four Tiers, Model-Driven... 6 Issues in Today s Dynamic Datacenters... 12 Summary: Five Ways Zenoss Enterprise
An Oracle White Paper July 2013 Oracle Enterprise Operations Monitor: Real-Time Voice over Internet Protocol Monitoring and Troubleshooting Introduction... 1 Overview... 2 Key Functions and Features...
An Oracle White Paper June 2013 Oracle: Big Data for the Enterprise Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure
The Definitive Guide tm To Cloud Computing Ch apter 10: Key Steps in Establishing Enterprise Cloud Computing Services... 185 Ali gning Business Drivers with Cloud Services... 187 Un derstanding Business
89 Fifth Avenue, 7th Floor New York, NY 10003 www.theedison.com 212.367.7400 Start: 3:02 End: White Paper IBM POWER8: Performance and Cost Advantages in Business Intelligence Systems This report was developed
White Paper Data Warehouse Optimization with Hadoop A Big Data Reference Architecture Using Informatica and Cloudera Technologies This document contains Confidential, Proprietary and Trade Secret Information
Microsoft System Center 2012 R2 Why Microsoft? For Virtualizing & Managing SharePoint July 2014 v1.0 2014 Microsoft Corporation. All rights reserved. This document is provided as-is. Information and views
ENSEMBLE WHITE PAPER ENABLING THE REAL-TIME ENTERPRISE BUSINESS ACTIVITY MONITORING WITH ENSEMBLE 1 INTRODUCTION ENABLING THE REAL-TIME ENTERPRISE Executive Summary Business Activity Monitoring (BAM) enhances
IT@Intel White Paper Intel IT IT Best Practices Private Cloud and Cloud Architecture December 2011 Best Practices for Building an Enterprise Private Cloud Executive Overview As we begin the final phases
Cloud optimize your business Windows Server 2012 R2 Published: October 7, 2013 Contents 1 Trends 3 Windows Server: cloud optimize your business 5 Windows Server 2012 R2 capability overview 5 Server virtualization
Benchmark Testing Results: OpenText Email Monitoring and Records Management Running on SQL Server 2012 Running OpenText Email Monitoring and Records Management on Microsoft SQL Server 2012 provides excellent
ericsson White paper Uen 284 23-3264 February 2015 Next-generation data center infrastructure MAKING HYPERSCALE AVAILABLE In the Networked Society, enterprises will need 10 times their current IT capacity
OPEN DATA CENTER ALLIANCE : sm Big Data Consumer Guide SM Table of Contents Legal Notice...3 Executive Summary...4 Introduction...5 Objective...5 Big Data 101...5 Defining Big Data...5 Big Data Evolution...7
An Oracle White Paper March 2013 Big Data Analytics Advanced Analytics in Oracle Database Advanced Analytics in Oracle Database Disclaimer The following is intended to outline our general product direction.
HRG Assessment Vblock Unified Data Solution for Oil & Gas It is 2014 and oil and gas industry leaders are engaged in making the digital oil field a reality. While this is a good first step in improving
Front cover IBM SmartCloud: Building a Cloud Enabled Data Center Redguides for Business Leaders Pietro Iannucci Manav Gupta Learn how to choose the infrastructure as a service (IaaS) solution that best
Role of Analytics in Infrastructure Management Contents Overview...3 Consolidation versus Rationalization...5 Charting a Course for Gaining an Understanding...6 Visibility into Your Storage Infrastructure...7
Reducing Total Cost of Ownership: Delivering Cost Effective Enterprise Business Intelligence A White Paper by MicroStrategy Reducing Total Cost of Ownership: Delivering Cost Effective Enterprise Business