ROCANA WHITEPAPER Improving Event Data Management and Legacy Systems
INTRODUCTION STATE OF AFFAIRS WHAT IS EVENT DATA? There are a myriad of terms and definitions related to data that is the by-product of operational systems. Curt Monash is credited for creating the term machine data 2. Log data is commonly used term but refers to only a subset of the data. ExtraHop 3 has defined four classes of data: wire, machine, metric, and synthetic. In this paper we use the term event data to be the union of all these terms. As IT infrastructure evolves to a more dynamic and elastic infrastructure, IT operators are encountering a new set of challenges in controlling their modern environments. Most obvious among these challenges is the ability to manage the massive amount of event data being generated in global-scale enterprises estimates are that machine data is growing at the rate of 40% annually 1. Many businesses are already generating terabytes of event data per day, rapidly growing to tens if not hundreds of terabytes per day in the next few years. Managing event data is the first step of the journey to utility, and analyzing event data is a second big challenge. Event data has myriad uses from monitoring and managing IT systems to security to business intelligence. There have been many attempts at providing a software solution to monitoring and analyzing event data, the most well-known being Splunk. As with many other legacy solutions, the industry has started to recognize some serious liabilities, including: cost licensing policies complexity closed architecture COST Increasingly, businesses are looking for alternatives to increasing Splunk spend due to cost structure alone. A recent study by Blue Hill Research 4 determined that the one-year TCO for a 1TB/day implementation of Splunk Enterprise was nearly $1,000,000, with the three-year TCO exceeding $2M. Because of Splunk s cost structure, most businesses are very selective about which data is ingested in order to save money. This decision to sample data often causes problems down the line when data needed is not available. Another reason businesses are looking for alternatives to Splunk is the hostage clause in the license agreement for Splunk Enterprise. The agreement sets limited conditions for daily ingest volumes. When these limits are exceeded, access to data is blocked until the license violation is rectified. Often this means purchasing an unanticipated, unbudgeted license upgrade. Worse yet, event data volumes typically surge right before or when problems are occurring, so this licensing policy often means that Splunk becomes unavailable just when it is needed most. COMPLEXITY Managing and using Splunk is so complicated that the community of Splunk experts has become known as ninjas for their advanced skillset. The complexity is twofold: (1) configuring the actual hardware infrastructure and capacity planning and (2) implementing the brute-force user model to find problems and build solutions. 2015 Rocana, Inc. 1
Splunk provides a reference architecture that involves indexers, search heads, and load balancers. The ratio of search heads and index heads is highly dependent on the queries users execute, making capacity planning a very challenging proposition. Worse, Splunk throughput is highly affected by both disk utilization rates and query class. The brute-force approach employed by Splunk is a second axis of complexity. Users must navigate through mountains of data to find records of real value. This is an artifact of Splunk s user model, in which everything is a search query. Even integrations with other systems are query based: save a query, execute the saved query, have an external tool pick up the results. CLOSED ARCHITECTURE Like a roach motel, Splunk is a one-way street. Splunk is a completely proprietary product; there is no way to access data other than through Splunk-provided tools. The data and indexes are closed. The only APIs are those provided by Splunk. All data access is query-based, either directly or through the API. This hurts businesses in several ways: The biggest limitation of the closed architecture is sharing data with other systems. Splunk provides methods for sharing data, but they are query-based. This makes Splunk effectively a batch-oriented system. Real-time or streaming analysis with external tools is not possible. The workaround of rapidly re-issuing queries can address this problem to a large degree, but at the cost of excessive system overhead, which in turn means an increase in required system resources and a higher TCO. Additionally, the proprietary nature of Splunk has limited the pool of so-called ninjas. The scarcity of Splunk professionals results in an excessive cost to hire. Also, because these resources are in great demand, they are frequently targeted by recruiters resulting in a high risk of turnover. A BETTER APPROACH Rocana Ops was created from the ground up using a modern approach to event data management relying on proven Big Data software, open source standards, machine learning, and purpose-built visualizations to bring a superior solution to market. Rocana Ops was designed to: encourage data collection and retention; provide an open standards based approach; support multiple integration & analysis techniques; and augment IT operations. 2015 Rocana, Inc. 2
ENCOURAGE DATA COLLECTION AND RETENTION Rocana s licensing policies are not tied to the amount of data being indexed but rather the number of users accessing the system. Licensing is easy, predictable, and cost-efficient without the threat of being held hostage. These licensing policies were developed so that businesses can collect data from all of their systems and integrate data into an event data warehouse. This addresses critical business needs such as: No more data silos and resultant miscommunication about the state of systems. Providing a single source of truth for all IT monitoring and management activities. Assuring that all necessary data is available when it is needed. Having a single source of all necessary data is especially important in security use cases, where issues are often discovered months after the fact and retroactive analysis is commonplace. USE OPEN STANDARDS By building Rocana Ops on top of open source software and following open standards, Rocana greatly reduces the TCO and risk to businesses. Data in Rocana Ops is managed using popular open source products and can be accessed using standard software that businesses are already using. This means businesses have a large pool of candidates to choose from when building out their teams, and these team members have skills that are transferable to other projects. Not only are such human resources easier to find than ninjas, they are also less expensive, and community resources are available for training and troubleshooting. SUPPORT MULTIPLE INTEGRATION & ANALYSIS TECHNIQUES Unlike legacy systems, which employ a brute-force search interface, Rocana Ops provides a much more flexible implementation for integration and analysis. First and foremost is the publish-subscribe architecture supported by Rocana Ops. 2015 Rocana, Inc. 3
The Rocana Ops architecture provides real-time streaming of events to external applications. Rocana Ops can establish many different channels, each with its own filtered set of data for downstream applications. The data published to these channels can be both modified and augmented (for example, a Geo IP lookup, database merge, etc.) before being published, and Rocana Ops always allows users to retain the original unaltered event. This design gives users a critical capability to replay data multiple times without affecting the source data. This allows Rocana Ops to serve as a deep data store for 3rd party tools and aids in testing and tuning machine learning models until the exact outputs desired are obtained. Rocana Ops also supports query-based interfaces to data. Developers can choose the best implementation option for their application: streaming or query-based. Since Rocana Ops is built using open source software, application developers can use their favorite SQL tools or other Hadoop compatible solutions to query the data stored by Rocana Ops, avoiding the challenges and liabilities that come with proprietary solutions. AUGMENT IT OPERATIONS Rocana Ops does much more than just collect and manage event data. Rocana Ops includes out-of-the-box functionality to make it easier for IT administrators to monitor and manage systems. Unlike brute-force solutions, Rocana Ops automatically organizes and presents multiple visualizations of data, organized by location, service, and host. Users can drill-down into system metrics, utilization, and detailed event data. Rocana Ops supports multiple visualization views out of the box, including changes in event volumes, annotated event data and custom metrics and dashboards. Rocana Ops employs cutting edge machine-learning features like anomaly detection, which can find aberrant activity such as CPU thrashing. Rocana s anomaly detection is not a simple threshold evaluation; it accounts for naturally occurring differences that might be caused by periodicity. CONCLUSION Legacy event data management and analysis solutions such as Splunk are greatly limited by implementation and design choices made a decade ago when proprietary solutions were still in vogue and the volume, variety, and velocity of machinegenerated data was much different. Rocana provides a modern alternative that is better in several regards: simpler and greater scalability open data access and formats out-of-the-box functionality for augmented IT ops open integration and rich analytics significantly lower TCO 2015 Rocana, Inc. 4
Rocana combines the assurance of predictable pricing and the scalability of a modern Big Data architecture, as well as the collaborative nature and reliability of open source components all to provide the next generation of event data management for controlling modern IT infrastructure. ABOUT ROCANA Rocana is creating the next generation of IT operations analytics software in a world in which IT complexity is growing exponentially as a result of virtualization, containerization and shared services. Rocana s mission is to provide guided root cause analysis of event oriented machine data in order to streamline IT operations and boost profitability. Founded by veterans from Cloudera, Vertica and Experian, the Rocana team has directly experienced the challenges of today s IT infrastructures, and has set out to address them using modern technology that leverages the Hadoop ecosystem. 1 http://www.dbms2.com/2015/01/30/growth-in-machine-generated-data/ 2 https://en.wikipedia.org/wiki/machine-generated_data 3 http://www.extrahop.com/post/blog/the-four-data-sets-essential-for-it-operations-analytics-itoa/ 4 http://www.tibco.com/assets/blt6d15d0ef383138d2/research-report-estimating-the-cost-of-machine-data-managementsplunk-and-tibco-loglogic.pdf Rocana, Inc. 548 Market St #22538, San Francisco, CA 94104 +1 (877) ROCANA1 info@rocana.com www.rocana.com 2015 Rocana, Inc. All rights reserved. Rocana and the Rocana logo are trademarks or registered trademarks of Rocana, Inc. in the United States and/or other countries. WP-EDM-0715 2015 Rocana, Inc. 5