ESG Brief IBM: An Early Leader across the Big Data Security Analytics Continuum Date: June 2013 Author: Jon Oltsik, Senior Principal Analyst Abstract: Many enterprise organizations claim that they already consider security data collection and analysis as big data, but they don t have security analytics solutions capable of addressing their scalability, performance, or operational needs. ESG believes that tactical security analytics solutions and compliance-centric SIEM tools are no match for today s big data security analytics needs. Leading vendors are addressing this gap with real-time and asymmetric big data security analytics systems built for scale and intelligence. IBM is one of few vendors offering an integrated approach that spans the entire continuum of enterprise security analytics needs. Overview In many respects, enterprise organizations have been moving toward big data security analytics for a number of years long before the industry was talking about technologies like Hadoop, MapReduce, and NoSQL. Security analytics is now seen as a big data problem because of: The growing volume of security data. In the early 2000s, security data collection and analysis focused on network perimeter devices like firewalls and IDS/IPS. Over time, security analysts expanded data collection to include internal network devices, servers, applications, and databases. New IT initiatives like BYOD, cloud computing, and server virtualization exacerbated security data collection needs as did the increasing volume of machine-based data. Little wonder then that, according to ESG research, 86% of enterprise organizations collect substantially more or somewhat more security data today than they did two years ago (see Figure 1). 1 Figure 1. Growth in Amount of Data Collected for Information Security Activities How has the amount of data your organization collects to support its information security activities changed in the last 2 years? (Percent of respondents, N=257) We collect about the same amount of data to support our information security activities today as we did 2 years ago, 14% We collect somewhat more data to support our information security activities today than we did 2 years ago, 43% We collect substantially more data to support our information security activities today than we did 2 years ago, 43% Source: Enterprise Strategy Group, 2013. 1 Source: ESG Research Report, The Emerging Intersection Between Big Data and Security Analytics, November 2012.
ESG Brief: IBM: An Early Leader across the Big Data Security Analytics Continuum 2 Security data retention. Driven by a combination of compliance requirements, lower storage costs, and the frequency of security investigations, large organizations are keeping security data online for longer periods of time. In fact, 21% of enterprise organizations surveyed by ESG keep the security data they collect online for a substantially longer period of time than they did two years ago, while 46% keep the security data they collect online for a somewhat longer period of time than they did two years ago. 2 A multitude of security analytics use cases. Security data is used to analyze activities and metrics associated with risk management, incident detection/response, regulatory compliance, and investigations/forensics. It is not unusual for a security analyst to engage in more than a dozen security investigations simultaneously. Many of these investigations now include analysis of nontraditional data sources such as social media, customer browsing history, and business transactions. Security professionals are being asked to crosscorrelate this data alongside security analytics for fraud detection and long-term historical investigations. To support business requirements, manage risk, and respond to security events, CISOs collect, retain, and analyze a larger repository of data than they did in the past. Security data growth and utilization will only increase in the future. Big Data Security Analytics Defined At a high level, big data security analytics is simply a collection of security data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional security data processing applications. This is the exact situation occurring at large organizations where tactical security analytics and compliancecentric legacy SIEM tools can no longer keep up with the growing volume of security data. To address this volume of data, big data security analytics solutions distinguish themselves based upon three basic characteristics: Scale. Big data security analytics solutions must have the ability to collect, process, and store hundreds of terabytes (if not petabytes) of data for an assortment of security analytics activities. Analytical flexibility. Big data security analytics solutions must provide users with the ability to interact, query, and visualize this volume of data. Performance. Big data security analytics must be built on top of an appropriate compute architecture in order to collect and process data analytic algorithms and complex queries in an acceptable timeframe. The Big Data Security Analytics Continuum Aside from the general characteristics described previously, ESG believes it is useful to think of big data security analytics solutions along a continuum (see Figure 2). Two poles make up this scale: 1. Real-time big data security analytics 2. Asymmetric big data security analytics Big data security analytics solutions will tend to lean toward one end of the continuum or the other, although individual solutions may offer some features and functionality in both areas. 2 Source: Ibid.
ESG Brief: IBM: An Early Leader across the Big Data Security Analytics Continuum 3 Figure 2. The Big Data Security Analytics Continuum Real-time Big Data Security Analytics Source: Enterprise Strategy Group, 2013. These solutions may be quite familiar to CISOs because they are basic evolutionary iterations of existing SIEM, log management, network flow analysis, and IP packet capture tools. This new breed of real-time big data security analytics solutions is distinguished from legacy SIEM platforms by the solutions scalability, analytics intelligence, and performance characteristics. Real-time big data security analytics solutions generally feature: A highly distributed architecture. Real-time big data security analytics solutions are typically built upon multiple distributed data collection appliances. Individual collectors are responsible for collecting, processing, storing, and enriching local network data (i.e., adding metadata to enhance raw data with security context). High-speed stream processing engines. Aside from collecting data, distributed appliances are also responsible for high-speed stream processing of local data sets. Stream processing is used to accommodate the high I/O rate needed to process massive amounts of security data (i.e., logs, flows, packet capture, etc.). A proprietary data management repository. To address volume, performance, scale, and analytics requirements, real-time big data security analytics solutions tend to be built on top of proprietary distributed data management repositories rather than traditional SQL databases or big data platforms. In fact, the only SIEM platforms that truly qualify as big data security analytics are those designed with proprietary, highly scalable data repositories. Specific types of data feeds. Real-time security analytics solutions are finely tuned to understand and interpret activities associated with specific types of data typically logs, network flows, and/or IP packets. Real-time big data analytics may also accept security intelligence data feeds, providing further insight for incident detection/response based upon input like IP address reputation, command and control (C&C) communications, or malware profiles.
ESG Brief: IBM: An Early Leader across the Big Data Security Analytics Continuum 4 Real-time big data security analytics solutions often enrich raw data feeds with security-centric metadata. This data enrichment can help big data security analytics make sense of disparate security events while also providing some security context to link individual security events together in order to detect anomalous activities spanning multiple technologies. Additionally, incident detection is based upon a combination of programmed rule sets or machine learning. Real-time security analytics solutions may simply be more modern SIEM platforms designed for emerging high-volume, high-scale, and high-performance incident detection/response. While some real-time security analytics solutions offer reporting capabilities to support regulatory compliance requirements, this functionality is purely to support more comprehensive regulatory compliance and GRC activities/tools. Asymmetric Big Data Security Analytics Asymmetric big data security analytics solutions are designed to supplement real-time big data security analytics by providing high-performance platforms for the analysis of massive volumes of structured and unstructured data. In this way, asymmetric big data security analytics can look at data across long periods of time to establish baseline behavior and detect anomalies. Asymmetric big data security analytics solutions are also designed with the assumption that analysts may have no idea what they are looking for, where to start, or how to proceed. Because of this, analysts need the flexibility to analyze the data in a multitude of ways and easily pivot from one query to the next. To provide flexible analytics on massive volumes of security data, asymmetric big data security analytics solutions tend to include: A multitude of data feed types. Asymmetric big data security analytics solutions take in standard security data like logs, flow data, and IP packet capture but enterprises will enhance these with a wide variety of additional data feeds like transactions, e-mails, user click streams, botnet harvesting, attacker data, web logs, etc. It is not unusual for leading-edge organizations to collect, store, and analyze hundreds of different structured and unstructured data types. Support for diverse types of data is critical to enabling the types of wide-ranging investigations typically conducted by security analysts. A centralized architecture. While real-time big data analytics solutions depend upon distributed appliances for data collection and stream processing, asymmetric big data analytics solutions tend to be centrally located in data centers or security operation centers (SOCs). Real-time data feeds are likely captured by log management or SIEM solutions and then shared with asymmetric big data security analytics systems. Other more esoteric data feeds can arrive through APIs or based upon batch-based ETL operations. Emerging big data technologies. While several vendors offer proprietary solutions built for their own parallel processing HPC environments, many asymmetric big data security analytics solutions utilize numerous promising big data technologies such as Hadoop, MapReduce, Mahout, and Pig. Given the innovation and open source community around these technologies, ESG believes that all real-time and asymmetric big data security analytics solutions will include support for a number of these technologies in the future. Some organizations will run big data security analytics on generic big data platforms, but most will look for integrated big data security analytics solutions that offer scale, capacity, and baked-in security analytics. Server clusters. Hadoop, a key technical foundation for asymmetric big data analytics, is based upon a distributed file system (HDFS) and MapReduce (a patented software framework introduced by Google to support distributed computing of large data sets on clusters of standard Intel servers). These technologies provide horizontal scalability for storage and processing. When CISOs want to analyze more data, they simply add more servers to Hadoop clusters for parallel processing performance, load balancing, and highavailability. IBM Security Is Bridging the Big Data Security Analytics Continuum As the big data security analytics market evolves, all products and solutions must provide enterprise functionality like scalability, high-performance, out-of-box intelligence, and strong integration. This will weed out some SMB solutions, leaving vendors with lots of cybersecurity and enterprise software experience.
ESG Brief: IBM: An Early Leader across the Big Data Security Analytics Continuum 5 For the most part, big data security analytics solutions will fall into the real-time or asymmetric camp but there are a few exceptions to this rule. For example, IBM already stands out within the big data security analytics market as it: Offers a leading real-time big data security analytics solution. IBM demonstrated real moxy when it abandoned its legacy SIEM and acquired market leader Q1 Labs in 2011. QRadar is actually one of few SIEM platforms that qualify as a real-time big data security analytics solution as it offers a distributed architecture for stream/parallel processing combined with deep security algorithms and analytics. This gives QRadar the ability to collect, process, and analyze logs, network flows, IP packets, and X-Force threat intelligence feeds for effective/efficient incident detection and response. Given the current threat landscape featuring hacktivism, cybercrime, and APTs, many enterprise organizations are replacing compliance-centric legacy SIEM tools in favor of real-time big data security analytics delivered by the QRadar Security Intelligence Platform. Utilizes IBM analytics resources for asymmetric big data security analytics. In January 2013, IBM entered the asymmetric big data security analytics market with the announcement of IBM Security Intelligence with Big Data. This solution combines IBM s Hadoop-based analytics engine (i.e., Infosphere BigInsights) with specific algorithms and enhancements designed for cyber security analysts. For example, IBM includes Infosphere Big Sheets (likely based upon Datameer), a prepackaged tooling and visualization technology for emulating spreadsheets, to provide Excel pivot-table-like functionality when working with a Hadoop back-end repository. In addition to addressing both sides of the big data security analytics continuum, IBM Security understands that large enterprises will need both types of solutions within the next few years and will likely demand that all big data security analytics solutions combine into a common architecture. IBM is way ahead in this area with tight integration by offering a combined QRadar/ big data security analytics solution called the Security Intelligence with Big Data. For example, QRadar can, and always has, enriched security events and logs with metadata to add context for investigations and forensics. IBM s asymmetrical big data security analytics can also share data with QRadar when an investigation uncovers a specific network traffic pattern used in a sophisticated attack. This data can then be used to generate new rules, improving real-time security event detection. Clearly, CISOs will appreciate the benefits associated with this twoway integration. The Bigger Truth Big data security analytics isn t some distant vision anymore ESG research reveals that 44% of respondent enterprise organizations believe that their current levels of security data collection, processing, and analysis qualifies as big data today, while another 44% believe that their security data collection, processing, and analysis will qualify as big data within the next two years. 3 Yes, CISOs may be collecting terabytes or petabytes of data, but today s tactical security analytics tools and legacy compliance-centric SIEM systems aren t delivering value. What s needed? Big data security analytics solutions built for scale, intelligence, automation, and complex queries. Leading vendors are already bringing these kinds of solutions to market to meet real-time and asymmetric security analysis needs. Ultimately, large enterprises will need both real-time and asymmetric big data security analytics capabilities for incident detection as well as historical analysis of large volumes of structured and unstructured data. Smart CISOs will plan for these diverse needs by selecting best-of-breed real-time and asymmetric big data security analytics solutions built for two-way data sharing and integration. IBM Security is one of the few vendors already delivering solutions that address these enterprise requirements. All trademark names are property of their respective companies. Information contained in this publication has been obtained by sources The Enterprise Strategy Group (ESG) considers to be reliable but is not warranted by ESG. This publication may contain opinions of ESG, which are subject to change from time to time. This publication is copyrighted by The Enterprise Strategy Group, Inc. Any reproduction or redistribution of this publication, in whole or in part, whether in hard-copy format, electronically, or otherwise to persons not authorized to receive it, without the express consent of The Enterprise Strategy Group, Inc., is in violation of U.S. copyright law and will be subject to an action for civil damages and, if applicable, criminal prosecution. Should you have any questions, please contact ESG Client Relations at 508.482.0188. 3 Source: Ibid.