White Paper Solving Agencies Big Data Challenges: PED for On-the-Fly Decisions Carina Veksler, NetApp March 2012 WP-7158 ABSTRACT With the growing volumes of rich sensor data and imagery used today to derive meaningful intelligence, government agencies need to address the challenges posed by these big datasets. NetApp provides a scalable, unified single pool of storage to better handle your processing and analysis of data to drive actionable intelligence in the most demanding environments on earth.
TABLE OF CONTENTS 1 Processing, Exploitation, and Dissemination (PED)...3 1. 1 PED Requirements...3 2 PED Impact on Storage...4 2.1 Data Growth Trends...4 3 NetApp Storage for PED...5 3.1 Solutions Approach...6 3.2 Unique Differentiation...6 4 Summary...7 LIST OF FIGURES Figure 1) Information requirements 4 Figure 2) Linear scaling of the E5460...7 2 Solving Agencies Big Data Challenges: PED for On-the-Fly Decisions
1 Processing, Exploitation, and Dissemination (PED) Accurate intelligence data forms the foundation for sound decision-making across government agencies. Increasingly, this data comes in raw form from multiple large data sensors, providing the necessary information for both manual and automated analysis. Imagery and other sensor data is useless, however, unless the right quality and quantity of information can be extracted from that data to effectively manage, exploit, analyze, interpret, and disseminate it for faster and more effective action. By processing and exploitation, we mean converting the immense volume of data collected into a form that can be used by analysts. This is done through decryption, language translation, and data reduction. Beyond this, dissemination refers to quickly routing relevant, accurate, mission-critical information to the right people at the right time. However, much of this sensor data has already started to stress today s existing storage architectures. Government agencies face big data challenges related to the immense ingest of large amounts of data and they have begun asking the following questions: Are there opportunities for me to take better advantage of my data? How can I make smarter, more meaningful decisions to support my organization s mission? What are the insights that could enable mission success? Do I have the ability to identify the hot spots that likely will fail before they fail? NetApp can help you answer these questions and meet these challenges. We re going to find ourselves in the not too distant future swimming in sensors and drowning in data. The answer isn t throwing more manpower at it because in DoD, we don t have it we are going to have to use technology and smarter systems. Lt. Gen. David A. Deptula First Deputy Chief of Staff ISR 1. 1 PED Requirements As government agencies deploy new, more sophisticated information-gathering systems, users are confronted with a range of collection, integration, and management issues. These systems must support: Large bandwidth-ingest. Sensor inputs vary in workloads and require extremely high bandwidth for large sequential writes generally associated with streaming data from a variety of sensors: motion imagery, video, radar, and satellite imagery. Long-term archival. A dense form factor for the storage platform is mandatory to support the growing volumes of data. Analysis. Broad support is required for many operating systems and export formats to accommodate both processing and exploitation tools across multiple agencies. Distribution. The ability to support a variety of transport mechanisms and end points is needed for clients to handle dissemination over geographically distributed PED cells effectively. 3 Solving Agencies Big Data Challenges: PED for On-the-Fly Decisions
2 PED Impact on Storage Modern warfare has changed in many ways. One of the most revolutionary and powerful developments in the field of battle has been the use of unmanned aerial vehicles (UAVs) and satellites to boost intelligence, surveillance, and reconnaissance (ISR) capabilities. These resources allow analysts, operators, and decision makers to monitor much larger areas and provide greater situational awareness with far less risk to service personnel. In 2009, drone aircraft flying over Iraq and Afghanistan returned roughly 24 years worth of video footage for processing. Updated models deployed in 2010 produced 10 times as many data streams as their predecessors, and those in 2011 will triple that workload. 1 Fast transfer and storage of rich video, motion imagery, and other large sensor data form the basis for PED workflows. Sensor data and intelligence must be available for analysis and interpretation as quickly as possible to enable teams to make split-second decisions. The less time spent ingesting images, the more time spent in logistics decisions, threat detection, and intelligence gathering. High-frame-rate video streams from multiple simultaneous sources are becoming more common, making the requirements for video transfer and storage even more daunting. Figure 1) ISR architecture requirements. 2.1 Data Growth Trends The growth of big data generated by large data sensors and intelligence, surveillance, and reconnaissance (ISR) systems is putting enormous pressure on existing compute infrastructures, especially the storage platform. These larger datasets contain a wealth of useful information that, if analyzed in a timely fashion, can provide valuable intelligence for mission success. But without the necessary analytical tools, these valuable sources of data become useless. 1 The Data Deluge, http://www.economist.com/node/15579717 (Feb. 25, 2010). 4 Solving Agencies Big Data Challenges: PED for On-the-Fly Decisions
Infrastructure Breaking Points Big data is breaking today s storage infrastructure along three major axes: 1. Complexity. Data is just text and numbers; and big data is about finding the information hidden in huge volumes of data. Once found, information must be rapidly linked from a wide variety of sources, leading to high fidelity decision support that spans multiple sources and data types, each improving decision confidence. Using normal algorithms for search, storage, and categorization is becoming increasingly complex and inefficient. 2. Speed. How fast is the data coming in? How fast can it be processed? Is there relevant information buried in the data? High-definition video (FMV) and wide-area motion video (WAMI) for surveillance have very high ingestion rates and requires automated information extraction to improve the time to information. Time to information is critical if agencies want to derive maximum value from this data. Taking weeks or months to run an analysis is no longer a viable option as it will not be timely enough to detect patterns that may affect the success of the mission. 3. Volume. All collected data must be stored in a place that is secure and always available. With such high volumes of data, IT teams now have to make decisions about what is too much data. This abundance of data can cause the infrastructure to quickly break on the axis of volume. Once the information is found, it becomes much easier to identify what needs to be kept and for how long. Best Practice Effective PED environments require a multisensor datastore that delivers high bandwidth to support: Large sequential writes during ingest Frequent random reads during processing and exploitation NFS based access for easy integration Efficient local and remote access during dissemination Independent capacity and performance scaling Extreme density to support the increasing data volumes and retention times 3 NetApp Storage for PED The growing volume of streamed video now generated monthly by government agencies is equal to what was generated annually as recently as in 2007. This growth rate requires a scalable storage solution that allows multiple agencies to share satellite feeds for collaboration, analysis, and dissemination in almost real time. Currently, intelligence operations frequently create multiple local copies from a central master when time and bandwidth management are of the essence in making timely decisions. High-speed local and remote file sharing and scalable file system performance are critical to support rapid information extraction from the data generated by wide-area motion imagery (WAMI), full-motion video (FMV), radar, and satellite imagery. The NetApp Full-Motion Video solution provides quick reference to critical information, enabling agencies to drive operational efficiencies and reduce time and energy. NetApp storage solutions help government agencies take advantage of the tens of thousands of hours of live video captured each year. With better insight, that data can be turned into quality information to ultimately help users make better decisions within necessary time frames. 5 Solving Agencies Big Data Challenges: PED for On-the-Fly Decisions
3.1 Solutions Approach NetApp delivers preconfigured, pretested solutions that are designed to capture multiple high-speed feeds, such as video and satellite. By enabling faster data exploitation, agencies have the information needed to make better and informed decisions. Based on the E-Series platform, the NetApp storage solution is optimized for capturing and examining rich video for improved decision making. Our solutions enhance situational awareness and command decision-making processes across both strategic and tactical agencies. Optimized for Performance NetApp solutions are designed to handle the extreme bandwidth requirements for large sequential writes generated by streaming data from a variety of sensors, including motion imagery, video, radar, and satellite imagery. The NetApp E5424 delivers both high bandwidth and high IOPS with leading price performance. The E5424 saves money by consuming 50% less power using up to 24 2.5" SAS drives in a 2U form factor. A fully loaded rack delivers performance of up to 35GB/sec sustained disk read throughput, 30GB/sec sustained disk write throughput, and 350,000 sustained IOPS. Maximum Density for Longer Retention NetApp storage solutions are designed to handle the growing volume of data generated by large data sensors, streamlining the footprint for long-term archiving. The NetApp E5460 delivers optimized storage density for maximum capacity with excellent performance, supporting up to 60 drives in each 4U enclosure. The E5460 supports high-capacity near-line SAS disk options that are superior to SATA drives. SAS disks are becoming the drive technology of choice for high capacity and lower cost per MB/sec, and they are an excellent choice for throughput-intensive applications. The 4U enclosure holds 60 disk drives in 5 drawers, delivering roughly 4.4GB/sec of read throughput and 2.9GB/sec of write throughput in one 40U. 3.2 Unique Differentiation NetApp delivers high-performance storage systems that meet the demanding performance and capacity requirements of PED environments without sacrificing simplicity and efficiency. Designed to meet wideranging requirements, their balanced performance is equally adept at supporting high-performance file systems and bandwidth-intensive streaming applications. The Full-Motion Video Solution for Processing, Exploitation, and Dissemination provides the extreme bandwidth required to efficiently ingest the tens of thousands of hours of sensor data collected every year. This allows you to ingest and stream higher-resolution video and provide enhanced performance, By using a shared file system that allows multiple nodes to access large datasets in parallel, you can dramatically improve the time it takes to input and output large or streaming files so you can better focus on analyzing and understanding the data to deliver meaningful intelligence on the fly. The NetApp solution allows you to keep all sensor data in a single repository, using fewer storage arrays and a single namespace for huge libraries even reaching up to 1+PB. NetApp also helps to improve the efficiency of the overall ISR workflow of active data with policy-based automatic archiving allowing the collapse of storage tiers while maintaining the bandwidth required to support ISR workflows. NetApp allows you to deploy active archives using cost-effective online storage for fast access to historical content or seamless integration with industry-leading archive management software. Linear Scaling To accommodate projected data growth, NetApp has designed the NetApp Full-Motion Video solution to scale linearly in both capacity and bandwidth. Our solution also allows you to expand in capacity to support expected data growth independent of performance. This enables the storage system to expand to the size required, but also to the functionality required by the environment (bandwidth versus density). 6 Solving Agencies Big Data Challenges: PED for On-the-Fly Decisions
The modular design of the E5400 storage arrays simplifies scaling and increases flexibility. The multiple drive shelf options enable custom configurations that can be tailored for any environment. You can mix drive types in a single enclosure so you can address different requirements with the same system. By combining elements from each solution, you can create a storage deployment tailored to your specific big-bandwidth requirements that will grow with your needs and protect your investment. Figure 2) Linear scaling of the E5460. 6 x E5460 5 x E5460 4 x E5460 3 x E5460 1 x E5460 2 x E5460 Drives (n) 30-60 90-120 150-180 210-240 270-300 330-360 Capacity (TB) 60-180 180-360 300-540 420-720 540-900 660 1080 Bandwidth when scaling systems (GB/s, writes)* 2.9 5.8 8.7 11.6 14.5 17.4 * With StorNext File System and representative workload 4 Summary Government agencies need an effective storage strategy to manage and maintain the growing volume of data generated by motion imagery, video, radar, and satellites. The ability to use technology in a smarter way is now a necessity. In addition, it is critical to have access to tools that automate processes and make it easy to manage, retain, and retrieve time-sensitive information. NetApp solutions deliver optimized storage for capturing and examining sensor feeds for better decision making. With the solutions ultradense form factor, data can be stored for longer periods of time, helping to improve decision support. Bandwidth. The solution handles the rigors of heavy computational workloads and bandwidthsensitive streaming environments. Reliability. Advanced thermal and power features provide fast and confident deployment with preconfigured, pretested options. Availability. Standard redundant components provide the utmost availability of critical data. Linear scalability. The solutions can accommodate growing data streams and access requirements. Effective storage strategies must address the network bandwidth issues presented by the size of the files captured by motion imagery, radar, satellites, and UAVs. Agencies that rely on PED solutions now need to evaluate whether continuing with their current strategy can adequately meet both current needs and support projected workloads in the future. 7 Solving Agencies Big Data Challenges: PED for On-the-Fly Decisions
NetApp provides no representations or warranties regarding the accuracy, reliability or serviceability of any information or recommendations provided in this publication, or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS, and the use of this information or the implementation of any recommendations or techniques herein is a customer s responsibility and depends on the customer s ability to evaluate and integrate them into the customer s operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document. 2012 NetApp, Inc. All rights reserved. No portions of this document may be reproduced without prior written consent of NetApp, Inc. Specifications are subject to change without notice. NetApp, the NetApp logo, and Go further, faster are trademarks or registered trademarks of NetApp, Inc. in the United States and/or other countries. All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such. WP-7158-0312 8 Solving Agencies Big Data Challenges: PED for On-the-Fly Decisions