Worldwide Headquarters: 211 North Union Street, Suite 105, Alexandria, VA 22314, USA P.571.296.8060 F.508.988.7881 www.idc-gi.com T a c k l i ng Big Data w i th High-Performance Computing W H I T E P A P E R Sponsored by: DataDirect Networks Shawn P. McCarthy December 2011 I D C G O V E R N M E N T I N S I G H T S O P I N I O N Big Data is a term used to describe a new generation of technologies and architectures designed to extract value (including intelligence information, changes, and trends) from very large volumes of data. Big Data systems usually are capable of high-velocity capture, movement of that data across systems, and tools for navigation and information discovery and analysis. For governments, Big Data can sometimes mean big challenges. This is particularly true for intelligence agencies, which gather more data than nearly any other organization. On any given day, intelligence operations may obtain terabytes (TB) or even petabytes (PB) of new information, including satellite imagery, recorded digital video, data from sensor arrays, and much more. The ongoing challenge for governments occurs when they have data volumes that exceed the capacity of their current IT infrastructure. This not only can have a negative impact on an organization's IT system but also can greatly reduce a group's ability to conduct important daily business. In intelligence operations, where timeliness and accuracy are of life and death importance, failure cannot be tolerated. Highly reliable systems capable of high-speed data transfer and flawless archiving and retrieval are of great importance. Big Data systems enable the collection and analysis of very large amounts of unstructured data, sometimes in an ad hoc manner with no immediate formal schema. Managing the large amounts of data now required by the Web applications used by intelligence groups may require that newly acquired data be served to hundreds or even thousands of concurrent users; and sometimes these files can be quite large. These operations also need to be able to search for data in a way that visualizes changes in videos or that represents extensive relationship networks of people or other entities. IDC Government Insights is tracking a new breed of systems and tools that help consumers of Big Data not only make sense of the vast data they collect but also move the data quickly, making it available to end users around the globe and setting rules for data access and archiving. December 2011, IDC Government Insights #GI232292
I N T H I S W H I T E P A P E R IDC Government Insights believes the Big Data needs of governments will grow rapidly in the coming years, especially for intelligencerelated operations. This paper highlights the new needs related to this growth and how they are being answered. This paper is based on interviews with operational organizations that are helping build functional next-generation Big Data solutions for government intelligence communities. S I T U A T I O N O V E R V I E W Today, intelligence agencies around the globe help enforce national and international security by collecting a wide range of data related to intelligence, surveillance, and reconnaissance (ISR). In fact, ISR functions have become one of the main elements of global defense capabilities. Important data is collected from a wide variety of systems, ranging in size from ground sensors and handheld devices to airborne reconnaissance platforms and orbiting satellites. These devices acquire a vast amount of information that needs to be processed to inform national security decision makers and military commanders of critical, real-time situational awareness that is often the difference between operational success or failure. In general, national systems collect information that needs to be tracked by special national intelligence agencies, while "tactical" systems gather data to support military commanders on the battlefield. Video data is becoming increasingly valuable because intelligence operatives need to be able to detect, predict, and react to ongoing threats. But as more digital data pours in, the ability to deal with that data becomes a challenge from a storage and throughput standpoint. Intelligence communities rely on high-performance computing (HPC) to "crunch the numbers" and help them make sense of the vast quantities of data they collect. In fact, today's HPC systems, often built with massively parallel arrays of very powerful processors, are able to handle so much data that other bottlenecks have emerged. Computer systems also have to be able to match the data throughput required by HPC systems. This is one of the ongoing challenges when architects design HPC systems for intelligence operations. T h e N e e d f or S p e e d, t h e N e e d f o r C l o u d Of paramount importance when architecting HPC systems is the ability to develop efficient and highly evolved solutions that provide advanced levels of speed, replication, data integrity, and reliability. This technology is built into systems that are set up to import, store, manage, and provide nearly instant access to surveillance video and other data formats. Page 2 #GI232292 2011 IDC Government Insights
Meanwhile, this is all happening at a time when cloud computing is evolving as one of the fastest-growing parts of the computer industry. Cloud solutions offer end users a way to tap into many types of highly efficient computing services in a very rapid way. This intersection presents a new opportunity for defense and security organizations that need to deal with Big Data sets associated with ISR. They can find the functionality they need in a set of systems that are hosted and managed by third-party providers that are experts in these types of technologies. Commercial cloud-based storage, as an online solution, has been around for a few years. But a new type of high-end cloud storage is now available that also lets clients streamline their high-end data flow, including video surveillance systems. In fact, the most advanced versions of these solutions can even pull data in directly from IP cameras as they record. And because the solutions are cloud based, they allow multiple facilities, including computing centers and monitoring facilities, to have access to both live and archived video resources. Also important to the ISR community is that cloud-based solutions of this type provide storage redundancy, failover capability, and built-in disaster recovery functions. Cloud-based solutions also are able to meet the unique needs of the ISR community. Intelligence operations have unpredictable data needs, which can change rapidly. The groups need systems that can quickly scale, add additional storage, and be made available to new computing resources as needed. This type of rapid and reliable availability makes a big difference for ISR. Another big issue for systems dedicated to national security is the need for new upgrades and investments in system improvements. Again, cloud-based systems capable of supporting rapid expansion and upgrades help meet this demand. L e v e r a g i n g t h e D D N W e b O b j e c t S c a l e r DataDirect Networks (DDN) is one of the world's largest providers of advanced information storage solutions, along with high-end processing capabilities and special IT services connected to highperformance computing. It supports "high growth" IT environments capable of full-system scalability, including cloud and grid computing. The company's specialty is scalable datacenters for mission-critical environments, including high-end intelligence operations. DDN offers an integrated cloud solution called the Web Object Scaler (WOS), which delivers large (global) capacity for content distribution and the capability to grow to billions of files and petabytes of storage. 2011 IDC Government Insights #GI232292 Page 3
WOS is a new technology designed to provide a content-addressable storage solution (information that can be retrieved based on its content). It provides very large (dozens to hundreds of PB) storage clouds and offers the following differentiators: Internet-scale performance. WOS clouds can deliver millions of random files per second with extremely low latency, helping organizations reduce or eliminate the need for content delivery networks. Single global namespace. All objects reside within a single namespace spanning all storage nodes and zones. It's not necessary to manage multiple file systems, RAIDs, LUNs, or SANs. Policy-based content distribution. Defined policies govern which geographic locations and/or tiers of the WOS cloud should store each file. Large file counts. The solution is capable of storing hundreds of billions of files and does not require multiple file systems across multiple storage systems. Integrated management. The solution is managed as a single Web-based entity, regardless of size or geography. It can be set up in minutes. Full resilience. The solution is fully distributed with no single points of failure or bottlenecks. Functions and features that are of particular interest to the ISR community include the following: The ability to import new objects into the global namespace while maintaining the integrity and coherency of the namespace, even if a specific storage node is not connected to the network Providing access to objects through a RESTful API, eliminating the need for applications to have any object location awareness (Representational state transfer [REST] is a style of software architecture for distributed hypermedia systems.) The ability to deliver millions of file reads per second From a storage perspective, the geographically dispersed cloud creates a common storage pool for content ingest, storage, and delivery. Regardless of the number of copies of an object that exist across the cloud, a single common object identifier (OID) will retrieve it from the location with the lowest latency. New objects created at any locality will automatically be copied to all other locations based on the replication policy rules. Data can be replicated across up to four sites (extended in upcoming releases) based on policy. Page 4 #GI232292 2011 IDC Government Insights
A M u l t i v e n d o r S y s t e m DDN worked with YottaStor and Pixia Corporation to develop an integrated storage cloud solution specifically for the ISR community. Features of this solution include the following: Global storage cloud presence Supports data ingest rates of 15 20GBps at a network's operational edge, such as a war zone (This is enough to support most types of next-generation sensors.) Capable of supporting thousands of concurrent analysts accessing data at full-motion video speeds (32 frames per second) Provides data life-cycle management and data migration tools Today's ISR platforms are currently capturing terabytes and even petabytes of data. But it's not possible to move that level of data efficiently over gigabit-speed networks. The integrated system addresses this challenge by deploying the WOS storage cloud in specialized containers, collecting the data, and then moving the containers, filled with data, to secure locations supporting the necessary bandwidth to allow analysts to access the collected information including access to the WOS distributed global namespace capability. P o s s i b l e C o s t A d v a n t a g e s In recent years, organizations that used large storage networks have been able to tune their architectures to achieve small but measurable cost savings, often through automated orchestration and improved software licensing terms and by negotiating hourly rate reductions from systems integrators. But as the amount of content being acquired continues to grow, such incremental cost savings can be displaced by the growing complexity of storage environments. Long-term savings often can be accomplished only by migrating to a new architecture designed to meet the requirements of extremely large and globally based data sets. Significant savings also can be achieved by eliminating third-party products often used to support very large storage pools and by eliminating overall system complexity. 2011 IDC Government Insights #GI232292 Page 5
F U T U R E O U T L O O K The recently released WOS 2.0 includes the following key enhancements: Expanded support for industry-standard interfaces and protocols including Network File System (NFS), Amazon S3, Web-based Distributed Authoring and Versioning (WebDAV), and ios (Apple's mobile operating system for smartphone and tablet clients) Asynchronous replication for enhanced performance Erasure code based declustered data protection to lower storage costs A complete cloud storage platform including multitenancy and billing support for service providers and iphone and ipad clients for drop-in file access Future versions of WOS will introduce additional accessibility and data management features designed to make the platform usable by an expanding collection of service providers, Big Data application environments, and rich content organizations. Application portability will be enhanced with support for new file and cloud storage protocols, giving integrators and customers a broad array of choice when deploying and consolidating applications onto a hyperscalable cloud storage infrastructure. Scalability and performance envelopes will be expanded, and future applications will further exploit WOS object metadata to search for patterns and linkages within the massive amounts of stored data, assisting analysts in "connecting the dots." With WOS, intelligence agencies and defense organizations will continue to cost-effectively store multiple petabytes of satellite imagery, remote sensing data, telecommunications, Web traffic, and video surveillance so that geographically distributed analysts can access and exploit that data from anywhere in the world. C o p y r i g h t N o t i c e Copyright 2011 IDC Government Insights. Reproduction without written permission is completely forbidden. External Publication of IDC Government Insights Information and Data: Any IDC Government Insights information that is to be used in advertising, press releases, or promotional materials requires prior written approval from the appropriate IDC Government Insights Vice President. A draft of the proposed document should accompany any such request. IDC Government Insights reserves the right to deny approval of external usage for any reason. Page 6 #GI232292 2011 IDC Government Insights