Make the Most of Big Data to Drive Innovation Through Reseach

Similar documents
NetApp Big Content Solutions: Agile Infrastructure for Big Data

Solving Agencies Big Data Challenges: PED for On-the-Fly Decisions

Easier - Faster - Better

Driving Down the High Cost of Storage. Pillar Axiom 600

Introduction to NetApp Infinite Volume

IBM System x reference architecture solutions for big data

HadoopTM Analytics DDN

With DDN Big Data Storage

ANY SURVEILLANCE, ANYWHERE, ANYTIME

Cisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage

High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances

Protecting Information in a Smarter Data Center with the Performance of Flash

Big data management with IBM General Parallel File System

Modern IT Operations Management. Why a New Approach is Required, and How Boundary Delivers

Deploying Flash in the Enterprise Choices to Optimize Performance and Cost

TOP 5 REASONS WHY FINANCIAL SERVICES FIRMS SHOULD CONSIDER SDN NOW

BUILDING A SCALABLE BIG DATA INFRASTRUCTURE FOR DYNAMIC WORKFLOWS

Solution Brief Network Design Considerations to Enable the Benefits of Flash Storage

The HP Neoview data warehousing platform for business intelligence Die clevere Alternative

IBM PureFlex System. The infrastructure system with integrated expertise

Cisco UCS and Quantum StorNext: Harnessing the Full Potential of Content

An Oracle White Paper November Backup and Recovery with Oracle s Sun ZFS Storage Appliances and Oracle Recovery Manager

Redefining Infrastructure Management for Today s Application Economy

Building & Optimizing Enterprise-class Hadoop with Open Architectures Prem Jain NetApp

Flash Memory Arrays Enabling the Virtualized Data Center. July 2010

Enabling High performance Big Data platform with RDMA

Enhance visibility into and control over software projects IBM Rational change and release management software

Business-centric Storage FUJITSU Hyperscale Storage System ETERNUS CD10000

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA

WHITE PAPER. Building Blocks of the Modern Data Center

Optimizing Storage for Better TCO in Oracle Environments. Part 1: Management INFOSTOR. Executive Brief

EMC: Managing Data Growth with SAP HANA and the Near-Line Storage Capabilities of SAP IQ

Dell* In-Memory Appliance for Cloudera* Enterprise

Building a Scalable Big Data Infrastructure for Dynamic Workflows

Virtual Data Warehouse Appliances

Fabrics that Fit Matching the Network to Today s Data Center Traffic Conditions

Application Deployment Experts

Greater Continuity, Consistency, and Timeliness with Business Process Automation

Online Transaction Processing in SQL Server 2008

Databricks. A Primer

Colgate-Palmolive selects SAP HANA to improve the speed of business analytics with IBM and SAP

Building your Big Data Architecture on Amazon Web Services

IBM DB2 Near-Line Storage Solution for SAP NetWeaver BW

Title. Click to edit Master text styles Second level Third level

IBM PureApplication System for IBM WebSphere Application Server workloads

Load DynamiX Storage Performance Validation: Fundamental to your Change Management Process

I D C V E N D O R S P O T L I G H T. S t o r a g e Ar c h i t e c t u r e t o Better Manage B i g D a t a C hallenges

can you effectively plan for the migration and management of systems and applications on Vblock Platforms?

Accelerating the path to SAP BW powered by SAP HANA

IBM Analytics. Just the facts: Four critical concepts for planning the logical data warehouse

T a c k l i ng Big Data w i th High-Performance

Essential Elements of an IoT Core Platform

Advanced Core Operating System (ACOS): Experience the Performance

Accenture Human Capital Management Solutions. Transforming people and process to achieve high performance

Proactive Performance Management for Enterprise Databases

SQL Server 2012 Parallel Data Warehouse. Solution Brief

BUSINESS INTELLIGENCE ANALYTICS

Choosing the Right Project and Portfolio Management Solution

How To Use Hp Vertica Ondemand

Product Brief SysTrack VMP

SAP HANA PLATFORM Top Ten Questions for Choosing In-Memory Databases. Start Here

Application Visibility and Monitoring >

solution brief September 2011 Can You Effectively Plan For The Migration And Management of Systems And Applications on Vblock Platforms?

Scala Storage Scale-Out Clustered Storage White Paper

WHITE PAPER. Easing the Way to the Cloud:

Consolidate and Virtualize Your Windows Environment with NetApp and VMware

CA Service Desk On-Demand

The Business Case for Using Big Data in Healthcare

NetApp High-Performance Computing Solution for Lustre: Solution Guide

EMC XtremSF: Delivering Next Generation Performance for Oracle Database

Benefits of an ITIL Help Desk in the Cloud

How To Use All Flash Storage In Education

Taming Big Data Storage with Crossroads Systems StrongBox

Netapp HPC Solution for Lustre. Rich Fenton UK Solutions Architect

Tap into Big Data at the Speed of Business

Brocade Network Monitoring Service (NMS) Helps Maximize Network Uptime and Efficiency

RAID for the 21st Century. A White Paper Prepared for Panasas October 2007

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

Using In-Memory Computing to Simplify Big Data Analytics

NEC s Carrier-Grade Cloud Platform

How To Improve Your Communication With An Informatica Ultra Messaging Streaming Edition

Maximum performance, minimal risk for data warehousing

How To Protect Data On Network Attached Storage (Nas) From Disaster

Use product solutions from IBM Tivoli software to align with the best practices of the Information Technology Infrastructure Library (ITIL).

Transcription:

White Paper Make the Most of Big Data to Drive Innovation Through Reseach Bob Burwell, NetApp November 2012 WP-7172 Abstract Monumental data growth is a fact of life in research universities. The ability to rapidly access and process large datasets has provided new breakthroughs in research projects across the sciences, but it is taxing the resources of IT organizations. A new infrastructure is needed to accommodate the increasing demands of research projects that generate big data, but it requires a solution that simplifies the IT complexity of high-performance computing in today s data-driven world.

TABLE OF CONTENTS 1 The Importance of University Research... 3 1.1 Funding and Budgets... 3 1.2 People... 3 1.3 Support for New Projects... 3 2 Technology Requirements for Today s Research... 4 2.1 Research Creates Big Data... 4 2.2 Capitalize on Research Grants to Better Utilize Existing Assets... 4 2.3 Information Sharing with Broad Access to Data... 5 3 Time to Think Differently, Not Historically... 5 3.1 Cost Management... 6 3.2 Staff Productivity... 7 3.3 Operational Productivity... 7 4 NetApp High-Performance Storage Rack... 7 5 NetApp and University Research... 8 6 Summary... 9 LIST OF FIGURES Figure 1) Research universities rely on funding from a variety of sources.... 3 Figure 2) Where is your infrastructure breaking?... 4 Figure 3) Internet2 extends shared services and collaboration across research institutions.... 5 Figure 4) High-performance computing benefits.... 6 Figure 5) Data growth impact on IT.... 6 Figure 6) Custom deployment versus NetApp HPS Rack.... 8 2 Make the Most of Big Data to Drive Innovation through Research

1 The Importance of University Research When was the last time you thought about how innovation in medicine, technology, energy, and science was developed? Although not all research takes place at universities, higher education is contributing a growing percentage of all basic research that drives innovation, passing the results to the private sector to help enable long-term economic growth, creating jobs and improving living standards in the process. Universities offer a broad spectrum of basic and applied research activities across the sciences, including medical, engineering, agricultural, natural, and humanities and social sciences. Without research universities, many new innovations and breakthroughs in science might never happen. And today s technology advancements play a key role in the outcome of successful research projects. 1.1 Funding and Budgets A university s reputation is often a critical factor in attracting faculty and students, as well as securing funding for research projects. Faced with continued pressures from a fragile economy, research universities rely on a variety of sources, including government, private, and industry sponsors, to fund new research initiatives. However, in light of ongoing proposed budget cuts, research departments must continually evaluate alternative technology approaches to drive increased efficiencies from reduced IT budgets while maintaining a high quality of research. Figure 1) Research universities rely on funding from a variety of sources. Endowments Annual Giving Research Funding Sources Government Grants Business/Industry Grants A growing source of research funding now comes from private industry. Early-stage research performed at universities is key to innovation that can then be leveraged by the private sector to develop practical solutions, such as new medical procedures, drugs, technology, green initiatives, and other innovations that improve how people live and work. 1.2 People Research is often a critical factor in ranking the top educational institutions. And the best research universities are able to recruit top students worldwide to pursue an education at their institution. In addition, faculty distinction (published research, faculty awards), the number of doctorates awarded, the number of postdoctoral appointments supported, and the number of patents held all contribute to how a university s research program is regarded. However, to maintain and grow successful research programs, faculty and students require access to state-of-the art research laboratories that provide technology for efficient access, collaboration, and analysis of data. 1.3 Support for New Projects Advancement in technology now makes it possible to accelerate the pace of research. However, in order to maximize funding by new research grants, university IT organizations must be able to quickly allocate resources to support new projects while managing the explosion of data with even fewer resources. In short, IT can make or break the opportunity to capitalize on new grants. 3 Make the Most of Big Data to Drive Innovation through Research

2 Technology Requirements for Today s Research The right IT infrastructure provides the foundation for effective research programs. And with access to the many advancements in technology, university research has entered a new era of scale in which the amount of data collected, processed, and stored is taxing today s IT architectures. In order to manage these ever-increasing datasets and meet key research objectives, an underlying infrastructure must be in place that enables IT to store and manage the growing volumes of data while allowing researchers to quickly retrieve and easily share data. 2.1 Research Creates Big Data Not only is the volume of data increasing, but also the data objects themselves are getting bigger. In addition, the sophistication of the processing of analytic data has become a time-consuming, computerintensive exercise. Many departments are now reaching multiple terabytes of data or even billions of files, putting an enormous amount of scale pressure on existing infrastructures, especially the storage platform. Figure 2) Where is your infrastructure breaking? Big data is breaking today s storage infrastructure along three major axes, as illustrated in Figure 2. Complexity. Data is no longer just about text and numbers; it's about real-time events and shared infrastructure. The information is now linked, it is high fidelity, and it consists of multiple data types. Applying normal algorithms for search, storage, and categorization is becoming much more complex and inefficient. Speed. How fast is the data coming in? High-definition video, streaming media over the Internet to player devices, and slow-motion video for surveillance all have very high ingestion rates. Researchers have to keep up with the data flow to make the information useful. They also have to keep pace with ingestion rates to drive faster analysis. Volume. All collected data must be stored in a location that is secure and always available. With such high volumes of data, IT teams have to make decisions about what is too much data. For example, they might flush all data each week and start over the following week. But for many applications this is not an option, so more data must be stored longer without increasing operational complexity. This can cause the infrastructure to quickly break on the axis of volume. 2.2 Capitalize on Research Grants to Better Utilize Existing Assets Research grants are a fundamental part of university life, with individual grants generally running for two to three years. Legacy architectures can have as much as 10 times the storage actually needed even when factoring in the extra capacity required to support anticipated growth requirements across the university. Moving away from silo-based architectures enables universities to be more flexible, with the ability to quickly reallocate compute resources as new 4 Make the Most of Big Data to Drive Innovation through Research

grants are awarded. Combining both campus and research IT in the same data center enables IT consolidation and the potential for massive cost savings by running multiple workloads on the same hardware. A shared services model increases utilization by repurposing assets to new projects quickly. This allows higher-education institutions to reap benefits long term by repurposing equipment from grant funding to new projects. 2.3 Information Sharing with Broad Access to Data Ready access to big data, as well as collaboration with remote team members, requires a high-speed network for low latency and high throughput. Internet2 provides a unique set of global capabilities to member organizations, and it is specifically designed to meet the needs of researchers and educators. This includes a 100-gigabit-per-second network that not only delivers reliable production services for high-performance needs, but creates a powerful experimental platform for the development of new applications. Unconstrained bandwidth availability enables widespread applications development and delivers: A deeply programmable environment in which compute storage, visualization, and transport capabilities can all be driven by applications Solutions that overcome traditional bottlenecks, passing high-bandwidth traffic and allowing performance monitoring Figure 3) Internet2 extends shared services and collaboration across research institutions. 3 Time to Think Differently, Not Historically The remainder of this paper focuses on the advances in technology that have made high-performance computing (HPC) more affordable and extend the benefits beyond traditional scientific applications to a broad set of university and commercial HPC applications. In order to capitalize on the information that can be derived from the tremendous volume of data generated by research, more institutions are turning to high-performance storage solutions to effectively analyze this data and solve complex research problems. However, the resource demands of high-performance computing often exceed the expertise, staff availability, and process knowledge of many users, resulting in: Shortage of time, talent, and resources to create HPC storage configurations (months to architect, design, provide proof of concept, and install) 5 Make the Most of Big Data to Drive Innovation through Research

Lack of single system management (monitor, manage, analyze) Overhead of ongoing maintenance and support Figure 4) High-performance computing benefits. Performance Reliability Efficiencies! Get bigger jobs done in a reduced amount of time! Get more jobs done in the same amount of time! Minimize system downtime! Ensure data integrity and availability! Density - performance, capacity, and power! Scale of data management Deploying storage systems to support high-performance computing (HPC) environments can be challenging. Many in the high-performance computing world depend on parallel file systems such as Lustre to deliver necessary bandwidth and capacity; however, designing, testing, and deploying such a solution is complex and time consuming and so is ongoing management. Many more HPC users would likely employ parallel file systems if the barriers to deployment were lower. Figure 5) Data growth impact on IT. If you were storing 100TB of online data in 2010, you will store:! 1.1 PB in 2016 (11x)! 2.5 PB in 2018 (25x)! 5.8 PB in 2020 (58x) * Based on industry average 50% annual growth 3.1 Cost Management Operating Expense While the upfront cost of building a high-performance computing storage solution may seem attractive, when you add together the increase in time to results; the operating expenses due to complicated management, maintenance, and support costs; and the potential impact of lost productivity due to poor availability, operating expenses overshadow initial capital costs. 6 Make the Most of Big Data to Drive Innovation through Research

Scalability When it comes time to add performance and/or capacity, a custom design may not provide the results you want. Depending on the building blocks you choose, it can be difficult to scale in small increments, making scaling an expensive proposition. It may also be impossible to scale performance and capacity independently. 3.2 Staff Productivity Time to Deploy Although a custom parallel file system deployment may seem like the best option, the time and resources it takes to deploy the file system and achieve measurable results may simply be too long in many cases. The typical commercial HPC user may not have the time, talent, or resources it takes to create a balanced HPC storage configuration. It can take months to architect, design, and perform proof-ofconcept testing. Then it may take several more months to procure the necessary hardware and weeks to install, configure, test, and deploy. That s a long time to wait to start getting results. Complex Management A parallel file system deployment has a significant number of physical components, including metadata and object servers, storage systems, disks, interconnects, networks, and so on not to mention the file system software running on servers and clients. When you build your own solution, there are a lot of separate components to monitor and manage, and you will likely need different tools for each infrastructure element. This makes analyzing performance and troubleshooting problems much more difficult. As a result, you spend more time and it costs more to manage your storage. 3.3 Operational Productivity Reliability, Availability, and Data Integrity When you architect your own solution you have to be careful to eliminate possible points of failure and choose the right components; otherwise, reliability, availability, and data integrity may be affected. Poor availability slows results and can impact time to market. Maintenance and Support With a custom solution, maintenance and support mean dealing with multiple vendors. This adds to the cost and complexity of your storage system. Some vendors may not be able to provide round-the-clock support, potentially impacting availability. In addition, working with multiple vendors to resolve complex problems can prolong problem resolution. 4 NetApp High-Performance Storage Rack Since scientists conduct university research, not computer science engineers, students and faculty need access to technology that lets them get their job done without needing to learn new tools. NetApp has introduced the High-Performance Storage Rack (HPS Rack) to address the storage challenges of today s research universities. The HPS Rack is a fully integrated HPC storage solution designed to deliver performance, scalability, reliability, and ease of management. The high-performance file system is purpose-built for high-performance computing workflows. The solution leverages data collected over time to enable accurate planning, understand usage by user, and optimize overall system performance. 7 Make the Most of Big Data to Drive Innovation through Research

Built using proven NetApp E-Series storage, the HPS Rack integrates all the components of a successful Lustre file system deployment so you can have the HPC storage you need up and running in less than a day. Scaling is highly granular, delivering predictable increases in performance and/or capacity. A worldwide service and support network enables you to get the help you need when you need it. IT benefits from the ability to provide high-performance computing without the complexities of traditional HPC deployments. The HPS Rack provides investment protection in research applications with a storage system tuned for HPC workloads. And your existing IT staff can easily maintain the solution without requiring expertise in high-performance computing. Figure 6) Custom deployment versus NetApp HPS Rack. Build Your Own File System w/ Fast Block Array! Unpredictable drops! High overhead, manual Architect & Design POC Procure 6-9 Months Install & Config Test & Deploy Create Tools X Iterate Tools Continuous X! Tune High-Performance Storage Rack! Time to results faster! Tools to manage, optimize, automate POC Procure Deploy & Provision Optimize for Applications 1-3 Months Job Scheduler 5 NetApp and University Research As part of its innovation strategy, NetApp supports innovative research in the academic community. As part of the CTO, the NetApp Advanced Technology Group (ATG) is responsible for maintaining many academic research relationships through sponsorships, consortium memberships, and direct collaborations. NetApp Faculty Fellowship Program The ATG has established the NetApp Faculty Fellowship (NFF) Program to fund innovative research on data storage and related topics. The goals of this program are to encourage leadingedge research in storage and data management and to foster relationships among academic researchers and engineers and researchers at NetApp. NetApp Faculty Fellowships are one-time grants, typically covering a year of funding for a graduate student working with the principal investigator on the proposed research. Grants are not restricted to this format, however, and NetApp ATG will consider proposals for different situations and durations. 8 Make the Most of Big Data to Drive Innovation through Research

NetApp Academic Alliance Program To help prepare college graduates for the fast-changing IT landscape, NetApp has launched the NetApp Academic Alliances Program. Through this unique program, NetApp will collaborate with some of the nation s leading colleges and universities to provide a rich library of teaching materials and resources that faculty can use to help students develop highly marketable storage-related IT skills. NetApp Education Donation Program NetApp offers an Education Donation Program that provides higher education and K 12 schools with the opportunity to receive a new NetApp FAS3100 storage system. This program enables NetApp to partner with schools across the country to help them achieve the efficiency and flexibility required to maximize their IT infrastructure and deliver on their educational objectives. The donations are part of a long-term investment to improve education IT. 6 Summary NetApp helps research universities streamline high-performance computing deployments with a preconfigured, pretested storage solution. The NetApp HPS Rack solution eliminates the barriers to entry, allowing universities to benefit from the power of high-performance computing to solve complex research problems. Maximize budget. The NetApp solution leverages real-time data collection to enable applicationdriven tuning. You benefit from an infrastructure that delivers maximum storage capacity with excellent performance, enabling you to achieve faster results, improve efficiency, and reduce operational costs. Maximize people. With its data collection capabilities, HPS Rack simplifies management by combining different views that enable administrators to track metrics of the fast data storage services, including bandwidth usage, capacity consumption, client node access, and specific job resource usage. With this data, IT can better understand and optimize the storage infrastructure and perform data-driven capacity, throughput, utilization analysis, scheduling optimization, and complete system management. Fast-track support for new research projects. The HPS Rack delivers an integrated storage solution for high-performance computing that is easy to deploy and manage and makes it easy to analyze performance and capacity. The prepackaged and preconfigured storage solution scales as storage requirements grow, resulting in faster time to results with lower TCO. 9 Make the Most of Big Data to Drive Innovation through Research

NetApp provides no representations or warranties regarding the accuracy, reliability, or serviceability of any information or recommendations provided in this publication, or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS, and the use of this information or the implementation of any recommendations or techniques herein is a customer s responsibility and depends on the customer s ability to evaluate and integrate them into the customer s operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document. 2012 NetApp, Inc. All rights reserved. No portions of this document may be reproduced without prior written consent of NetApp, Inc. Specifications are subject to change without notice. NetApp, the NetApp logo, and Go further, faster are trademarks or registered trademarks of NetApp, Inc. in the United States and/or other countries. All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such. WP-7172-1112 10 Make the Most of Big Data to Drive Innovation through Research