White Paper. Version 1.2 May 2015 RAID Incorporated

Similar documents
With DDN Big Data Storage

Big Data 101: Harvest Real Value & Avoid Hollow Hype

Data Centric Computing Revisited

How To Handle Big Data With A Data Scientist

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

Make the Most of Big Data to Drive Innovation Through Reseach

We are Big Data A Sonian Whitepaper

White Paper: SAS and Apache Hadoop For Government. Inside: Unlocking Higher Value From Business Analytics to Further the Mission

Data-Driven Decisions: Role of Operations Research in Business Analytics

ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V

How To Understand The Benefits Of Big Data

NetApp Big Content Solutions: Agile Infrastructure for Big Data

HGST Object Storage for a New Generation of IT

Data Centric Systems (DCS)

Scala Storage Scale-Out Clustered Storage White Paper

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

Big Workflow: More than Just Intelligent Workload Management for Big Data

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

Business-centric Storage FUJITSU Hyperscale Storage System ETERNUS CD10000

Big Data and Healthcare Payers WHITE PAPER

Big Data at Cloud Scale

Understanding the Value of In-Memory in the IT Landscape

BIG DATA TECHNOLOGY. Hadoop Ecosystem

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

How To Make Data Streaming A Real Time Intelligence

ORACLE UTILITIES ANALYTICS

IBM Global Technology Services September NAS systems scale out to meet growing storage demand.

HadoopTM Analytics DDN

ANALYTICS BUILT FOR INTERNET OF THINGS

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Reducing Storage TCO With Private Cloud Storage

Successful Outsourcing of Data Warehouse Support

TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP

T a c k l i ng Big Data w i th High-Performance

Ali Eghlima Ph.D Director of Bioinformatics. A Bioinformatics Research & Consulting Group

Business white paper. Lower risk and cost with proactive information governance

Considerations for Research Data Management

ANY THREAT, ANYWHERE, ANYTIME Scalable.Infrastructure.to.Enable.the.Warfi.ghter

Scale-out NAS Unifies the Technical Enterprise

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

The Next Wave of Data Management. Is Big Data The New Normal?

The Ultimate in Scale-Out Storage for HPC and Big Data

Data Refinery with Big Data Aspects

Protecting Big Data Data Protection Solutions for the Business Data Lake

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

The Future of Data Management

IBM: An Early Leader across the Big Data Security Analytics Continuum Date: June 2013 Author: Jon Oltsik, Senior Principal Analyst

Hadoop Cluster Applications

From Terabytes to Exabytes, A paradigm Shift in Big Data Modeling, Analytics and Storage management for Healthcare and Life Sciences Organizations

The Six A s. for Population Health Management. Suzanne Cogan, VP North American Sales, Orion Health

BIG DATA SURVEY 2014 SURVEY

Center for Dynamic Data Analytics (CDDA) An NSF Supported Industry / University Cooperative Research Center (I/UCRC) Vision and Mission

Massive Cloud Auditing using Data Mining on Hadoop

The Rise of Industrial Big Data. Brian Courtney General Manager Industrial Data Intelligence

RevoScaleR Speed and Scalability

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

Beyond the Single View with IBM InfoSphere

Surak Thammarak. Advisory Systems Engineer EMC

WHITEPAPER. A Technical Perspective on the Talena Data Availability Management Solution

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Hadoop. Sunday, November 25, 12

EMC s Enterprise Hadoop Solution. By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst

Big data: Unlocking strategic dimensions

Big Data & Analytics for Semiconductor Manufacturing

ANY SURVEILLANCE, ANYWHERE, ANYTIME

How To Use Hp Vertica Ondemand

SGI HPC Systems Help Fuel Manufacturing Rebirth

Interactive data analytics drive insights

Big Data Challenges. technology basics for data scientists. Spring Jordi Torres, UPC - BSC

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

In-Memory Analytics for Big Data

Building your Big Data Architecture on Amazon Web Services

NextGen Infrastructure for Big DATA Analytics.

Virtual Data Warehouse Appliances

Annex: Concept Note. Big Data for Policy, Development and Official Statistics New York, 22 February 2013

Unisys ClearPath Forward Fabric Based Platform to Power the Weather Enterprise

Transforming the Telecoms Business using Big Data and Analytics

Big data management with IBM General Parallel File System

SUSTAINING COMPETITIVE DIFFERENTIATION

APICS INSIGHTS AND INNOVATIONS EXPLORING THE BIG DATA REVOLUTION

Easier - Faster - Better

I D C A N A L Y S T C O N N E C T I O N. T h e C r i t i cal Role of I/O in Public Cloud S e r vi c e P r o vi d e r E n vi r o n m e n t s

Cloud-based data warehousing to power aviation analytics

G-Cloud Big Data Suite Powered by Pivotal. December G-Cloud. service definitions

Secure Cloud Computing Concepts Supporting Big Data in Healthcare. Ryan D. Pehrson Director, Solutions & Architecture Integrated Data Storage, LLC

Laurence Liew General Manager, APAC. Economics Is Driving Big Data Analytics to the Cloud

Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper

Big Data Trends A Basis for Personalized Medicine

WHITE PAPER SPLUNK SOFTWARE AS A SIEM

BUILDING A SCALABLE BIG DATA INFRASTRUCTURE FOR DYNAMIC WORKFLOWS

Switching Architectures for Cloud Network Designs

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

Fibre Channel Overview of the Technology. Early History and Fibre Channel Standards Development

Collaborations between Official Statistics and Academia in the Era of Big Data

Big Data Tools: Game Changer for Mainstream Enterprises

BIG DATA-AS-A-SERVICE

Transcription:

White Paper Version 1.2 May 2015 RAID Incorporated

Introduction The abundance of Big Data, structured, partially-structured and unstructured massive datasets, which are too large to be processed effectively by conventional data processing methods, holds the potential for high value analytical insight never before available. Mining Big Data is an immediate, critical challenge for many businesses and research organizations. Analysis of those data creates new data sets and metadata that must be stored to be acted upon. A new generation of high performance approaches to modeling, optimizing, text mining and statistical analysis are required to turn these terabytes, petabytes and even exabytes of information to actionable analyses. Big data may be spun from scientific and engineering applications, genomic research, biometrics, weather data, financial and consumer information collection, security logs, rich media recognition devices and sources yet to be explored. Streaming from social media and internet feeds account for another almost limitless source of data. Life Sciences and healthcare industries are at the forefront of finding value in the storage and manipulation of Big Data. Biometric data collected over large populations are often key to predicting, identifying, preventing and/or treating disease. More and more, executive decision makers in diverse industries are recognizing the potential within the data they have, or could collect. The need for data visualization is also increasing in importance. The use of charts and graphics can make large data sets understandable in a small, graphical space and enable comparisons and conclusions. It can also help bridge the gap between data scientists and business leaders. Innovation and competitive positioning demands real-time analysis and reporting to provide trending and predictive data for decision makers. As datasets grow, legacy software, hardware and transmission techniques can no longer meet these demands rapidly enough. The term technical computing has been coined to describe approaches that use new mathematical and scientific computing principles to manipulate and analyze huge data sets and provide useful answers to users who are outside of those disciplines. Big Data Analytics Demands Faster, Secure Processing From DNA research to biometrics, technical computing has contributed to the wealth of knowledge about heath, disease and heredity. DNA sequencing was time consuming and costly in just the recent past. New advances in technical computing have made this a viable tool for patient diagnosis and treatment. it took 13 years to map the first genome, at a cost of several billion dollars. In less than 10 years, the time and cost of DNA sequencing of another genome was reduced by a factor of 1 million today, your or my personal genome could be mapped in just a few days for a few thousand dollars. [Forbes]

Businesses are using financial and consumer information collected across their customer base both from terminals and websites to better understand the trends and cycles of their market and better align their offerings and services. The ability to identify trends and perform accurate forecasting is critical to businesses positioning themselves both tactically and strategically. This analysis must be cost effective, meaningful and available within a time frame to be useful. For some industries, such as market financial trading, microsecond time frames can make the difference between success and failure! Meteorology is a scientific field that uses massive amounts of data in real time; another excellent example of the need for processing speed. Scientists use meteorological data collected over wide areas to predict the weather and are critical to the development and deployment of response plans for the general public, as well as governmental agencies and businesses that depend on such warnings. Predictions must be developed and delivered quickly enough to permit proactive response. A weather forecast for yesterday is not useful. Nearly all areas of research need analytics that are faster, more powerful and cheaper -- to address massive data sets growing geometrically. Data Scientists who utilize public or general-purpose cloud resources for capacity and scalability quickly realize how data transfer rates pose a major limitation. Many have moved back to private HPC, whether cloud or purpose-built infrastructure, to surpass these limits. Data must be moved from primary sources to multiple researchers quickly for any time sensitive analysis. New approaches in computing architecture, both hardware and software, are constantly being developed to address these mushrooming data demands while providing the scalability and performance required. Simultaneously, high value data must be protected. Whether the requirement is high availability In the face of hardware or infrastructure failure, or reliable and immediate retrieval of archived data, systems must be designed to accommodate these requirements. Data scientists must also protect data from intrusion, theft, or malicious corruption. Due to the sensitivity of the subject matter involved in many areas of research, privacy, security and regulatory compliance are factors that drive decisions away public and shared cloud environments and towards private cloud and protected infrastructure. Big data analytics can sometimes reduce a greater mass of data by extracting relevant information to be analyzed and producing high value metadata when the original information pool is too large to manipulate in a timely fashion or at a reasonable cost. Big Data Requires Parallel Application and File Systems Legacy systems tended to be centralized and involved serial processing of data. A new approach to processing Big Data is required. To scale beyond what a single system is able to accomplish, parallel processing requires decentralization for optimal results and performance. Huge improvements in

performance have been achieved across these networked processors and storage disks using parallel application and file systems (GPFS) offering almost unlimited scalability. When a Big Data analytics infrastructure is needed, usually termed high performance computing (HPC), advanced systems utilize clusters of computers to address the complex operations required of technical computing. These clusters can contain thousands of individual computers; whatever is required to accomplish the analytical processing in the timeframe required. HPC also requires the fastest, low-latency, high-bandwidth networks. This infrastructure also demands both fast and high bandwidth shared storage access to all of the individual computes in the cluster. Private cloud architectures abstract some of the physical infrastructure, allowing more flexible workloads and burstable computing to public cloud architectures when required. Conclusion Whether your current infrastructure no longer meets your needs, or public cloud solutions bandwidth limitations are unrealistic for your research, you may need to look for a responsive provider who can draw upon significant resources, security expertise to help design and implement a solution. A provider who is large enough to offer you the efficiencies of scale but still able to provide a dedicated customer-centric focus. Most in-house IT personnel don t have the skills or experience needed to architect, build and operate an infrastructure that will scale to your Big Data analytic needs and provide for future expansion. Every component of your High Performance Computing infrastructure must align with the overall needs of the system and work seamlessly with your software ecosystem clustered computers, specialty computing resources, shared storage systems, high-bandwidth network and inter-process communications, switching and security. HPC ecosystems require support by critical and specialized skills that can often be provided or supplemented through a partnership with an experienced provider. RAID, Inc brings two key components to the table as a vendor speaking the language of researchers and understanding how to translate a scientific problem into a computational process, and a broad based level of experience with best of breed products which can be used as building blocks to create a custom optimized infrastructure for supporting that computational process. Whether we are designing to optimize for a limited budget, a performance sensitive application, or a large scale flexible computing platform to support a diverse faculty or audience, RAID has both directly applicable experience and extensive current industry knowledge and can (will) work with a customer to address THEIR needs.

References http://www.hrgresearch.com/high%20performance%20computing.html http://www.raidinc.com/high-performance-computing-for-big-data/ Vanacek, Jacqueline. April 16, 2012. How Cloud and Big Data are Impacting the Human Genome - Touching 7 Billion Lives. Forbes.

http://www.forbes.com/sites/sap/2012/04/16/how-cloud-and-big-data-are-impacting-the-human-ge nome-touching-7-billion-lives/