1 IT and Storage for Big ata Analytics Randy Kerns Senior Strategist valuator Group
2 verview Big data can mean two different things - Storage for large amounts of data - Analytics against very large amounts of data Usually from machine-tomachine data - Called pervasive computing So, what does this mean for storage?
3 What It Means for IT
4 The Storage Way to Say Big ata efined by architectural platform, big data storage is: Scale-out AS Global amespace File System AS gateway to SA and Scale-out SA efined by application, big data storage is: Storage for applications that handle large files and requires performance Storage for extremely large number of files xamples: Media & entertainment, oil & gas exploration, life sciences, etc.
5 The Analytics Way to Say Big ata Big data analytics is: - A term for business intelligence (BI) processes that are different from traditional data warehousing - The ability to tap unstructured data as a source for BI processes - Information delivered to users in real or near real-time (but not an absolute requirement) - Convergence of multiple data sources Latency introduced by storage, including networked storage, is often assiduously avoided Cost is minimized
6 ata Analytics Model Customer Profiles osql B HFS Logs, Tweets Location High Scale ata Reductions Predictions on Buying Behavior BI and Analytics PS Batch Low Latency 3) Input Into xpert System 4) Real-time: etermine Best ffer For This User 2b) Lookup Location osql B 2a)Lookup User Profile 1) Identify User
7 Why Should Storage Professionals Care? istributed computing for analytics (Hadoop, for example) is moving from science experiment to mission-critical As this happens, data encompassed by these applications becomes the responsibility of people who worry about: - Security - ata protection/disaster recovery/business continuance - ata governance and compliance - igital records management and archiving
8 Shared Storage for the Traditional ata Warehouse Archive LTP Files / XML data Log Files perational xtract, Transform, Load (TL) ata Warehouse Schedules Ad hoc Queries Reports ashboards otifications
9 istributed, Shared-othing Architectures for Big ata Analytics etwork Layer B8GMR3 1 Link 2 3 Link 4 5 Link 6 7 Link 8 Pwr Console Compute Layer C T R L n Storage Layer AS AS AS AS AS
10 CAP Theorem It is impossible for a distributed computer system to simultaneously provide all three of the following guarantees: - Consistency (all nodes see the same data at the same time) - Availability (a guarantee that every request receives a response about whether it was successful or failed) - Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system) A distributed system can satisfy any two of these guarantees at the same time, but not all three
11 Issue for IT How to store information for big data - How much data is there????? - Where did this idea come from? What are the requirements Is it from analytics operations - Store original data capture in flight as part of the analytics operation? - Store as secondary process? - on t save anything, except results? What about Rental ata?
12 Shared Storage as Secondary Storage Is there a place for shared storage in shared-nothing? If so, what does it look like? etwork Layer Compute Layer B8GMR3 C T R L 1 Link 2 3 Link 4 5 Link 6 7 Link Pwr Console n Storage Layer SA/AS
13 Shared Storage as Primary Storage etwork Layer B8GMR3 1 Link 2 3 Link 4 5 Link 6 7 Link 8 Pwr Console Compute Layer C T R L n Storage Layer SA or AS, but more commonly Scale-out AS
14 Shared Primary/Secondary Storage Advantages - Can reduces latency for queries that span nodes - nhances system availability - Addresses the enterprise storage requirements Security ata protection/disaster recovery/business continuance ata governance and compliance igital records management and archiving isadvantages - Additional cost - Crosses a cultural boundary
15 Why ot Shared Storage?
16 Big ata Storage for Big ata Analytics Shared storage as secondary storage for big data analytics - ata Protection, atabase of Record, Archive - xamples: etapp and ParAccel, MC ata omain/vmax and Greenplum, RainStor Shared storage as primary storage for big data analytics - xamples: Calpont, Red Hat Gluster, IBM GPFS, exenta ZFS, Hadoop nodes in Virtual Machines
17 Is Hadoop a Storage evice? - It s a distributed computing platform YS - 1K node cluster w/ 1TB RAM per node = 1PB of very high performance storage - ata protection built-in (multiple data copies but not RAI) - HFS - mbedded, distributed file system (like scale-out AS)
18 HFS Hadoop File System Very large istributed File System (FS) 10K nodes, 100 million files, 10 PB Uses standard servers with direct attached storage Files are replicated to handle hardware failure 3 copies etect failures and recovers from them ptimized for batch processing ata locations exposed so that computations can move to where data resides Provides very high aggregate bandwidth Runs in user space - heterogeneous S
19 Hadoop File System on Standard Servers Source: Matt Foley
20 Typical Hadoop Configuration etwork Layer B8GMR3 1 Link 2 3 Link 4 5 Link 6 7 Link 8 Pwr Console Compute Layer C T R L n Storage Layer AS AS AS AS AS
21 Hadoop Key Milestones ec 2004 Google GFS paper published July 2005 MapReduce first used Feb 2006 Becomes Lucene subproject Apr 2007 Yahoo! on 1000-node cluster Jan 2008 Apache Top Level Project May 2009 Hadoop sorts a Petabyte in 17 hours Aug 2010 World s largest Hadoop cluster at Facebook nodes Petabytes
22 valuating Hadoop as a Storage evice Snapshots? Scale capacity and performance concurrently? SS and automated tiering? edupe? Insert your hot-button storage feature here:
23 valuating Hadoop as a Storage evice
24 IT and Big ata Analytics There will be big data Circumstances may vary. and change Participate early - ata scientists may not have same concerns or requirements - ecisions can limit choices Understand options - Products / software
Microsoft System Center 2012 R2 Why Microsoft? For Virtualizing & Managing SharePoint July 2014 v1.0 2014 Microsoft Corporation. All rights reserved. This document is provided as-is. Information and views
How cloud computing can transform your business landscape Introduction It seems like everyone is talking about the cloud. Cloud computing and cloud services are the new buzz words for what s really a not
Technology Insight Paper Big Data Maximizing the Flow By John Webster August 15, 2012 Enabling you to make the best technology decisions Big Data Maximizing the Flow 1 Big Data Maximizing the Flow 2 The
White Paper BIG DATA-AS-A-SERVICE What Big Data is about What service providers can do with Big Data What EMC can do to help EMC Solutions Group Abstract This white paper looks at what service providers
Volume :2, Issue :4, 580-585 April 2015 www.allsubjectjournal.com e-issn: 2349-4182 p-issn: 2349-5979 Impact Factor: 3.762 R.Saranya M.Phil Full Time Research Scholar, Department of computer science, Vivekananda
An Oracle White Paper June 2013 Oracle: Big Data for the Enterprise Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure
Trends in Cloud Computing and Big Data Nikita Bhagat, Ginni Bansal, Dr.Bikrampal Kaur email@example.com, firstname.lastname@example.org, email@example.com Abstract - BIG data refers to the
For Big Data Analytics There s No Such Thing as Too Big The Compelling Economics and Technology of Big Data Computing March 2012 By: 4syth.com Emerging big data thought leaders Forsyth Communications 2012.
Using Data Mining and Machine Learning in Retail Omeid Seide Senior Manager, Big Data Solutions Sears Holdings Bharat Prasad Big Data Solution Architect Sears Holdings Over a Century of Innovation A Fortune
Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.872.8200 F.508.935.4015 www.idc.com W H I T E P A P E R B i g D a t a : W h a t I t I s a n d W h y Y o u S h o u l d C a r e Sponsored
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and
Big Data: Beyond the Hype Why Big Data Matters to You White Paper BY DATASTAX CORPORATION October 2013 Table of Contents Abstract 3 Introduction 3 Big Data and You 5 Big Data Is More Prevalent Than You
Plug Into The Cloud with Oracle Database 12c ORACLE WHITE PAPER DECEMBER 2014 Disclaimer The following is intended to outline our general product direction. It is intended for information purposes only,
American Journal of Engineering Research (AJER) e-issn : 2320-0847 p-issn : 2320-0936 Volume-03, Issue-05, pp-266-270 www.ajer.org Research Paper Open Access Convergence of Big Data and Cloud Sreevani.Y.V.
Solution Brief Big Data in the Cloud: Converging Technologies How to Create Competitive Advantage Using Cloud-Based Big Data Analytics Why You Should Read This Document This paper describes how cloud and
April 2013 Operational Intelligence: What It Is and Why You Need It Now Sponsored by Splunk Contents Introduction 1 What Is Operational Intelligence? 1 Trends Driving the Need for Operational Intelligence
Putting the cloud to work for your organization. A buyers guide to cloud solutions. What s in this guide for you? If you re thinking about bringing the cloud into your business but aren t sure where to
ISSN (Online): 2409-4285 www.ijcsse.org Page: 78-85 A Survey of Big Data Cloud Computing Security Elmustafa Sayed Ali Ahmed 1 and Rashid A.Saeed 2 1 Electrical and Electronic Engineering Department, Red
1 Contents Introduction. 1 View Point Phil Shelley, CTO, Sears Holdings Making it Real Industry Use Cases Retail Extreme Personalization. 6 Airlines Smart Pricing. 9 Auto Warranty and Insurance Efficiency.
Convergence of Social, Mobile and Cloud: 7 Steps to Ensure Success June, 2013 Contents Executive Overview...4 Business Innovation & Transformation...5 Roadmap for Social, Mobile and Cloud Solutions...7
Cloud Computing A Small Business Guide. Whilst more and more small businesses are adopting Cloud Computing services, it is fair to say that most small businesses are still unsure of what Cloud Computing
EXECUTIVE SUMMARY Big Data is not an uncommon term in the technology industry anymore. It s of big interest to many leading IT providers and archiving companies. But what is Big Data? While many have formed
white paper Boosting Retail Revenue and Efficiency with Big Data Analytics A Simplified, Automated Approach to Big Data Applications: StackIQ Enterprise Data Management and Monitoring Abstract Contents
The little elephant driving Big Data Despite the funny-sounding name, Hadoop is a serious enterprise software suite that drives Big Data Hadoop enables the storage and processing of very large databases
INTELLIGENT BUSINESS STRATEGIES W H I T E P A P E R Architecting A Big Data Platform for Analytics By Mike Ferguson Intelligent Business Strategies October 2012 Prepared for: Table of Contents Introduction...