1
SCALE-OUT SAS GRID WITH XTREMIO & ISILON REDEFINE WORKLOAD AGILITY WITH ARCHITECTURAL FLEXIBILITY - TED BASILE & JOHN MALLORY 2
Session objectives Challenges of Application & Data Silos Big Data Defined (Data Lake Foundation) Why EMC Scale-out Storage for SAS Grid? Storage Architecture Flexibility improves SAS Workflow Impacts of XtremIO & Isilon in SAS environments 3
Every Big Data Journey is Unique Big Data 1.0 Decentralized Datawarehousing Silo d approach is inefficient & complex Lack of cross-lob collaboration Focused on rearview mirror of business Reporting What Happened Big Data 2.0 Analytics for Mixed Data Sets Complimentary to EDW Integration of new data types (unstructured, dark Mainly LOB-oriented Understand Why It Happened Big Data 3.0 Federated Big Data Lake Collect and store data Bring the analytical tools to the data Agile service-oriented (aas) model & architecture Determine What Will Happen 4
What Exactly Is Big Data? Any data-set that cannot be processed with traditional systems Traditional Emerging Structured Data Unstructured Data Public records Social Networks, UGC Dark Data Internet Of Things Location Data 5
What Exactly Is Big Data? Any data-set that cannot be processed with traditional systems Traditional Traditional Emerging Structured Data Unstructured Data Dark Data Emerging Structured Data Unstructured Data Public records Social Networks, UGC Public records Social Networks, UGC Internet Of Things Location Data Dark Data Internet Of Things Location Data 6
Big Data is a big problem Increasing Silos, management and IT complexity Traditional Large volume Structured Data Unstructured Data Many sources Dark Data Emerging Public records Social Networks, UGC Rapid growth Internet Of Things Location Data IT complexity Sources Source Hadoop 7
Data Lake Foundations Scale-out storage for traditional, emerging workloads Traditional No copying Structured Data Unstructured Data Simplification Dark Data Emerging Data Lake Public records Social Networks, UGC No data triplication Internet Of Things Location Data Faster Insights Sources Shared Source Storage Hadoop 8
Why EMC Scale-out for SAS Grid Grid Compute Scale seamlessly as users, data and work grow Utilize resources more efficiently Consolidate silos of redundant SAS HW/SW into a shared services model Improve job scheduling & performance via parallelization Non-disruptive maintain, upgrade & reconfigure SAS environment without downtime Scale-Out Storage Independently scale compute - SAS WORK (XtremIO) and SAS DATA (Isilon) performance & capacity Highly efficient data protection (Isilon) and Inline data services (XtremIO) for maximum efficiency and workflow Optimized for high concurrency of throughput (sequential & random) Non-disruptive operations, dynamic scale-out, enterprise data protection & disaster recovery Your Data Lake Foundation 9
SAS Grid - Storage Requirements SAS DATA Repository for all incoming unstructured/structured data Required shared file-system Read/write ratios job dependent on SAS jobs Generally requires 50MB/sec per stream All processed results Data Cubes archived here SAS WORK Temporary/Intermediate files created by SAS DATA Very throughput and IO dependent (~3-6GB/second) Write intensive; large block sequential (or random) Large reporting cue of jobs; fed back to SAS DATA 10
XtremIO s Unique Architecture Consistent Predictable Performance + Efficiency SOFTWARE-DEFINED SCALE OUT Linear Scale IOPS, Bandwidth & Capacity DATA CENTER SERVICES HA/BC, App Management, Converged Infrastructure METADATA ENGINE Consistent sub 1ms latency INLINE AND UNSTOPPABLE DATA SERVICES Data Reduction Efficiencies, In-Memory Metadata Self Service Provisioning and Orchestration Validated Reference Architectures Intelligent vsphere VAAI Integration Continuous Data Protection and Disaster Recovery Storage Resource Management Enterprise Multi Pathing Converged Infrastructure VSI, DB, VDI 2-3 Site Continuous Availability Thin Provisioning Flash Data Protection Database Consistent Snapshot Management Virtualization Management Integration (VMW & MS) Deduplication Encryption EMC Storage Analytics Vmware vrealize Ops Compression Writeable Copies 11
XtremIO Copy Services for SAS DEV/TEST Efficiency, Checkpoint Protection, On-demand Backups 100% IN-MEMORY Any topology Instant creation Instant deletion 100% SPACE EFFICIENT No space reservations No metadata bloat 100% PERFORMANCE Identical read IOPS Identical write IOPS Identical latency 100% OPTIMIZED Identical data services Always on, always inline INCREDIBLE SCALE Instant application clones to petabyte scale UNMATCHED Use XtremIO where allflash arrays were never before viable Copyright 2014 EMC Corporation. All rights reserved. 12
On-demand Operations for SAS Delivered via XtremIO s Unique In-Memory Copy Services LIFECYCLE TEST BED for DEVELOPMENT & QA Develop with no impact to SAS jobs Move new code to Prod in real-time SAS File Systems TEST (Writeable) QA (Writeable) Training (Readable) Remove risks with new feature updates ARRAY-BASED CHECKPOINTS BACKUP & RECOVERY SAS Job #1.. 9:00 10:00 11:00 12:00 18:00 Preserve hours/days of SAS processing No overhead/impact to SAS jobs Protect SAS Jobs 24x7 vs. nights/weekends Readable copies feed backups to Data Domain Difficult using Traditional Storage 13
Isilon Scale-Out Data Lake Single Storage Pool For File Data Consolidation Shared Silos Storage Files Web Inconsistent Enterprise Security security Archive Report Multi-protocol Access Access Faster Time Time to to insights Insights Mobile Analyze 14
Redefining SAS Agility & Scalability XTREMIO & Isilon Scale-out = DATA LAKE FOUNDATION 1 2 3 4 On-demand ETL Workflow Run Time & Wall Clock Savings Lifecycle Test Bed Architecture & OPEX Flexibility On-Demand from production & loading into staging area; Snapshots offload Prod ~30% more jobs run/day; uncover hours of Wall clock savings; ZERO storage tuning; Checkpoint protection Dev/Test/QA in parallel to SAS jobs; move code to production in real-time XtremIO + Isilon = workflow partitioning; 20% Core/CPU recovery; On-demand backup & RPO; >8:1 Power/Cooling/Space Savings 15
SAS Grid EMC Storage Architecture Dataflow and Architecture Flexibility Data Sources Scale-out Storage SAS Grid Compute SAS Users FC (SAN) IP (NFS) LAN/WAN Consolidated Backup/Recovery 1 Data 2 3 4 Grid 5 Final 6 SAS SAS Users Future Growth: reads/writes DATA sets Add new users loaded in Users DATA receive WORK files on-demand; landing shared via start jobs into finished to scale-out zone Grid (via Isilon XtremIO jobs Grid/Isilon/XtremI Isilon) O 16
Automotive Manufacturer (1 of 2) XtremIO Redefines Performance, Efficiency & Protection CHALLENGE Critical warranty & recall analytics application; extremely storage bound at 2,000 reports/day (300 users running ad-hoc and daily reports) Traditional storage bottleneck due to de-staging of large block writes Accidental deleted data sets by users, forcing jobs reruns SOLUTION APPLICATION(S) XtremIO Workload Results 3:1 (dedupe & compression) ~1.08 ms latency (during peak processing hours) 40-146KB IO size 3.5GB/second (32% utilized) EMC XtremIO (4x10TB cluster); EMC PowerPath RESULTS SAS Analytics (Partitioned Data Sets) 30% improvement in jobs run per day with ZERO storage tuning and management required Removed cyclical reporting barriers during month & quarter end Copy services provided real-world instantaneous recovery of data sets Identified additional I/O locking problems at server to improve additional throughput (bottleneck shifted to O/S and application layer) 17
Automotive Manufacturer (2 of 2) XtremIO Transactional Analysis During Peak Processing IOPS Peak IOPS = 85K (compute bound) Response Time (ms) Average Latency = 1.08ms MB/sec Bandwidth=3.5GB/sec (32% utilized) 18
SAS Performance Testing on XtremIO Mixed Analytics SAS Workloads Testing Results* 3x better run-time savings; wall-clock time reduced by 1 hour ZERO array tuning required (major time savings) Only All-flash Array that kept all data services (dedupe & compression etc) active throughout testing Exceeded concurrent/mixed benchmarks for SAS DATA & SAS WORK on config limited to 25% of XtremIO s capable bandwidth and IOPs 'The EMC XtremIO all-flash array can be extremely beneficial for many SAS Workloads. Testing has shown it can significantly eliminate application IO latency, providing improved performance. - SAS Performance Engineering *Based on 2x 10TB X-bricks (small cluster); pushing 90% of max bandwidth 19
In Conclusion SAS is key to the Federated Big Data Lake (FBDL) Scale-out storage platforms compliment SAS Grid and enable a real-time shared resource model Isilon = Data Integration Platform (SAS & HDFS) XtremIO delivers more than just performance Unique copy services deliver new capabilities Formal RA with SAS on Isilon and XtremIO being discussed 20
Questions? 21