Accelerating Enterprise Big Data Success Tim Stevens, VP of Business and Corporate Development Cloudera 1
Big Opportunity: Extract value from data Revenue Growth x = 50 Billion 35 ZB Cost Savings Margin Gain THINGS DATA VALUE 2
Big Gap: Roadblocks on the journey Worry about attacks Bring data to compute -- fail to scale x = NO NO NO 50 Billion 35 ZB Waste time on Revenue misguided pilots Growth Cost Savings SECURITY INSIGHT PROOF Hold back production deployment Delay insights with batch processing Pay more for data management Store underutilized data Fail to show Margin ROI Gain Use sub-optimal hardware THINGS DATA VALUE 3
Intel Confidential NDA ONLY Big Picture: Datacenter Inflection 3 2 1 Linux/x86 Units UNIX/RISC units Cluster to Cloud ASIC to IA/Fabric Physical to Virtual SW-only to HW-assisted UNIX to Linux RISC to IA Virtualized Nonvirtualized Public Private 2010 2011 2012 2013 2008 2009 2010 2011 2012 2013 4 Big Data In 2000 Intel saw Linux coming & invested in heavily in Red Hat; in 2005 we saw virtualization happening and invested in VMware; in 2008 we started investing heavily in hyper-scale computing. We think big data & Hadoop will dwarf all of them. 0 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 Diane Bryant, SVP & GM Data Center Group, Intel 4
Big Deal: Cloudera + Intel Alliance Intel invests $740M in Cloudera As Intel s largest datacenter venture deal, represents Intel commitment to big data Supports Cloudera s ability to remain independent Intel & Cloudera drive innovation through open source Accelerate evolution of Hadoop by joining forces on foundational technologies Enable open source developers to innovate in and on top of the Hadoop platform Intel enables CDH to run best on Intel Architecture Enables Cloudera to make best use of Intel data center technologies Provides datacenter infrastructure for Cloudera development & benchmarking at scale Intel & Cloudera foster the broadest ecosystem of big data solutions 5 2014 Cloudera, Inc. All rights reserved.
Big Goal: Converge on one open source platform Most stable, compatible, and mature Hadoop distribution Leading SQL functionality & performance (Impala) Deepest management and governance capabilities 150 Hadoop developers 100 open source committers The only distribution with performance and security enhanced from the silicon up Leading security capabilities including encryption, access control, and auditing 50 Hadoop developers and 12 committers Long-standing committment to open source with 1000 developers working on Linux, KVM, Xen, Java, OpenStack, Hadoop 6 2014 Cloudera, Inc. All rights reserved.
Driving innovation through open source Ramp the pace of innovation in the Apache Hadoop platform while reducing fragmentation SQL Streaming Performance Project Gryphon Impala Apache Storm Apache Spark Streaming Apache Tez Apache Spark Impala Spark Streaming Spark Security Project Rhino Apache Sentry 2014 Cloudera, Inc. All rights reserved. Project Rhino (including Sentry) Storage Apache HDFS Apache HBase Accelerated investment in both 7
Enabling CDH to run best on Intel Architecture Software & Silicon co-evolve to deliver dramatic gains 1 Push computeintensive work down to the silicon 2 Increase main memory utilization up 3 to 20X Design for rackscale architecture Encryption (AES-NI) Compression (SSE 4.2) Math (MKL) Improve Disk:Memory 200:1 10:1 8 2014 Cloudera, Inc. All rights reserved.
Focus of Joint Engineering Feature / Target Cloudera Enterprise SECURITY PERFORMANCE MANAGEMENT APPLICATIONS HDFS Encryption and extended file ACLs Centralized authorization via Sentry Simplified Kerberos Crypto acceleration with AES-NI MR/Shuffle optimizations Compression acceleration with SSE 4.2 Service management extensions Simplified cloud provisioning, including AWS support Backup and Disaster Recovery Certified w/ Intel Enterprise Edition of Lustre Impala enhancements including low-latency SQL engine, SQL-92 analytic queries, and more Spark support in CDH, including Spark on YARN, Spark security, and Spark streaming SQL on HBase HBase cell-level authorization Search: document and index security Auditing & data lineage Optimizations using AVX and other IA Optimizations using MKL Explore Xeon Phi with Java support Deeper diagnostics of various modules Support for Azure, VMware, OpenStack Extended RBAC in Cloudera Manager Spark interoperability with Impala Wire encryption for Spark Pig integration with Spark Spark/Sentry integration 9
Cloudera Enterprise Data Hub powered by Apache Hadoop Open Source Scalable Flexible Cost-Effective Managed Batch Processing Enterprise Data Hub, powered by Apache Hadoop Analytic SQL Search Machine Learning Workload Management Stream Processing 3 rd Party Apps Data Management Open Architecture Secure Governed Storage for Any Type of Data Filesystem Unified, Elastic, Resilient, Secure Online NoSQL System Management 10 2014 Cloudera, Inc. All rights reserved.
Improving Apache Hadoop performance with IA Up to 50% Faster Up to 80% Faster Up to 50% Faster Compute Storage & Memory Network Compared to previous generation SSD compared to HDD 10GbE compared to 1GbE As measured by time to completion of 1TB sort on 10 node cluster Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Source: Intel Internal testing For more information go to : intel.com/performance ` 11
Enabling ecosystem with joint leadership Market leader in big data management systems Largest base of paid customers & free users Consistently delivering industry-leading capabilities around Apache Hadoop Market leader in silicon Long & successful history of investment and collaboration with software platforms Global reach; market leading Hadoop distribution in China 12 2014 Cloudera, Inc. All rights reserved.
Joint customers leading the way Cost Savings Revenue Growth Margin Gain Captures TB s of data from smart meters Analyzes usage patterns to optimize customer consumption $320M USD in utility savings Utilities simply can t cope with the vast volumes of smart meter data not just with storing the data, but being able to analyze it and put it to use -- Drew Hylbert, VP Technology & Infrastucture, Opower Needs to be IoT oriented Needs to leverage Hadoop 13
Summary: Faster Insights, Better Security, and Less Complexity Accelerate innovation via open source software Maintain an open horizontal platform for big data Continue to enhance Apache Hadoop and related projects Enable CDH to run best on IA Optimize performance across compute, storage, & network Ensure platform security, enhanced by hardware Foster evolution of big data ecosystem Establish usage models and industry standard benchmarks Develop reference architectures and industry-wide solutions 14
More Resources intel.com/bigdata cloudera.com 15
16 2014 Cloudera, Inc. All rights reserved. Tim Stevens tstevens@cloudera.com