The Enterprise Data Hub and The Modern Information Architecture Dr. Amr Awadallah CTO & Co-Founder, Cloudera Twitter: @awadallah 1 2013 Cloudera, Inc. All rights reserved.
Cloudera Overview The Leader in Open Source Data Management built on Apache Hadoop The Leading Open Source Distribution of Apache Hadoop Powerful Suite of Big Data Management Software Enterprise Grade Security, Auditability, and Reliability Founded: 2008, Employees: 500+ Customers: Over 50% of the Fortune 50 and 65% of the Fortune 500 plus top US intelligence and defense agencies. 80% market share of Hadoop distributions. Partners: 800+ in hardware, software, and services. Education: 15,000+ trained; includes developers, admins, analysts, data scientists. Community: Founders and top supporters of the Hadoop open source ecosystem 2 2013 2014 Cloudera, Inc. Inc. All rights All Rights reserved. Reserved.
Cloudera s Mission Help Organizations Leverage the Power of All Their Data to Ask Bigger Questions. 3 2013 Cloudera, Inc. All rights reserved.
Why is this Happening Now? 4 2014 Cloudera, Inc. All Rights Reserved.
It isn t All About Size 10TB to 10PB IT S ALL (BIG) DATA 5 2014 Cloudera, Inc. All Rights Reserved.
And It Isn t Just About Web 2.0 / Social AUTOMOTIVE Auto sensors reporting location, problems COMMUNICATIONS Location-based advertising CONSUMER PACKAGED GOODS Sentiment analysis of what s hot, customer service FINANCIAL SERVICES Risk & portfolio analysis New products EDUCATION & RESEARCH Experiment sensor analysis HIGH TECHNOLOGY / INDUSTRIAL MFG. Mfg quality Warranty analysis LIFE SCIENCES Clinical trials Genomics MEDIA / ENTERTAINMENT Viewers / advertising effectiveness ON-LINE SERVICES / SOCIAL MEDIA People & career matching Website optimization HEALTH CARE Patient sensors, monitoring, EHRs Quality of care OIL & GAS Drilling exploration sensor analysis RETAIL Consumer sentiment Optimized marketing TRAVEL & TRANSPORTATION Sensor analysis for optimal traffic flows Customer sentiment UTILITIES Smart Meter analysis for network capacity LAW ENFORCEMENT & DEFENSE Threat analysis - social media monitoring, photo analysis 6 2014 2013 Cloudera, Inc. All Rights rights Reserved. reserved.
Legacy Information Architecture is a Mess Thousands of Employees & Inaccessible Information Issues: Limited Scale Limited Agility Limited History Limited Visibility EDWs Marts Servers Document Stores Storage Search Data Archives Silos of Multi- Structured Data ERP, CRM, RDBMS, Machines Files, Images, Video, Logs, Clickstreams External Data Sources 7 2014 Cloudera, Inc. All Rights Reserved.
Enterprise Data Hub is the Solution 4 3 Multi-workload Data Platform Bring applications to data Combine different workloads on common data (i.e. SQL +Search) True BI agility Self-service Exploratory BI Simple search + BI tools Schema on read agility Reduce BI user backlog requests 2 Data Mgmt & Transformations 1 Active Archive 8 4 3 2 Servers EDH Marts 1 Storage Archives EDWs 1 Documents Search One source of data for all analytics Persisted state of transformed data Significantly faster & cheaper Full fidelity original data Indefinite time, any source Lowest cost storage ERP, CRM, RDBMS, Machines Files, Images, Video, Logs, Clickstreams 2014 Cloudera, Inc. All Rights Reserved. External Data Sources
The Enterprise Data Hub Online NoSQL DBMS Analytic MPP DBMS Enterprise Data Hub Search Engine Batch Processing Resource Management Stream Processing Unified Scale-out Storage For Any Type of Data Elastic, Fault-tolerant, Self-healing, In-memory capabilities Machine Learning SQL Streaming File System (NFS) Metadata, Security, Audit, Lineage System Management Data Management Key Attributes: 1. Secure & Compliant Robust access controls Data encryption options Shared security policies 2. Enterprise Data Governance Meta data management Data lineage/tethering Audit histories 3. Unified & Manageable Common storage & resource management On-prem, cloud & managed service Highly available (including DR) 4. Open Architecture Open source plaform APIs & engines for multiple workloads Extensible for 3 rd parties 9 2014 Cloudera, Inc. All Rights Reserved.
Data Warehouse vs. Data Hub Enterprise Data Warehouse Enterprise Data Hub 10 2013 2014 Cloudera, Cloudera, Inc. Inc. All rights All Rights reserved. Reserved.
The Modern Information Architecture Data Architects System Operators Engineers Data Scientists Analysts Business Users META DATA / ETL TOOLS CLOUDERA MANAGER DEVELOPER TOOLS DATA MODELING BI / ANALYTICS ENTERPRISE REPORTING ENTERPRISE DATA HUB ENTERPRISE DATA WAREHOUSE ONLINE SERVING SYSTEM SYS LOGS WEB LOGS FILES RDBMS WEB/MOBILE APPLICATION Customers & End Users 11 2014 Cloudera, Inc. All Rights Reserved.
Customer Journey to Achieve Full Potential Operational Efficiency Information Advantage Cheap Storage ETL Acceleration EDW Optimization Exploration Data Science Consolidation 360 View IT Business 12 2014 2013 Cloudera, Inc. All Rights rights Reserved. reserved.
Other Starting Use Cases for the EDH Market Basket Analysis Fraud Detection Log Processing Predictive Maintenance Risk Management Innovation and Advantage Ask bigger questions in the pursuit of discovering something incredible Operational Efficiency Perform existing workloads faster, cheaper, better ETL Acceleration Active Archive EDW Optimization Deep Exploratory BI Historical Compliance 13 2013 2014 Cloudera, Cloudera, Inc. Inc. All rights All Rights reserved. Reserved.
Conclusion: EDH Allows You To Active Archive Retain Option Value of Data Accelerate ETL Transformations Enable Exploration/Agility Consolidate Silos Achieve True 360 View of Customers and Products. 14 2014 Cloudera, Inc. All Rights Reserved. 2013 Cloudera, Inc. All rights reserved.
The Future Is Information Driven. Start Now. 15 2013 Cloudera, Inc. All rights reserved.