EMC/Greenplum Driving the Future of Data Warehousing and Analytics EMC 2010 Forum Series 1
Greenplum Becomes the Foundation of EMC s Data Computing Division E M C A CQ U I R E S G R E E N P L U M Greenplum, with expertise in the massively parallel arena, will give the storage giant a boost in big-data computing. InformationWeek 2
About EMC s Data Computing Division Driving the Future of Data Warehousing and Analytics Core Products: Greenplum Database 4.0 true MPP architecture and features that meet mandatory requirements of enterprise-class data warehousing Greenplum Database Single-Node Edition free version for data analysis power users Greenplum Data Computing Appliance price/performance leadership, industry s fastest data loading, and private cloud ready Greenplum Chorus the world s first Enterprise Data Cloud Platform We enable global organizations to gain greater insight and value from their data than ever before possible 3
Greenplum Database 4.0: Critical Mass Innovation 4.0 represents industry leading innovations in: Workload Management Fault-Tolerance Advanced Analytics Culmination of more than +7 years of research and development First vendor to achieve critical mass and maturity across all necessary aspects of enterprise class DBMS platforms Genuine floor-sweep replacement option for Teradata, Oracle, DB2, and SQL Server 4
Greenplum Single Node Edition Free, state-of-the-art, parallel analytic database Fully parallel execution leverages multi-core processors No storage capacity cap from GBs to 10s of TBs Hybrid row and column-oriented processing Ability to expand beyond SNE to massively parallel edition of Greenplum database Single Node Edition 5
Data Warehousing Requirements Fast Data Loading Extreme Performance & Elastic Scalability Unified Data Access 6
Key Technology Pillars World s fastest data loading Scatter / Gather streaming technology Fast query execution with linear scalability Shared-nothing MPP architecture Unified data access across the enterprise Dynamic query optimization and workload management 7
Scatter Gather TM Streaming for the world s fastest data loading speeds Parallel-everywhere approach to data loading Avoids the need for a loader tier of servers Supports both large batch and continuous near-realtime loading patterns 8
Shared-Nothing Architecture Massively Parallel Processing (MPP) Interconnect Loading Most scalable database architecture Optimized for BI and analytics Provides automatic parallelization No need for manual partitioning or tuning Just load and query like any database Tables are distributed across segments Each has a subset of the rows Extremely scalable and I/O optimized All nodes can scan and process in parallel No I/O contention between segments Linear scalability by adding nodes Each adds storage, query performance and loading performance 9
Unified Data Access Across The Enterprise Workload Management Connection management controls how many users can be connected and assigns them to a queue User-based resource queues allow for control of the total number or cost of queries allowed at any point in time. Dynamic Query Prioritization Patent pending technique of dynamically balancing resources across running queries Allows DBAs to control query priorities in real-time, or determine default priorities by resource queue 10
Greenplum Chorus: The World s First Enterprise Data Cloud Platform World s first Enterprise Data Cloud Platform (EDC), enabling: Self-service provisioning Data virtualization services Data collaboration Customers deploy Chorus along with VMware and the Greenplum Database to create a net new & self-service analytic infrastructure Chorus can significantly accelerate the time and ease with which companies extract value and insight from their data 11
Greenplum Chorus: Core Design Philosophies Secure Provide comprehensive and granular access control over whom is authorized to view and subscribe to data within Chorus Collaborative Facilitate the publishing, discovery, and sharing of data and insight using a social computing model that appears familiar and easy-to-use Data-centric Focus on the necessary tooling to manage the flow and provenance of data sets as they are created/shared within a company MAD Skills in Action Build a platform capable of supporting the magnetic, agile, and deep principles of MAD Skills 12
Our Customers Include 150+ global enterprise customers $250+ Million saved by customers choosing Greenplum over Teradata 5+ Billion shares analyzed daily by Financial Markets using Greenplum 20+ Trillion rows being mined for business value 1+ Billion consumers receiving more secure and personalize services from Greenplum customers 13
Response Time (Min) Customer Example: Regional Bank - Teradata Bake-Off Business Problem DW and data mart consolidation across banking regional bank operations Improved query performance for both operational and ad-hoc reporting In-database analytics to support advanced data mining initiatives Existing Solution Oracle Benefits over Teradata Open-systems, commodity HW Significantly better TCO Incremental scalability Better price-performance Response Time Improvement We turned to Greenplum because its massively parallel data warehousing approach is the only one robust and cost effective to grow with us over time. - SVP Corporate Finance 14
Response Time (Min) Customer Example: Investment Firm - Netezza Bake-Off Business Problem Exorbitant maintenance and support costs for Enterprise Data Warehouse Poor data load and ad-hoc query performance on existing Oracle system Scalable platform capable of consolidating multiple decision support DBMS Existing Solution Oracle Benefits over Netezza Open-systems, commodity HW Support model that fit with their existing data center operations Incremental scalability Better price-performance Response Time Improvement Queries that timed-out after 8 hours now run in less than 10 minutes. -Sr. Director Data Warehousing 15
TB/day Customer Example: Stock Exchange Business Problem Analytic database platform standard across global exchange operations Key Criteria Mission critical reliability High-concurrency, mixed-workload Incremental scalability Data Size 10TB - multi-hundred TB systems Loading 1TB/day to 2TB/day Result 6 production systems deployed globally Greenplum offers strong scalability advantages due to its highly parallel model that enables us to simply add more servers as data volumes expand. - CIO 16
Net Data Size (TB) Customer Example: Internet Media Business Problem Multi-hundred TB EDW to support $1B Internet advertising operation True mixed-workload environment supporting production reporting, ad-hoc data mining, and operational data services Competition Teradata, HP, Oracle, Netezza, Aster Data Data Size Results 1 trillion row fact table, adding 3TB/day Running successfully in production ~ 2 years Continuous operations mode while moving data centers across the country Scalability & Reliability Greenplum will be an invaluable partner as we continue to put our data to work in new ways that will improve both the user and advertiser experience on our network of sites. - EVP of Product, Tech &Ops 17
Greenplum Industry Solutions Mission: Drive the Adoption of Greenplum Software through the Creation of Industry-specific Analytic Solutions Strategic Objectives: Address Business Analytic Requirements of Specific Industries Raise Value Proposition from Technology to Business Solutions Develop Ecosystem of Analytic Application Service Providers and ISVs 18
Industry Sales Focus Financial Services Retail Telco Media Entertainment Energy ------- Utilities Oil & Gas Healthcare Public Sector --------- Federal SLED Greenplum Analytic Application Services and ISVs BI Tools Industry- Specific Data Feeds ETL Tools Industryspecific Analytic Application Services and ISVs Chorus Collaboration Open Interfaces Database 19
Implementation Services Partners Industry Sales & Strategic Partnerships Ecosystem Financial Services BI Tools Retail Industry- Specific Data Feeds Telco Media Entertainment Greenplum Analytic Application Services and ISVs ETL Tools Open Interfaces Energy ------- Utilities Oil & Gas Industryspecific Analytic Application Services and ISVs Healthcare Public Sector --------- Federal SLED Chorus Collaboration Database Infrastructure Partners 20
Feed Handler G-Tick Platform EMC Secure Tick Data Management for real-time and historical data EMC & Partners Real-time data Algo Trading Price Engine Trading Desks Trade, Position, Market Data Snapshots Trade Strategies Order Mgmt System Historical data Business Intelligence & Analytics Tools Risk Modeling Compliance Surveillance GemFire in-memory processing database Greenplum high performance analytic engine EMC Components & Partner components highlighted 21
Greenplum Value Prop Scalable Performance Efficiency Improvement Revenue Growth 22
Greenplum Value Prop Greenplum provides an agile analytics environment to address the life cycle of analytics in an enterprise. Chorus, Greenplum s Enterprise Data Cloud, provides a platform to consolidate and virtualize the various data mart silos into a private cloud environment. Greenplum is building out industry-specific solution suites where a higher level of integration is required to drive better time-to-value for various lines of business. Greenplum enables extreme scale, elastic expansion, self service provisioning and data collaboration. 23
Thank you 24