HP Oracle Database Platform / Exadata Appliance Extreme Data Warehousing Shyam Varan Nath President, Oracle BIWA SIG & Founder Exadata SIG (http://oracleexadata.org) South Florida Oracle User Group March 26, 2009
Agenda The Problem Storage Bottleneck for Large Databases Introduction to Data Warehouse Appliances Market Landscape The Solution - Oracle Database Platform and Exadata Storage Technical Details Summary Questions Training Webinar for Report Template_022307
About Myself. Word of Thanks to SFOUG for this talk today ❿ A Certified DBA (OCP) on 4 different Database versions since 1998 ❿ Former member of Oracle Corporation - BI Consulting Practice ❿ Experience in Oracle Data Warehousing, Business Intelligence (OBIEE) and Data Mining ❿ Founder and President of Oracle BIWA SIG (http://oraclebiwa.org), Exadata SIG ❿ Received IOUG Oracle Contribution Award in 2007 ❿ Frequent speaker in Oracle Openworld (2003, 06, 07, 08), NYOUG (June 06, Sep 06, Sep 08, Mar 09), IOUG/Collaborate (2005, 06, 08), NOUG (2006), SFOUG (2007), ODTUG (2008) on topics ranging from Database to BI. ❿ Bachelors from Indian Institute of Technology (IIT), MBA and MS from Florida Atlantic University ❿ Based in South FL since 1995 Training Webinar for Report Template_022307
Business imperative The Problem Storage Bottleneck for Large Databases What is the choking point for Large Databases? Database Engine or Storage or the Interconnect? Today most databases run on computers with one or many powerful CPU s Most large database are I/O bound rather than CPU bound The large storage systems are not able to feed data at a fast enough rate to the database server How can we make the storage more intelligent? Training Webinar for Report Template_022307
Exadata Storage: The next step in VLDW Technology Over the past 12+ years, Oracle has steadily introduced major architectural advances for large database support Data warehouses have grown exponentially with these new technologies 1995 1997 1999 2001 2003 2005 2008 Oracle Release 7.3 Parallel Execution Oracle8 Oracle8i Oracle9i Oracle9iR2 Oracle10g Oracle11g Automatic Storage Management Compression Real Application Clusters Composite Partitioning Range Partitioning First 1TB Database built in lab First 1TB customer: Acxiom First 100TB customer: Yahoo! First 10TB customer: Amazon.com Over 100 Terabyte customers Exadata First 30TB customer: France Telecom - 5 -
Business imperative How Big is the Data Warehouse Storage Problem? ABC Inc. s Data Warehouse is approaching 12 terabytes in size and growing by 100% every year! Storage and backup of data alone is costing 24% of the IT budget. Today How much are we spending in Storage? Annual storage cost $1.2 m Total IT budget is $5m and cost is expected to double next year at the given rate Tomorrow What are the other impacts of huge storage needs? Information Retrieval is slow Not only is the Data Warehouse growing unmanageable in size, information query is slowing down leading to lost orders - 6 -
What is causing the explosion of data in most enterprises? Regulatory Compliance Landscape Web 2.0 Government regulations like SOX, HIPAA government regulations that mandate storing historical data for a certain number of years A new kind of data source Web 2.0 such as social networks, blogs leading to various forms of semi-structured and unstructured data. Some of these data is being stored in the database, some in ECM Multi media content Bandwidth has become cheap and increasing amounts of multimedia content is being generated and stored Migration of Legacy Applications As legacy applications from main frames and other files based databases is being migrated to RDBMS, increasing volumes of data is being stored inside the database Click-stream Click-stream and personalization data continues to explore for online sites - 7 -
Some Large Databases in use Today Yahoo's data needs are substantial. According to Hasan, VP of Data, the travel industry's Sabre system handles 50m events / day, credit card company Visa handles 120m events / day, and the New York Stock Exchange has handled over 225 m events / day. Yahoo, he said, handles 24 billion events / day, fully two orders of magnitude more than other non-internet companies. - 8 -
Building on Oracle s Leading Position Number 1 in Data Warehousing! Teradata 11.7% Other 12.5% Microsoft 14.8% Oracle 39.3% IBM 21.7% Market Size is $6.7 Billion with 14.6% Growth YoY Source: IDC, Aug 2008 Worldwide Data Warehouse Management Tools 2007 Vendor Shares - 9 -
Business imperative Market Landscape How does the Market Landscape of Data Warehouse appliances look like? ORACLE DATABASE PLATFORM The users are able to retrieve information faster due to improved information query response time by up to 3 times NETZZA The cost of additional license for Data Compression is $ 1 million. Total expected cost benefit is about $2 million / per year User Experience Cost Benefit DW Appliance Data Storage Data Process & Organization Competitive Advantage TERADATA Use of Data Compression reduces storage need by up to 5 times, reducing storage cost by up to 60% EXADATA STORAGE Ability to get results 3 times faster from the Data Warehouse will enhance Decision Support process and result in 20% more customer orders, adding $4 million to annual revenue - 10 -
HP Oracle Database Machine: The next step in DW Hardware Solutions Custom Reference Configurations Optimized Warehouse HP Oracle Database Machine Complete Flexibility Any OS, any platform Easy fit into a company s IT standards Documented bestpractice configurations for data warehousing Scalable systems pre-installed and preconfigured: ready to run out-of-the-box Highest performance Pre-installed and preconfigured Sold by Oracle - 11 -
Quote from TDWI In any BI application, it s always disk I/O that slows performance. Data Warehouses are mainly I/O bound rather than CPU bound Other VLDB techniques work with Exadata such as partitioning and compression Exadata is good for index scan as well, improving the index read efficiency - 12 -
Three Pronged Approach to Solve the Problem Faster Pipe Infiniband More Pipes More Efficient use of the Data Pipe by Division of Work between the DB Grid and the Exadata Storage Server - 13 -
HP Oracle Database Machine: Extreme Performance 10-100X faster than conventional DW systems High bandwidth: 14GB/sec of raw I/O throughput >50GB/sec of raw business data can be processed with compression High-bandwidth Infiniband network between Database Servers and Storage Servers Efficient block access in Storage Servers Smart scan processing Data-intensive processing in the storage server Compute-intensive processing in the database server Less data transfer over the network - 14 -
HP Oracle Database Machine: Key Components Database Server Grid 8 Servers, each consisting of: One HP DL 360-G5 with 2 Intel Quad-core processors 32 GB RAM 4 146GB SAS disks Dual-port Infinibad Host Channel Adapter (HCA) Oracle Enterprise Linux Oracle Database 11g Enterprise Edition with Real Application Clusters and Partitioning 4 Infiniband Switches Each with 24 ports Exadata Storage Server Grid 14 Servers, each consisting of: One HP DL180-G5 with 2 Intel Quad-core processors 8GB RAM 12 450GB SAS or 1TB SATA disks Dual-port Infiniband Host Channel Adapter (HCA) Oracle Enterprise Linux Oracle Exadata Storage Server Software - 15 -
Division of Work Data Intensive Processing Compute Intensive Processing Exadata Storage Server Implements data intensive processing directly in storage Scans tables and indexes filtering out data that is not relevant to a query Compute intensive data processing remains in database servers Joins, aggregation, statistics, data conversions, etc. Exadata cell is smart storage, not a database node - 16 -
How Does Query Processing Change with Exadata? - 17 -
Smart Scans Exadata cells implement smart scans to greatly reduce the data that needs to be processed by database Only return relevant rows and columns to database Offload predicate evaluation Data reduction is usually very large Column and row reduction often decrease data to be returned to the database by 10x - 18 -
Traditional Scan Processing ❶ SELECT customer_id FROM calls where amount > 200; ❷ Table Extents Identified ❸ I/Os Issued ❻ Rows Returned ❺ DB Host reduces terabyte of data to 1000 customer names that are returned to client ❹ IOs Executed: 1 terabyte of data returned to hosts Smart Scan Example: Telco wants to identify customers that spend more than $200 on a single phone call With traditional storage, all database intelligence resides in the database hosts Most data returned from storage is discarded by database Discarded data consumes valuable resources, and impacts the performance of other workloads - 19 -
Exadata Smart Scan Processing ❶ SELECT customer_id FROM calls where amount > 200; ❷ Smart Scan Constructed And Sent To Cells ❸ Smart Scan identifies rows and columns within terabyte table that match request ❻ Rows Returned ❺ Consolidated Result Set Built From All Cells ❹ 2MB of data returned to server Only the relevant columns customer_id and required rows where amount>200 are are returned to database CPU consumed by predicate evaluation is offloaded Moving scan processing off the database frees CPU cycles and eliminates lots of unproductive messaging Returns the needle, not the entire hay stack - 20 -
Smart Scan Transparency Smart Scans correctly handle complex cases including Uncommitted data and locked rows Chained rows Compressed tables National Language Processing Date arithmetic Regular expression searches Partitioned tables Smart scans are transparent to the application No application or SQL changes required Returned data is fully consistent and transactional If a cell dies during a smart scan, the uncompleted portions of the smart scan are transparently routed to another cell - 21 -
Data Flow Concepts ❿Concept of Data flow and producer consumer relationships ❿Three kinds of data exchanges take place Exchange 1 Exchange 2 Exchange 3 Exchange 1 is flow of data within an Exadata Cell using idb protocol, throughput is 60-80MB/sec per disk Exchange 2 is between a single cell and Database grid (1Gb/sec) Exchange 3 is between the Database grid and the Storage Grid (1.6 GB/sec) - 22 -
Visual of Data Flow Exchanges - 23 -
Targeted Messages: to DW Managers / Architects v/s to DBA s/ System Admins Key Messages for DW Managers / Architects 10x 100x performance gains for end-user queries Zero changes to existing BIDW tools and applications Supports large numbers of Decision Support users and applications Fast deployment: no configuration needed Key Messages to DBA s / Sys Admins Built on Oracle Database 11g (11.1.0.6 and higher), consistent with corporate standards Based on standard hardware components from HP no proprietary hardware Oracle provides a single point of purchase and support Hardware repair is provided by HP worldwide - 24 -
HP Oracle Database Machine Data Capacity Raw Storage User Data Data Bandwidth HP Oracle Database Machine Hardware SAS 97 TB 30 TB 14 GB/s HP Oracle Database Machine Hardware SATA 168 TB 46 TB 10.5 GB/s HP Exadata Storage Server Hardware SAS 5.4 TB 1.5 TB 1 GB/s HP Exadata Storage Server Hardware SATA 12 TB 3.3 TB 0.75 GB/s Raw Storage: Total raw disk capacity, computed as (# disks x disk capacity) User Data: Space for end-user data, computed after mirroring and after allowing space for database structures such as temp, logs, undo, and indexes. User data capacity is uncompressed; with compression, 2x to 4x more data can often be stored. Actual user data capacity varies by application - 25 -
HP Oracle Database Machine: High Availability Problem Database Machine Solution Power failure Redundant power supplies for all servers Switch failure Disk failure Database Server failure Redundant switches; dual-port HCA s in all servers Oracle Automatic Storage Management: all disks are mirrored Oracle Real Application Clusters Storage Server failure Oracle Exadata Storage Servers - 26 -
HP Oracle Database Machine: Installation Goal: Deliver to the customer a completely functioning database system All servers properly configured and networked All software configured (CRS, RAC, DB, Exadata) Default database created Performance and functionality validated Installation is included in the price of HP Oracle Database Machine Onsite HP Installation Services Onsite Oracle ACS Services - 27 -
HP Oracle Database Machine: Support Single point of contact for support (Oracle) for entire HP Oracle Database Machine Hardware Software Oracle Enterprise Linux Database Exadata Storage Software Software issues resolved by Oracle support Hardware support Hardware issues are passed to HP HP contacts the customer to resolve the issues HP Support is available 24x7 For on-site support HP has to respond (not repair) within defined times Customer can buy additional support (HP Care packs) - 28 -
DB Machine Technology Comparison HP Oracle Database Machine Netezza 10100 Teradata 2550 Footprint 1 rack 1 rack 1 rack User data Disks Database cores Storage cores Total cores 21 TB 168 x 450GB disks 64 DB Cores 112 Storage Cores 176 Cores 12.5 TB 108 x 400GB disks 4 DB Cores (?) 108 Storage Cores* 112 Cores* 12.6 TB 144 x 300GB disks 32 DB Cores 0 Storage Cores 32 Cores Interconnect Memory HW Architecture 20Gb/sec Infiniband 368 GB Open 1Gb/sec Ethernet 108 GB Proprietary 1 Gb/sec BYNET 128 GB Proprietary** * Netezza 10100 uses PowerPC CPU s (less powerful than Intel Xeon cores) ** Teradata BYNET Interconnect is proprietary - 29 -
Retailer Exadata Speedup 3x to 50x Merchandising Level 1 Detail: Period Ago Merchandising Level 1 Detail: Current - 52 weeks Supply Chain Vendor - Year - Item Movement Merchandising Level 1 Detail by Week Materialized Views Rebuild Date to Date Movement Comparison - 53 weeks Prompt04 Clone for ACL audit Sales and Customer Counts Gift Card Activations Recall Query 16x Average Speedup - 5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 x SPEEDUP - 30 -
Exadata s Value Proposition Ability to stay on Oracle Database for Extreme BIDW Performance Compatibility with DB features like Partitioning, DB Compression etc Horizontal Scalability for Database Grid and Storage Grid Existing DB features compatibility (Partitioning) Ready Configuration Reference Customers Scalable DB Pre-built solutions from Oracle for BIDW like BI-Apps using OBIEE, Industry extensions like Oracle Data Warehouse for Retail (Accelerators) Industry Vertical Solutions Oracle HP Database Machine Scalable Storage Single point of support Hardware and Software Pre-built BI Accelerators BI/DW Technical Infrastructure Single Point of Contact - 31 -
Exadata Benefits Extreme Performance 10X to 100X speedup for data warehousing Database Aware Storage Smart Scans Massively Parallel Architecture Dynamically Scalable to hundreds of cells Linear Scaling of Data Bandwidth Transaction/Job level Quality of Service Mission Critical Availability and Protection Disaster recovery, backup, point-in-time recovery, data validation, encryption - 32 -
What can Oracle Exadata Platform do for you? Let us look at why Oracle Exadata needs to be in the BIDW roadmap of the companies to address common issues Issues Opportunities Explosion on Data Volumes High Perforamance even with exponential growth of data Cost of licensing new H/W and S/W Reduced Query Performance due to large database size Oracle Exadata Total cost of ownership is reduced in long run Tremendous Business Productivity boost DB is on Exadata, what about backup? Compatibility with other 11g features like compression or Partitioning Fear of adoption and learning curve of data compression Standby DB does not have to be Exadata Compression/Partitioning can be used with Exadata storage No impact to app developers/endusers, minimal impact for DBA s - 33 -
Questions Reminder join IOUG Exadata SIG for more info Contact Info: ShyamVaran@Gmail.com (954) 609 2402 cell http://oracleexadata.org - 34 -
Other Resources http://oracleexadata.org http://www.oracle.com/exadata www.oracle.com/technology/products/bi www.oracle.com/solutions/business_intelligence OTN: http://www.oracle.com/technology/products/bi/db/dbmachine http://www.oracle.com/technology/products/bi/db/exadata Forums: http://structureddata.org/ http://kevinclosson.wordpress.com/ http://techspectator.blogspot.com/ Subject:Oracle Exadata Setup/Configuration Best Practices Doc ID:757553.1Type:BULLETIN Modified Date:18-MAR- 2009Status:PUBLISHED Subject:Oracle Exadata Best Practices Doc ID:757552.1Type:BULLETIN Modified Date:02-MAR-2009-35 -