Oracle Aware Flash: Maximizing Performance and Availability for your Database Gurmeet Goindi Principal Product Manager Oracle Kirby McCord Database Architect US Cellular Kodi Umamageswaran Vice President, Exadata Software Oracle
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle s products remains at the sole discretion of Oracle. 2 Copyright 2013, Oracle and/or its affiliates. All rights reserved.
Program Agenda Flash and Databases Oracle s Innovations in Flash Customer Case Study US Cellular Questions & Answers 3
The Promise Of Flash Replace expensive 15K RPM disks with fewer Solid State (Flash) Devices Reduce failures & replacement costs Reduce cost of Storage subsystem Reduce energy costs Lower transaction & replication latencies by eliminating seek and rotational delays Replace power hungry, hard-to-scale DRAM with denser, cheaper devices Reduce cost of memory subsystem Reduce energy costs 4 Copyright 2013, Oracle and/or its affiliates. All rights reserved.
The Complexity Of Flash Data loss Caused by defects, erase cycles (endurance), and interference Density Vs Endurance SLC: Single bit per cell Less density, greater endurance MLC: Multiple bits per cell Greater density, less endurance Write-in-place is prohibitive Read, Erase, Rewrite: Must be avoided Low per Chip bandwidth Compensate for these: Complex Firmware 5 Copyright 2013, Oracle and/or its affiliates. All rights reserved.
Applications Of Flash In Databases As an additional storage layer (caching) Stage active Database objects in flash Accelerate reads and writes to these objects For Data Files Improves user transaction response time Increases overall throughput for IO intensive workloads 6 Copyright 2013, Oracle and/or its affiliates. All rights reserved.
To Cache Or Not To Cache Random reads against tables and indexes Cached: more likely to have subsequent reads Sequential read tables, or Scans Not Cached: sequentially accessed data is unlikely to be followed by reads of the same data Backups, mirrored copies of the block Not Cached: Why? But most general purpose flash solutions are database agnostic and cache all the above workloads 7 Copyright 2013, Oracle and/or its affiliates. All rights reserved.
Flash And Database Logs Flash has very good average write latency Greatly improves user transaction response time Flash occasional outliers, one or two orders of magnitude slower Garbage collection etc contribute to that delay OLTP workloads dislike such large variations 8 Copyright 2013, Oracle and/or its affiliates. All rights reserved.
Where To Introduce A Flash Device SSD NAS/SAN SSD HDD SSD PCI-E Flash PCI-E Flash Direct Attached Mount flash inside your server PCI-E or SSD Networked Storage Share device on the network (FC or 10GE) Popular implementations: Tiered Storage: Multiple tiers of disk drives (SSD, FC, SAS); various performance characteristics and data moves between these tiers Hybrid Storage: Combination of Direct Attached flash on the storage controller and HDD in expansion shelves All Flash Arrays: All storage is some form of flash device either SSD or custom Flash modules 9
Server Attached Flash PCI Express attached Flash I/O cards Extremely fast, improves performance for currently active data Eliminates round trip between server and storage => Fastest I/O response Easy to install and configure, but usually not hot swappable SSD attached via SATA or SAS Slower than PCI Express Cheaper than PCI Express attached flash PCI-E Flash SSD 10
Server Attached Flash - Limitations Database limited to size of single server Storage limited by number of PCIe or disk slots in the server ANY component failure = loss of database access Firmware, Card, Driver, etc. Local second server with identical Flash recommended ($$$ x 2) MTBF of the Server decreases as the number of flash cards increases No Data Loss Requires Synchronous Log Shipping to standby server Negates performance benefit of Flash writes Poor resource utilization Local storage results in islands of storage No IO resource management for prioritizing multiple databases 11
Tiered Or Hybrid Storage Hierarchy of storage tiers based on performance The Storage controller moves the data between these layers based on usage More resilient and scalable architecture compared to server attached flash Better storage resource utilization as it gets shared among multiple servers Server attached flash is usually faster then networked attached flash 12
Tiered Or Hybrid Storage - Limitations The Storage Controller software determines which data is hot Has limited visibility to the content of the data Driven by usage patterns and not application needs Usage pattern changes as the database workload changes (OLTP Vs Reporting) makes the achieved performance less predictable Usually caches data based on yesterday s workload Storage controllers are IO bound Flash can exacerbate that limitation 13
All Flash Arrays No spinning media, all SSD, or proprietary flash modules, or a combination Best in class performance among storage based solutions Expensive, but innovations in Flash and Storage management are driving the cost down Limited throughput usually in low single digit GB/s Most implementations don t scale very well Availability not at the same maturity level as traditional storage arrays. Are missing functionality that traditional arrays have added over the last two decades. 14
Putting It All Together Flash devices in application tier (server attached flash) lacks enterprise class scalability and high availability Flash devices in traditional storage arrays are not efficient as storage controllers don t respond quickly enough to workload changes and are IO bound All Flash Arrays lack the features and stability of traditional arrays 15
Program Agenda Flash and Databases Oracle s Flash Architecture and Innovations in Flash Customer Case Study US Cellular Questions & Answers 16
Oracle s Flash Architecture Scale out architecture adds flash capacity and performance by adding storage servers adds networking and CPU needed to process flash in one unit Database Aware Storage Metadata about IO present on the cell Flash on the Storage Server enables sharing A block on disk is stored in only one flash cache 17 Copyright 2013, Oracle and/or its affiliates. All rights reserved.
Oracle s Innovations in Flash Exadata Smart Flash Cache Exadata Smart Flash Log Exadata Smart Flash Cache Compression Exadata Smart Flash Cache Scan Awareness Exadata Smart File Initialization 18 Copyright 2013, Oracle and/or its affiliates. All rights reserved.
Exadata Smart Flash Cache Understands different types of I/Os from database Skips caching I/Os to backups, data pump I/O, archive logs, tablespace formatting Caches Control File Reads and Writes, file headers, data and index blocks Write-back flash cache Caches writes from the database not just reads RAC-aware from day one 19 Copyright 2013, Oracle and/or its affiliates. All rights reserved.
Exadata Smart Flash Log Outliers in log IO slow down lots of clients Log writer Outliers from any one copy of mirror affect response time foreground Log Buffer foreground foreground foreground Performance critical algorithms like space management and index splits are sensitive to log write latency Legacy storage IO cannot differentiate redo log IO from others client client client client 20 Copyright 2013, Oracle and/or its affiliates. All rights reserved.
Exadata Smart Flash Log Smart Flash Log uses flash as a parallel write cache to disk controller cache Whichever write completes first wins (disk or flash) Reduces response time and outliers log file parallel write histogram improves Greatly improves log file sync Uses almost no flash capacity (< 0.1%) Completely automatic and transparent 21 Copyright 2013, Oracle and/or its affiliates. All rights reserved.
Exadata Smart Flash Cache Compression Exadata Automatically Compresses all data in Smart Flash Cache Compression engine built into flash card Zero performance overhead on reads and writes Logical size of flash cache increases upto 2x Data Format Compression Uncompressed Tables Indexes Oracle E-biz uncompressed DB OLTP Compressed Tables HCC Compressed Tables 1.3X to 4X 1.3X to 4X 4X 1.2X to 2X Minimal Great for OLTP 22 Copyright 2013, Oracle and/or its affiliates. All rights reserved.
Exadata Smart Flash Cache Compression User Data Flash Media Data Flash Device As user adds more data, data is compressed and written Flash device has no logical space at the end for user data but has lot of physical space 23 Copyright 2013, Oracle and/or its affiliates. All rights reserved.
Exadata Smart Flash Cache Compression User Data Flash Media Data Flash Device Extend the logical address space to store more data 24 Copyright 2013, Oracle and/or its affiliates. All rights reserved.
Exadata Smart Flash Cache Compression User Data Flash Device Flash Media Data In reality, data doesn t compress with a uniform compression ratio If a new block (green) update compresses more than previous block (red) there is nothing to do, there is more free space 25 Copyright 2013, Oracle and/or its affiliates. All rights reserved.
Exadata Smart Flash Cache Compression User Data Flash Device Flash Media Data If new block (green) does not compress as well as the previous block (red) Run out of space? That should never happen! 26 Copyright 2013, Oracle and/or its affiliates. All rights reserved.
Exadata Smart Flash Cache Compression User Data TRIM block Flash Device Flash Media Data Exadata periodically monitors free space Performs TRIMs on blocks to ensure enough free space for all IOs Space reclaimed via LRU tail not the next block TRIMs show up as writes (watch for it in iostat) Device size shown in Linux much larger than 2x 27 Copyright 2013, Oracle and/or its affiliates. All rights reserved.
Exadata Smart Flash Cache Compression Exadata automatically Compresses all data in Smart Flash Cache Compression engine built into flash card Zero performance overhead on reads and writes Logical size of flash cache increases upto 2x User gets large amount of data in flash for same media size Enabled via cellcli e alter cell flashcachecompression=true Elasticity of flash cache is completely automatic and transparent 28 Copyright 2013, Oracle and/or its affiliates. All rights reserved.
Exadata Smart Flash Cache Scan Awareness On a traditional cache, if you scan dataset larger than cache size Blocks 0,1,2,3 brought into cache, cache is full Block 20,21,22,23 say replaces 0,1,2,3 Repeat the same scan Block 0,1, 2, 3 will replace blocks 20,21,22,23 Block 20,21,22,23 will again replace block 0,1,2,3 Traditional caches churn with no actual benefit Some implementations call the insertion of new block in the middle scan resistant LRU Insert new block Churn 29 Copyright 2013, Oracle and/or its affiliates. All rights reserved.
Exadata Smart Flash Cache Scan Awareness Exadata Smart Flash Cache is scan resistant Ability to bring subset of the data into cache and not churn OLTP and DW scan blocks can co-exist Nested scans bring in repeated accesses Repeat, For each item in large table, scan small table Smart enough to pull the small table into flash since it is accessed repeatedly even though the size of large table alone is larger than flash cache No need to set KEEP attribute in data warehouses Happens automatically, no tuning or configuration needed Cache 30 Copyright 2013, Oracle and/or its affiliates. All rights reserved.
Exadata Smart File Initialization File Initialization Database Metadata Write Back Flash Cache 11.2.3.2 Database I/O Smart File Initialization 11.2.3.3 Database Metadata CELLSRV CELLSRV CELLSRV I/Os in parallel I/O optimized I/O optimized 31 Copyright 2013, Oracle and/or its affiliates. All rights reserved.
Exadata Smart File Initialization Combine the benefits of previous Smart Initialization and Writeback Flash Cache Write file creation meta-data to writeback flash cache Tiny amount of flash space used to cache large portions of initialized data on disk File creation sped up by an order of magnitude Initialization I/Os to disk deferred or not performed if data loaded Create tablespace, file extensions, autoextend show benefit Redo log initialization included in Exadata 12.1.1.1.0 Happens automatically, no tuning needed Database Metadata CELLSRV I/O optimized 32 Copyright 2013, Oracle and/or its affiliates. All rights reserved.
Exadata Smart Flash Benefits Smart Flash Cache is database aware Smart Flash Logging avoids redo log outliers Smart Flash Compression doubles flash media capacity Smart Flash Cache Scan provides subset scannning and is table scan resistant Smart File Initialization creates a file but just writing meta-data to flash cache Happens automatically, no tuning needed 33 Copyright 2013, Oracle and/or its affiliates. All rights reserved.
Program Agenda Flash and Databases Oracle s Innovations in Flash Customer Case Study US Cellular Questions & Answers 34
Oracle Database Kirby McCord Lead Integration Infrastructure Architect
USCellular United States Cellular Corporation provides a comprehensive range of wireless products and services, excellent customer support, and a high-quality network to 5.0 million customers in 23 states. The Chicago-based company had 7,000 full- and part-time associates as of June 30, 2013. 36
Who am I? Lead Infrastructure Architect at USCellular Oracle OCP since Oracle 7 Member of the IOUG Conference Committee Infrastructure Track Member 2011 Infrastructure Track Lead 2012 Engineered System Manager 2013 IOUG Collaborate Speaker 2012 & 2013 Contributor to IOUG s Exadata Tip Booklet 37
Exadata System 4 Databases ODS ODS Primary loaded via GoldenGate, some batch loading. Mixed workloads on access. Majority of workload on server. X2-2 Storage Expansion (HC Disks) EDW EDW Warehouse with atomic layer in 3NF with some star schema built in DW DW Traditional Warehouse, star schema. 38 ZFSSA 7410 Storage Array X2-2 Full Rack (HP Disks) Misc Misc Small adhoc workload All Databases are Multi-instance Have Physical Standby s Backed up to the ZFFSA
Previous Solution ODS Only 10gR2 Database built around Oracle s Reference Architecture Monolithic Unix Server 24 cores, 384 GB RAM 10gE Networking 8 8G FC SAN connectivity Enterprise Grade Storage Array 15K SAS drives (short armed disks) Lots of cache Dedicated to Database 39 Note: The 2 solutions had similar architectures, loaded via GoldenGate and Informatica and mixed workload applications accessing. The Exadata based system is from a newer billing system, which is 2-3x the amount of changes to load via GG. We also have a lot more applications accessing the ODS. The new workload is larger than what the previous hardware could handle.
What Did We See - Exadata ODS What? Writes are supposed to be fast! Wait until later slides. 40 1.49 ms single block reads While doing 42K read IOPS and 11K write iops over an hour period. Note: The other databases were active on the Exadata System during this time.
Comparison to Old system Metric Exadata ODS Monolithic Hardware ODS Comparison Single Block Reads 1.5 ms 3.8 ms > 2x Log File Synch Waits.85 ms 5.7 ms > 6x Note: The Exadata ODS is over twice the workload as the previous version. In addition, the Exadata system is shared with several databases, while the Monolithic Hardware was dedicated. 41
Write Back Flash Enablement Writes I/Os Design to accelerate write intensive workloads. From previous slide, we had lots of free buffer waits. Enabled this feature on X2-2. Result: No more free buffer waits. 42
Maintenance Activities Preventative Maintenance Operations Battery replacement Proactive disk replacement Unplanned Operations Disk Replacement Flash Drive Replacement Motherboard Replacements Patching All patches for over a year All Have Been Done in a Rolling Fashion! 43
What This Means to Us More Flexibility in System Use We are less concern about unplanned activities on the system. The users can go after the system when they need to, not during certain windows. Maintenance activities have less impact on system availability. More Use of the Data Exadata s Flash reduces the i/o contention of the mixed workloads within the database and between competing databases More concurrent users mean more business questions being answered. Faster Access to the Data Faster I/O means less time waiting for queries to return, more time to analyze the results 44
Program Agenda Flash and Databases Oracle s Innovations in Flash Customer Case Study US Cellular Questions & Answers 45 Copyright 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
46 Graphic Section Divider
47