Flash for Databases September 22, 2015 Peter Zaitsev Percona
In this Presentation Flash technology overview Review some of the available technology What does this mean for databases? Specific opportunities for MySQL 2
Before SSDs
There were HDDs Good at Sequential Read/Writes RT=Seek Time + Rotation Latency Reads/Write Similar Latency No Specific Write Limits Retain data for a long time One IO Request in Parallel Low cost per GB 4
RAID and SAN
Using Many HDDs together Caching Reads Buffering Writes (Writeback Cache) Better Sequential Read/Write speed Better throughput at high concurrency Higher IO latencies for uncached IO 6
Flash Revolution Use Flash chips instead of platters No moving parts No seeks 7
NAND Flash Cell Page/Read Block Erase Block Write but no overwrite Wears with writes (erases) 8
Writing to the Flash Erase Set all bits to 1111111 Write Change Zero to one Set some of the bits to 0: 0100111.. Impossible. Do Erase, when Write 9
Types of NAND Flash From AnandTech: 10
Flash Storage Design Cache Battery/Super Capacitor Controller + Complex Firmware Built-in Parallelism 11
Flash Controller and Firmware Tasks Write wear leveling Garbage collection Error correction Bad block mapping Read scrubbing Read disturb management Encryption 12
Flash Properties Lots of IOs per device! (100K+) Less random IO penalty Writes more expensive than reads (but can be faster) Limited by amount of writes Limited retention Concurrent execution on single device Fast write acknowledgement (safe or not) Can burst writes 13
Flash Interface Designs DIMM PCI-E SFF-8639 SATA/SAS FC and Network 14
Transitioning AHCI NVMe 15
AHCI vs NVMe Source: AnandTech.com 16
Some Product Examples Products and Leaders are changing quickly 17
Sandisk ULLtraDIMM 18
HGST Virident 19
Sandisk FusionIO 20
Intel P3x00 21
Intel 750 22
Intel 730 (SATA) 23
msata 24
M.2 Interface 25
Violin Memory 26
Consumer vs Enterprise Performance Endurance Durability Retention Encryption 27
Not your HDD All HDDs are the same; All SSDs are different 28
Evaluation Performance changes over time Empty Space Matters Complex internals Watch stability carefully 29
How Flash Fails Clear write amount defined EOL (but often can handle a lot more) One day it s gone Power Loss Protection Internal ECC and redundancy 30
To RAID or not to RAID? More valuable for consumer grade Watch for good Flash support RAID controller logic may slow things down Use a redundant array of inexpensive servers instead? 31
Redundancy Device internal redundancy Hardware RAID Software RAID Filesystem RAID 32
OS Support Flash support is actively being improved TRIM Sparse Files 33
Flash And Databases www.percona.com
Database History Most have been designed in HDD time Optimize for sequential IO Count on cheap sequential writes RAID, BBU to improve performance 35
It s time for Flash Your OLTP Database should live on Flash 36
But What Flash? Pick a flash type that is right for your application 37
IO vs Memory 38
Warmup Much faster warmup times Even if the database fits in memory, SSD might be justified 39
Tolerate more IO bound load HDD 5ms Can do 20 IO/s for 100ms response time (non parallel) Flash 0.1ms Can do 1000 IO/s for 100ms response time (non parallel) 40
Endurance Might be a top consideration 41
Endurance Math HGST FlashMax III 2200GB 4400GB/day over 5 Years 1400MB/sec peak writes 66 days at peak write throughput Crucial M500 960GB 72TB total life time writes 400MB/sec write 52 hours at peak write throughput 42
Databases and Flash How do we optimize databases to us Flash best? 43
Storage Engines Innodb TokuDB 44
Torn Page problem Flash can avoid this with little cost due to internal design FusionIO NVMFS (Atomic Writes) Copy-on-Write File Systems ZFS BTRFS Filesystem level data journaling less preferred data=journal for EXT4 Skip-Innodb-double-write 45
Fast IO Path Bypass Caching O_DIRECT Native Asynchronous IO Efficient Checksuming Innodb_checksum_algorithm=crc32 Innodb_flush_method=O_DIRECT 46
IO Cost Accounting Sequential vs Random IO balance IO vs CPU Balance Smaller page sizes might make sense innodb_page_size=4k 47
Less Pre-fetching Most pre-fetched data must be used Often best to try It out 48
Less merging on flushing Do not assume flushing multiple sequential dirty pages has same cost Innodb_flush_neighbors=0 49
Less Space on Disk Innodb Compression (2x typical) TokuDB Compression (5-10x typical) Archiving data off OLTP System 50
Less Writes on Flash Hybrid Flash/SSD System Transactional Logs, Other logs on the HDD with RAID and BBU Small Temporary objects on tmpfs Innodb_log_file_size=<LARGE> 51
Logs on RAID can be fast 52
Single Intel 730 Sysbench 53
IOPS 54
Consistency 55
Is Flash Too Fast? Multiple instances might scale better 56
Other Thoughts Host hardware and OS matter, especially with high end flash Virtualization has higher relative overhead Network higher relative overhead 57
Thank you! pz@percona.com https://www.linkedin.com/in/peterzaitsev https://twitter.com/peterzaitsev 58