Everything a DBA Needs to Know About Storage Alexey Saltovski, DBA Group Leader, Matrix Didi Atzmony, Director, PM, Kaminario Agenda Enterprise Storage systems SQL Server configuration Storage & Server best practices Monitoring Performance Benchmarking Storage features in the near future 1
Enterprise Storage systems - Agenda History / Definition / Types / Services Logical & Physical Components Performance challenge ( What is IOPS/Latency/Throughput) New trends in Storage systems: Solving the Performance I/O challenge High Availability (HA) & Disaster recovery (DR) What is Storage Input Output Non-volatile 2
Types of Storage Tape HDD SAS SATA Flash DRAM Major characteristics: 1. Capacity 2. Availability 3. Performance 4. Security 5. Scalability 6. Data integrity 7. Manageability Enterprise storage system Definition 3
Enterprise Storage Basic Types DAS Direct Attached Storage Internal Drives Memory Cards Dedicated Storage System NAS Network Attached Storage File based protocol File Server SAN Storage Area Network Block Level Device Disks Enterprise storage system Architecture SAN: Storage Area Network SAN Front End Cache Back End Hard Disk Drives 4
The LU LU = Logical Unit The way the storage is exposed Looks like any storage drive Designated by a number LUN A LUN may be shared by many Hosts Many physical disks exposed as one logical disk Security - LUN Masking LUNs are grouped Consistency groups All the LUNs used by Finance All the LUNs use for Development Imperative for backup Useful for Security LUN Masking what a host can see According to WWN Access to a LUN or a Consistency group 5
RAID RAID = Redundant Array of Inexpensive Disks RAID 0 Striping RAID 1 Mirroring RAID 10 = RAID 1 + RAID 0 Parity RAID Can we save space? RAID 4 RAID 5 RAID 6 RAID 4 6
Enterprise Storage Availability RAID scheme RAID 1, 4, 5, 6, 10 All hardware redundant Controllers IO Paths UPSs At least N+1 A snapshot of data at a point-in-time. Usage: Testing Protection Replication Snaps Prod F Snap V 1 V 2 A A B B C C DF D E E t 10 : Write Snap V1(3) to V2 = F 7
Replication / Disaster Recovery Having a remote copy Outside of the storage system For backup or Disaster Recovery RPO Recovery Point Objective How much data is lost during the failover RTO - Recovery Time Objective How long it will take to recover Synchronic or A-synchronic In DR 100 KM Synchronous Replication RPO = 0 Data is always the same on both storage systems Slower as distance grows Two options: Host Host Local Remote Local Remote 8
Asynchronous Replication Remote DR site Sync Low performance Solution Asynchronous Replication RPO > 0 Host Local Remote Snap and Replication Snap aside Incremental snap Then Async replication Update Remote Site SNAP Local Site 9
The Performance challenge CPU Memory Addressable Memory Network Speed Disk Data Transfer 1990 2012 Improvement 0.05 MIPS/$ 0.02 MB/$ 147 MIPS/$ 25 MB/$ 2940x 1250x 2 16 2 64 2 48 x 100 Mbps 5 MBPS 100 Gbps 130 MBPS 1000x 26x The Performance challenge Three basic metrics are applied to describe the performance aspects of any storage system: IOPS The number of Input or Output requests per second. Bandwidth (throughput) The number of bytes transferred per second. Response Time (latency) The amount of time each I/O request will take to complete. 10
Performance challenge - HDD The HDD is mechanical many performance issues Mechanical actuator Rotational Latency: 7500, 10K, 15K Seek time: Actuator getting to the right location Ways to mitigate: Parallelism Does not manage random or any other badly formed IO patterns. Short Stroking Writing in such a way that the actuator barely moves Still slow Very expensive low utilization All I Want is Bandwidth If you really didn t care about latency how would you get the best bandwidth? FibreChannel? Infiniband? iscsi? 11
Maximum Bandwidth - Truck Place 10,000 of 1TB Disks in a TRUCK and move it all to a remote site in Eilatin 8 hours. Bandwidth of transfer is 460 GB/s Latency of the transfer is ~8hours. Storage bottlenecks are rarely bandwidth! Why? Disk RAID Disk bandwidth has increased 1987-2005 -30x Disk Latency 3x Solid State Drive DRAM SSD Uses standard DRAM Volatile Media Requires backup power Battery UPS Pros Fastest Consistent performance for ANY workload Cons Backup Power Power consumption 12
Solid State Drive - Flash What is flash? Solid state media The media is divided into blocks The blocks are divided into pages Write: a page Erase: a block Flash types SLC: Single Level Cell MLC: Multi Level Cell (2 bits) Wear out Limited number of writes SLC 100,000 MLC 30,000 Solid State Drive - Flash Form Factor PCIe SAS Writing to Flash -LSA Writing is done in a logging manner B 02 B5 30 B 1 B4 02 B3 02 Write Amplification To write requires sometimes clearing a block Moving pages from one block to another Write Amplification: additional writes to complete the operation Performance impact B 20 13
Flash I/O What is flash? Solid state media The media is divided into blocks A E I The blocks are divided into pages Write: a page Erase: B a block F J Flash types SLC: C Single GLevel CellB MLC: Multi Level Cell (2 bits) Wear Z out H L Limited number of writes SLC 100,000 MLC 30,000 M N A D A B C D Z Q Solid State Drive - Flash Performance Read: 70 msec Write: 25 msec(???) NOT good with random writes Pros Non volatile, Lowest power consumption Fast Dense Cons Limited writes Unstable performance Less reliable 14
Performance challenge Form Factor Latency (ms) RAM Rack Mount SSD 0.015 PCIe SSD 0.05 Flash Rack Mount SSD 0.07 HDD 4 to 7 Automatic Tiering Storage controller shifts data around for better Performance According to some heuristic Hot data Predetermined data Prediction algorithms Pros Better utilization of performance resources Cons Not always reliable Doesn t work for random workloads 15
The SSD Storage Revolution is Coming SSD will replace FC-HDD Media Cost / GB SSD $/GB FCHDD$/GB HDDSourceIDC SSD SourceGartner The FC HDD storage market is an $18B market in 2011 The SSD Storage Revolution is Coming Enterprise SSD Market Size - $B SourceGartner 16
Flash and New trends in Storage systems All SSD FLASH Solutions Many new startups Many flavors to the Flash solution Higher performance Expecting: Lower latency Higher throughput Millions of IOPS GREEN, GREEN, GREEN Lower power consumption Smaller footprint Capacity Availability Redundancy Software Stack Performance Latency IOPS Throughput Flash Performance Density Green Summary -Enterprise Storage systems 17
Agenda Enterprise Storage systems SQL Server configuration Storage & Server best practices Monitoring Performance Benchmarking SQL Server Configuration - Agenda SQL Server Architecture (Memory Structure, Process & Files) I/O pattern Basic: Terms and Key Concepts I/O pattern Advanced: I/O sizeof SQL operations. Wait types: user waits Vs. Background waits. DMVs (Dynamic Management Views) 18
SQL Server Architecture code area SQL server kernel Stack space Buffer pool Plan cache Buffer cache Net lib dll Open data services Log cache SQLOS Lazywriter Checkpnt MS DTC Log Writer Worker Threads Temp DB User DB Storage system Master DB Model DB Background Processes Client Client User Process User Process Open Data Services Database Cleanup Worker Thread Log Writer Lazy Writer Memory Pool Database Buffer Cache Log Cache Client User Process Worker Thread Database Shrinking Log Files Data Files Users Relational Engine OLE DB Storage Engine 19
Database files configuration Logfile(*.ldf) configuration Datafiles(*.mdf, ndf) configuration Filegroups TempDB Sorting, ordering, rebuilding Only one DB per instance (2000 2008). Database Files Types of Systems DSS / DWH Data Reading Oriented Usually, Fewer User Connections, Issuing Complex Aggregative Queries That Generate Sequential Accesses To The Disk Periodic Write Activity Must Be Also Taken Into Account OLTP -(Online transaction processing) Data Manipulation Oriented Usually, Large Number Of User Connections, Using Simple Queries That Generate Random Accesses To The Disk Examples: ERP, CRM 20
I/O Patterns I/O Size Small I/O 8 kb Bigger I/O 64 1024 kb Random Vs. Sequential Random -Accessing The Data That Is Scattered Across The Disk. Sequential -Accessing The Data By The Order It Is Physically Stored. Read Vs. Write Read -Accesses The Data From Drive And Writing It To The Buffer Pool. Random read I/O Sequential Read I/O Write -Writing The Data From The Buffer Pool Onto The Hard Drive. Advanced I/O patterns Operation Random / Sequential Read / Write I/O Size Range OLTP Log Sequential Write Sector Aligned Up to 60K OLTP Log Sequential Read Sector Aligned Up to 120K OLTP Data (Index Seeks) Random Read 8K OLTP - Lazy Writer (scatter gather) Random Write Any multiple of 8K up to 256K OLTP Checkpoint (scatter gather) Random Write Any multiple of 8K up to 256K Read Ahead (DSS, Index/Table Scans) Sequential Read Any multiple of 8KB up to 256K (1024 Ent Ed) Bulk Insert Sequential Write Any multiple of 8K up to 128K BACKUP Sequential Read/Write Multiple of 64K (up to 4MB) RESTORE Sequential Read/Write Multiple of 64K (up to 4MB) DBCC CHECKDB Sequential Read 8K 64K ALTER INDEX REBUILD -replaces DBREINDEX (Read Phase) Sequential Read Any multiple of 8KB up to 256K ALTER INDEX REBUILD -replaces DBREINDEX (Write Phase) Sequential Write Any multiple of 8K up to 128K 21
Waits What is a wait? App time Application Work Time DB time Wait Time Total Transaction Time Wait types Network. Storage I/O system, configuration Locks, latches, concurrency User Vs. Background waits Useful DMVs sys.dm_os_wait_stats sys.dm_io_virtual_file_stats sys.dm_exec_query_stats sys.dm_exec_sessions sys.dm_os_performance_counters Do remember that DMV data gets re-set with the re-start of the SQL Server service! 22
Sys.dm_os_wait_stats Column name Data type Description wait_type nvarchar(60) Name of the wait type. waiting_tasks_count bigint Number of waits on this wait type. This counter is incremented at the start of each wait. wait_time_ms bigint Total wait time for this wait type in milliseconds. This time is inclusive of signal_wait_time_ms. max_wait_time_ms bigint Maximum wait time on this wait type. signal_wait_time_ms bigint Difference between the time that the waiting thread was signaled and when it started running. Wait Types Wait_Type Area Usage Description ASYNC_IO_COMPLETION I/O Resource Used to indicate a worker is waiting on a asynchronous I/O operation to complete not associated with database pages CHECKPOINT_QUEUE Buffer Background Used by background worker that waits on events on queue to process checkpoint requests. This is an "optional" wait type see Important Notes section in blog CHKPT Buffer Background CXPACKET Query Sync DISKIO_SUSPEND BACKUP Sync FT_IFTS_SCHEDULER_IDLE_WAIT Full-Text Background Used to coordinate the checkpoint background worker thread with recovery of master so checkpoint won't start accepting queue requests until master online Used to synchronize threads involved in a parallel query. This wait type only means aparallel query is executing. Used to indicate a worker is waiting to process I/O for a database or log file associated with a SNAPSHOT BACKUP Used by a background task processing full-text search requests indicating it is waiting for work to do: IO_COMPLETION I/O Resource KSOURCE_WAKEUP Shutdown Background LAZYWRITER_SLEEP Buffer Background Used to indicate a wait for I/O for operation (typically synchronous) like sorts and various situations where the engine needs to do a synchronous I/O Used by the background worker "signal handler" which waits for a signal to shutdown SQL Server Used by the Lazywriter background worker to indicate it is sleeping waiting to wake up and check for work to do LOGBUFFER Transaction Log Resource Used to indicate a worker thread is waiting for a log buffer to write log blocks for a transaction 23
Wait Types (cont d) LOGMGR_QUEUE Transaction Log Background Used by the background worker "Log Writer" to wait on a queue for requests to flush log blocks to the transaction log. This is an "optional" wait type see Important Notes section in blog MISCELLANEOUS Ignore Ignore This really should be called "Not Waiting". PREEMPTIVE_XXX Varies External Used to indicate a worker is running coded that is not under the SQLOS Scheduling Systems REQUEST_FOR_DEADLOCK_SEARCH Lock Background RESOURCE_QUERY_SEMAPHORE_COMPILE Query Resource Used by background worker "Lock Monitor" to search for deadlocks. This is an "optional" wait type see Important Notes section in blog Used to indicate a worker is waiting to compile a query due to too many other concurrent query compilations that require "not small" amounts of memory. RESOURCE_SEMAPHORE Query Resource SOS_SCHEDULER_YIELD SQLOS Forced SQLTRACE_BUFFER_FLUSH Trace Background Used by background worker Used to indicate a worker is waiting to be allowed to perform an operation requiring "query memory" such as hashes and sorts Used to indicate a worker has yielded to let other workers run on a scheduler THREADPOOL SQLOS Resource Indicates a wait for a task to be assigned to a worker thread WRITELOG I/O Sync Indicates a worker thread is waiting for LogWriter to flush log blocks. XE_DISPATCHER_WAIT XEvent Background XE_TIMER_EVENT XEvent Background Used by a background worker to handle queue requests to write out buffers for async targets Used to indicate a background task is waiting for "expired" timers for internal Xevent engi Script Examples SQL Server IO Statistics example http://gallery.technet.microsoft.com/scriptcenter/various-sql-server-io-3f9002f7#content Determing SQL Disk IO Workload http://henkvandervalk.com/sql-under-the-hood-part-2-estimate-the-sql-disk-io-workload-based-onthe-virtual-file-stats-dmv 24
Agenda Enterprise Storage systems SQL Server configuration Storage & Server best practices Monitoring Performance Benchmarking SQL Server I/O performance: Know Your I/O Pattern Storage Disk VsSSD drivers RAID Levels Server HBAs Multipath configuration Storage network Fabric consideration SQL Server Number of data files and file groups TempDBconsiderations 25
Performance No single right way to do it. All environments are different. Latency is crucial. Ensure storage engineers have knowledge of SQL best practices. Validate your configuration before deployment. Storage How many/what size LUNs? It depends :-) Results vary, not all storage implementations perform the same. Test it Use Faster Media Units Use 64KB allocation unit size Array and Driver Firmware Is Important Discuss HBA/Controller Setting With Your Storage Vendor Consider Multi-Path Solution Tune HBA Queue Depth 26
Separation of Workloads Separate Random from Sequential Operations Separate Indexes from Tables Isolate Log from Data at the physical level Isolate TEMPDB Database Files Do Not Mix SQL Server Files With Other Data Multiple LUNs are usuallybetter than 1 large LUN More Data Files Be er Performance Determined Mainly By Hardware Capacity Number Of Data Files May Impact Scalability Number Of Processors On Host (Mainly Be Concerned For >= 4 CPUs) 27
Database Files cont d For tempdb, configure multiple files. Consider read-only filegroupsfor certain scenarios Autoshrink Do not use it Improve read-ahead operation. Verify object continuity Compression Help reduce the size of the database. Improves the performance of I/O intensive workloads. Extra CPU resources are required to compress and decompress the data. It is important to understand the workload characteristics when deciding which tables to compress. 28
Compression (cont d) U: The percentage of update operations on a specific table, index, or partition, relative to total operations on that object. The lower the value of U (that is, the table, index, or partition is infrequently updated), the better candidate it is for page compression. S: The percentage of scan operations on a table, index, or partition, relative to total operations on that object. The higher the value of S (that is, the table, index, or partition is mostly scanned), the better candidate it is for page compression. Agenda Enterprise Storage systems SQL Server configuration Storage & Server best practices Monitoring Performance Benchmarking 29
Monitoring Performance OS Level # Parameter Counter 1 IOPS Disk Reads/Sec Disk Writes/Sec 2 Latency Average Disk sec/read Average Disk sec/write 3 Disk Throughput Disk Read Bytes/sec Disk Write Bytes/sec 4 IO Queue Average Disk Queue Length (<=2) Current Disk Queue Length 5 IO Size Average Disk Bytes/Read Average Disk Bytes/Write Monitoring Performance SQL Level Management Studio DMVs 30
Monitoring Performance Storage Level Agenda Enterprise Storage systems SQL Server configuration Storage & Server best practices Monitoring Performance Benchmarking 31
Benchmarking Do not believe anyone Use benchmarking tools. Benchmark and test before server / storage migration Tools IOMETER SQLIOSim SQLQueryStress IOMeter Ask your storage admin. Or test by yourself Testing must be done after working hours!!! Iometer(http://www.iometer.org/) consists of two programs: Iometeris the controlling program -Using graphical user interface Dynamo is the workload generator. It has no user interface. 32
IOMeter SQLIOSim Utility Starting with SQL Server 2008, SQLIOSimis included with the SQL Server installation. When you install SQL Server, you find the SQLIOSimtool in the BINN folder. http://support.microsoft.com/kb/231619 33
SQLQueryStress A lightweight performance testing tool, designed to load test individual queries. Includes support for randomization of input parameters in order to test cache repeatability Includes basic capabilities for reporting on consumed server resources SQLQueryStress(cont d) 34
Benchmarking Summary IT is all about sizing! Monitor your production I/O pattern What is my Business's I/O characteristics? IOPS / Latency / Through / IO Size / Disk Queue / Read- Ahead Benchmark your storage echo-system(server, HBA, Fabric, Storage) to discover top service time (latency in specific load) What is my SYSTEM I/O Limits? Take Home Message Know the limits of your storage Know the I/O pattern of your system Remember that Latency a key parameter 35
Alexey Saltovski, DBA Group Leader, Matrix Didi Atzmony, Director, PM, Kaminario 36