Windows Server Infrastructure and Storage for SQL Server & Hyper-V Michael Frandsen michaelf@mentalnote.dk
Bio - Michael Frandsen I have worked in the IT industry for just over 23 years, 19 of these has been spent as a consultant. I have a close relationship with Microsoft R&D in Redmond, with the Windows team for 21 years, ever since the first beta of Windows NT 3.1, SQL server for 20 years, since the first version Microsoft did by themselves, v4.21a I am in various advisory positions in Redmond and am involved in vnext versions of Windows, Hyper-V, SQL Server and Office/SharePoint. Just finished Windows Blue (Windows 8.1 & Windows Server 2012 R2) and SQL14 (SQL Server 2014) Now working on SQL15. Specialty areas: Architecture & design High performance Storage Low Latency Kerberos Scalability (scale-up & scale-out) Consolidation (especially SQL Server) High-Availability VLDB Data Warehouse platforms BI platforms High Performance Computing (HPC) clusters Big Data platforms & architecture
Bio - Michael Frandsen
MentalNote Independent One-Man company with no affiliations Only selling consultancy deliverables No hardware sales No software sales Mission Excellence: Delivering knowledge and empowering clients through extensive and deep knowledge, always based on facts and objectiveness. This is achieved through a constant development of skills, by both internal research and partnerships with leading companies in the IT industry, such as Microsoft, HP and others, not only locally but mainly with these companies' Research and Development departments at their respective headquarters. Responsibility: Giving back to people, animals and places in need of support. Done with both manpower and funding of various causes, such as Cancer Research, Heart illness Animal Protection Terminally sick children, Orphans Developing Countries, Areas hit by natural disaster
SQL Server storage challenges Capacity Fast Shared Reliable
Performance The SAN legacy Because it s expensive it must be fast 3000 2500 2000 1500 1000 500 0 0 4 8 12 16 20 24 Price
The SAN legacy Shared storage or Direct Attached SAN File Server 2 x 8Gb/s Database Server 2 x 8Gb/s Mail Server 2 x 8Gb/s SAN 2 x 8Gb/s
The SAN legacy Widespread misconception
CACHE SQL SERVER WINDOWS CPU CORES MPIO Algorithm MPIO DSM WWN Zoning FC SWITCH Port Logic XOR Engine SCSI Controller The SAN legacy Complex stack FC HBA FC HBA A B A B A B CACHE STORAGE CONTROLLER A B A B DISK DISK LUN LUN DISK DISK SQL Server Read Ahead Rate CPU Feed Rate HBA Port Rate Switch Port Rate SP Port Rate LUN Read Rate Disk Feed Rate
SAN Bottleneck Typical Data Warehouse / BI / VLDB SAN load: High I/O processor load maxed out (top - slim rectangles) High cache load (Middle - big rectangles) Low disk spindle load (lower half - squares)
SAN Bottleneck Ideal Data Warehouse / BI / VLDB SAN load: Low to medium I/O processor load (top - slim rectangles) Low to medium cache load (Middle - big rectangles) High disk spindle load (lower half - squares)
Traditional interconnects Fibre Channel Stalled at 8Gb/s for many years 16Gb/s FC still very exotic Strong movement towards FCoE (Fibre Channel over Ethernet) iscsi Started in low-end storage arrays Many still 1Gb/s 10Gb/E storage arrays typically have few ports compared to FC NAS NFS, SMB, etc.
File Share reliability Is this mission critical technology?
SMB 1.0-100+ Commands Protocol negotiation, user authentication and share access (NEGOTIATE, SESSION_SETUP_ANDX, TRANS2_SESSION_SETUP, LOGOFF_ANDX, PROCESS_EXIT, TREE_CONNECT, TREE_CONNECT_ANDX, TREE_DISCONNECT) File, directory and volume access (CHECK_DIRECTORY, CLOSE, CLOSE_PRINT_FILE, COPY, CREATE, CREATE_DIRECTORY, CREATE_NEW, CREATE_TEMPORARY, DELETE, DELETE_DIRECTORY, FIND_CLOSE, FIND_CLOSE2, FIND_UNIQUE, FLUSH, GET_PRINT_QUEUE, IOCTL, IOCTL_SECONDARY, LOCK_AND_READ, LOCK_BYTE_RANGE, LOCKING_ANDX, MOVE, NT_CANCEL, NT_CREATE_ANDX, NT_RENAME, NT_TRANSACT, NT_TRANSACT_CREATE, NT_TRANSACT_IOCTL, NT_TRANSACT_NOTIFY_CHANGE, NT_TRANSACT_QUERY_QUOTA, NT_TRANSACT_QUERY_SECURITY_DESC, NT_TRANSACT_RENAME, NT_TRANSACT_SECONDARY, NT_TRANSACT_SET_QUOTA, NT_TRANSACT_SET_SECURITY_DESC, OPEN, OPEN_ANDX, OPEN_PRINT_FILE, QUERY_INFORMATION, QUERY_INFORMATION_DISK, QUERY_INFORMATION2, READ, READ_ANDX, READ_BULK, 14 distinct READ_MPX, WRITE READ_RAW, RENAME, SEARCH, SEEK, SET_INFORMATION, SET_INFORMATION2, TRANS2_CREATE_DIRECTORY, TRANS2_FIND_FIRST2, TRANS2_FIND_NEXT2, TRANS2_FIND_NOTIFY_FIRST, TRANS2_FIND_NOTIFY_NEXT, TRANS2_FSCTL, TRANS2_GET_DFS_REFERRAL, operations?!?? TRANS2_IOCTL2, TRANS2_OPEN2, TRANS2_QUERY_FILE_INFORMATION, TRANS2_QUERY_FS_INFORMATION, TRANS2_QUERY_PATH_INFORMATION, TRANS2_QUERY_PATH_INFORMATION, TRANS2_REPORT_DFS_INCONSISTENCY, TRANS2_SET_FILE_INFORMATION, TRANS2_SET_FS_INFORMATION, TRANS2_SET_PATH_INFORMATION, TRANSACTION, TRANSACTION_SECONDARY, TRANSACTION2, TRANSACTION2_SECONDARY, UNLOCK_BYTE_RANGE, WRITE, WRITE_AND_CLOSE, WRITE_AND_UNLOCK, WRITE_ANDX, WRITE_BULK, WRITE_BULK_DATA, WRITE_COMPLETE, WRITE_MPX, WRITE_MPX_SECONDARY, WRITE_PRINT_FILE, WRITE_RAW) Other (ECHO, TRANS_CALL_NMPIPE, TRANS_MAILSLOT_WRITE, TRANS_PEEK_NMPIPE, TRANS_QUERY_NMPIPE_INFO, TRANS_QUERY_NMPIPE_STATE, TRANS_RAW_READ_NMPIPE, TRANS_RAW_WRITE_NMPIPE, TRANS_READ_NMPIPE, TRANS_SET_NMPIPE_STATE, TRANS_TRANSACT_NMPIPE, TRANS_WAIT_NMPIPE, TRANS_WRITE_NMPIPE)
SMB 2.0-19 Commands Protocol negotiation, user authentication and share access (NEGOTIATE, SESSION_SETUP, LOGOFF, TREE_CONNECT, TREE_DISCONNECT) File, directory and volume access (CANCEL, CHANGE_NOTIFY, CLOSE, CREATE, FLUSH, IOCTL, LOCK, QUERY_DIRECTORY, QUERY_INFO, READ, SET_INFO, WRITE) Other (ECHO, OPLOCK_BREAK) TCP is a required transport SMB2 no longer supports NetBIOS over IPX, NetBIOS over UDP or NetBEUI
SMB 2.1 Performance improvement Up to 1MB MTU to better utilize 10Gb/E! Disabled by default! Real benefit required app support Ex. Robocopy in W7 / 2K8R2 is multi-threaded Defaults to 8 threads, range 1-128
SQL Server SMB support < 2008 Using UNC path could be enabled with trace flag Not officially supported scenario No support for system databases No support for failover clustering 2008 R2 UNC path fully supported by default No support for system databases No support for failover clustering
Two things happened SQL Server 2012 Windows Server 2012
SQL Server 2012 UNC support expanded System Databases supported on SMB Failover Clustering supports SMB as shared storage and TempDB can now reside on NON-shared storage Mark Souza commented: Great Suggestion!
Windows Server 2012 InfiniBand Teaming SMB 3.0 RDMA Multichannel SMB Direct
New old interconnects InfiniBand characteristics Been around since 2001 Used mainly for HPC clusters and Super Computing High throughput RDMA capable Low latency Quality of service Failover Scalable
InfiniBand throughput
InfiniBand throughput Trends in I/O Interfaces with Servers
InfiniBand throughput Low-level Uni-directional Bandwidth Measurements
Transparent Failover SQL Server or Hyper-V Server Normal operation \\fs1\share Failover to Node B Connections and handles autorecovered; application IO continues with no errors \\fs1\share File Server Node A File Server Node B File Server Cluster
SMB Multichannel Single RSS-capable SMB Client RSS Multiple 1GbE s 1GbE SMB Client 1GbE Multiple in a team SMB Client Teaming Multiple RDMA s /IB SMB Client /IB Full Throughput Bandwidth aggregation with multiple s Multiple CPUs cores engaged when using Receive Side Scaling (RSS) Switch RSS SMB Server Switch 1GbE 1GbE SMB Server Switch 1GbE 1GbE Switch Teaming SMB Server Switch Switch /IB /IB SMB Server Switch /IB /IB SMB Multichannel implements end-to-end failure detection Leverages teaming if present, but does not require it Automatic Configuration SMB detects and uses multiple network paths
SMB Multichannel RSS SMB Client CPU utilization per core Switch RSS SMB Server Core 1 Core 2 Core 3 Core 4
SMB Multichannel RSS SMB Client CPU utilization per core Switch RSS SMB Server Core 1 Core 2 Core 3 Core 4
SMB Multichannel RSS SMB Client 1 RSS SMB Client 2 Switch Switch Switch Switch RSS RSS SMB Server 1 SMB Server 2
SMB Multichannel RSS SMB Client 1 RSS SMB Client 2 Switch Switch Switch Switch RSS RSS SMB Server 1 SMB Server 2
MB/sec SMB Multichannel Performance SMB Client Interface Scaling - Throughput 5000 4500 4000 3500 3000 2500 2000 1500 1000 500 0 1 x 2 x 3 x 4 x I/O Size
RDMA in SMB 3.0 SMB over TCP and RDMA User Client 1 Application Unchanged API Memory 4 RDM A Memory File Server 1. Application (Hyper-V, SQL Server) does not need to change. Kernel SMB Client SMB Server 2. SMB client makes the decision to use SMB Direct at run time TCP/ IP NDKPI 2 SMB Direct SMB Direct NDKPI TCP/ IP 3. NDKPI provides a much thinner layer than TCP/IP No longer flow anything via regular TCP/IP 3 RDMA RDMA 4. Remote Direct Memory Access performed by the network interfaces. Ethernet and/or InfiniBand
SMB Direct and SMB Multichannel SMB Client 1 SMB Client 2 R- 54GbIB R- 54GbIB R- R- Switch 54GbIB Switch 54GbIB Switch Switch R- 54GbIB R- 54GbIB R- R- SMB Server 1 SMB Server 2
SMB Direct and SMB Multichannel SMB Client 1 SMB Client 2 R- 54GbIB R- 54GbIB R- R- Switch 54GbIB Switch 54GbIB Switch Switch R- 54GbIB R- 54GbIB R- R- SMB Server 1 SMB Server 2
The I/O picture to keep in mind Storage & network CPU Register CPU Cache Memory
NAND Flash Removes The Waste NAND Flash:
Impact of Increased CPU Utilization (Consolidation Effect)
Hyper-V v3.0 Only two goals: Adopt new technologies in the Win8 kernel Be the best hypervisor for SQL Server
Hyper-V v3.0 Microsofts initial idea up to November 2010 40% 35% 36% 33% Servers by number of CPU Sockets 30% 25% 20% 21% 15% 10% 5% 0% 6% 1% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 2% 0% 0% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 22 24
Hyper-V v3.0 New insight for the Hyper-V Team SQL instances by number of CPU Sockets 40% 35% 36% 30% 25% 24% 20% 15% 10% 16% 12% 11% 5% 0% 1% 0% 0% 0% 0% 0% 0% 0% 1 2 3 4 5 6 8 11 12 14 15 16 24 CPU s
Hyper-V Team idea of Physical to Virtual Before: 750 Servers with SQL Server 920 SQL Server Instances 200TB Storage After: 780-790 Servers with Hypervisor and SQL Server 920 SQL Server Instances 200TB Storage
Real life consolidation on Physical servers Before: 750 Servers with SQL Server 920 SQL Server Instances 200TB Storage After: 6 Servers with SQL Server 12 SQL Server Instances 140TB Storage
Final specs of Hyper-V v3.0 Capability Hyper-V Server 2008 R2 Hyper-V Server 2012 Number of logical processors on host 64 320 Maximum supported RAM on host 1 TB 4 TB Virtual CPUs supported per host 512 2048 Maximum virtual CPUs supported per virtual machine 4 64 Maximum RAM supported per virtual machine 64 GB 1 TB Maximum running virtual machines supported per host 384 1024 Guest NUMA No Yes Maximum failover cluster nodes supported 16 64 Maximum number of virtual machines supported in failover clustering 1000 8000
Final specs of Hyper-V v3.0 So what about Storage? VMware tops out at 300,000 IOPS per VM A really good number A single Windows Server 2012 Hyper-V VM does: 985,000 IOPS
New things happened SQL Server 2014 (SQL14) Windows Server 2012 R2 (Windows Blue)
Windows Server 2012 R2 RTM September 5 th 2013 Both Server and Client Win8.1 Hyper-V v4.0 985.000 IOPS -> 1.300.000 IOPS Improved network performance 300.000 IOPS/ Improved Storage Spaces Caching Tiered Storage -> 450.000 IOPS/ Ex. 8K I/O from 950 MB/s -> 1.300 MB/s
SQL Server 2014 RTM April 1 st 2014 - (no aprils fool) Project Hekaton In-Memory OLTP Columnstore Index Clustered & Updateable Updated Always-On Improved reliability and scalability 8 replicas Completely New Query Engine For the first time control of IOPS with resource policies Buffer Pool Extension
Drivers In-Memory OLTP Tech Pillars Benefits Memory Optimized Tables Considerations High performance data operations Main-Memory Optimized Optimized for in-memory data Indexes (hash and range) exist only in memory No buffer pool Stream-based storage for durability Hardware trends Steadily declining memory price, NVRAM Table Constructs Fixed schema no ALTER TABLE, must drop/recreate/reload No LOB datatypes; row size limited 8060 No constraints support (PK only) No Identity or Calculated columns, CLR etc Data and table size considerations Size of tables = (row size * # of rows) Size of hash index = (bucket_count * 8-bytes) Max size SCHEMA_AND_DATA = 512 GB IO for Durability SCHEMA_ONLY vs. SCHEMA_AND_DATA Memory Optimized Filegroup Data and Delta files Transaction Log Database Recovery 60
Comparing Space Savings 101 Million Row Table + Index Space 19,7GB 10,9GB 5,0GB 4,0GB 6,9GB 1,8GB TABLE WITH CUSTOMARY INDEXING TABLE WITH CUSTOMARY INDEXING (PAGE COMPRESSION) TABLE WITH NO INDEXING TABLE WITH NO INDEXING (PAGE COMPRESSION) TABLE WITH COLUMNSTORE INDEX CLUSTERED COLUMNSTORE
Structure of In-Memory ColumnStore How It Works CREATE CLUSTERED COLUMNSTORE Organizes and compresses data into ColumnStore BULK INSERT Creates new ColumnStore Row Groups INSERT Rows are placed in the Delta Store (heap) When Row Store is big enough, a new ColumnStore Row Group is created DELETE Rows are marked in the Deleted Bitmap UPDATE Delete plus insert Most data is in ColumnStore format Partition ColumnStore Delta Store
Re-think hardware usage Storage Memory
Re-think hardware usage Storage Memory L5 Cache L4 Cache
Re-think hardware usage Storage Memory L2 RAM L1 RAM
Data Buffer Pool Manager Query Tree Cmd Parser Command TDS TDS Query Plan Optimizer Query Execut or Result Sets SNI Relational Engine Results Protocol Layer Plan Cache Transaction Log Access Methods GetPage D Data Cache Cached Pages Data Files Write I/O Transaction Manager Storage Engine Buffer Manager Read I/O Buffer Pool
Data IOPS Offload to Storage Class Memory (SCM) Query Tree Cmd Parser Command TDS TDS Query Plan Optimizer Query Execut or Result Sets SNI Relational Engine Results Protocol Layer Plan Cache Transaction Log Access Methods GetPage Cached Pages D Data Cache L2 Buffer Pool Data Files Write I/O Transaction Manager Storage Engine Buffer Manager Read I/O Buffer L1 Buffer Pool Pool PCIe Flash Fusion Fusion Fusion (SCM) IO IO IO
DIY Shared Storage New paradigm for SQL Server storage design Direct Attached Storage (DAS) Now with flexibility Converting DAS to shared storage Fast RAID controllers will be shared storage NAND Flash PCIe cards will be shared storage
New Paradigm designs Fusion Fusion Fusion PCIe Flash IO IO IO
New Paradigm designs File Server File Server NAND Flash Shared Storage Traditional SAN Shared Storage
New Paradigm designs File Server File Server NAND Flash Shared Storage Traditional SAN Shared Storage
Demo
Demo
Demo
Demo
Demo
Demo
Demo
Real life consolidation on Physical servers Before: 750 Servers with SQL Server 920 SQL Server Instances 200TB Storage After: 6 Servers with SQL Server 12 SQL Server Instances 140TB Storage
Real life consolidation on Physical servers How did we achieve the Storage savings? Databases by type 63% 37% System User
Share of disk space Digging deeper Further storage re-claims could easily be done in databases 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Disk capacity waste in SQL environments (350TB) 83% 57% 17% Allocated disk space Allocated database space Used database space Total free space
Typical large scale SQL Server deployments What you thought you needed What you decided to buy What your coders imposed on you What you would have run best on How you configured the hardware
Violin Memory WFA actual customer deployment Customer is a Logistics company operating in the Nordics Legacy platform: Development & Test: 455.000 rows/sec 21x Improvement in Production Production: 621.229 rows/sec With WFA: 13,24 mio. rows/sec 30x Improvement in Dev & Test
Violin Memory WFA actual customer deployment Using Updateable Clustered Column Store Index in SQL Server 2014 472x Improvement in Dev & Test 330x Improvement in Production 1,6x Savings In Storage
New things are happening again Win9 SQL15
Thank you Business Card: michaelf@mentalnote.dk on LinkedIn