Draft for review Wednesday, December 16, 2015 10:30 AM Microsoft Storage Spaces Direct Deployment Guide Last update: 16 December 2015 Microsoft Software Defined Storage solution based on Windows Server 2016 Microsoft Software Defined Storage using Lenovo rack-based servers Designed for Enterprise MSPs/CSPs, and HPC Private Cloud Storage or Hyper-converged High performing, high availability and scale out solution with growth potential Daniel Lu David Ye Michael Miller
Table of Contents 1 Storage Spaces Direct Solution Overview... 1 2 Solution Configuration... 4 3 Understanding the Tasks... 7 4 Preparing the Hardware & Storage Subsystem... 8 4.1 Firmware & Drivers... 8 4.2 Physical Storage Subsystem... 8 5 Physical Network Switch Configuration... 10 6 Installation of the Windows Operating System...11 7 Install Hyper-V Role and Create Virtual Switch... 12 8 Configuration of the Windows Operating System... 13 9 Applying Server Roles... 14 10 Failover Clustering Set Up... 16 11 Create a Storage Pool... 18 12 Create a Virtual Disk... 20 13 Verify Storage Spaces Fault Tolerance... 23 14 Appendix: Parts List for Disaggregated Solution... 24 15 Appendix: Parts List for Hyper-converged Solution... 26 ii
1 Storage Spaces Direct Solution Overview From the initial offering of software defined storage in Windows Server 2012 R2, the next iteration of this solution will soon debut in Windows Server 2016 Technical Preview 4 under the name of Storage Spaces Direct (S2D). It continues with the same concept of collecting a pool of affordable disks to form a large useable and shareable storage repository. In this round of advancement, this solution expands to encompass both SATA and SAS drives support that reside internally in the server. A common topic in the past when discussing high performance and shareable storage pools would naturally fall upon an expensive SAN infrastructure. Thanks to the evolution of disk technology and the ongoing advancements in network throughput, the realization of having an economical, highly redundant and high performance storage subsystem is now present. What about capability and storage growth in S2D? Leveraging the 16 hard drive bays of the x3650 M5 and the high capacity drive, e.g. 4TB in this solution, each server node is in itself a JBOD (just of bunch of disks) repository. As demand for storage grows, the process becomes a simple task of adding additional x3650m5 system into the environment this expansion. 1
What about performance for S2D? Using a combination of solid state drives and regular hard disks as the building block to the storage volume, this speed divide between traditional and solid disks provide the ability for storage tiering. The faster performing solid state acts as a cache repository to the higher capacity data tier in this solution. At the data level, the information is spread or striped across multiple disks thus allowing for very fast retrieval from multiple read points. Lastly, at the network physical layer, in this particular solution 10GbE links are employed but future additional throughput needs can be satisfied by using the Mellanox 40GbE adapters. For now the dual 10GbE pipes upon which both Windows operating system and storage replication traffic resides are more than sufficient to support the workloads and show no effect of bandwidth saturation. What about resilience in S2D? Whereas traditional disk subsystem protection relies upon the RAID storage controller, in this iteration of Window s software defined storage, using non-raid adapter and adopting redundancy measures presented by Windows Server 2016 itself, high availability of the data can still be achieved. The storage can be configured in a simple spaces, mirror spaces or parity spaces. The initial category, simple, offers no data protection. The latter two settings, mirror and parity, do provide resilience. In mirror spaces, the data can be mirrored or copied to two or even three different locations across multiple nodes. Like any fault tolerance measure there is a trade-off and in this case the higher the resilience designation, the lower the total available free disk space in the environment due to the increasing number of data copies. What are my use cases for S2D? As the first statement in this document alluded to, the downplaying of the SAN s importance in the enterprise space as a provider of high performance and high resilience storage platform is changing. The S2D solution is a direct replacement to this role. Whether the environment s primary function is Windows applications or Hyper-V virtual machine farm, S2D can be the principal storage provider to these environments. Another use for S2D can be the repository for backup or archive of virtual machines files. Where ever a share volume is applicable for use, S2D can be the new solution to support this function. 2
The two major divisions of S2D usage fall upon a disaggregated solution and a hyper-converge solution set. In the first approach, disaggregated, the environment is separated into compute and storage components. An independent pool of servers running Hyper-V acts to provide the CPU and memory resources for the running of virtual machines that reside on the storage environment. The S2D solution is employed in the storage environment to provide the storage repository for the running of virtual machines. This divorced Figure 1 Disaggregated Configuration mentality, as illustrated in figure 1 on the left, allows for the independent scaling and expanding of the compute farm (Hyper-V) and the storage farm (SD). For the hyper-converge concept, there is no longer the existence of two separate resource pools for compute and storage respectively. Instead, each server node acts and provides hardware resources to support the running of virtual machines under Hyper-V plus the allocation of its internal storage to contribute to the S2D environment. Figure 2 on the right clearly demonstrate this all-in-one configuration for a four node hyper-converged solution. When it comes to growth, each additional node inclusion into the environment will mean both compute and storage resources are amplified together. This may be seen as less flexible if workload metrics dictate a specific resource increase is sufficient to cure the bottleneck, e.g. CPU resource, but regardless of this data, any scaling will still mean the additional of both compute and storage resources. This is a fundamental short fall for all hyper-converge solutions and the S2D solution is no exception. 3
2 Solution Configuration The following components and information are relevant to the test environment used to develop this guide. This solution consists of two key components, the high throughput network infrastructure and the storage dense 2U servers, Lenovo s x3650m5. In this particular configuration, the networking components consist of the Lenovo G8264 switch but this element is interchangeable going forward. The solution calls for four server nodes as the minimum configuration to harness the failover capability of losing one of the four systems. Figure 3 x3650m5 Configuration The illustration below, figure 4, shows the layout of the hard disks. Looking at the front of the system, twelve 3.5 hard disks are present, where three of them are 800GB SSD devices and the remaining nine are 4TB SATA hard disks. These twelve storage devices are managed by the N2215 SAS HBA. When examining the rear of the server, the two 2.5 300GB SAS disks in a RAID-1 configuration on the ServeRAID M1215, are utilized for the Windows operating system. Remember, as one of the requirements for this solution, non-raid storage controller must be used for the data volume but to elevate the high availability of the operating system, the ServeRAID adapter is employed here as well. Finally, the four SSD components in each system in conjunction with the twelve SATA hard disks will form the tiered data volume. 4
Figure 4 x3650m5 Storage Subsystem Network wiring of this solution is very straight forward with network links traversing from the two G8264 switches down to the four x3650m5 servers. Each system contains a dual port 10GbE Mellanox CX3 adapter, as displayed in figure 5, for both operating system traffics and the storage communications. To allow for redundant network links in the event of a network port or external switch failure, the recommendation calls for the connection from port 1 on the Mellanox adapter to be joined to a port on the first G8264 switch plus a hookup from port 2 on the Mellanox adapter to be linked to an available port on the second G8264 switch. This ensures failover capabilities on the switches. The last construction on the network subsystem is to leverage the virtual network capabilities of Hyper-V to team both 10GbE ports on the Mellanox together and apply on top of them a virtual switch and logical network adapters to facilitate the operating system and storage traffics. Figure 5 x3650m5 Networking With the integration details fully explained in the previous paragraphs, the full disclosure of the relevant part list items that contributes to this solution is listed in the Appendix Section of this document. All the parts listed in the previous figures represent the disaggregated solution for S2D. 5
For the hyper-converged solution, the major differences are seen in the server configuration. In the hyper-converge model, the servers are given 256GB of memory as compared to 64GB, the CPU model is of higher speed, 2.6GHz as compared to 2.5GHz, and the two disks utilized for the operating system is double in sized, 600GB as compared to 300GB for the disaggregated model. These enhancements in the hyper-converge solution is to account for the dual functions of compute and storage that each server node will take on, whereas in the disaggregated solution, there is a separate of duties with one set of server farm dedicated for Hyper-V and a second set devoted to only S2D. 6
3 Understanding the Tasks 7
4 Preparing the Hardware & Storage Subsystem 4.1 Firmware & Drivers Best practices dictate that with a new server deployment, the first task is to review the system firmware and drivers relevant to the incoming operating system. This will expedite hardware support calls in the future if the system in trouble has the latest firmware and driver sets. Lenovo has a user friendly tool for this important task called UpdateXpress. https://www-947.ibm.com/support/entry/portal/docdisplay?lndocid=lnvo-xpress This tools can be utilize in two fashion, the first option allows the system administrator to download and install the tool on the target server, perform a verification to identify any firmware and drivers that need attention, and downloading the update packages from the Lenovo web site and proceed with the updates. In the second method, the server owner can download the new packages to a local network share or repository and during a maintenance window, proceed with the updates at that time. This flexibility in the tool grants full control to the server owner and ensures these important updates are performed in accordance to an appropriate timeframe. 4.2 Physical Storage Subsystem Power On the server node to review the disk the subsystem in preparation for the installation of the operating system. Activity: During the system boot up, press the F1 key to initiate the UEFI menu screen. Traverse to System Settings, Storage, and then access the applicable disk controller (M1215). Goal#1: Create a RAID-1 pool of two 300GB hard disks on the ServeRAID M1215 SAS/SATA 8
controller. The remaining 12 disks (three 800GB SSD, nine 4TB HD) connected to N2215 SAS HBA inside each x3650m5 server, leave them as not configured. They will be handled by the Windows operating system when the time comes to creating the storage pool. 9
5 Physical Network Switch Configuration Windows Server 2016 supports Remote Direct Memory Access (RDMA) feature in network adapters to achieve high throughput and low latency storage traffic. With all Mellanox ports from each server node traversing upward to the two G8264 switches, a few configuration tasks are needed on these switches to enable the support of RDMA as well. Upon login onto the each of two G8264 switches, execute the following commands. S2D-SW1 (config)# enable S2D-SW1 (config)# configure terminal S2D-SW1 (config)# cee enable S2D-SW1 (config)# cee global pfc enable S2D-SW1 (config)# cee global pfc priority 3 enable S2D-SW1 (config)# cee global ets priority-group pgid 1 bandwidth 60 S2D-SW1 (config)# write If the solution is leveraging another switch model or switch vendor s equipment, other than the Lenovo G8264, it is still essential to perform this equivalent commend sets for the switches. The commands themselves may differ from what is stated in the above paragraph but it is imperative that the same functions are executed on the switches to ensure proper operation of this solution. 10
6 Installation of the Windows Operating System Utilizing the Integrated Management Module, out of band management tool, to perform the installation of the Windows operating system. Even though there are several options when it comes the installation of the operating systems, ranging from i) remote ISO media mount, ii) bootable USB media with the installation content, and iii) installation DVD. Select the option that is appropriate for the situation. With the method of Windows deployment picked out, power back on the server to being the installation process. Select the appropriate language pack, correct input device, the geography, then proceed to selecting the desire OS edition (GUI or Core components only), before settling on the suitable drive for the OS installation. Select the partition of 300GB/600GB (RAID-1 configuration), depending on a hyper-converged or a disaggregated solution, as the installation location for the Windows OS. 11
7 Install Hyper-V Role and Create Virtual Switch Installation of the Hyper-V role via the PowerShell commands then performs a system reboot. Activity: Install-WindowsFeature Name Hyper-V IncludeManagementTools -Restart Creation of a virtual network switch supporting multiple uplinks provided by the Mellanox adapter. Activity: New-VMSwitch Name S2DSwitch NetAdapterName NIC 1, NIC 2 Create virtual network adapters to bind with the recently create virtual switch. Activity: Add-VMNetworkAdapter SwitchName S2DSwitch Name SMB1 ManagementOS Add-VMNetworkAdapter SwitchName S2DSwitch Name SMB2 ManagementOS Enable-NetAdapaterRDMA vethernet (SMB1), vethernet (SMB2), The last command set enables the RDMA feature set on the network adapter. 12
8 Configuration of the Windows Operating System If virtual switches and virtual network adapters are not in use, then configure the physical network adapters directly. For the networking components, the Mellanox ConnectX-3 adapter on each server node, use static IP address for the windows operating system or public facing interface. Lastly, configure the storage heartbeat interface with another static IP assignment. Final task is to perform a ping command per each interface (public and heartbeat) to the corresponding server node in this environment to confirm their enablement. To ensure the latest fixes and patches are apply to the operating system, perform updating of the Windows components Activity: Server Manager, Local Server, Windows Update (Not configured), Let me choose my settings Upon completing the Windows update, it is time to join the server node to the Windows Active Directory Domain. Activity: Server Manager, Local Server, Domain, Change, Member of Domain Reboot the system to initiate this new Domain member designation. Verification of the internal disks and bring them Online if any of them of labelled Offline Activity: Server Manager, Tools, Computer Management, Disk Management, select the Disk, right click to activate Online 13
9 Applying Server Roles Apply the following server roles of each of the node in this solution File Server Failover Clustering Activity: Server Manager, Dashboard, Add roles and features, Role-based or feature-based installation, Select a server from the server pool, Disk Management, Next on Select server roles, Failover Clustering Upon completion, ensure the following features are installed:.net Framework 4.6 Failover Clustering Ink and Handwriting Services Media Foundation Remote Server Administration Tools SMB 1.0/CIFS File Sharing Support User Interfaces and Infrastructure Windows PowerShell Windows Server Antimalware Features Wireless LAN Service WoW64 Support 14
From this point onward, be sure to log onto the systems with a Domain credited account and not the local Administrator account to perform manipulation of the cluster services. Ensure the common Domain account is part of the local Administrators Security Group. 15
10 Failover Clustering Set Up Now that the Failover Clustering feature is installed and enabled, the next phase is to validate the environment components that are necessary to form the cluster. Activity: Server Manager, Dashboard, Tools, Failover Cluster Manager Activity: Validate Configuration 16
Select all the server nodes that will form the cluster in the Select Servers or a Cluster tab. Select Run only tests I select under the Testing Options tab. Confirm on the selections and run the test. View the Report upon the completion of executing the various test parameters. When the test has been successfully completed and all relevant components have passed the evaluation, next step is for the creation of the cluster else resolve any failure subsystems. Activity: Create Cluster Select all the server nodes that will form the cluster in the Select Servers tab. Enter the name of the cluster under the Access Point for Administering the Cluster tab. Confirm on the selection and proceed with the creation of the new cluster. Query the health status of your cluster storage. When there are no errors with the running of the previous commands, it s time to enable the Storage Spaces Direct feature. Activity: enable-clusterstoragespacesdirect 17
11 Create a Storage Pool With the S2D feature turned on, the next step is to create the storage pool utilizing all the disks including SSD and HD devices in this environment. With two disks allocated to the operating system in each node, this leaves 12 disks (three solid state drives and nine hard disks) for contribution to the storage pool, thus a total of 45 disks (nine SSD and 36 HD). In this current strategy all nine SSD devices in the environment, will be set up as Journal to enhance the performance since all array data are written first to the Journal devices. The 36 regular hard disks when pooled together will form the basis of the shared data repository for this cluster. Optional Activity: Get-Disk? IsOffline eq $true Get-Disk? IsOffline eq $true Set-Disk IsOffline $false For those who are more familiar with the command line interface over the Windows Disk Manager tool, the above PowerShell commands will identify all disks that are currently Offline in the first instruction set. In the second set of commands, it will enable all disks with an Offline status to become Online, thus paving the way to creating the storage pool. Activity: New-StoragePool -StorageSubSystemName <FQDN of the subsystem> -FriendlyName <StoragePoolName> -WriteCacheSizeDefault 0 ProvisioningTypeDefault fixed ResiliencySettingNameDefault Mirror PhysicalDisk (Get-StorageSubSystem Name <FQDN of the subsystem> Get-PhysicalDisk) Activity: Get-StoragePool <StoragePoolName> Get-PhysicalDisk? MediaType eq SSD Set-PhysicalDisk Usage Journal Without knowing the FQDN syntax, the first task is to obtain this information with the command instructions from the above screen capture. 18
Running of the New-StoragePool command will create the new pool under the FriendlyName of VMDataPool. The second PowerShell command shown in the screen capture above, Get-StoragePool, will set all the SSD devices to Journal function thus no longer consider part of the available free space in the pool. To confirm on the creation, view the Pool object under Failover Custer Manager. Optimize the storage pool. Activity: optimize-storagepool xxxx, where xxxx is the name of the pool. Best practices would suggest that in the event of adding additional disks to the storage pool, an action to optimize it would be in order. 19
12 Create a Virtual Disk Activity: Failover Cluster Manager, Storage, Pools, right click on the desire Pool and select New Virtual Disk. Select the appropriate storage pool under the Select the storage pool tab, and provide a name to the virtual disk. Provide a name and description under the Specify the virtual disk name tab. Since in the creation of the storage pool, PowerShell commands dictated that all SSD devices are already consider Journal status, the option for Create storage tiers on this virtual disk will not available and greyed out from selection. Proceed to the next tab. Ensure Enable enclosure awareness is NOT CHECK, under the Specify enclosure resiliency tab. Under the Select the storage layout tab, choose Simple or Mirror depending upon the data striping requirement and protection level. If Mirror, the resiliency type are Two-way mirror which employees two copies of the data but requires at least two disks and can handle a single disk failure, and Three-way mirror, which is three copies of the data that allows for two simultaneous disk failures but requires at least five disks. Enter the size of the virtual disk and prior to completion, review the settings for the new virtual disk before clicking on Create. 20
Subsequently, you can use PowerShell commands to create a virtual disk. Activity: New-Volume StoragePoolFriendlyName <StoragePoolName> -FriendlyName <VirtualDiskName> -PhysicalDiskRedundancy 2 FileSystem CSVFS_REFS Size <Size in size units> With the virtual disk created, the next step is to grant it the Cluster Shared Volume designation so that this storage space become available to all the node for use. From Failover Cluster Manager, select the Disks object, ensure the virtual disk is Online, and under the Assigned To column, verify it is Available Storage. From Server Manager, select Disks, under Volumes section, create a New Volume. In the New Volume Wizard, under the Select the server and disk tab, choose the cluster to which this volume will be provision to. 21
Under the Specify the size of the volume tab, verify the size of the volume, and proceed. Under the Assign to a drive letter or folder tab, choose Don t assign to a drive letter or folder. Under the Select file system settings tab, select ReFS for the File system, 64K for Allocation unit size and give it a volume name and then review the selections before proceeding with the volume creation. Return to the Failover Cluster Manager, select the Disks object, ensure the virtual disk is Online, and under the Assigned To column, right click and select Add to Cluster Share Volume. To verify, go to Server Manager and view the Volumes objects. 22
13 Verify Storage Spaces Fault Tolerance Once all the pool, disk and volume are set up according to the needs, the final step is to verify that there is indeed disk fault tolerance in this storage environment. Run the following commands to verify the tolerance of your disk subsystem. Activity: Get-StoragePool FriendlyName <PoolName> FL FriendlyName, Size, FaultDomainAwarenessDefault Activity: Get-VirtualDisk FriendlyName <VirtualDiskName> FL FriendlyName, Size, FaultDomainAwareness, ResiliencySettingName 23
14 Appendix: Parts List for Disaggregated Solution For Lenovo Server purchases, please contact your local sales representatives or visit http://shop.lenovo.com/us/en/systems/servers/racks/systemx/ Part number Description Quantity 5462AC1 Server1 : Lenovo System x3650 M5 4 A5EU System x 750W High Efficiency Platinum AC Power Supply 8 A483 Populate and Boot From Rear Drives 4 A5EY System Documentation and Software-US English 4 A5FV System x Enterprise Slides Kit 4 A5FX System x Enterprise 2U Cable Management Arm (CMA) 4 A3W9 4TB 7.2K 6Gbps NL SATA 3.5" G2HS HDD 36 A56J S3700 800GB SATA 3.5" MLC HS Enterprise SSD 12 A5EA System x3650 M5 Planar 4 A5B8 8GB TruDDR4 Memory (2Rx8, 1.2V) PC4-17000 CL15 2133MHz LP RDIMM 32 ASQA System x3650 M5 Rear 2x 2.5" HDD Label (Independent RAID-Riser1) 4 A5FH System x3650 M5 Agency Label GBM 4 ASQB System x3650 M5 Rear 2x 3.5" HDD Label 4 A5FM System x3650 M5 System Level Code 4 A2HP Configuration ID 01 8 A5FT System x3650 M5 Power Paddle Card 4 9206 No Preload Specify 4 A5G1 System x3650 M5 EIA Plate 4 A3WG 3U Bracket for Mellanox ConnectX-3 10 GbE Adapter 4 A5FC System x3650 M5 WW Packaging 4 A5V5 System x3650 M5 Right EIA for Storage Dense Model 4 ASDM ASDA Addl Intel Xeon Processor E5-2660 v3 10C 2.6GHz 25MB 2133MHz 105W Intel Xeon Processor E5-2660 v3 10C 2.6GHz 25MB Cache 2133MHz 105W 4 4 24
5977 Select Storage devices - no configured RAID required 4 A5GH System x3650 M5 Rear 2x 2.5" HDD Kit (Independent RAID) 4 A5GE x3650 M5 12x 3.5" HS HDD Assembly Kit 4 A3YY N2215 SAS/SATA HBA 4 A45W ServeRAID M1215 SAS/SATA Controller 4 A5FF System x3650 M5 12x 3.5" Base without Power Supply 4 AT89 300GB 10K 12Gbps SAS 2.5" G3HS HDD 8 A5GL System x3650 M5 Rear 2x 3.5" HDD Kit (Cascaded) 4 A3PM Mellanox ConnectX-3 10 GbE Adapter 4 5374CM1 HIPO : Configuration Instruction 4 A5M2 ServeRAID M1215 SAS/SATA Controller Upgrade Placement 4 A2HP Configuration ID 01 4 A2JX Controller 01 4 5374CM1 HIPO : Configuration Instruction 4 A2HP Configuration ID 01 4 A46U N2215 SAS/SATA HBA Placement 4 A2JY Controller 02 4 67568HG Lenovo services1 : 3 Year Onsite Repair 24x7 4 Hour Response 4 25
15 Appendix: Parts List for Hyper-converged Solution Part number Description Quantity 5462AC1 Server1 : Lenovo System x3650 M5 4 A5EW System x 900W High Efficiency Platinum AC Power Supply 8 A483 Populate and Boot From Rear Drives 4 A5EY System Documentation and Software-US English 4 A5FV System x Enterprise Slides Kit 4 A5FX System x Enterprise 2U Cable Management Arm (CMA) 4 A3W9 4TB 7.2K 6Gbps NL SATA 3.5" G2HS HDD 36 A56J S3700 800GB SATA 3.5" MLC HS Enterprise SSD 12 A5EA System x3650 M5 Planar 4 A5B7 16GB TruDDR4 Memory (2Rx4, 1.2V) PC4-17000 CL15 2133MHz LP RDIMM 64 ASQA System x3650 M5 Rear 2x 2.5" HDD Label (Independent RAID-Riser1) 4 A5FH System x3650 M5 Agency Label GBM 4 ASQB System x3650 M5 Rear 2x 3.5" HDD Label 4 A5FM System x3650 M5 System Level Code 4 A2HP Configuration ID 01 8 A5FT System x3650 M5 Power Paddle Card 4 9206 No Preload Specify 4 A5G1 System x3650 M5 EIA Plate 4 A3WG 3U Bracket for Mellanox ConnectX-3 10 GbE Adapter 4 A5FC System x3650 M5 WW Packaging 4 A5V5 System x3650 M5 Right EIA for Storage Dense Model 4 ASFE Notice for Advanced Format 512e Hard Disk Drives 4 A5EP A5GW Addl Intel Xeon Processor E5-2680 v3 12C 2.5GHz 30MB 2133MHz 120W Intel Xeon Processor E5-2680 v3 12C 2.5GHz 30MB Cache 2133MHz 120W 4 4 5977 Select Storage devices - no configured RAID required 4 26
A5GH System x3650 M5 Rear 2x 2.5" HDD Kit (Independent RAID) 4 A5GE x3650 M5 12x 3.5" HS HDD Assembly Kit 4 A3YY N2215 SAS/SATA HBA 4 A45W ServeRAID M1215 SAS/SATA Controller 4 A5FF System x3650 M5 12x 3.5" Base without Power Supply 4 AT8A 600GB 10K 12Gbps SAS 2.5" G3HS HDD (AT8A) 8 A5GL System x3650 M5 Rear 2x 3.5" HDD Kit (Cascaded) 4 A3PM Mellanox ConnectX-3 10 GbE Adapter 4 5374CM1 HIPO : Configuration Instruction 4 A5M2 ServeRAID M1215 SAS/SATA Controller Upgrade Placement 4 A2HP Configuration ID 01 4 A2JX Controller 01 4 5374CM1 HIPO : Configuration Instruction 4 A2HP Configuration ID 01 4 A46U N2215 SAS/SATA HBA Placement 4 A2JY Controller 02 4 67568HG Lenovo services1 : 3 Year Onsite Repair 24x7 4 Hour Response 4 27