Z-Drive 6000 PCIe NVMe SSD Series Dual Port Capability Maulik Sompura George Tehrani Scott Harlin Published September 2015 OCZ Storage Solutions, Inc. A Toshiba Group Company
Table of Contents Page 1 2 3 4 5 6 7 8 9 10 Executive Summary Background Purpose and Applications Single Port vs Dual Port Applications & Differences 4.1 Dual Port Applications 4.2 Single Port Applications Identifying & Isolating Single Points of Failure Dual Port Functionality in PCIe SSDs Enabling Dual Port on Z-Drive 6000 SSDs Dual Port Performance & Verification Z-Drive 6000 Series Configurations 9.1 Customer Impact 9.2 Use Cases Summary 3 3 4 4 4 5 6 7 8 9 9 9 10 11 Disclaimer OCZ may make changes to specifications and product descriptions at any time, without notice. The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Any performance tests and ratings are measured using systems that reflect the approximate performance of OCZ products as measured by those tests. Any differences in software or hardware configuration may affect actual performance, and OCZ does not control the design or implementation of third party benchmarks or websites referenced in this document. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to any changes in product and/or roadmap, component and hardware revision changes, new model and/or product releases, software changes, firmware changes, or the like. OCZ assumes no obligation to update or otherwise correct or revise this information. OCZ MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. OCZ SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL OCZ BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF OCZ IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. ATTRIBUTION 2015 OCZ Storage Solutions, Inc. A Toshiba Group Company. All rights reserved. OCZ, the OCZ logo, OCZ XXXX, OCZ XXXXX, [Product name] and combinations thereof, are trademarks of OCZ Storage Solutions, Inc. A Toshiba Group Company. All other products names and logos are for reference only and may be trademarks of their respective owners. 2
1 Executive Summary This paper focuses on the dual port aspects of a PCIe SSD. It provides a high-level description on dual port versus single port use cases and applications. It is further intended to help system architects and designers to address dual port functionality especially since this capability is now supported on new PCIe-based NVMe (Non-Volatile Memory express) platforms. Dual port is one of those capabilities that server and storage OEMs place high on their priority lists because it enables High Availability (HA) for enterprise applications, redundancy, and sustained uptime reliability. The dual port implementation within OCZ Storage Solutions NVMe-compliant Z-Drive 6000 SSD Series is also included in this paper. 2 Background Dual data-paths (or multi-paths) are not new to enterprise applications. Originally, computers were connected to a device using a single data-path, and as this progressed over time, the enterprise community learned that high rates of failures were typically generated from backplanes, host bus adapters, cabling, connectors, etc., primarily due to moving parts and mechanical connections. As a result, redundancy soon became available in terms of RAID for disk drives, data duplication, and other techniques. In the single-path scenario, if the data path from the host to the drive itself is compromised, data cannot be accessed paving the way for the dual-path (or multi-path) concept. Focusing on this multi-path technique from the drive to the host and vice-versa, SCSI devices were the first to implement dual port capabilities using two physical connections. With the advent of Serial Attached SCSI (SAS), dual port connectivity became possible through a single physical connection. Due to its benefits, SAS dual port quickly became a popular configuration. As dual port enables data to be streamed from either port independently, the technique provides fault tolerance for either one of the data paths. This same dual port functionality can now be achieved on PCIe NVMe SSDs. Dual port enables two data paths within a single host so that two controllers can access the same storage device for redundancy. In another usage, two host systems could also concurrently access the same drive with dual port. If a system failure or power loss occurs where one data path is lost, the available data path continues operation as if no failure had occurred with minimal impact to Quality of Service (QoS). 3
3 Purpose and Applications Enterprise customers rely on data redundancy or data replication that is achieved via two different availability approaches: 1. Approach #1 - High Availability: This implementation is highly reliable and typically avoids single point failures. One of the ways that high data availability is ensured is through two identical data paths. Applications that require this level of HA in support of mission-critical data include banking & financial, OnLine Transaction Processing (OLTP), OnLine Analytical Processing (OLAP), virtualization, High Performance Computing (HPC), and Big Data. For this class of applications, redundancy is architected at every point in the data path to avoid failures. A popular implementation is described in Sections 4 and 5. 2. Approach #2 - Moderate to High Availability: This implementation provides moderate data availability due to cost-constraints of implementing multi-point redundancy. Examples in this case include hyperscale and cloud service providers that add redundancy through clustering and redundant copies of data at different locations so that in the case of a failure, instead of using sophisticated dual-domain/dual-path redundancy, new cost-efficient drives are rebuilt using backed-up data from an alternate location replacing the failed drives. However, this approach does not mean that this is a cheaper solution. In fact, Total Cost of Ownership (TCO) might be higher in some cases if multiple copies of data need to be maintained requiring additional drives and software costs. System architects need to consider the full TCO benefit, as well as the pros and cons of the two approaches. 4 Single Port vs. Dual Port Applications & Differences 4.1 Dual Port Applications The most popular use case for dual port is when two controllers concurrently access the same device, such as an SSD, to allow for data redundancy. In this configuration, the two flash controllers can access the same data within a device through different ports as evident in the popular JBOD/JBOF (Just a Bunch of Disks/ Just a Bunch of Flash) configuration in Figure 1. Virtualized environments also need this level of HA to protect mission-critical data from system failures. For these applications, redundancy is built-in at every point in the data path to avoid failures associated with devices, components, cables, etc. 4
JBOF High Availability Data or Metadata Multi-ported Device Redundancy based on Usage Controller A Root Complex PCIe Switch Controller B Root Complex PCIe Switch NVMe NVMe NVMe NVMe Figure 1 is an example of a JBOF High Availability configuration 4.2 Single Port Applications There are single port use cases where data redundancy is not required. This is evident in server caching applications where caching is used for temporary data storage or as a scratch pad. The same is true in power-critical client storage applications where data is non-redundant and the objective is to reduce the power consumption while adding a fast PCIe NVMe SSD to supplement or replace a slower form of storage. Both the server caching and client storage configurations are represented in Figure 2. Server Caching Root NVMe Complex Temporary data Non-redundant PCIe Switch Reduced memory footprint NVMe NVMe NVMe Client Storage Boot/OS drive or HDD cache Non-redundant Power optimized NVMe Root Complex PCI Controller Hub (PCH) SATA HDD Figure 2 represents two single port use cases that do not require data redundancy 5
5 Identifying & Isolating Single Points of Failure In this section, single points of failure are described below. As represented in Figure 3, there is typically one server/controller (Server 1) that utilizes a host bus adapter (HBA 1) to connect to the SSDs via a PCIe switch (SW1). If this connection fails, access to data would be lost. To support failover in this configuration, an extra HBA can be added to the server (as depicted by HBA 2). Though failover is supported in this configuration, there is only one switch connecting to the drives, so if the switch fails, access to data would still be lost. Server 1 SSD HBA 1 HBA 2 SW 1 SSD SSD Figure 3 is a configuration that addresses single point failures SSD To avoid these single points of failure, utilizing dual port drives and two switches that connect to every drive will alleviate this problem. However, if the server is lost (Server 1), access to data will still be lost. Therefore, depending on associated costs, the addition of a second server could achieve a fully redundant system where both servers could access all of the dual ported drives via two switches as outlined in Figure 4. In this configuration, the dual ports are active/active enabling both controllers to access any drive port at any time representing a fully redundant, dual ported configuration. Server 1 HBA 1 HBA 2 SW 1 DP SSD DP SSD Server 2 HBA 1 HBA 2 SW 2 DP SSD DP SSD Figure 4 represents a dual-ported configuration 6
Connector Connector 6 Dual Port Functionality In PCIe SSDs The Enterprise SSD Form Factor Working Group defined its latest SFF-8639 connector (or U.2 form factor), and as part of its definition, includes dual port enablement. The mechanism for enabling dual port operation is DualPortEn# (pin E25) as depicted in Figure 5. Enterprise PCIe (SFF-8639) Pin Name E22 GND E23 SMClk E24 SMDat E25 DualPortEn# Source: Enterprise SSD Form Factor Working Group. Figure 5: Depicts pin E25 as the mechanism for enabling dual port operations The enablement is defined so that an enterprise PCIe SSD can be configured as either a single x4 controller or as dual x2 controllers as outlined in Figure 6. Typical Server Configuration PCIe upstream (e.g. CPU x 4 PCIe port) PCIe x 4 SSD Form Factor Device (Enterprise PCIe SSD) Cmd Q Data Typical Server Configuration Controller A Controller B PCIe x 2 PCIe x 2 SSD Form Factor Device (Enterprise PCIe SSD) Cmd Q Data Cmd Q Data Figure 6 defines a PCIe SSD configuration with either a single x4 controller or with dual x2 controllers It is typically pulled high internal to the enterprise PCIe SSD which means that if nothing is connected to the pin (pin E25), the state of the pin will be high and the dual port capability will not be enabled If DualPortEn# (pin E25) is left open, the PCIe interface will be configured as a single x4 port If DualPortEn is pulled low by the system (driven low or grounded by a backplane), the dual port capability will be enabled and explained further in the next section 7
7 Enabling Dual Port On Z-Drive 6000 SSDs OCZ s launch strategy for the Z-Drive 6000 Series is to release a single port version first followed by a dual-port version. The Z-Drive 6000 hardware supports dual port as explained in the previous section. To upgrade from single port to dual port, all that is required is a firmware update. Customers need to contact OCZ sales or product management to make sure of the correct firmware version for dual port enablement. Once confirmed, the Z Drive 6000 Series supports dual x2 PCIe Gen3 or single x4 PCIe Gen3 depending on the DualPortEn# SFF8639 pin setting. In either case, four namespaces are available (two for each server) where each host can access either namespace. If the namespaces are shared and a collision may be possible, hosts rely on reservation/ release NVMe commands. Figure 7 depicts this dual port and namespace implementation. NVM Subsystem PCIe Port Host Interface 0 Host Interface 1 Host PCIe Link NVMe Controller (Instance of CSRs) I/O Requests Admin Commands I/O Requests Admin Commands Host PCIe Link Defined by the NVMe Spec. PCIe Port NVMe Controller (Instance of CSRs) Namespace Host 0 PCIe Connection Synchronization Reservations (Access Exclusion/Synchronization) Host 0 PCIe Connection Non-Volatile Storage (Flash) Namespace 0 Namespace 1 Figure 7 depicts NVMe namespace management and the dual port implementation Dual port functionality is enabled automatically on Z-Drive 6000 SSDs with the dual port enabling firmware. If pin E25 (DualPortEn#) is pulled low by the system (driven low or grounded by a backplane), the dual port capability will be enabled. The Z-Drive 6000 Series is dependent on the host system to enable dual port functionality as described. 8
8 Dual Port Performance & Verification As mentioned earlier, a Z-Drive 6000 SSD can operate in dual x2 PCIe Gen3 or a single x4 PCIe Gen3. Hence, dual port functionality will split bandwidth and/or IOPS performance in half on each port. Therefore, given one Z-Drive 6000 SSD (either 1.6TB or 3.2TB), the 4KB random read performance of ~700,000 IOPS on a single x4 port would deliver ~350,000 IOPS split on each x2 port. It is extremely important to test the dual port capability of SSDs correctly. The first step is to determine if both ports are functioning properly and delivering nearly half of the bandwidth of what single port would deliver. Secondly, if a fault is injected into one of the data paths or when one of the ports is down, verification needs to determine if the data can be accessed through the other port. As it relates to power cycle testing, verifications will also need to determine if both ports boot up perfectly every time. Are there evident performance degradations? Therefore, it is important to verify different temperature conditions, workloads, fault injections into ports, power cycling, etc. to provide the full confidence that in an enterprise server, the SSD will function as specified. The Z-Drive 6000 SSD Series is fully tested and verified to perform dual port functionality as specified. 9 Z-Drive 6000 Series Configurations 9.1 Customer Impact NVMe is a standard that meets the needs of enterprise-class storage requirements. Since PCIe does not define registers, direct memory access, command sets or feature-sets for storage devices, NVMe fills the gap nicely as a standard software interface, and also replaces proprietary software interfaces and drivers previously used by PCIe-based SSDs. For the enterprise, the Z-Drive 6000 Series offers very low latency, high performance, and selectable power envelopes. In particular, storage latency has become fundamentally important for new applications especially in support of virtual infrastructure and Big Data applications (including business intelligence and data analytics). Getting the most performance and efficiency out of an enterprise SSD to support the real-time I/O needs of these applications is critical, and bypassing the traditional ATA and SCSI stack greatly shortens the path for an I/O request. The Z-Drive 6000 Series represents a complete dual ported solid-state storage solution that has the potential to save enterprise customers many hundreds of thousands of dollars because it minimizes storage overprovisioning as well as the associated overhead. By deploying high-performance Z-Drive 6000 SSDs in server enclosures that natively support the NVMe U.2 interface enable companies to cost-effectively use hyper-converged systems and hybrid cloud-based applications that enable scalability without having to build out expensive storage infrastructures. 9
9.2 Use Cases This section describes use cases for integrating Z-Drive 6000 Series SSDs into an IT environment and ideal for OLTP and OnLine Analytical Processing (OLAP) applications. As OLTP applications handle many short transactions in quick, random bursts, the Z-Drive 6000 Series is well-suited to accelerate the processing of these transactions for a variety of business needs that include: Analytics Content Delivery (streaming media, video on demand) Critical infrastructure applications (Supervisory Control And Data Acquisition/SCADA) Database Query Acceleration Financial and Ledgers Gaming Video Surveillance Real-time Billing Real-time Monitoring Trading As OLAP applications handle bigger chunks of data, the data must be retrieved from multiple sources and respond quickly to complex queries. OLAP applications can also be used for a similarly diverse set of business workloads that include: Business Intelligence Batch Processing Data Warehousing and Report Generation ERP Systems High Transaction Processing (HTP) Massive Data Feeds and Reporting As described previously, a bulk of these applications that require high bandwidth and low latency performance also need redundancy and high availability of the data. As SSDs provide such advantages as dual port functionality and redundant paths to the data, business-critical applications will not suffer as the Z-Drive 6000 Series is an excellent fit for the applications listed above. 10
10 Summary The SSD industry is in its nascent stage with regard to NVMe PCIe platforms and drives and as such, servers with dual port capabilities are still not in production. As the industry transitions into a newer, faster era of NVMe PCIe drives in 2016 and beyond, more drives and platforms will implement dual port capabilities and will provide enterprise customers and their applications with the benefits of this functionality. As scalability and HA are fundamental requirements for any server or storage OEM s networked storage solution, the PCIe 3.0 interface can attach a large number of high density SSDs to deliver petabyte scalability needed for Big Data applications. The Z-Drive 6000 Series scalability is only limited by the number of U.2 drive bays. To achieve HA, the Z-Drive 6000 Series implements dual port capabilities that enable redundant data access that eliminate a single point of failure. With the advent of NVMe PCIe SSDs, more customers will leverage this key feature and some may even replace SAS-based SSDs with dual port capable PCIe NVMe drives due to their very high throughput, low latencies, and other performance and feature-related advantages. Depending on the application, moderate to high availability of the data requirement and infrastructure budgets, enterprise customers will soon have the option to deploy dual port versus single port solid-state storage devices. OCZ is a pioneer in this technology and was the first to showcase dual port functionality with the Z-Drive 6000 Series in an all-nvme members conference held in April 2015. Dual port functionality is scheduled for availability within the Z-Drive 6000 / 6300 Series in October 2015. 11