NVM Express Moves Into The Future. NVMe over Fabrics

Similar documents
3G Converged-NICs A Platform for Server I/O to Converged Networks

Building Enterprise-Class Storage Using 40GbE

Block based, file-based, combination. Component based, solution based

How To Evaluate Netapp Ethernet Storage System For A Test Drive

IP SAN Fundamentals: An Introduction to IP SANs and iscsi

Fibre Channel over Ethernet in the Data Center: An Introduction

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

Fibre Channel Over and Under

InfiniBand Software and Protocols Enable Seamless Off-the-shelf Applications Deployment

FIBRE CHANNEL OVER ETHERNET

How To Test Nvm Express On A Microsoft I7-3770S (I7) And I7 (I5) Ios 2 (I3) (I2) (Sas) (X86) (Amd)

Overview and Frequently Asked Questions Sun Storage 10GbE FCoE PCIe CNA

Windows TCP Chimney: Network Protocol Offload for Optimal Application Scalability and Manageability

Unified Fabric: Cisco's Innovation for Data Center Networks

Using Multipathing Technology to Achieve a High Availability Solution

Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family

RoCE vs. iwarp Competitive Analysis

Converging Data Center Applications onto a Single 10Gb/s Ethernet Network

Building a Scalable Storage with InfiniBand

From Ethernet Ubiquity to Ethernet Convergence: The Emergence of the Converged Network Interface Controller

Storage Area Network

PCI Express IO Virtualization Overview

Performance Beyond PCI Express: Moving Storage to The Memory Bus A Technical Whitepaper

Direct Scale-out Flash Storage: Data Path Evolution for the Flash Storage Era

Network Function Virtualization Using Data Plane Developer s Kit

SAN and NAS Bandwidth Requirements

Fibre Forward - Why Storage Infrastructures Should Be Built With Fibre Channel

Cut I/O Power and Cost while Boosting Blade Server Performance

Latency Considerations for 10GBase-T PHYs

Cloud-ready network architecture

AHCI and NVMe as Interfaces for SATA Express Devices - Overview

Best Practices Guide: Network Convergence with Emulex LP21000 CNA & VMware ESX Server

Ethernet: THE Converged Network Ethernet Alliance Demonstration as SC 09

Fibre Channel Overview of the Technology. Early History and Fibre Channel Standards Development

Evaluation Report: Emulex OCe GbE and OCe GbE Adapter Comparison with Intel X710 10GbE and XL710 40GbE Adapters

Virtualizing the SAN with Software Defined Storage Networks

Fibre Channel over Ethernet: Enabling Server I/O Consolidation

New Trends Make 10 Gigabit Ethernet the Data-Center Performance Choice

Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 SMB Direct

Low Latency 10 GbE Switching for Data Center, Cluster and Storage Interconnect

Meeting the Five Key Needs of Next-Generation Cloud Computing Networks with 10 GbE

Achieving High Availability & Rapid Disaster Recovery in a Microsoft Exchange IP SAN April 2006

White Paper: Understanding Network and Storage Fabric Challenges in Today s Virtualized Datacenter Environments

Storage at a Distance; Using RoCE as a WAN Transport

Top Ten Reasons for Deploying Oracle Virtual Networking in Your Data Center

OFA Training Program. Writing Application Programs for RDMA using OFA Software. Author: Rupert Dance Date: 11/15/

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Enterprise SSD Interface Comparisons

ALCATEL-LUCENT ENTERPRISE DATA CENTER SWITCHING SOLUTION Automation for the next-generation data center

Switching Solution Creating the foundation for the next-generation data center

PCI Express Impact on Storage Architectures. Ron Emerick, Sun Microsystems

Integration of Microsoft Hyper-V and Coraid Ethernet SAN Storage. White Paper

SAN Conceptual and Design Basics

SMB Direct for SQL Server and Private Cloud

OCZ s NVMe SSDs provide Lower Latency and Faster, more Consistent Performance

Upgrading Data Center Network Architecture to 10 Gigabit Ethernet

Hewlett Packard - NBU partnership : SAN (Storage Area Network) или какво стои зад облаците

I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology

Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003

WHITE PAPER. Best Practices in Deploying Converged Data Centers

Emerging storage and HPC technologies to accelerate big data analytics Jerome Gaysse JG Consulting

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

High Speed I/O Server Computing with InfiniBand

InfiniBand -- Industry Standard Data Center Fabric is Ready for Prime Time

EDUCATION. PCI Express, InfiniBand and Storage Ron Emerick, Sun Microsystems Paul Millard, Xyratex Corporation

Cloud Optimized Performance: I/O-Intensive Workloads Using Flash-Based Storage

iscsi Top Ten Top Ten reasons to use Emulex OneConnect iscsi adapters

Fiber Channel Over Ethernet (FCoE)

PCI Express Impact on Storage Architectures and Future Data Centers. Ron Emerick, Oracle Corporation

Storage Solutions Overview. Benefits of iscsi Implementation. Abstract

The proliferation of the raw processing

FCoE Deployment in a Virtualized Data Center

Standard of the Camera & Imaging Products Association. White Paper. of CIPA DC Picture Transfer Protocol over TCP/IP networks

The functionality and advantages of a high-availability file server system

Storage Protocol Comparison White Paper TECHNICAL MARKETING DOCUMENTATION

Lustre Networking BY PETER J. BRAAM

THE CLOUD STORAGE ARGUMENT

Introduction to Cloud Design Four Design Principals For IaaS

Router Architectures

Configuration Maximums VMware Infrastructure 3

InfiniBand in the Enterprise Data Center

Hyper-V over SMB Remote File Storage support in Windows Server 8 Hyper-V. Jose Barreto Principal Program Manager Microsoft Corporation

PCI Express and Storage. Ron Emerick, Sun Microsystems

FC SAN Vision, Trends and Futures. LB-systems

Choosing the Best Network Interface Card Mellanox ConnectX -3 Pro EN vs. Intel X520

ALCATEL-LUCENT ENTERPRISE DATA CENTER SWITCHING SOLUTION Automation for the next-generation data center

Traditionally, a typical SAN topology uses fibre channel switch wiring while a typical NAS topology uses TCP/IP protocol over common networking

IP SAN Best Practices

Operating System for the K computer

NVM Express TM Infrastructure - Exploring Data Center PCIe Topologies

Promise of Low-Latency Stable Storage for Enterprise Solutions

CompTIA Storage+ Powered by SNIA

Intel Ethernet Switch Converged Enhanced Ethernet (CEE) and Datacenter Bridging (DCB) Using Intel Ethernet Switch Family Switches

IP Storage: The Challenge Ahead Prasenjit Sarkar, Kaladhar Voruganti Abstract 1 Introduction

White Paper Abstract Disclaimer

SMB Advanced Networking for Fault Tolerance and Performance. Jose Barreto Principal Program Managers Microsoft Corporation

High Performance Server SAN using Micron M500DC SSDs and Sanbolic Software

Tyche: An efficient Ethernet-based protocol for converged networked storage

Using High Availability Technologies Lesson 12

Whitepaper. Implementing High-Throughput and Low-Latency 10 Gb Ethernet for Virtualized Data Centers

Transcription:

NVM Express Moves Into The Future NVM Express (NVMe ) is a new and innovative method of accessing storage media and has been capturing the imagination of data center professionals worldwide. The momentum behind NVMe has been increasing since it was introduced in 2011. In fact, NVMe technology is expected to improve along two dimensions over the next couple of years: improvements in latency and the scaling up of the number of NVMe devices in large solutions. NVMe over Fabrics NVM Express over Fabrics defines a common architecture that supports a range of storage networking fabrics for NVMe block storage protocol over a storage networking fabric. This includes enabling a front-side interface into storage systems, scaling out to large numbers of NVMe devices and extending the distance within a datacenter over which NVMe devices and NVMe subsystems can be accessed. Work on the NVMe over Fabrics specification began in 2014 with the goal of extending NVMe onto fabrics such as Ethernet, Fibre Channel and InfiniBand. NVMe over Fabrics is designed to work with any suitable storage fabric technology. This specification was published in June 2016. Two types of fabric transports for NVMe are currently under development: NVMe over Fabrics using RDMA NVMe over Fabrics using Fibre Channel (FC-NVMe) Using RDMA with NVMe over Fabrics includes any of the RDMA technologies, including InfiniBand, RoCE and iwarp. The development of NVMe over Fabrics with RDMA is defined by a technical sub-group of the NVM Express organization. FC-NVMe, the NVMe over Fabrics initiative relating to Fibre Channel-based transport, is being developed by the INCITS T11 committee, which develops all of the Fibre Channel interface standards. As part of this development work, FC-NVMe is also expected to work with Fibre Channel over Ethernet (FCoE). The goal of NVMe over Fabrics is to provide distance connectivity to NVMe devices with no more than 10 microseconds (µs) of additional latency over a native NVMe device inside a server. NVMe over Fabrics solutions are expected to begin to become available in 2016.

Use Cases There are several use cases for NVMe over Fabrics. One can easily imagine a storage system comprised of many NVMe devices, using NVMe over Fabrics with either an RDMA or Fibre Channel interface, making a complete end-to-end NVMe storage solution. This system would provide extremely high performance while maintaining the very low latency available via NVMe. Another implementation would use NVMe over Fabrics to achieve the low latency while connected to a storage subsystem that uses more traditional protocols internally to handle I/O to each of the SSDs in that system. This would gain the benefits of the simplified host software stack and lower latency over the wire, while taking advantage of existing storage subsystem technology.

Peering Over the Horizon into the Future As we move further into 2016, we can expect additional optimization of the host-side code path for NVMe to become available, further reducing overall latency to approximately 100 µs as seen by the host computer. In addition, the industry will provide solutions that can scale to hundreds of NVMe devices into what might be called rack scale shared storage solutions. Looking ahead into 2017 and beyond, we expect availability of the first of the post-flash non-volatile memory solutions that utilize the NVMe protocol running on a new type of storage media, resulting in latencies of approximately 20-25 µs as seen by the host computer. At the same time, the NVMe over Fabrics solutions will begin to provide significant scaling to thousands of NVMe devices in a large shared storage solution providing storage to hundreds or thousands of application processing cores. NVMe over Fabrics Technical Characteristics Obviously, transporting NVMe commands across a network requires special considerations over and above those that are determined for local, in-storage memory. For instance, in order to transmit NVMe protocol over a distance, the ideal underlying network or fabric technology will have the following characteristics: Reliable, credit-based flow control and delivery mechanisms. This type of flow control allows the network or fabric to be self-throttling, providing a reliable connection that can guarantee delivery at the hardware level without the need to drop frames or packets due to congestion. Credit-based flow control is native to Fibre Channel, InfiniBand and PCI Express transports. An NVMe-optimized client. The client software should be able to send and receive native NVMe commands directly to and from the fabric without having to use a translation layer such as SCSI.

A low-latency fabric. The fabric itself should be optimized for low latency. The fabric should impose no more than 10 µs of latency end-to-end, including the switches. Reduced latency and CPU utilization adapters or interface cards. The adapter should be able to register direct memory regions for the applications to use so that the data to be transmitted can be passed directly to the hardware fabric adapter. Fabric scaling. The fabric should be able to scale out to tens of thousands of devices or more. Multi-Host support. The fabric should be able to support multiple hosts actively sending and receiving commands at the same time. This also applies to multiple storage subsystems. Multi-port support. The host servers and the storage systems should be able to support multiple ports simultaneously. Multi-path support. The fabric should be able to support multiple paths simultaneously between any NVMe host initiator and any NVMe storage target. The large number (64K) of separate I/O queues and inherent parallelism of these NVMe I/O queues can work well with the type of fabrics described above. Each of the 64K I/O queues can support 64K commands simultaneously, making it capable of implementation in very large fabrics. Furthermore, the small number of commands in the NVMe command set makes it relatively straightforward to implement in a variety of fabric environments. Differences between local NVMe and NVMe over Fabrics Approximately 90% of the NVMe over Fabrics protocol is the same as the local NVMe protocol. This includes the NVMe namespaces, I/O and administrative commands, registers and properties, power states, asynchronous events, reservations and others. The key differences are in four areas, listed in the table below. Differences PCI Express (PCIe) NVMe over Fabrics Identifier Bus/Device/Function NVMe Qualified Name (NQN) Discovery Bus Enumeration Discovery and Connect commands Queueing Memory-based Message-based Data Transfers PRPs or SGLs SGLs only, added Key PRP: Physical Region Page (physical memory page address, PCIe transport only) SGL: Scatter-Gather List (list of locations and lengths for read or write requests) These differences are primarily interesting to developers of NVMe products, as their device drivers need to properly handle both local NVMe devices and remote NVMe devices. Some of these items, such as the Identifier, may be exposed to end-users to help identify specific NVMe devices for specific applications. The discovery mechanism is designed to work with multiple types of transports.

NVMe Transport Mapping In a local NVMe implementation, NVMe commands and responses are mapped to shared memory in a host over the PCIe interface. However, fabrics are built on the concept of sending and receiving messages without shared memory between the end points. The NVMe fabric message transports are designed to encapsulate NVMe commands and responses into a message-based system by using capsules that include one or more NVMe commands or responses. The capsules or combination of capsules and data are independent of the specific fabric technology and are sent and received over the desired fabric technology. For NVMe over Fabrics, the entire NVMe multi-queue model is maintained, using normal NVMe submission queues and completion queues, but encapsulated over a message-based transport. The NVMe I/O queue pair (submission and completion) is designed for multi-core CPUs, and this low-latency efficient design is maintained in NVMe over Fabrics. When sending complex messages to an NVMe over Fabrics device, the capsules allow multiple small messages to be sent as one message, which improves the efficiency of the transmission and reduces latency. The capsule is either a submission queue entry or a completion queue entry combined with some amount of data, metadata or Scatter-Gather Lists (SGLs). The content of the elements is the same as local NVMe protocol, but the capsule is a way to package them together for improved efficiency.

NVMe Qualified Name (NQN) One of the key benefits of a storage fabric is the inherent intelligence used to maintain consistency across all devices. In this case, NVMe over Fabrics uses a familiar qualified naming addressing convention. The NVMe Qualified Name (NQN) is used to identify the remote NVMe storage target. It is similar to an iscsi Qualified Name (IQN). Additional details on NVMe qualified names are described in section 7.9 of the NVMe Base specification, available at http://www.nvmexpress.org/specifications/. Protocol Type Example NVMe NQN nqn.2014-08.com.vendor:nvme:nvm-subsystem-sn-d78432 iscsi IQN iqn.1991-05.com.microsoft:dmrtk-srvr-m

Conclusion NVMe over Fabrics is poised to extend the low-latency efficient NVMe block storage protocol over fabrics to provide large-scale sharing of storage over distance. NVMe over Fabrics maintains the architecture and software consistency of the NVMe protocol across different fabric types, providing the benefits of NVMe regardless of the fabric type or the type of non-volatile memory used in the storage target. The next few years will be very exciting for the industry! NVM Express and the NVM Express logo are registered trademarks, and NVMe is a trademark of NVM Express, Inc. All other names mentioned may be trademarks or registered trademarks of their respective holders.