Article Preview
Top1. Introduction
NVMe SSD (NVMe Specification,2016) has high I/O operations per second (IOPS) with very low latencies, making it a good choice for storage in data centers. However, the Flash devices are always deployed over provisioning and caused underutilization in data centers (Ana, Christoforos, Eno, Binu, & Sanjeev, 2016). Disk sharing by way of remote access is a good way to solve disk over provisioning and underutilized issue.
The main stream of remote data access is via network. Disk access over the network can improve utilization on any machine that has spare capacity and bandwidth or on servers dedicated to sharing storages.
There is a significant challenge for remote access to achieve the same performance like local access. Taking iSCSI (Satran, 2004), a network based protocol as an example, its performance is significant lower than local access. Figure 1 depicts how initator (iSCSI Initiator, 2016) remotely accesses NVMe SSD of target (Ubuntu 16.04: Install tgt for iSCSI target, 2016) via iSCSI in Linux kernel stack (Werner, 2015). An application in initiator wants to read data of NVMe SSD in target. The read command needs to be passed into Linux kernel and processed in sequence by block layer, SCSI layer, iSCSI layer, TCP/IP, NIC driver, finally the encapsulated read command is delivered to target side via network. The encapsulated read command in NIC driver of target is extracted in sequence of TCP/IP, iSCSI layer, block layer, NVMe driver, then the NVMe SSD controller fetches the read command and executes it. The returned data is transferred backward through the same procedures. The IOPS of remote access to NVMe SSD via iSCSI only reach one fifth of local access by our experiment. The slow performance is caused by the complicated procedures in Linux kernel stack.
Figure 1.
Remote access NVMe SSD via iSCSI in Linux kernel stack
There were proposals to speed up network performance for remote access. One proposal was to optimize iSCSI in flash disaggregation system (Ana, Christoforos, Eno,Binu, & Sanjeev, 2016). Another proposal ReFlex (Ana,Heiner, & Christoforos, 2017) optimized TCP/IP, Ethernet and NVMe driver (Intel Linux NVMe Driver, 2016). These proposals needed to modify iSCSI, TCP/IP software stacks, and it may cause porting issues in different kernel versions.
The authors propose the novel design to improve remote access NVMe SSD performance. The design is enabling remote access NVMe SSD via PCIe NTB (Non-Transparent Bridge) (Intel ntb specification, 2000). Instead of SCSI/iSCSI, and TCP/IP processed in both side of servers, the authors design a remote NVMe driver to shake hands with NVMe driver via NTB and gain direct control of NVMe SSD.
Figure 2 depicts how to remotely access NVMe SSD via NTB in Linux kernel stack (Werner, 2015). An application in server 1 wants to read data of NVMe SSD in server 2. The read command is passed into Linux kernel and processed by block layer, then the read request is dispatched to remote NVMe driver. The remote NVMe driver fills in dedicated I/O queue of NVMe driver via NTB, then finally the NVMe controller fetches and executes it. The returned read data is transferred backward to remote NVMe driver via NTB. The performance of remote access NVMe SSD via NTB can approach the local access.
The following sections will introduce the adopted technology, design methodology, test tools, and experiment results.
Figure 2.
Remote access NVMe SSD via NTB in Linux kernel stack
Top2. Background
The research focuses on remote access NVMe SSD via NTB. It is based on these technologies to accomplish, including NTB and NVMe I/O queues sharing.