Xilinx KCU116: The Cost-Effective 100 Gbps Network & Storage FPGA Development Platform

The Kintex® UltraScale+™ family is considered to be the best price/performance/watt balance FPGA device built on TSMC 16 nm FinFET Technology from Xilinx®. Combine the new UltraRAM and the new interconnect optimization technology (SmartConnect), this device delivers the most cost-effective solution for applications that require high-end capability transceivers for 100 Gbps connectivity cores. This family is designed specifically for networking and storage applications such as network packet processing and wireless MIMO technology, 100 Gbps wired networking, industrial and data center networking acceleration, and NVMe SSD (solid-state drive) storage acceleration. This article demonstrates the 100 Gbps solution of TCP Offload Engine networking and NVMe SSD implementation on Xilinx’s KCU116 Evaluation Kit by using Design Gateway’s TOE100G-IP Core which is for CPU solutions with 12 GB/s TCP transmission over 100 GbE interface and NVMeG4-IP Core which is able to achieve incredibly fast performance of approximately 4 GB/s per SSD.

Introduction to the Kintex® UltraScale+ KCU116 Evaluation Kit

The KCU116 is ideal for evaluating key Kintex UltraScale+ features, most notably the 28 Gbps transceiver performance. This kit is well suited for rapid prototyping based on a XCKU5P-2FFVB676E FPGA device.

Included on the board are an on board 1 GB 32-bit DDR4-2666, FMC expansion ports for 1 x M.2 NVMe SSD, and PCIe Gen4 x8 lanes for up to 2 x M.2 NVMe SSD interface. The 16 x 28 Gbps GTY transceivers are available for both PCIe Gen4 and 100 GbE interface implementation and features a variety of peripheral interfaces and FPGA logic for user-customized designs.

Image of Xilinx KCU116 evaluation kit (click to enlarge)Figure 1: KCU116 evaluation kit. (Image source: Xilinx Inc.)

Together with Design Gateway’s IP Cores, the KCU116 provides everything that is necessary to develop state-of-the-art 100 Gbps networking and storage solutions without needing MPSoC support.

Implementation of 100Gbps networking & storage solutions

Diagram of 100 Gbps networking & storage solution on KCU116Figure 2: 100 Gbps networking & storage solution on KCU116. (Image source: Design Gateway)

Even though Kintex UltraScale+ devices don’t feature MPSoC technology like Zynq UltraScale+, the networking and NVMe storage protocol processing is possible to implement without needing Processors and OS by leveraging Design Gateway’s IP Cores solutions:

  1. TOE100G-IP: 100 GbE Full TCP protocol stack IP Core without needing a CPU
  2. NVMeG4-IP: Standalone NVMe Host Controller with built-in PCIe Gen4 Soft IP 

Both TOE100G-IP and NVMeG4-IP can operate without the need for CPU/OS/Driver. User logic for control and data path with both IPs can be implemented by pure hardware logic or bare-metal OS by Microblaze, enabling the development of high-level applications and algorithms faster and easier without needing to worry about complicated networking and NVMe protocols. This opens up new opportunities for advanced system level solutions such as sensors data capturing, on board computation and AI based Edge computing devices.

Design Gateway’s TOE100G-IP for UltraScale+ device

Image of TOE100G-IP systemsFigure 3: TOE100G-IP systems. (Image source: Design Gateway)

The TOE100G IP core implements TCP/IP stack (in hardwire logic) connects with Xilinx’s 100 Gb Ethernet Subsystem module for the lower-layer hardware. The user interface of the TOE100G IP consists of a Register interface for control signals and a FIFO interface for data signals. The TOE100G IP is designed to connect with the 100 Gb Ethernet subsystem which uses a 512-bit AXI4-ST to connect to the user interface. The Ethernet subsystem, provided by Xilinx, includes EMAC, PCS, and PMA functions. The clock frequency of user interface of 100 Gb Ethernet subsystem is equal to 322.265625 MHz.

TOE100G-IP’s Features

  • Full TCP/IP stack implementation
  • Support one session by one TOE100G IP (Multisession can be implemented by using multiple TOE100G IPs))
  • Support both Server and Client mode (Passive/Active open and close)
  • Support Jumbo frame
  • Simple data interface by standard FIFO interface
  • Simple control interface by single port RAM interface

FPGA resource usages on the XCKU5P-2FFVB676E FPGA device are shown in Table 1 below.

Family Example Device Fmax (MHz) CLB Regs CLB LUTs CLB IOB BRAMTile URAM GTY Design Tools
Kintex-Ultrascale+ XCKU5P-FFVB676-2E 350 12883 17535 3208 - 53 - 4 Vivado2019.1

Table 1: Example Implementation Statistics for Kintex Ultrascale+ device

More details of the TOE100G-IP are described in its datasheet which can be downloaded from Design Gateway’s website.

Design Gateway’s NVMe PCIe Gen4 Host Controller for GTY Transceivers

The Kintex UltraScale+ features a GTY transceiver which capable of a PCIe Gen4 interface support, but a PCIe Gen4 integrated Block and ARM processor isn’t available.

Design Gateway solved this problem by developing the NVMeG4-IP core that is able to run as a standalone NVMe host controller with built-in PCIe soft IP and PCIe bridge logic in single core. Enabling NVMe PCIe Gen4 SSD access simplifies the user interface and enables standard features to be designed for ease of usage without needing knowledge of the NVMe protocol.

Image of NVMeG4-IP block diagramFigure 4: NVMeG4-IP block diagram. (Image source: Design Gateway)

NVMeG4-IP’s Features

  • Able to implement application layer, transaction layer, data link layer, and some parts of the physical layer to access the NVMe SSD without a CPU or external DDR memory
  • Operates with Xilinx PCIe PHY IP configured as a 4-lane PCIe Gen4 (256-bit bus interface)
  • Includes 256 Kbyte RAM data buffer
  • Supports six commands, i.e. Identify, Shutdown, Write, Read, SMART, and Flush (optional additional command support available)
  • User clock frequency must be more than or equal to PCIe clock (250 MHz for Gen4)

FPGA resource usages on the XCKU5P-2FFVB676E FPGA device are shown in Table 2 below.

Family Example Device Fmax (MHz) CLB Regs CLB LUTs CLB IOB BRAMTile URAM GTY Design Tools
Kintex-Ultrascale+ XCKU5P-FFVB676-2E 300 19214 21960 4382 - 12 8 4 Vivado2019.1

Table 2: Example Implementation Statistics for Kintex Ultrascale+ device.

More details of the NVMeG4-IP are described in its datasheet which can be downloaded from Design Gateway’s website.

Example TOE100G-IP implementation & performance result on KCU116

Figure 5 shows the overview of the reference design based on the KCU116 to demonstrate TOE100G-IP implementation. The demo system includes Bare-metal OS Microblaze systems, User logic and Xilinx’s 100 Gb Ethernet Subsystems.

Image of TOE100G-IP demo systems block diagramFigure 5: TOE100G-IP demo systems block diagram. (Image source: Design Gateway)

The demo system is designed to evaluate TOE100G-IP operation in both Client and Server mode. The test logic allows sending and receiving data with a test pattern for the highest possible data speed on the User interface side.  For a 100 GbE interface with KCU116, four SFP+ transceivers (25GBASE-R) and fiber cable are required as shown in Figure 6.

Image of TOE100G-IP demo environment set up on KCU116Figure 6: TOE100G-IP demo environment set up on KCU116. (Image source: Design Gateway)

The example test result when comparing 100G with others (1G/10G/25G/40G) is shown in Figure 7.

Graph of TOE100G-IP performance comparison with 1G/10G/25G/40G on the KCU116Figure 7: TOE100G-IP performance comparison with 1G/10G/25G/40G on the KCU116. (Image source: Design Gateway)

The test result demonstrates that TOE100G-IP is capable of achieving a speed of approximately 12 GB/s TCP transmission.

Example of NVMeG4-IP implementation & performance result on KCU116

Figure 8 shows the overview of the reference design based on the KCU116 to demonstrate 1CH NVMeG4-IP implementation. It’s possible to implement multiple instances of NVMeG4-IP to achieve higher storage performance if the FPGA resources are available from the User customized design.

For more detail of NVMeG4-IP reference design, please refer to the NVMeG4-IP reference design document provided on Design Gateway’s website.

Diagram of NVMeG4-IP reference design overviewFigure 8: NVMeG4-IP reference design overview. (Image source: Design Gateway)

The demo system is designed to write/verify data with the NVMe SSD on the KCU116. The user controls the test operation through a Serial console. For the NVMe SSD to interface with the KCU116, an AB18-PCIeX16 adapter board is required as shown in Figure 9.

Image of NVMeG4-IP demo environment set up on KCU116 (click to enlarge)Figure 9: NVMeG4-IP demo environment set up on KCU116. (Image source: Design Gateway)

The example test result when running the demo system on the KCU116 while using the 512 GB Samsung 970 Pro is shown in Figure 10.

Graph of NVMe SSD read/write performance on KCU116 by using Samsung 970 PRO SFigure 10: NVMe SSD read/write performance on KCU116 by using Samsung 970 PRO S. (Image source: Design Gateway)

Conclusion

Both the TOE100G-IP and NVMeG4-IP Core provide the solution to utilize 100 Gbps connectivity capability on KCU116 board for networking and NVMe storage application implementation. One TOE100G-IP is capable of approximately 12 GB TCP transmission over 100 GbE. The NVMeG4-IP can provide very high-performance storage with NVMe PCIe Gen4 at approximately 4 GB/s per SSD. Multiple instances of NVMeG4-IP can be used to form a RAID0 Controller and can increase storage performance to match the 100 GbE transmission speed.

The KCU116 evaluation kit and Design Gateway’s network & storage IP solutions enable the possibility to achieve the goal of the highest possible performance with the lowest possible FPGA resource usage for a very cost-effective solution or product based on the Xilinx® Kintex UltraScale+® device.