Version 2

    Bittware logoIntroduction

     

    We are generating data quicker than our ability to analyze, understand, transmit, and reconstruct it. Demanding data-intensive applications like Big Data, Machine Learning (ML), Internet of Things (IoT), and high-speed computing are headlining a need for "accelerators" to offload work from general-purpose processors. An accelerator (a hardware device) partners with the CPU server to boost the speed and performance at which data is processed. A variety of off-the-shelf architectures for accelerators, such as GPU, ASIC, and FPGA architectures are available.

     

    Field Programmable Gate Arrays (FPGAs) offer excellent performance to implement dedicated hardware acceleration with flexible, software-like adaptability. Over the past decade, FPGA accelerators have emerged as a preferred option in data centers, due to their flexible and adaptable architecture and the ability to straddle multiple domains. FPGA accelerators need a server to function and cannot substitute a server's CPU(s). They are accelerators, giving a boost to the CPU server engine.

     

    A Growing Demand for FPGA Accelerators

     

    The inherent highly flexible fine-grained parallelism of FPGA accelerators offers data, task, and pipeline parallelism, resulting in faster data process execution. They also support low-latency data processing, due to their non-instruction architecture and data flow design. FPGAs are reconfigurable hardware chips that can be reprogrammed to implement varied combinational and sequential logic. Their programmability imparts excellent flexibility and the opportunity to quickly develop a prototype of a circuit keeping the same hardware. An FPGA complies with the high performance and low power operational needs of a given application, resulting in a high-performance ratio.

     

    Conventionally, FPGAs are a collection of logic blocks, I/O Cells, and interconnection resources. Advancements in FPGA hardware technology remodeling have made it easier for designers to customize functions specific to the application. Standard components like clock generators, floating-point units, DRAM controllers, PCI (peripheral component interconnect) express controllers, and even whole multicore microprocessors, are now part of the hardware. Further advancements in FPGA software technology, from low-level hardware description languages like Verilog to high-level programming options like C, C++, and open computing language (OpenCL) have made FPGAs a better alternative to integrate as accelerators in desired applications.

     

    FPGA accelerators have already set to expand beyond the data centers. Sectors such as autonomous driving, robotics, and Industry 4.0 cannot tolerate the telecommunication delays of relaying information from distant data centers. Compute horsepower is increasingly deployed in edge computing racks and mobile base stations in 5G. When it comes to large scale operations, an FPGA is a better accelerator choice for battery-operated devices, and in cloud services hosted on massive servers with reduced power consumption and minimal costs.

     

    Implementing Effective Acceleration

     

    FPGAs are powerful, and proven expertise in the circuitry, devices, and interconnects that interface with these FPGAs helps maximize their potential. BittWare, part of the Molex group of companies, has a 30-year track record of successfully designing and deploying advanced FPGA accelerator products. The company provides enterprise-class accelerator products featuring Achronix®, Intel®, and Xilinx® FPGA technology. It is the only FPGA-vendor agnostic supplier of critical mass able to address the qualification, validation, lifecycle, and support requirements of enterprise customers who want to deploy FPGA accelerators in high volumes for mission-critical applications.

     

    Partitioning workloads, understanding memory, network bandwidth requirements, and managing thermals in complex data center installations are some of BittWare’s key competencies. BittWare's FPGA accelerator products are designed to solve four primary application areas:

     

    Compute Accelerators: Heterogeneous platforms employing accelerators are the answer to process compute-intensive applications. FPGAs, with all their capabilities and features, offer a consolidated computation platform that generates high performance, flexible and energy-efficient operations for huge volumes of data. FPGAs have proven potential in the high-performance computing world.  This transformation is mostly due to the recent progress in the FPGA development tools and hardware technology like variable precision, utilization of DSPs for floating-point, high memory bandwidth, and development using OpenCL.

     

    Designed for compute-intensive data center applications, BittWare's 520N-MX is a PCIe board built using Intel's Stratix 10 MX FPGA with integrated HBM2 (high bandwidth memory) at 512 Gbps. Four QSFP28 cages supporting up to 100G per port makes it ideal for clustering, and OCuLink connectors allow for expansion. This board supports both traditional HDL and higher abstraction C, C++, and OpenCL. Its flexible memory architecture allows a broad range of memory types to be coupled to the FPGA fabric: QDR-II+ SRAM, DDR4 SDRAM, Intel Optane 3D-XPoint, and NVMe SSDs. The 520N-MX features a Board Management Controller (BMC) for advanced system monitoring and control, which significantly simplifies platform integration and management.

     

    BittWare 520N-MX

     

    Figure 1. Hardware accelerator diagram: BittWare 520N-MX

     

    Network Accelerators: Network port speeds continue to grow at a pace that traditional server nodes simply cannot match. Network accelerators are thus used to multiply information flow speed between end-users and offload some networking tasks. Numerous primary networking functions now depend upon FPGA-based network processing cards. These include a feature which prematurely evolves, like network security functions where algorithms vary regularly, or specialized services such as real-time packet capturing. FPGA networking cards are quickly emerging in proprietary functionality that previously ran on CPUs, e.g., network tasks like Deep Packet Inspection (DPI) above 10 Gigabit Ethernet.

     

    The BittWare XUP-P3R PCIe accelerator board built with a Xilinx UltraScale+™ FPGA is designed for high-performance, high-bandwidth, and reduced latency applications demanding massive data flow and packet processing. The board offers extensive memory configurations supporting up to 512 GBytes of memory, sophisticated clocking, and timing options. The XUP-P3R provides a variety of interfaces for high-speed serial I/O as well as debug support. Four QSFP28 cages are available on the front panel, each supporting 100GbE, 40GbE, four 25GbE, or four 10GbE channels, for a total of up to 400 Gbps of bandwidth. The four QSFPs can also be combined for 400GbE. All of these features combine to make the XUP-P3R ideal for a wide range of data center applications, including network processing and security, acceleration, storage, broadcast, and signal integration. BittWare offers complete software support for this card with its BittWorks II software tools. The toolkit serves as the main interface between the BittWare board and the host system. It includes drivers, libraries, utilities, and example projects for accessing, integrating, and developing applications for the BittWare board.

     

    BittWare XUP-P3R

     

    Figure 2. Hardware accelerator: BittWare XUP-P3R with four QSFP28 cages for 1x 400GbE, 4x 100GbE, 4x 40GbE, 16x 25GbE, or 16x 10GbE.

     

    BittWare also provides a suite of optimizable network acceleration tools. SmartNIC Shell is a complete working Network Interface Card (NIC) implemented on a BittWare FPGA board. SmartNIC Shell allows users to quickly deploy their own network functions (NFV), network monitoring, specialized packet brokering, or anything else that manipulates packets.

     

    Storage Accelerators: Computational Storage is an architecture in which data is processed in physical proximity to the storage device. The primary benefit of such an arrangement is to reduce the amount of data that must move between the storage plane and the compute plane. In traditional architectures, the CPU usually processes data compute tasks like compression, and data is alternated between the storage and the compute planes. In contrast, computational storage architecture shifts data compute tasks to a hardware accelerator (FPGA), offloading the CPU. Data remains close to the compute, avoiding data movement on the slower CPU compute plane. This offers considerable benefits, as the device performing the computing task can be more application-specific, and consequently, more energy-efficient.

     

    TraditionalComputational
    traditional storageComputational Storage

     

    Figure 3. Traditional vs. Computational Storage

     

    BittWare's 250 series FPGA products provide innovative solutions to cater to storage market needs. The 250-series product features Xilinx UltraScale+ FPGAs, and MPSoCs offer ASIC-class functionality in a single chip. Combining NVMe with reconfigurable logic FPGA and MPSoC, BittWare provides a new class of storage products. NVMe (Non-Volatile Memory Express), a communications standard developed specially for SSDs, operates across the PCIe bus, which allows for faster drives.  The 250 FPGA & MPSoC product line comprises three FPGA adapters, the 250-SoC, 250-U2, and 250-M2D.

     

    The recently launched BittWare 250-M2D is an FPGA-based Computational Storage Processor (CSP) designed to meet the draft M.2 Accelerator Module Hardware Specification standard. It is intended to operate in Glacier Point carrier cards for Yosemite servers. The 250-M2D product features a Xilinx Kintex UltraScale+ FPGA directly coupled to two banks of local DDR4 memory.

     

    Sensor Processing Accelerator: Sensor processing is a core market for FPGAs. FPGAs are extensively used in embedded applications such as RADAR, SDR, and Electronic Warfare for real-time data acquisition, filtering, and signal processing. Modern FPGAs and SoCs, with their inherent flexibility, are ideal for multi-channel sensor processing applications managing ultra-high data ingress and real-time processing requirements within SWaP (Size, Weight, and Power) constrained application environments. The unique blend of logic resources, DSP blocks, and embedded memories enables an efficient fulfillment of FPGA sensor processing demands.

     

    BittWare’s XUP-VV8 offers a large Xilinx FPGA in a 3/4-length PCIe board featuring QSFP-DD (double-density) cages for maximum port density. Using the Virtex UltraScale+ VU13P or VU9P FPGA, the board supports up to 8x 100GbE or 32x 10/25GbE. A utility header provides a USB interface for debug and programming support. The FPGA provides large logic resources up to 3.8M logic cells, as well as 455Mb of embedded memory. The board’s flexible memory configuration includes four DIMM sites that support DDR4 SDRAM and QDR. Memory card options include up to 512 GBytes of DDR4 with optional error-correcting codes (ECC) or up 2,304 Mbits QDR-II+ (2x 288Mbit banks x18). The board also features flash memory for FPGA images. The Board Management Controller (BMC) features include control of power and resets, monitoring of board sensors, FPGA boot loader, voltage overrides, configuration of programmable clocks, access to I2C bus components, field upgrades, and IPMI messaging. Access to the BMC is via PCIe or USB. BittWare’s BittWorks II Toolkit also provides utilities and libraries for communicating with the BMC components at a higher, more abstract level, allowing developers to remotely monitor the state of the board.

     

    Bittware XUP-VV8

     

    Figure 4. Hardware accelerator: XUP-VV8 is a 3/4 width PCIe board with four QSFP-DDs supporting up to 8x 100GbEor 32x 10/25GbE.

     

    Conclusion

     

    The increasing power of FPGA accelerators enables HPC centers, hyperscalers, edge computers, data scientists, and cloud builders to explore every possible option to accelerate application performance using FPGAs. Many users have recognized the power of FPGAs in acceleration applications. For example, Microsoft's Project BrainWave uses FPGAs for high-speed AI inferencing. Similarly, Amazon offers the ability to use FPGAs in the cloud through its F1 services, and the Baidu search engine uses FPGAs for its storage controller.

     

    BittWare has followed suit, with products featuring Achronix, Intel and Xilinx FPGA technology, and has established its market across multiple industry sectors: Commercial, Financial, Aerospace, Instrumentation, Telecommunication, and Bioinformatics, to name a few. CSPI, an IT integration solutions and high-performance computer system provider, based its next-generation Myricom network adapter on BittWare's Arria V GZ low profile PCIe board platform that provides highly accurate time stamping and low latency. In one of the states, for airport security radar system, BittWare's fully integrated FPGA system solution is installed inside the airport radar security system to regularly scan ground-based threats with high-speed analog capture and multiple channels processing. FPGAs will see broader and deeper adoption as accelerators.

     

    Return to FPGA Group