Recurrent Neural Networks (RNNs) can process sequential data of variable length and have been widely used in natural language processing, speech synthesis and recognition, and financial time series forecasting. However, RNNs are very compute-intensive and have to process input frame by frame because of their strict sequential dependency. Traditional hardware cannot achieve the ideal number in latency, especially for financial data processing, in which latency is one of the most important factors for customers.
The deep-learning processor unit (DPU) for RNN is a customized accelerator built on FPGA or ACAP devices to achieve acceleration for the RNN. It can support different types of recurrent neural networks, including RNN, gate recurrent unit (GRU), long-short term memory (LSTM), Bi-directional LSTM, and their variants. The DPU for RNN has been deployed on the Alveo U25 and the U50LV data center accelerator cards and the VersalĀ® VCK5000 development card. The following table summarizes the features of these three RNN accelerators:
Feature | DPURADR16L (U25) | DPURAHR16L (U50LV) | DPURVDRML (VCK5000) |
---|---|---|---|
Precision | int16 | int16 | Mix: int8 for GEMM on AI Engine, int16 for others |
Operation Type | Matrix-Vector multiplication, element-wise multiplication and addition, sigmoid and Tanh | GEMM, Element-wise multiplication and addition, Sigmoid and Tanh, Relu, Max, Embedding (in RNN-T) | |
Multiplication Unit | One 32x32 Systolic Array | Seven 16x32 Systolic Arrays | 40 AI Engine cores |
Frequency | Freq_DSP = Freq_PL = 310 MHz | Freq_DSP = 540 MHz, Freq_PL = 270 MHz | Freq_AIE = 1.25 GHz, Freq_PL = 300 MHz |
Resource Utilization | LUTs: 187,509 (35.9%) Regs: 303670 (29.0%) Block RAM: 659 (67.0%) URAM: 56 (43.8%) DSPs: 1092 (55.5%) |
LUTs: 488,679 (56.1%) Regs: 1045016 (60.0%) Block RAM: 796 (59.2%) URAM: 512 (80%) DSPs: 4148 (69.7%) |
LUTs: 169,163 (18.8%) Regs: 241657 (13.4%) Block RAM: 197 (20.4%) URAM: 332 (71.7%) DSPs: 82 (4.2%) AI Engine: 40 (10.0%) |
Example Models | IMDB Sentiment Detection, Customer Satisfaction, Open Information Extraction | RNN-T | |
Quantization | RNN Quantizer v2.0 | Manually | |
Compilation | RNN Compiler v2.0 | Manually | |
|