RNN compilation flow uses XMODEL format as the unified interface for the quantizer versus the compiler and the compiler versus run time. The RNN compiler accepts the quantized XMODEL(s) as input and generates another XMODEL output as the RNN run-time input when compilation is complete. The core components of the compiler are the RNN compiler frontend and backend:
Figure 1. RNN Compilation Flow
- Frontend
- The frontend parses the XMODEL files into JSON files for backend usage.
- Backend
- The backend parses the JSON files into Tensor immediate representation and performs multiple target-specific optimizations, including off-chip and on-chip memory planning for aggressive memory reuse and efficient instruction scheduling to achieve better parallelism. The backend also generates hardware instruction and generates an XMODEL as ouput. The generated XMODEL contains all the necessary metadata and hardware instructions. The RNN runtime uses it for on-board inference purposes.