This design chooses the
bfloat16data type for both layer I/O data and for weights and biases. This simplifies the quantization of trained network parameters. Usingbfloat16requires no special tools or quantization strategies.This design does not set a specific throughput target.
The design partitions each network layer to its own AIE-ML v2 tile where feasible. This simplifies system partitioning and enables you to build a well-defined scope for each kernel.
Memory tile pre/post zero-padding capability is leveraged for 1D convolutional layers to expand input tensor shapes to satisfy model requirements that use
padding="same". The model useskernel_size=7which requires the input samples dimension to be pre-padded and post-padded with three zeros.Memory tile multi-dimensional addressing capabilities are leveraged to efficiently transfer I/O data for compute consumption with minimal core cycles being required for data shuffling or lane adjustments within the core.
Compute workloads for 1D convolutional layers leverage the efficient
mac_4x8_8x8()intrinsic forbfloat16data types to achieve a maximum efficiency of 256 MAC operations per cycle when feasible by a particular layer.Compute workloads leverage the less efficient
mac_elem_64()intrinsic forbfloat16data types with a maximum efficiency of 64 MAC operations per cycle in cases wheremac_4x8_8x8()is not feasible (for example in theconv1d_w1()layer which only receives data from two input nodes).The host sends weights and biases at run-time as async RTPs and stores them in local tile memory. Larger ML networks with millions or billions of weights require streaming solutions based on memory tiles or DDR; such a complex solution is excessive for the small Radio-ML Modulation Classifier problem considered here, where all weights fit easily within the array.
The design does not achieve perfect functional bit-match against the Python model. The main contributors to this are the dense layers; the corresponding sections discuss more details. Achieve a closer match by building Python models aligning with the implementation, then training those models to extract updated weights/biases.