The
bfloat16data type is chosen for both layer I/O data and for weights and biases. This simplifies the quantization of trained network parameters. No special tools nor quantization strategies are required.No specific throughput target is chosen.
The design partitions each network layer to its own AIE-ML v2 tile where feasible. This simplifies system partitioning and allows you to build a well-defined scope for each kernel.
Memory tile pre/post zero-padding capability is leveraged for 1D convolutional layers to expand input tensor shapes to satisfy model requirements that use
padding="same". The model useskernel_size=7which requires the input samples dimension to be pre-padded and post-padded with three zeros.Memory tile multi-dimensional addressing capabilities are leveraged to efficiently transfer I/O data for compute consumption with minimal core cycles being required for data shuffling or lane adjustments within the core.
Compute workloads for 1D convolutional layers leverage the efficient
mac_4x8_8x8()intrinsic forbfloat16data types to achieve a maximum efficiency of 256 MAC operations per cycle when feasible by a particular layer.Compute workloads leverage the less efficient
mac_elem_64()intrinsic forbfloat16data types with a maximum efficiency of 64 MAC operations per cycle in cases wheremac_4x8_8x8()is not feasible (for example in theconv1d_w1()layer which only receives data from two input nodes).Weights and biases are sent from the host at run-time as async RTPs and are stored in local tile memory. Larger ML networks with millions or billions of weights require streaming solutions based on memory tiles or DDR; such a complex solution is excessive for the small Radio-ML Modulation Classifier problem considered here where all weights may be stored easily within the array.
Perfect functional bit-match against the Python model is not achieved. The main contributors to this are the dense layers; more details are discussed in the corresponding sections. A closer match can be achieved by building Python models aligning with the implementation then training those to extract updated weights/biases.