- Loading the original graph
Partitioner can handle frozen tf.Graph, tf.GraphDef, or a path to the network file/folder. If the pb file is provided the graph should be properly frozen. Other options include model stores using tf.train.Saver and tf.saved_model.
- Partitioning
In this step the subgraph specified by startnode and finalnode sets is analyzed for FPGA acceleration. This is done in multiple phases.
- All graph nodes get partitioned into (FPGA) supported and unsupported sets using one of two method. The default (compilerFunc='SPECULATIVE') method uses rough estimate of the hardware operation tree. The second method (compilerFunc= ‘DEFINITIVE’) utilizes the hardware compiler. The latter is more accurate and can handle complex optimization schemes based on the specified options, however, it takes considerable more time to conclude the process.
- Adjacent supported and unsupported nodes get merged into (fine grained) connected components.
- Supported partitions get merged into maximally connected components, while maintaining the DAG property.
- Each supported partition gets (re)compiled using hardware compiler to create runtime code, quantization info, and relevant model parameters.
- Each supported partition subgraph is stored for visualization and debug purposes.
- Each supported subgraph gets replaced by tf.py_func node (with naming convention fpga_func_<partition_id>) that contains all necessary python function calls to accelerate that subgraph over FPGA.
- Freezing the modified graph
The modified graph gets frozen and stored with “-fpga” suffix.
- Run natively in Tensorflow
The modified graph can be loaded using load_partitioned_graph method of the partitioner class. The modified graph replaces the default tensorflow graph and can be used similar to the original graph.