The concept of Neural Architecture Search (NAS) is that for any given inference task and dataset, there exist in the potential design space several network architectures that are both efficient and which have high prediction scores. Often, a developer starts with a standard backbone that is familiar to them, such as ResNet50, and trains that network for the best accuracy. However, there are many cases when a network topology with a much lower computational cost may have offered similar or better performance. For the developer, the effort to train multiple networks with the same dataset (sometimes going so far as to make this a training hyperparameter) is not an efficient method to select the best network topology.
NAS can be flexibly applied for each layer. The number of channels and amount of sparsity is learned by minimizing the loss of the pruned network. NAS achieves a good balance between speed and accuracy, but requires extended training times. This method requires a four-step process:
- Train
- Search
- Prune
- Fine-tune (optional)
Compared with coarse-grained pruning, one-shot NAS implementations assemble multiple candidate "subnetworks" into a single, over-parameterized graph known as a Supernet. The training optimization algorithm attempts to optimize all candidate networks simultaneously using supervised learning. Upon the completion of this training process, candidate subnetworks are ranked based on computational cost and accuracy. The developer selects the best candidate to meet their requirements. The one-shot NAS method is effective in compressing models that implement both depthwise convolutions and conventional convolutions but requires a long training time and a higher level of skill on the part of the developer.