Artificial intelligence, deep learning, and neural networks represent incredibly exciting and powerful machine learning-based techniques used to solve many real-world problems. Neural networks, which a inspired by the human brain, are now the predominant vision processing algorithms, exceeding humans in accuracy in multiple applications. They are capable of modelling and processing nonlinear relationships between inputs and outputs in parallel and they are characterized by containing adaptive weights along paths between neurons which are tuned during training time. Once the parameters are learned, they can be used in the field to perform inference.
Quantization of neural networks
Weights, or network parameters, in neural networks are traditionally represented with 32bit float data types. Recent research shows that weights with 8, 4, 2, or 1bit fixed point values are sufficient. However, compared to 32 bit float which represents values between 10^-38 to 10^+38, the dynamic range has been hugely reduced with quantization.
Intuitively you might think this would highly reduce the accuracy of the neural network, but it was demonstrated for numerous popular networks, that if the training is performed already with these quantized weights, they can maintain a very reasonable level of accuracy.
The advantages are significant because:
- reduced precision fixed point values are smaller, so storing millions of weights represents significant memory savings
- reduced precision arithmetic is cheaper (area and power) than floating point, and programmable logic, thanks to its flexibility, is a perfect match to implement such ad-hoc reduced precision arithmetic cores
The examples that are currently available can be split in 2 categories:
- bnn: stands for Binarized Neural Network. The quantization process goes down to a single bit for all the parameters. In this specific case, the MAC arithmetic can be simplified to XNOR and popcount operations.
- qnn: stands for Quantized Neural Network. In this case, parameters can have flexible bit widths
The current release shows 2 examples per each category. Another distinctive difference among the available overlays is the actual hardware architecture:
- Feed-forward Dataflow: all layers of the network are implemented in the hardware, the output of one layer is the input of the following one that starts processing as soon as data is available. The network parameters for all layers are cached in the on-chip memory. For each network topology, a customized hardware implementation is generated that provides low latency and high throughput.
- Multi-layer offload: a fixed hardware architecture is implemented, being able to compute multiple layers in a single call. The complete network is executed in multiple calls, which are scheduled on the same hardware architecture. Changing the network topology implies changing the runtime scheduling, but not the hardware architecture. This provides a flexible implementation but features slightly higher latency.
In the current release, the 2 bnn overlays are implemented in feed-forward dataflow architecture with fixed topologies, while the 2 qnn overlays feature a multi-layer offload architecture with support to 2-bits and 3-bits for the activations.
Multiple notebooks examples are provided, with different dataset and several architecture.
The BNN based notebooks with dataflow are:
- Cifar10: shows a convolutional neural network, composed of 6 convolutional, 3 max pool and 3 fully connected layers trained on the Cifar10 dataset
- SVHN: shows a convolutional neural network, composed of 6 convolutional, 3 max pool and 3 fully connected layers trained on the Street View House Number dataset
- GTRSB: shows a convolutional neural network, composed of 6 convolutional, 3 max pool and 3 fully connected layers trained on the German Road Sign dataset
- MNIST: shows a multi layer perceptron with 3 fully connected layers trained on the MNIST dataset for digit recognition
The QNN based notebooks with multi-layer offload are:
- ImageNet Classification: shows an example on how to classify a non-labelled image (e.g., downloaded from the web, your phone etc) in one of the 1000 classes available on the ImageNet dataset.
- ImageNet – Dataset validation: shows an example classifying labelled image (i.e., extracted from the dataset) in one of the 1000 classes available on the ImageNet dataset.
- ImageNet – Dataset validation in a loop: shows an example classifying labelled image (i.e., extracted from the dataset) in one of the 1000 classes available on the ImageNet dataset in a loop.
- Object Detection – from image: shows object detection in a image (e.g., downloaded from the web, your phone etc), being able to identify objects in a scene and drawing bounding boxes around them. The objects can be one of the 20 available in the PASCAL VOC dataset
- Object Detection – from image in a loop: shows object detection in a image and draws bounding boxes around identified objects (20 available classes from PASCAL VOC dataset) in a loop.
The Pynq BNN-PYNQ Repository is hosted on github: BNN-PYNQ GitHub Repository.
The Pynq QNN-MO-PYNQ Repository is hosted on github: QNN-MO-PYNQ GitHub Repository.