LinMao's Blog

Machine Learning on PYNQ


Artificial intelligence, deep learning, and neural networks represent incredibly exciting and powerful machine learning-based techniques used to solve many real-world problems. Neural networks, which a inspired by the human brain, are now the predominant vision processing algorithms, exceeding humans in accuracy in multiple applications. They are capable of modelling and processing nonlinear relationships between inputs and outputs in parallel and they are characterized by containing adaptive weights along paths between neurons which are tuned during training time. Once the parameters are learned, they can be used in the field to perform inference.

Quantization of neural networks

Weights, or network parameters, in neural networks are traditionally represented with 32bit float data types. Recent research shows that weights with 8, 4, 2, or 1bit fixed point values are sufficient. However, compared to 32 bit float which represents values between 10^-38 to 10^+38, the dynamic range has been hugely reduced with quantization.

Intuitively you might think this would highly reduce the accuracy of the neural network, but it was demonstrated for numerous popular networks, that if the training is performed already with these quantized weights, they can maintain a very reasonable level of accuracy.

The advantages are significant because:

  • reduced precision fixed point values are smaller, so storing millions of weights represents significant memory savings
  • reduced precision arithmetic is cheaper (area and power) than floating point, and programmable logic, thanks to its flexibility, is a perfect match to implement such ad-hoc reduced precision arithmetic cores

Available overlays

The examples that are currently available can be split in 2 categories:

  • bnn: stands for Binarized Neural Network. The quantization process goes down to a single bit for all the parameters. In this specific case, the MAC arithmetic can be simplified to XNOR and popcount operations.
  • qnn: stands for Quantized Neural Network. In this case, parameters can have flexible bit widths

The current release shows 2 examples per each category. Another distinctive difference among the available overlays is the actual hardware architecture:

  • Feed-forward Dataflow: all layers of the network are implemented in the hardware, the output of one layer is the input of the following one that starts processing as soon as data is available. The network parameters for all layers are cached in the on-chip memory. For each network topology, a customized hardware implementation is generated that provides low latency and high throughput.
  • Multi-layer offload: a fixed hardware architecture is implemented, being able to compute multiple layers in a single call. The complete network is executed in multiple calls, which are scheduled on the same hardware architecture. Changing the network topology implies changing the runtime scheduling, but not the hardware architecture. This provides a flexible implementation but features slightly higher latency.

In the current release, the 2 bnn overlays are implemented in feed-forward dataflow architecture with fixed topologies, while the 2 qnn overlays feature a multi-layer offload architecture with support to 2-bits and 3-bits for the activations.

Available examples

Multiple notebooks examples are provided, with different dataset and several architecture.

The BNN based notebooks with dataflow are:

  • Cifar10: shows a convolutional neural network, composed of 6 convolutional, 3 max pool and 3 fully connected layers trained on the Cifar10 dataset
  • SVHN: shows a convolutional neural network, composed of 6 convolutional, 3 max pool and 3 fully connected layers trained on the Street View House Number dataset
  • GTRSB: shows a convolutional neural network, composed of 6 convolutional, 3 max pool and 3 fully connected layers trained on the German Road Sign dataset
  • MNIST: shows a multi layer perceptron with 3 fully connected layers trained on the MNIST dataset for digit recognition

The QNN based notebooks with multi-layer offload are:


The Pynq BNN-PYNQ Repository is hosted on github: BNN-PYNQ GitHub Repository.

The Pynq QNN-MO-PYNQ Repository is hosted on github: QNN-MO-PYNQ GitHub Repository.

赞(1) 打赏
转载请注明出处:LinMao's Blog(林茂的博客) » Machine Learning on PYNQ

评论 1

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
  1. #1


    B1D1ng4年前 (2020-07-22)回复

LinMao's Blog(林茂的博客)