The first neural network implementation that I'm going to look at is for CIFAR-10 (Canadian Institute For Advanced Research).  CIFAR-10 is a computer vision dataset used to train and test neural networks for object recognition.  The CIFAR-10 data consists of 60,000 32x32 color images in 10 classes, with 6000 images per class. There are 50,000 training images and 10,000 test images.

 

Labeled Image Classes

  • airplane
  • automobile
  • bird
  • cat
  • deer
  • dog
  • frog
  • horse
  • ship
  • truck

 

The CIFAR-10 examples are from the BNN-PYNQ GitHub repository https://github.com/Xilinx/BNN-PYNQ.

 

First example: Deer image

The first example is in the Jupyter notebook CNV-BNN_Cifar10.  It uses trained parameters for the CNV network using 1 bit for weights and activation.  The CNV is described in the FINN paper https://arxiv.org/abs/1612.07119"CNV is a convolutional network topology that contains a succession of (3x3 convolution, 3x3 convolution, 2x2 maxpool) layers repeated three times with 64-128-256 channels, followed by two fully connected layers of 512 neurons each."

 

This example instantiates both a hardware accelerated and pure software inference classifier.  It classifies an image from the CIFAR-10 test set using both hardware and software classifiers to demonstrate the speed advantage of the hardware implementation.  The image in this example is of a deer.  I added a cell to show the detailed classification ranking of the different CIFAR-10 classes for this image.

 

Video showing the step-by-step execution of the notebook.

 

Here are the classifier rankings copied from the notebook output:

You can see that for this image the classification is unambiguous.  The deer ranking is 1.7x the next nearest class.

 

Second example: Dog images

Next I created a notebook to classify images in the 'dog' class.  For this example I used one image from the CIFAR10 test set and 4 images that I downloaded from the internet.  The images are shown below.

Dog Test Images

 

Videos are nice to see the execution sequence but since the flow is the same as the previous example - I've just extracted the notebook data for easier reading.

 

When I first ran the classification, the classifier had difficulty with dog2 (differentiating it from a cat) and puppy2 was mis-classified as a 'ship'.  Looking at the puppy2 image you can see that the puppy is sitting on a rug that is a distinct horizontal shape in the image.  As an experiment I edited the image to remove the rug and ran the classifier again.

 

Here is the modified image:

puppy2 image with rug removed

 

And here are the full classifier results:

You can see that removing the rug (horizontal shape) reduced the 'ship' ranking dramatically.  This demonstrates the difficulty of using a network trained on such a small set of images.  There will be extreme sensitivity to any artifacts in the image relative to the subject you are trying to classify.

 

And here is a plot of the data:

 

Last example: Webcam images

As a final example of the BNN-CIFAR10 network, I'm going to try classifying images captured by a webcam.

 

The hardware inference (classification) will be performed withe different precision for weights and activation.

 

Here are the 3 cases:

  1. W1A1 - 1 bit weights and 1 activation, this is the BNN we were using previously
  2. W1A2 - 1 bit weight and 2 activation
  3. W2A2 - 2 bit weights and 2 activation

 

As the complexity of the network increases the execution time will increase but hopefully so will the inference accuracy.

 

I am using a Logitech C525 USB webcam which is HD 720P with autofocus.  It is plugged directly into the USB Host port on the PYNQ-Z2.

 

The code to capture the image from the webcam is straightforward using the OpenCV-Python (cv2) and the Python Imaging Library (PIL).  The code also allows for brightness enhancement as webcam brightness is very sensitive to ambient lighting.

import cv2
from PIL import Image as PIL_Image
from PIL import ImageEnhance
from PIL import ImageOps


# says we capture an image from a webcam
cap = cv2.VideoCapture(0) 
_ , cv2_im = cap.read()
cv2_im = cv2.cvtColor(cv2_im,cv2.COLOR_BGR2RGB)
img = PIL_Image.fromarray(cv2_im)
img

# The enhancement values (contrast and brightness) depend on backgroud, external lights etc
bright = ImageEnhance.Brightness(img)                                     
img = bright.enhance(0.95)

 

W1A1 Classifier

hw_classifier = bnn.CnvClassifier(bnn.NETWORK_CNVW1A1,"cifar10",bnn.RUNTIME_HW)
class_ranksW1A1=hw_classifier.classify_image_details(img)
inferred_class=class_ranksW1A1.argmax()
print("Inferred class: {0}".format(inferred_class))
print("Class name: {0}".format(hw_classifier.class_name(inferred_class)))

 

W1A2 Classifier

hw_classifier = bnn.CnvClassifier(bnn.NETWORK_CNVW1A2,"cifar10",bnn.RUNTIME_HW)
class_ranksW1A2=hw_classifier.classify_image_details(img)
inferred_class=class_ranksW1A2.argmax()
print("Inferred class: {0}".format(inferred_class))
print("Class name: {0}".format(hw_classifier.class_name(inferred_class)))

 

W2A2 Classifier

hw_classifier = bnn.CnvClassifier(bnn.NETWORK_CNVW2A2,"cifar10",bnn.RUNTIME_HW)
class_ranksW2A2=hw_classifier.classify_image_details(img)
inferred_class=class_ranksW2A2.argmax()
print("Inferred class: {0}".format(inferred_class))
print("Class name: {0}".format(hw_classifier.class_name(inferred_class)))

 

 

For my first test of the 3 different network precisions, I am going to use the deer test image to generate a comparison.  That will be my performance baseline before using the webcam captured images.

 

Case 1:

W1A1 - 1 bit weights and 1 activation

Inference took 1582.00 microseconds

Classification rate: 632.11 images per second

Inferred class: 4

Class name: Deer

 

Case 2:

W1A2 - 1 bit weight and 2 activation

Inference took 1627.00 microseconds

Classification rate: 614.63 images per second

Inferred class: 4

Class name: Deer

 

Case 3:

W2A2 - 2 bit weights and 2 activation

Inference took 4867.00 microseconds

Classification rate: 205.47 images per second

Inferred class: 4

Class name: Deer

 

All 3 networks correctly inferred the 'deer' class.  You can see the increase in inference time with increased complexity.  So what did the complexity buy us?

 

Here is a comparison of the classification rankings:

 

W1A1:

W1A2:

W2A2:

For this image W1A2 and W2A2 are clearly better than W1A1, but there isn't much difference between W1A2 and W2A2.  W2A2 is 3x slower than W1A2.

 

To be continued.....

 

I had intended to put all the CIFAR-10 stuff in a single blog, but the webcam results surprised me a bit so I'll finish in another post.