You probably have heard a lot about deep learning and AI and its impact on every aspect of our lives. So if you are inquisitive and excited to try your hand at developing a fun application that leverages deep learning and more importantly deploy it on a Raspberry Pi, this post is for you. Of course, there are lots of ways to get started with deep learning and then deploy the trained network on your desktop or on the cloud for inference.

But, the path to deploying a deep learning algorithm to a Raspberry Pi is not so straightforward, requiring some C++ know-how. So, if you have already dabbled in a few things but you were stuck, we hope this post is also for you. Because in this post we’ll highlight a workflow that can quickly get you started with deep learning network and deploy it on a Raspberry Pi as I have done here with the pedestrian detection example or the image classification example below.

 

You often hear about the need for GPUs for training deep learning algorithms and it’s because you need a lot of compute power for training. But you don’t need such high compute power for inference. So, our focus in this post will be to demonstrate the inference piece running on the Raspberry Pi.

Training vs Inference

 

Before we begin, here is the high-level overview of the steps involved in training a deep learning algorithm and deploying it to production environment either to an enterprise environment or embedded hardware.

 

High level workflow for training and deploying deep learning networks

 

I am going to walk through some of these key concepts and steps, to go from training deep learning algorithms for some of the common tasks, like object detection, classification etc. to deploying these algorithms on a Raspberry Pi, using code generation. Deep learning’s applications are not limited to image processing and computer vision and the concepts from this post, can be applied to detect patterns/features on audio signals, text and time-based signals but we will mainly refer to vision-based applications for simplicity. We’ll briefly go over topics such as labeling ground truth, data augmentation, using pretrained networks and generating optimized C++ code for prototyping and deploying to Arm Cortex A processors like the Raspberry Pi.

 

Getting started with Deep Learning

 

One of the quickest ways to get started with deep learning is to start with a pretrained network. For instance, for image classification, you might consider popular pretrained models like VGG, ResNet or Inception or for object detection, YOLO is a popular model of choice. There are several examples in MATLAB to get started with one such pretrained network. However, if you already have a model in another framework such as Keras, Tensorflow or Caffe, you can interoperate with MATLAB by importing and exporting these networks.

Here is a list of pretrained models supported in MATLAB.

MATLAB supports a wide variety of network architectures from convolutional networks to LSTMs and it is interoperable with open source deep learning frameworks. So, you can import and export models with other deep learning frameworks using the ONNX model format or specific converters.

Explore how you can import models in and out of MATLAB:

 

Interoperability of MATLAB with open source frameworks

 

The next key step then is to train the pretrained network on your own data to fit your application needs. This is very important because the predicted output of the network is very dependent on your training data and how close it is to the actual inputs upon deployment.

Now using the Raspberry Pi Support from MATLAB, you can acquire training data from the sensors and imaging devices connected to the Raspberry Pi, to train your network. However, one of the challenges that a lot of deep learning practitioners run into, is in labeling the data.

Before we can start training a network model, first we need a set of labeled training data, a corpus of images annotated with the locations and labels of objects of interest. This requires sifting through every image or frame of video or time series data and label the locations of all objects of interest. This process is known as ground truth labeling. Ground truth labeling is often the most time-consuming part of training. Using tools like the data labeler for audio, video, image and time series data, you can automate this laborious task.

Another challenge would be the volume of data and as you may already know, you need a lot of data to train a network. You can use tools for data augmentation in MATLAB to address this. This might be a bit advanced, if you are just starting out, but you can explore more about automatic labeling and data augmentation in the links below:

 

After labeling your data, you need to pick a pretrained network for transfer learning. For instance, for image classification, you might consider popular pretrained models like VGG, ResNet or Inception. You can either use one of these pretrained models that are available or import one into MATLAB and then modify the network to suit your application. You can update the individual layers to customize the network, if you have the expertise, and you can intuitively visualize your network architecture using the network analyzer. However, a simple transfer learning approach would be to replace the final few layers to adapt the network to your application needs.

 

If you are proficient in machine learning and deep learning, there is a lot of flexibility and you can use intuitive apps to build or modify your networks. But for the beginners let’s keep it simple and you can start with one of the several examples in MATLAB documentation that use pretrained networks. These examples also have the datasets to train and validate the models.

 

Once you are ready for training, you can train your model on your local machine either on the CPU or GPU or scale to clusters all with a few simple training options. Training is a complete topic onto itself that requires a dedicated discussion to address topics like setting up training parameters, hyper parameter tuning etc. Please refer to detailed documentation here if you are interested in learning more.

 

options = trainingOptions('adam', ExecutionEnvironment','gpu',…

‘MaxEpochs',maxEpochs,'MiniBatchSize',miniBatchSize,…

‘GradientThreshold',1, 'Verbose',false, 'Plots','training-progress');

 

Prototyping and deploying your deep learning application to Raspberry Pi

 

Now that you have a trained network, which is the brains of your application, you can design your application logic around it. This typically includes some pre-processing logic to prepare the input before you pass it in the right format to the trained network for inference and some post processing logic to use the predicted output to drive an action. Figure below shows the pseudo code for such a function. You can test this function using live data from the Raspberry Pi to validate that you get the expected behavior. Once you are confident of the algorithm behavior, you can move onto generating the C++ code and deploying the stand-alone application to the Raspberry Pi.

function output = entry_point_func(input)

I = pre_processing_function(input);

persistent trainednet;

if isempty(trainednet)

  trainednet = coder.loadDeepLearningNetwork('trained_newtwor_saved.mat');

end

prediction = trainednet.predict(I);

output = post_processing_function(prediction);

 

You might have heard a lot about the need for GPUs for training machine learning algorithms. However, you don’t need such high compute for inference. Infact, Raspberry Pi 3 is a great example of such a hardware. It has an Arm cortex A53 which is quite powerful for inference for applications like image classification or even object detection.

Arm’s Compute Library is a low-level collection of routines that are optimized for certain Arm architectures that use NEON instructions like the Arm Cortex A53. Using these routines gives you an optimal performance for your computer vision and machine learning algorithms on the Arm Cortex A processor. If you are interested in learning more about Arm’s Compute Library, check out the Github repository and see here for instructions on installing Arm Compute library on the Raspberry pi.

 

The figure below illustrates how code generation integrates with the Arm Compute Library.

Code generation from deep learning algorithm                      Code generation for Arm processors                                                                                                             

 

Using MATLAB Coder, you can quickly generate code from the entire application. Once you generate the code you can always integrate it with any handwritten code or custom libraries like OpenCV. Alternatively, the support package supports a limited set of I/O for code generation like the webcam interface. Please refer to the example here for the detailed steps to generate code and to deploy to a Raspberry Pi.

You now have a high-level understanding of the steps to get started with prototyping and deploying a deep learning algorithm on a Raspberry Pi starting with some of the built-in examples. You also have a lot of resources to learn about some of the concepts in greater detail. So, what are you waiting for – pick an example and enjoy the Pi.

 

Additional resources to get started with Deep Learning

 

  1. Deep Learning with MATLAB: Transfer Learning in 10 Lines of MATLAB Code
  2. Take the self-paced Deep Learning Onramp
  3. Classify Image Using Pretrained Network- GoogLeNet
  4. Pedestrian detection example video
  5. Image classification example video