6 Replies Latest reply on Apr 12, 2018 4:12 PM by bernhardmayer

    Deep Neural Network Benchmark with Raspberry Pi 2, 3 and 3+

    bernhardmayer

      With this benchmark I want to show you how the different versions of the Raspberry Pi perform with deep neural networks and how much power they consume. I am runing the network with OpenCV. The tested Raspberry Pi models are Raspberry Pi 2, Raspberry Pi 3 and Raspberry Pi 3 Model B+.

       

      OpenCV installation

       

      Reasonable support for neural networks in OpenCV began with version 3.3.0. Unfortunately the latest OpenCV version in the Raspbian repositories is 2.4.9.1. So you have to install it manually and compile it from the sources. You can find a good installation guide here: https://www.pyimagesearch.com/2016/04/18/install-guide-raspberry-pi-3-raspbian-jessie-opencv-3/

      You can leave out the python stuff since we don't need it here. I installed the latest OpenCV version which is now 3.4.1. On my Raspberry Pi 3 Model B+ I encountered a lot of problems when I used all 4 cores for compiling. So I had to compile it using only a single core. Thus it took about 5 hours to compile.

       

      Running deep neural networks

       

      There exist different framworks for neural networks but most of them are supported by OpenCV. I am using the caffe framework because there already exists a nice example in the OpenCV documentation: https://docs.opencv.org/3.4.0/d5/de7/tutorial_dnn_googlenet.html

      It uses the pre trainend GoogLeNet for image classification. You can find information on the network on this website: Caffe | Model Zoo   So, no training is necessary, just download the network and let it run. Furthermore training deep neural networks on the Raspberry Pi is no fun because you need way more computing power to do this in reasonable time.

       

      I adjusted the example a little bit since using the same image like thousands have done before is a little bit lame. I will use images from the Raspberry Pi camera and find out what the network sees in them. This is a little bit more realistic because there are a lot of use cases for this in, for example, robotics where small computers with real time capabilities like the Raspberry Pi are needed at the edge.

       

      Here is my code:

      #include <iostream>
      #include <string>
      #include <opencv2/dnn.hpp>
      #include <opencv2/core/utils/trace.hpp>
      #include <opencv2/opencv.hpp>
      #include <thread>
      
      
      // global variables for exchange between threads
      cv::VideoCapture cap;    // create camera input
      cv::Mat cameraImage;  // create opencv mat for camera
      
      void cameraThread(void)    // function for the camera thread
      {
          while(1)    // loop forever
          {
              cap >> cameraImage;    // copy camera input to opencv mat
          }
      }
      
      /* Find best class for the blob (i. e. class with maximal probability) */
      static void getMaxClass(const cv::Mat &probBlob, int *classId, double *classProb)
      {
          cv::Mat probMat = probBlob.reshape(1, 1); //reshape the blob to 1x1000 matrix
          cv::Point classNumber;
          minMaxLoc(probMat, NULL, classProb, NULL, &classNumber);
          *classId = classNumber.x;
      }
      static std::vector readClassNames(const char *filename = "synset_words.txt")
      {
          std::vector classNames;
          std::ifstream fp(filename);
          if (!fp.is_open())
          {
              std::cerr << "File with classes labels not found: " << filename << std::endl;
              exit(-1);
          }
          std::string name;
          while (!fp.eof())
          {
              std::getline(fp, name);
              if (name.length())
                  classNames.push_back( name.substr(name.find(' ')+1) );
          }
          fp.close();
          return classNames;
      }
      
      int main( int argc, char** argv )
      {
          int ende=0;
          std::thread tcam;    // create thread pointer
      
          std::cout << "OpenCV version : " << CV_VERSION << std::endl;    // print opencv version for debug
      
          std::string model = "bvlc_googlenet.caffemodel";    // define filenames for neural network
          std::string proto = "bvlc_googlenet.prototxt";
      
          cv::dnn::Net net = cv::dnn::readNetFromCaffe(proto, model); // open net
          
          if (net.empty())
          {
              std::cerr << "Can't load network by using the following files: " << std::endl;
              std::cerr << "proto: " << proto << std::endl;
              std::cerr << "model: " << model << std::endl;
              return -1;
          }
          
          cap.open(0);        // open camera
          if(!cap.isOpened()) 
          {
              std::cout << "no camera found!" << std::endl;
              return -1;
          }
          cap >> cameraImage;    // copy camera input to opencv mat to get data to startup
          tcam=std::thread(cameraThread);    // start extra thread to get camera input
          std::cout << "starting ..." << std::endl;
      
          while(ende==0)
          {
      //GoogLeNet accepts only 224x224 BGR-images
              cv::Mat image;
              cameraImage.copyTo(image);    // copy camera image to have local copy for modifications
              cv::Mat inputBlob = cv::dnn::blobFromImage(image, 1.0f, cv::Size(224, 224),cv::Scalar(104, 117, 123), false);   //Convert Mat to batch of images
          
              cv::TickMeter t;
              net.setInput(inputBlob, "data"); //set the network input
              t.start();  // start timer
              cv::Mat prob=net.forward("prob");    // computer output
              t.stop();  // stop timer
      
              int classId;
              double classProb;
              getMaxClass(prob, &classId, &classProb);//find the best class
              std::vector classNames = readClassNames();
              std::cout << "Best class: #" << classId << " '" << classNames.at(classId) << "'" << std::endl;    // print best class
              std::cout << "Probability: " << classProb * 100 << "%" << std::endl;    // print propability
              std::ostringstream buf;
              buf << "Best class: #" << classId << " '" << classNames.at(classId) << "' - ";
              buf << "Probability: " << classProb * 100 << "%";
              cv::putText(image,buf.str(),cv::Point2f(10,20),0,0.5,cv::Scalar(255,255,255),1,1);  // print buf to image
      
              std::cout << "Time: " << (double)t.getTimeMilli() / t.getCounter() << " ms" << std::endl;  // print result of timer
      
      
              cv::imshow("image",image);    // show image
              char key=cv::waitKey(1);    // check if end
              if(key=='e') ende=1;
          }
          return 0;
      }

       

      The program uses a second thread to read the data of the camera so that there is no delay in the stream.

       

      Power measurements

       

      To power the Raspberry Pi I made a little PCB with a Texas Instruments TPS54335 buck regulator on it. It takes 12 V as an input and generates 5 V for the Raspberry Pi. Additionally there is a Texas Instruments INA226 power monitor on the 12 V rail to measure input voltage and current. The INA226 is also connected to the I2C bus of the Raspberry Pi. So the Raspberry Pi can read out its current power consumption (Actually there are some losses in the voltage regulator but when I use the same regulator for all tests it should be comparable)

       

      Test setup

       

      I did all test with the same camera and SD card. I just exchanged the Raspbery Pi. The system is running Raspbian Stretch with Destop, Version March 2018, but the boot option is set to console only. Ethernet is connected, WiFi is off. Nothing is connected to USB, no monitor attached.

      The system was installed and setup on the Raspberry Pi 3 Model B+ and then the same SD-Card was transferred to the other models.

       

      During all tests the camera will see the scene on the screenshot below and identify it as teddy bear.

      test scene for neural network test

       

      Results

       

      power in idle

      Since I already had the test setup I also made some measurements in the idle mode. I was logged in via ssh and was at the bash prompt.

       

      Model
      Current (A)
      Power (W)
      Raspberry Pi 20.1281.53
      Raspberry Pi 30.1471.76
      Raspberry Pi 3 Model B+0.2032.43

      power during calculation of neural network

       

      Model
      Current (A)
      Power (W)
      Raspberry Pi 20.2783.32
      Raspberry Pi 30.4215.05
      Raspberry Pi 3 Model B+0.5636.76

      execution time of neural network

       

      Model
      time (ms)
      Raspberry Pi 22635
      Raspberry Pi 31804
      Raspberry Pi 3 Model B+1548

      energy consumption for each cycle of the neural network

      Given the execution time and power consumption one can calculate how much energy each cycle of the neural network consumes.

       

      Model

      Energy (Ws)

      Raspberry Pi 28.75
      Raspberry Pi 39.11
      Raspberry Pi 3 Model B+10.46

       

      Conclusion

       

      The new Raspberry Pi 3 Model B+ calculates the neural network nearly twice as fast as the Raspberry Pi 2. On the other hand it also needs twice as much power. When you compare Raspberry Pi 2 and 3 the newer is a lot faster and consumes more power but the energy efficiency of these two models is nearly the same. The new Raspberry Pi 3 Model B+ has a worse efficiency than its predecessors. It has to be cleared why it is so bad. Maybe it is because of the Gigabit-Ethernet-IC or the new Wifi module.

      But for now I have to state: The new model B+ should only be taken if you really need the processing power und your results fast. Projects where energy efficiency is important should stick to the old Raspberry Pi 3 or even the Raspberry Pi 2.

       

      Do you have any ideas or comments on my measurements?