This will be the last in my series of Path II Programmable blogs. The Path II Programmable training courses and project have been a great learning experience. Although it's not without headaches and frustration, I'm really enjoying working with the the Xilinx MPSoC FPGA ecosystem and the Avnet Ultra96v2 hardware. Doing a project really pulls together all that you've learned from the training but it unfortunately also exposes what you didn't learn or understand. I appreciate Element14 for the opportunity to participate in Path II Programmable. I added a list of my other PIIP blogs at the end of this post.
Project: Embedded Vision Processor
Here is the block diagram of the project:
- Capture IP camera RTSP video stream using OpenCV
- Process video stream with neural network implemented in PL
- Process neural network response in PS
- Output video stream via Display Port to HDMI monitor
Even with a simple feature set there are many elements involved in implementing this project on the Ultra96, not the least of which is the learning curve for the tools and hardware. I estimate that it will take me 4-8 additional weeks to fully implement the system that I want. For the purposes of PIIP, I thought that I would demonstrate some of the features and indicate what I have accomplished and what I have left to do.
- JTAG/UART Pod
- Power supply, 12V 4A
- Trendnet USB3 to Ethernet adapter (not used - see text)
- HDMI monitor
- Outdoor PTZ IP camera (digital zoom only) 1280x720
- Indoor PTZ IP camera (digital zoom only) 1280x720
- Mini DP to HDMI active cable
- Micro USB to USB-A cable
- 16GB FAT32 microSD card
I needed to create a hardware/software platform using the Ultra96v2 that would provide the capabilities required for the feature set. I used the DPU TRD for the Ultra96 as a starting point (Deep Learning Processor Reference Design) DPU Integration Tutorial https://github.com/Xilinx/Edge-AI-Platform-Tutorials/tree/master/docs/DPU-Integration. The reference design is for an Ultra96v1 and the 2018.2 toolset, so there were some modifications required to use the Ultra96v2 and 2018.3 toolset.
The reference design inserts a DPU (Deep Learning Processor) in the Ultra96 PL which is what I wanted. The DPU is configured using the DNNDK tool. The reference design provides the software and models for two configurations: resnet50 and face detection. The face detection example used a webcam for the video source and processed the video stream to place a bounding box on detected faces and then output the result on the Display Port. Since this is very close to what I am trying to implement, I used it as my starting point. I built and tested a bootable image and rootfs for the Ultra96v2 as documented in PIIP Project - Port DPU TRD for Ultra96v1 to v2 .
Capture IP camera RTSP video stream using OpenCV
This task turned out to take a lot more effort than I expected. The face detection program was already using OpenCV and the VideoCapture object to capture the webcam stream, so I thought it would be as simple as using the rtsp URL and format as the VideoCapture arguments. It turns out that the DPU TRD PetaLinux build did not include all the required OpenCV libraries and utilities that I needed. Both IP cameras that I am using have a max resolution of 1280x720 and my plan is to use the high resolution h264 encoded streams. I ended up needing to add libav, ffmpeg, and gstreamer to the PetaLinux build configuration. My next issue was a total surprise. I was using a Trendnet USB3 to Ethernet adapter for networking but the USB side started getting very intermittent (device not detected). This adapter still works on my Windows10 box, so I wonder if my Ultra96 has some issues. I decided at that point just to fall back to using WiFi. Big surprise -- the DPU TRD does not have the WiFi configured and that was very painful to fix. I did find some very useful github repositories in the process. Avnet has repositories for all its hardware platforms, board support packages, and petalinux configs. https://github.com/Avnet/petalinux, https://github.com/Avnet/hdl, https://github.com/Avnet/bdf. I ended up looking through the ultra96v2_oob petalinux scripts to figure out how to add the WiFi to the TRD design. After completing that, adding the rtsp source did turn out to be as simple as using the correct rtsp URL and the CAP_FFMPEG format specifier with VideoCapture. What I am going to look at in the future is whether or not I can get some additional acceleration with xfOpenCV. I am treading on thin ice here as I really haven't investigated which functions can be accelerated. It would have been really nice if the Ultra96 had the hardware VCU (codec) for h264 decode and encode but that's only available on EV FPGAs.
Process video stream with neural network implemented in PL
The design flow for the DPU is shown below. I'd like to implement and train different neural networks for specific image detection and classification. The key to doing this is in the lower left of the flow. You need to use the Decent and DNNC tools of the DNNDK to generate the model.elf for the SDK. I have not tried learning the DNNDK tool yet. I think I'll play around with the provided resnet50 classifier first, but eventually I'll need to learn the DNNDK to build the correct models.
Process neural network response in PS
I am assuming that I can get flags (responses) from the DPU that signal specific detection or classification events. I am starting to go through the user guide. Again this will require using the DNNDK. There are a whole series of functions that I would like to generate - object tracking, alerts, image and stream saving and tagging. I have started looking at what I need to do to interact with the cameras. IP cameras normally have a webserver interface that uses regular HTTP commands to control the camera functions. To test the control functionality I've incorporated a lightweight web client in my programs. I have a quick demo of panning the driveway camera. I have found that the cameras that I am using do not fully implement their documented APIs. One feature that is not working is the move to absolute position. I doubt that these cameras are fully ONVIF compliant but my plan is try that interface next - otherwise I will need to implement my own absolute positioning functions to do object tracking .
Output video stream via Display Port to HDMI monitor
The existing example already had the output configured, so getting an output display was not a problem. Just needed to remember to export the DISPLAY variable. What I need to work on is getting the appropriate formatting (size and mode configuration) for the monitor. You'll see in my example videos that I need to clean up the image location and size. I'll admit that I have no clue as to how to configure the Matchbox display manager. The example videos are iPhone videos of my HDMI monitor screen, so the image quality isn't the best. I have an HDMI capture device that I could use but it isn't currently set up (I'll try to use it for future videos). The videos are showing the default desktop on the Ultra96 and the DPU output displays in the desktop frame when the appropriate .elf is run. I noticed that the time on the desktop did not match the time on the camera, so I updated my Petalinux build to add ntp and ntpd for time synchronization and I switched to using a static IP for good measure. Now the desktop time is in sync with the camera time.
Face detection running on indoor (Workroom) IP camera
Face detection running on outdoor (Driveway) camera
The panning is done via a series of HTTP GET commands to the camera.
Multiple cameras displayed using hconcat function of OpenCV
This is just a mock up. I am not running detection on both cameras simultaneously yet. I need to figure out how to do that.
So, it seems that I have more stuff not done than finished but it feels like I did a lot of work. I guess that's the nature of learning something new. I am definitely looking forward to getting this project functioning the way I want. There are so many new tools that I haven't learned (DNNDK, SDSoC, xfOpenCV, etc). Unfortunately these new tools all have their specific OS and tool version requirements which makes things difficult. I just saw in the tool flow that some tools require the Pro version of Windows 10 which is not on the machine that I switched to earlier. And Petalinux has been a real struggle... I've gotten used to the convenience of all the tools available in a full featured Linux distribution like Ubuntu so it is really hard to debug without them. When I was trying to understand my time synchronization problem I really missed tools like ntpq and ntpdate. I haven't figured out how to install them but at least ntp and ntpd are now running correctly. And I still haven't figured out why I have a DNS problem running with DHCP but using a static IP works. Plus without a package manager, it seems like I am rebuilding the rootfs constantly to add features. As I said at the beginning, this stuff is both fun and a headache .