In any Computer Vision project, 'Image Acquisition' is considered as a very important stage. To many it sounds like a step which is the easiest, but if you don't give good attention to the details it can create lot of troubles later.


Some important aspects of this step are : image quality, frame rate, size considerations, blurring aspects, light conditions/changes, storage of frames.


In my project 'Raise  your Hands !' , this step is very crucial since I've to detect the hands as soon as they are raised in real-time. The system cannot wait for the camera to grab a big image frame and write it to the disk (SLOW!). Else the teacher would hear the system after a big latency which isn't good for anyone.

So, I need to focus on Medium Image Quality instead of a very high resolution image frames which would make the system slow.


Moreover, the process of raising hands by students isn't a sudden action and it happens over a couple of seconds. Most of the capturing by any modern camera is done at 25-30 frames per second (fps) which means that each frame is grabbed at 1/30th of a second. Now, in reality this is good for visual perception (else you would see jerky video) but for a system which needs to see an event which happens over 2-5 seconds it is an overkill. My system is modified to yield 1 FPS which just works in these scenarios and saves a lot of disk space (if you plan to store the frames somewhere).


I've kept most of the camera settings to default, as the recording happens in an indoor environment which is quite helpful in dealing with exposure and focal length settings. Other details of the settings are:

HP Digital Camera

Dimensions: 1920x1080

Resolution: 96 dpi

Bit depth : 24

Size: ~100-150 KB per frame


In windows system, we can use the following command to convert a video file to image frames (where -r sets fps)

ffpmeg -i vid.mp4 -r 1 frames%05d.jpg