An Image frame contains a lot of information ; color for each pixel , edges of every object, objects inside the field of view.
Sequence of these images (which is a video) gives a temporal information of all the above characteristics which is also very useful in detecting the actions performed by humans and is widely used in the vision community.
After the Image gets acquired, next step is to filter the important information and discard the rest. Storing all the information is very expensive (both in time and cost) which is not good. In our scenario color has no information. We need edges to detect the hands/face and temporal information will let us know about the event of raising hands.
Also, I reduced the dimension of the image by half as full HD image quality takes more processing time and experimentally i found that half the resolution gives approximately the same performance.
Hence, these are the two pre-processing steps applied:
1. Converting the color image to rgb ; using matlab vision toolbox api rgb2gray().
2. Reducing the resolution by half; using matlab vision toolbox api imresize().