machine vision.jpg

Google’s Open Images and YouTube8-M have recently released datasets of annotated images and film that can serve to train a machine in a few days to a few weeks. This could give researchers and coders alike a leg up to innovate new software breakthroughs for photo and video analysis. Computer generated photo captions created using Google’s Open Images database. (via Google)

Google announced their release of two massive datasets to expedite machine learning in early October – and they are free and available to anyone. Machine learning is basically a process of using information that has already been tagged and tested to allow machines to learn enough from this to create better algorithms to predict and tag things that they don’t already know.

However, you need a massive amount of tagged data to begin with to create an intelligent piece of software. With Google’s release of two massive visual datasets from Open Images and YouTube8-M, anyone (according to them) has what they need to create their own intelligent machine from the ground up. Google’s hope is that this data can be used to create video and image analysis tools that rival existing ones.


The Open Images dataset [available here] has 9 million entries that were tagged by computers and corrected by humans in a collaboration with Google, Carnegie Mellon and Cornell. The YouTube8-M database [available here ] is even more impressive with 8 million videos totaling 500,000 hours of footage that has still images tagged and already extracted from the videos. Hence, the software enables bots to analyses footage frame by frame in a similar way to Open Images. However, Google is really hoping for an analysis tool to emerge from this innovation that allows better, real-time analysis of video that is better than a still image approach. The YouTube8-M team thinks that this is the most comprehensive and diverse video dataset that currently exists for machine learning.


So, what to do with all this data? The Google Research team’s blog provides lots of ideas based upon what they have already done, but what could certainly be improved upon. Some uses of machine learning using Open Images datasets include automatic captions for images, computer generated responses to shared photos, and advanced filter hierarchies for DeepDream and artistic style transfer. If you want to play around with an existing Google machine learning tool, you can check out the Google Cloud Vision API (


According to the Google Research team’s experiments with the YouTube8-M dataset, they were able to train a machine in less than a day using TensorFlow, which is an existing Open Source library for machine learning. However, there is no word yet on them being able to do anything ground breaking with this, other than trying to annotate videos real-time. Anyone who has tried Closed Captioning on YouTube knows that this is a major area for improvement. However, the tools are now out there and it seems like some very cool stuff can come out of it.


Have a story tip? Message me at: cabe(at)element14(dot)com