Researchers used the DeepMind Lab to compare how children and AI agents learn about the world. (Image Credit: DeepMind)


Researchers from Alphabet’s DeepMind and the University of California, Berkeley, have proposed a framework to compare how children and AI agents learn about the world via exploration behavior. The team’s work could narrow the gap between artificial intelligence and humans when they’ve learned new skills. For example, it could lead to the development of robots that could pick and pack millions of products while staying away from obstacles.


There are a number of questions surrounding exploration behaviors when it comes to developing a reinforcement learning agent: how should an agent gather enough experience from different environments to produce optimal behaviors? According to the researchers, the problem of explorations is one of the most fundamental issues in exploration learning.


The researchers conducted direct, controlled comparisons between children and AI agents to leverage insights from children’s exploratory behavior to improve the design of RL algorithms. From the day they’re born, humans are capable of exploring their surroundings efficiently to learn new skills.


According to the team, recent evidence suggests that children explore more often than adults, which means that children have the tendency to perform higher amounts of learning than a grown-up. As a result, it can enable powerful, abstract task generalization, which is a form of generalization that could be beneficial to AI agents.  For example, in a study conducted in 2017, preschoolers who played with a toy were able to hypothesize whether the blocks functioned based on its color or shape and used that hypothesis to make conclusions about a new toy or block. AI is capable of estimating this type of task adaption, but it struggles without human supervision and intervention.


A child using an Arduino-based controller to navigate through the maze in the DeepMind Lab. (Image Credit: DeepMind)


The researchers used DeepMind Lab, a learning environment based on the Quake game engine that provides a series of 3D navigation and puzzle-solving tasks for learning agents. The tasks require physical or spatial navigation abilities and are modeled after playable games for children. In their experimental setup, children are able to interact with DeepMind Lab via a custom Arduino-based controller, which exhibits the same four controls AI agents would use in this environment. (move forward, move back, move left, and turn right).


In their experiments, the researchers presented a method to compare the behaviors of children and RL agents in simulated exploration tasks. As a result, researchers were able to precisely test questions about how children and RL agents explore, along with how and why they differ.


In the first experiment, children were instructed to complete two mazes, one after another, with the same layout. In the first maze, they explored freely (“no-goal” condition), and in the second maze, they were told to search for a “gummy.” In the first maze, the children’s search strategies closely matched a depth-first search (DFS) agent, which pursues an unexplored path until it arrives at a dead-end, and from there, it will turn around and explore the last path it saw.  The team discovered that the children made choices consistent with DFS 89.61% of the time compared to the second maze, where they made choices consistent with DFS 96.04% of the time.


The top portion shows what a child sees when they start three of the different phases of the maze. The bottom shows the layout of each phase. (Image Credit: DeepMind)


In another experiment, children aged four to six were instructed to complete two mazes with three phases. In the first phase, the children explored the maze either in a no-goal condition, a “sparse” condition, which contains a goal but with no rewards, or a “dense” condition with rewards leading up to the goal. In the second phase, children were told to find the goal, which was in the same exploration area. In the last phase, they were told to find the goal, but the optimal route to the goal was blocked.


The team theorized that children and RL agents would follow the dense-reward path to the goal in the first phase of the “dense” condition, but they will be able to find the goal quicker in the second phase. However, it will take longer to locate the goal in the final phase compared to those in the “sparse” condition since the path is blocked. Some RL agents may be unsuccessful when they attempt to locate the goal in the final phase if they try to switch from exploitation to exploration when they discover the path is blocked.


Initial experimental data suggests that children aren’t likely to explore an area in the dense rewards condition. The researchers also noted that this lack of exploration doesn’t hinder their performance in the final phase. This has the opposite effect on RL agents, where dense rewards make them less incentivized to explore, leading to poor generalization.


The researchers wrote in the paper, “This work only begins to touch on a number of deep questions regarding how children and agents explore … In asking [new] questions, we will be able to acquire a deeper understanding of the way that children and agents explore novel environments, and how to close the gap between them.”


Have a story tip? Message me at: cabe(at)element14(dot)com