AI bots learn to use tools while playing hide-and-seek
AI bots playing hide-and-side discovered and taught themselves how to use tools, researchers say, without being instructed to do so.
Developed by OpenAI, the research project involved teams of 2 or 3 bots playing brief games of hide-and-seek nearly 500 million times. The environments varied but were sparse, often consisting of walls and a collection of movable blocks and ramps that became increasingly important to the bots as they played.
Told only to avoid the seeking team's line-of-sight, hider bots taught themselves to move blocks to hide themselves after playing 22 million times. About 70 million games later, the seekers learned to move ramps and climb them to jump over the hiders' blockades.
The dynamics then became more complicated -- hiders learned to lock ramps, rendering them immovable. In turn, seekers began exploiting the game's mechanics. Instead of pushing blocks, seekers discovered that they could move a box while standing on it, allowing them to jump over any obstacles the hiders built. After 458 million games, the hiders taught themselves the ultimate, game-ending exploitation -- they blockaded themselves, and then rendered all tools immovable.
Researchers wrote that this proves that tool discovery, which they call "Emergent tool use," occurs among AI agents inside competitive environments.
To prove this, researchers placed individual bots inside similar environments, where they were left to explore and experiment with tools without the need to play any game. Unlike the agents in competitive envrionments, the non-competitive, experimental bots showcased increasingly erratic behavior as time went on.
OpenAI writes that this suggests that competitive environments better incentivize AI to showcase human-like behavior, and could be used to train AI agents to develop human-like skills.
From OpenAI's blog post:
Agents trained in hide-and-seek qualitatively center around far more human interpretable behaviors such as shelter construction, whereas agents trained with intrinsic motivation move objects around in a seemingly undirected fashion. Furthermore, as the state space increases in complexity, we find that intrinsic motivation methods have less and less meaningful interactions with the objects in their environment. For this reason, we believe multi-agent competition will be a more scalable method for generating human-relevant skills in an unsupervised manner as environments continue to increase in size and complexity.