Investigating large-scale training of RL agents in a vast and diverse space of simulated tasks



TL; DR

Kinetix is an open-ended reinforcement learning (RL) framework for 2D physics-based control tasks, which can represent a diverse range of environments, from mazes, to video-games, to complex manipulation problems, and everything in between. Kinetix can run at millions of steps per second on a single GPU by using JAX, unlocking large-scale training of general reinforcement learning agents. All environments in Kinetix have the same goal: make the green and blue touch, without green touching red. The agent can act through applying torque via motors and force via thrusters. Through these simple rules, we can represent an astonishing array of tasks, all within a unified framework.

Kinetix: The RL Framework

Kinetix is a 2D-physics-based hardware-accelerated RL environment, meaning that it can represent a large number of diverse tasks, from video games, to classic RL environments to more complex locomotion and manipulation environments. For instance, below we have the classic RL environments CartPole and Acrobot, some more complex robotic locomotion tasks inspired by Mujoco, as well as nontraditional environments where the agent controls multiple parts of a complex system.

Kinetix: A Suite of Handmade Evaluation Tasks

We provide a large set of challenging and diverse RL environments that you can start using immediately (see our main Github repository for more). You can use these environments to train a single, multi-task RL agent, or train on individual tasks, or use these as a heldout evaluation set to test the generalisation of agents.

Below we show each of level in our database. For a dedicated experience (and the ability to edit and save your own levels), please see the gallery.

Kinetix: The Easy-to-use Environment Creator!

While we have designed a set of 74 hand-designed levels, we also provide functionality for anyone to create their own environments using our built-in editor, found here. After creating a level, you can export it and use the new level to evaluate or train a JAX-based RL agent. In other words, you can benefit from the speedup of JAX without needing to actually write any JAX code! All of this is described in more detail in the Kinetix repository.

Kinetix: An Open-Ended Benchmark

Finally, we believe Kinetix serves as an ideal environment to study open-ended learning, automatic curriculum learning, and unsupervised environment design. This is because Kinetix is fast, enabling large-scale experiments, and because it is able to represent a wide range of semantically diverse tasks, as opposed to only small variations of the same task (e.g., different obstacle locations in a maze).

We provide functionality to generate random environments, as well as code to run autocurricula methods on this distribution. We use this to train a general agent on randomly sampled levels and investigate its generalisation capabilities. As well as autocurricula and RL generalisation methods, we believe Kinetix serves as an excellent foundation for future study into areas including agent network capacity, plasticity loss, lifelong learning, multi-task learning.

What Now?

If you are interested, there are many ways to engage with Kinetix:

Citation

      
 @misc{matthews2024Kinetix,
    title={Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks}, 
    author={Michael Matthews, Michael Beukman, Chris Lu, Jakob Foerster},
    year={2024},
    eprint={2410.23208},
    archivePrefix={arXiv},
    primaryClass={cs.LG},
    url={https://arxiv.org/abs/2410.23208}
 }

Acknowledgements

This is based on the Distill Template and the ACCEL Blog. Big thanks to Thomas Foster, Alex Goldie, Matthew Jackson and Andrei Lupu for feedback, discussions and suggestions on this work.