Investigating large-scale training of RL agents in a vast and diverse space of simulated tasks
Kinetix
is an open-ended reinforcement learning (RL) framework for 2D physics-based
control tasks, which can represent a diverse range of environments, from mazes, to video-games, to
complex manipulation problems, and everything in between.
Kinetix
can run at millions of steps per second on a single GPU by using JAX, unlocking large-scale
training of general reinforcement learning agents.
All environments in Kinetix
have the same goal: make the green and blue touch, without green touching red.
The agent can act through applying torque via motors and force via thrusters. Through these simple rules, we can represent an astonishing array of tasks, all within a unified framework.
Kinetix: The RL Framework
Kinetix
is a 2D-physics-based hardware-accelerated RL environment, meaning that it can represent a
large number of diverse tasks, from video games, to classic RL
environments to more complex locomotion and manipulation environments. For instance, below we have the classic RL
environments CartPole and Acrobot, some more complex robotic locomotion tasks inspired by Mujoco, as well as nontraditional environments where the agent controls multiple
parts of a complex system.
Kinetix: A Suite of Handmade Evaluation Tasks
We provide a large set of challenging and diverse RL environments that you can start using immediately (see our main Github repository for more). You can use these environments to train a single, multi-task RL agent, or train on individual tasks, or use these as a heldout evaluation set to test the generalisation of agents.
Below we show each of level in our database. For a dedicated experience (and the ability to edit and save your own levels), please see the gallery.
Kinetix: The Easy-to-use Environment Creator!
Kinetix
: An Open-Ended BenchmarkFinally, we believe Kinetix
serves as an ideal environment to study open-ended learning, automatic
curriculum learning, and unsupervised environment design. This is because Kinetix
is fast, enabling
large-scale experiments, and because it is able to represent a wide range of semantically diverse tasks, as
opposed to only small variations of the same task (e.g., different obstacle locations in a maze).
We provide functionality to generate random environments, as well as code to run
autocurricula methods on this distribution.
We use this to train a general agent on randomly sampled levels and investigate its generalisation capabilities.
As well as autocurricula and RL generalisation methods, we believe Kinetix
serves as an excellent foundation for future study into areas including agent network capacity, plasticity loss, lifelong learning, multi-task learning.
Kinetix
:
@misc{matthews2024Kinetix,
title={Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks},
author={Michael Matthews, Michael Beukman, Chris Lu, Jakob Foerster},
year={2024},
eprint={2410.23208},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2410.23208}
}
This is based on the Distill Template and the ACCEL Blog. Big thanks to Thomas Foster, Alex Goldie, Matthew Jackson and Andrei Lupu for feedback, discussions and suggestions on this work.