Cleaner Environment#
We provide here a JAX jit-able implementation of the Multi-Agent Cleaning environment.
In this environment, multiple agents must cooperatively clean the floor of a room with complex indoor barriers (black). At the beginning of an episode, the whole floor is dirty (green). Every time an agent (red) visits a dirty tile, it is cleaned (white).
The goal is to clean as many tiles as possible in a given time budget.
A new maze is randomly generated using a recursive division method for each new episode. Agents always start in the top left corner of the maze.
Observation#
The observation seen by the agent is a NamedTuple
containing the following:
-
grid
: jax array (int8) of shape(num_rows, num_cols)
, array representing the grid, each tile is either dirty (0), clean (1), or a wall (2). -
agents_locations
: jax array (int) of shape(num_agents, 2)
, array specifying the x and y coordinates of every agent. -
action_mask
: jax array (bool) of shape(num_agents, 4)
, array specifying, for each agent, which action (up, right, down, left) is legal. -
step_count
: jax array (int32) of shape()
, number of steps elapsed in the current episode.
Action#
The action space is a MultiDiscreteArray
containing an integer value in [0, 1, 2, 3]
for each
agent. Each agent can take one of four actions: up (0
), right (1
), down (2
), or left (3
).
The episode terminates if any agent meets one of the following conditions:
-
An invalid action is taken, or
-
An action is blocked by a wall.
In both cases, the agent's position remains unchanged.
Reward#
The reward is global and shared among the agents. It is equal to the number of tiles which were cleaned during the time step, minus a penalty (0.5 by default) to encourage agents to clean the maze faster.
Registered Versions 📖#
Cleaner-v0
, a room of size 10x10 with 3 agents.