Cleaner Environment#

We provide here a JAX jit-able implementation of the Multi-Agent Cleaning environment.

In this environment, multiple agents must cooperatively clean the floor of a room with complex indoor barriers (black). At the beginning of an episode, the whole floor is dirty (green). Every time an agent (red) visits a dirty tile, it is cleaned (white).

The goal is to clean as many tiles as possible in a given time budget.

A new maze is randomly generated using a recursive division method for each new episode. Agents always start in the top left corner of the maze.

Observation#

The observation seen by the agent is a NamedTuple containing the following:

grid: jax array (int8) of shape (num_rows, num_cols), array representing the grid, each tile is either dirty (0), clean (1), or a wall (2).
agents_locations: jax array (int) of shape (num_agents, 2), array specifying the x and y coordinates of every agent.
action_mask: jax array (bool) of shape (num_agents, 4), array specifying, for each agent, which action (up, right, down, left) is legal.
step_count: jax array (int32) of shape (), number of steps elapsed in the current episode.

Action#

The action space is a MultiDiscreteArray containing an integer value in [0, 1, 2, 3] for each agent. Each agent can take one of four actions: up (0), right (1), down (2), or left (3).

The episode terminates if any agent meets one of the following conditions:

An invalid action is taken, or
An action is blocked by a wall.

In both cases, the agent's position remains unchanged.

Reward#

The reward is global and shared among the agents. It is equal to the number of tiles which were cleaned during the time step, minus a penalty (0.5 by default) to encourage agents to clean the maze faster.

Registered Versions 📖#

Cleaner-v0, a room of size 10x10 with 3 agents.