Skip to content

🚁 Search & Rescue#

Multi-agent environment, modelling a group of agents searching a 2d environment for multiple targets. Agents are individually rewarded for finding a target that has not previously been detected.

Each agent visualises a local region around itself, represented as a simple segmented view of locations of other agents and targets in the vicinity. The environment is updated in the following sequence:

  • The velocity of searching agents are updated, and consequently their positions.
  • The positions of targets are updated.
  • Targets within detection range, and within an agents view cone are marked as found.
  • Agents are rewarded for locating previously unfound targets.
  • Local views of the environment are generated for each searching agent.

The agents are allotted a fixed number of steps to locate the targets. The search space is a uniform square space, wrapped at the boundaries.

Many aspects of the environment can be customised:

  • Agent observations can be customised by implementing the ObservationFn interface.
  • Rewards can be customised by implementing the RewardFn interface.
  • Target dynamics can be customised to model various search scenarios by implementing the TargetDynamics interface.

Observations#

  • searcher_views: jax array (float) of shape (num_searchers, channels, num_vision). Each agent generates an independent observation, an array of values representing the distance along a ray from the agent to the nearest neighbour or target, with each cell representing a ray angle (with num_vision rays evenly distributed over the agents field of vision). For example if an agent sees another agent straight ahead and num_vision = 5 then the observation array could be
1
[-1.0, -1.0, 0.5, -1.0, -1.0]

where -1.0 indicates there are no agents along that ray, and 0.5 is the normalised distance to the other agent. Channels in the segmented view are used to differentiate between different agents/targets and can be customised. By default, the view has three channels representing other agents, found targets, and unlocated targets respectively. - targets_remaining: float in the range [0, 1]. The normalised number of targets remaining to be detected (i.e. 1.0 when no targets have been found). - step: int in the range [0, time_limit]. The current simulation step. - positions: jax array (float) of shape (num_searchers, 2). Agent coordinates.

Actions#

Jax array (float) of (num_searchers, 2) in the range [-1, 1]. Each entry in the array represents an update of each agents velocity in the next step. Searching agents update their velocity each step by rotating and accelerating/decelerating, where the values are [rotation, acceleration]. Values are clipped to the range [-1, 1] and then scaled by max rotation and acceleration parameters, i.e. the new values each step are given by

1
heading = heading + max_rotation * action[0]

and speed

1
speed = speed + max_acceleration * action[1]

Once applied, agent speeds are clipped to velocities within a fixed range of speeds given by the min_speed and max_speed parameters.

Rewards#

Jax array (float) of (num_searchers,). Rewards are generated for each agent individually.

Agents are rewarded +1 for locating a target that has not already been detected. It is possible for multiple agents to newly detect the same target inside a step. By default, the reward is split between the locating agents if this is the case. By default, rewards granted linearly decrease over time, from +1 to 0 at the final step. The reward function can be customised by implementing the RewardFn interface.