🚁 Search & Rescue#
Multi-agent environment, modelling a group of agents searching a 2d environment for multiple targets. Agents are individually rewarded for finding a target that has not previously been detected.
Each agent visualises a local region around itself, represented as a simple segmented view of locations of other agents and targets in the vicinity. The environment is updated in the following sequence:
- The velocity of searching agents are updated, and consequently their positions.
- The positions of targets are updated.
- Targets within detection range, and within an agents view cone are marked as found.
- Agents are rewarded for locating previously unfound targets.
- Local views of the environment are generated for each searching agent.
The agents are allotted a fixed number of steps to locate the targets. The search space is a uniform square space, wrapped at the boundaries.
Many aspects of the environment can be customised:
- Agent observations can be customised by implementing the
ObservationFninterface. - Rewards can be customised by implementing the
RewardFninterface. - Target dynamics can be customised to model various search scenarios by implementing the
TargetDynamicsinterface.
Observations#
searcher_views: jax array (float) of shape(num_searchers, channels, num_vision). Each agent generates an independent observation, an array of values representing the distance along a ray from the agent to the nearest neighbour or target, with each cell representing a ray angle (withnum_visionrays evenly distributed over the agents field of vision). For example if an agent sees another agent straight ahead andnum_vision = 5then the observation array could be
1 | |
where -1.0 indicates there are no agents along that ray, and 0.5 is the normalised
distance to the other agent. Channels in the segmented view are used to differentiate
between different agents/targets and can be customised. By default, the view has three
channels representing other agents, found targets, and unlocated targets respectively.
- targets_remaining: float in the range [0, 1]. The normalised number of targets
remaining to be detected (i.e. 1.0 when no targets have been found).
- step: int in the range [0, time_limit]. The current simulation step.
- positions: jax array (float) of shape (num_searchers, 2). Agent coordinates.
Actions#
Jax array (float) of (num_searchers, 2) in the range [-1, 1]. Each entry in the
array represents an update of each agents velocity in the next step. Searching agents
update their velocity each step by rotating and accelerating/decelerating, where the
values are [rotation, acceleration]. Values are clipped to the range [-1, 1]
and then scaled by max rotation and acceleration parameters, i.e. the new values each
step are given by
1 | |
and speed
1 | |
Once applied, agent speeds are clipped to velocities within a fixed range of speeds given
by the min_speed and max_speed parameters.
Rewards#
Jax array (float) of (num_searchers,). Rewards are generated for each agent individually.
Agents are rewarded +1 for locating a target that has not already been detected. It is possible
for multiple agents to newly detect the same target inside a step. By default, the reward is
split between the locating agents if this is the case. By default, rewards granted linearly
decrease over time, from +1 to 0 at the final step. The reward function can be customised by
implementing the RewardFn interface.