SearchAndRescue
Bases: Environment
A multi-agent search environment
Environment modelling a collection of agents collectively searching for a set of targets on a 2d environment. Agents are rewarded (individually) for coming within a fixed range of a target that has not already been detected. Agents visualise their local environment (i.e. the location of other agents and targets) via a simple segmented view model. The environment area is a uniform square space with wrapped boundaries.
An episode will terminate if all targets have been located by the team of searching agents.
-
observation:
Observation
searcher_views: jax array (float) of shape (num_searchers, channels, num_vision) Individual local views of positions of other agents and targets, where channels can be used to differentiate between agents and targets types. Each entry in the view indicates the distance to another agent/target along a ray from the agent, and is -1.0 if nothing is in range along the ray. The view model can be customised by implementing theObservationFn
interface. targets_remaining: (float) Number of targets remaining to be found from the total scaled to the range [0, 1] (i.e. a value of 1.0 indicates all the targets are still to be found). step: (int) current simulation step. positions: jax array (float) of shape (num_searchers, 2) search agent positions. -
action: jax array (float) of shape (num_searchers, 2) Array of individual agent actions. Each agents actions rotate and accelerate/decelerate the agent as [rotation, acceleration] on the range [-1, 1]. These values are then scaled to update agent velocities within given parameters (i.e. a value of -+1 is the maximum acceleration/rotation).
-
reward: jax array (float) of shape (num_searchers,) Arrays of individual agent rewards. A reward of +1 is granted when an agent comes into contact range with a target that has not yet been found, and the target is within the searchers view cone. It is possible for multiple agents to newly find the same target within a given step, by default in this case the reward is split between the locating agents. By default, rewards granted linearly decrease over time, with zero reward granted at the environment time-limit. These defaults can be modified by flags in
IndividualRewardFn
, or further customised by implementing theRewardFn
interface. -
state:
State
- searchers:
AgentState
- pos: jax array (float) of shape (num_searchers, 2) in the range [0, env_size].
- heading: jax array (float) of shape (num_searcher,) in the range [0, 2π].
- speed: jax array (float) of shape (num_searchers,) in the range [min_speed, max_speed].
- targets:
TargetState
- pos: jax array (float) of shape (num_targets, 2) in the range [0, env_size].
- vel: jax array (float) of shape (num_targets, 2).
- found: jax array (bool) of shape (num_targets,) flag indicating if target has been located by an agent.
- key: jax array (uint32) of shape (2,)
- step: int representing the current simulation step.
- searchers:
1 2 3 4 5 6 7 8 9 |
|
Instantiates a SearchAndRescue
environment
Parameters:
Name | Type | Description | Default |
---|---|---|---|
target_contact_range
|
float
|
Range at which a searchers will 'find' a target. |
0.02
|
searcher_max_rotate
|
float
|
Maximum rotation searcher agents can turn within a step. Should be a value from [0,1] representing a fraction of π-radians. |
0.25
|
searcher_max_accelerate
|
float
|
Magnitude of the maximum acceleration/deceleration a searcher agent can apply within a step. |
0.005
|
searcher_min_speed
|
float
|
Minimum speed a searcher agent can move at. |
0.005
|
searcher_max_speed
|
float
|
Maximum speed a searcher agent can move at. |
0.02
|
time_limit
|
int
|
Maximum number of environment steps allowed for search. |
400
|
viewer
|
Optional[Viewer[State]]
|
|
None
|
target_dynamics
|
Optional[TargetDynamics]
|
Target object dynamics model, implemented as a
|
None
|
generator
|
Optional[Generator]
|
Initial state |
None
|
reward_fn
|
Optional[RewardFn]
|
Reward aggregation function. Defaults to |
None
|
observation
|
Optional[ObservationFn]
|
Agent observation view generation function. Defaults to
|
None
|
Source code in jumanji/environments/swarms/search_and_rescue/env.py
115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 |
|
action_spec
cached
property
#
Returns the action spec.
2d array of individual agent actions. Each agents action is an array representing [rotation, acceleration] in the range [-1, 1].
Returns:
Name | Type | Description |
---|---|---|
action_spec |
BoundedArray
|
Action array spec |
observation_spec
cached
property
#
Returns the observation spec.
Local searcher agent views representing the distance to the closest neighbouring agents and targets in the environment.
Returns:
Name | Type | Description |
---|---|---|
observation_spec |
Spec[Observation]
|
Search-and-rescue observation spec |
reward_spec
cached
property
#
Returns the reward spec.
Array of individual rewards for each agent.
Returns:
Name | Type | Description |
---|---|---|
reward_spec |
BoundedArray
|
Reward array spec. |
animate(states, interval=100, save_path=None)
#
Create an animation from a sequence of environment states.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
states
|
Sequence[State]
|
sequence of environment states corresponding to consecutive timesteps. |
required |
interval
|
int
|
delay between frames in milliseconds. |
100
|
save_path
|
Optional[str]
|
the path where the animation file should be saved. If it is None, the plot will not be saved. |
None
|
Returns:
Type | Description |
---|---|
FuncAnimation
|
Animation that can be saved as a GIF, MP4, or rendered with HTML. |
Source code in jumanji/environments/swarms/search_and_rescue/env.py
372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 |
|
close()
#
Perform any necessary cleanup.
Source code in jumanji/environments/swarms/search_and_rescue/env.py
392 393 394 |
|
render(state)
#
Render a frame of the environment for a given state using matplotlib.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
state
|
State
|
State object. |
required |
Source code in jumanji/environments/swarms/search_and_rescue/env.py
364 365 366 367 368 369 370 |
|
reset(key)
#
Initialise searcher and target initial states.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
key
|
PRNGKey
|
Random key used to reset the environment. |
required |
Returns:
Name | Type | Description |
---|---|---|
state |
State
|
Initial environment state. |
timestep |
TimeStep[Observation]
|
TimeStep with individual search agent views. |
Source code in jumanji/environments/swarms/search_and_rescue/env.py
203 204 205 206 207 208 209 210 211 212 213 214 215 |
|
step(state, actions)
#
Environment update.
Update searcher velocities and consequently their positions, mark found targets, and generate rewards and local observations.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
state
|
State
|
Environment state. |
required |
actions
|
Array
|
2d array of searcher steering actions. |
required |
Returns:
Name | Type | Description |
---|---|---|
state |
State
|
Updated searcher and target positions and velocities. |
timestep |
TimeStep[Observation]
|
Transition timestep with individual agent local observations. |
Source code in jumanji/environments/swarms/search_and_rescue/env.py
217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 |
|