CVRP
Bases: Environment[State, DiscreteArray, Observation]
Capacitated Vehicle Routing Problem (CVRP) environment as described in [1].
-
observation:
Observation
- coordinates: jax array (float) of shape (num_nodes + 1, 2) the coordinates of each node and the depot.
- demands: jax array (float) of shape (num_nodes + 1,) the associated cost of each node and the depot (0.0 for the depot).
- unvisited_nodes: jax array (bool) of shape (num_nodes + 1,) indicates nodes that remain to be visited.
- position: jax array (int32) of shape () the index of the last visited node.
- trajectory: jax array (int32) of shape (2 * num_nodes,) array of node indices defining the route (set to DEPOT_IDX if not filled yet).
- capacity: jax array (float) of shape () the current capacity of the vehicle.
- action_mask: jax array (bool) of shape (num_nodes + 1,) binary mask (False/True <--> invalid/valid action).
-
action: jax array (int32) of shape () [0, ..., num_nodes] -> node to visit. 0 corresponds to visiting the depot.
-
reward: jax array (float) of shape (), could be either:
- dense: the negative distance between the current node and the chosen next node to go to. For the last node, it also includes the distance to the depot to complete the tour.
- sparse: the negative tour length at the end of the episode. The tour length is defined
as the sum of the distances between consecutive nodes.
In both cases, the reward is a large negative penalty of
-2 * num_nodes * sqrt(2)
if the action is invalid, e.g. a previously selected node other than the depot is selected again.
-
episode termination:
- if no action can be performed, i.e. all nodes have been visited.
- if an invalid action is taken, i.e. a previously visited city other than the depot is chosen.
-
state:
State
- coordinates: jax array (float) of shape (num_nodes + 1, 2) the coordinates of each node and the depot.
- demands: jax array (int32) of shape (num_nodes + 1,) the associated cost of each node and the depot (0.0 for the depot).
- position: jax array (int32) the index of the last visited node.
- capacity: jax array (int32) the current capacity of the vehicle.
- visited_mask: jax array (bool) of shape (num_nodes + 1,) binary mask (False/True <--> not visited/visited).
- trajectory: jax array (int32) of shape (2 * num_nodes,) identifiers of the nodes that have been visited (set to DEPOT_IDX if not filled yet).
- num_visits: int32 number of actions that have been taken (i.e., unique visits).
[1] Toth P., Vigo D. (2014). "Vehicle routing: problems, methods, and applications".
1 2 3 4 5 6 7 8 |
|
Instantiates a CVRP
environment.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
generator
|
Optional[Generator]
|
|
None
|
reward_fn
|
Optional[RewardFn]
|
|
None
|
viewer
|
Optional[Viewer[State]]
|
|
None
|
Source code in jumanji/environments/routing/cvrp/env.py
99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 |
|
action_spec: specs.DiscreteArray
cached
property
#
Returns the action spec.
Returns:
Name | Type | Description |
---|---|---|
action_spec |
DiscreteArray
|
a |
observation_spec: specs.Spec[Observation]
cached
property
#
Returns the observation spec.
Returns:
Type | Description |
---|---|
Spec[Observation]
|
Spec for the |
Spec[Observation]
|
|
Spec[Observation]
|
|
Spec[Observation]
|
|
Spec[Observation]
|
|
Spec[Observation]
|
|
Spec[Observation]
|
|
Spec[Observation]
|
|
animate(states, interval=200, save_path=None)
#
Creates an animated gif of the CVRP environment based on the sequence of states.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
states
|
Sequence[State]
|
sequence of environment states corresponding to consecutive timesteps. |
required |
interval
|
int
|
delay between frames in milliseconds, default to 200. |
200
|
save_path
|
Optional[str]
|
the path where the animation file should be saved. If it is None, the plot will not be saved. |
None
|
Returns:
Type | Description |
---|---|
FuncAnimation
|
animation.FuncAnimation: the animation object that was created. |
Source code in jumanji/environments/routing/cvrp/env.py
284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 |
|
close()
#
Perform any necessary cleanup.
Environments will automatically :meth:close()
themselves when
garbage collected or when the program exits.
Source code in jumanji/environments/routing/cvrp/env.py
303 304 305 306 307 308 309 |
|
render(state)
#
Render the given state of the environment. This rendering shows the layout of the tour so far with the cities as circles, and the depot as a square.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
state
|
State
|
environment state to render. |
required |
Returns:
Name | Type | Description |
---|---|---|
rgb_array |
Optional[ArrayNumpy]
|
the RGB image of the state as an array. |
Source code in jumanji/environments/routing/cvrp/env.py
272 273 274 275 276 277 278 279 280 281 282 |
|
reset(key)
#
Resets the environment.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
key
|
PRNGKey
|
used to randomly generate the coordinates. |
required |
Returns:
Name | Type | Description |
---|---|---|
state |
State
|
|
timestep |
TimeStep[Observation]
|
|
Source code in jumanji/environments/routing/cvrp/env.py
146 147 148 149 150 151 152 153 154 155 156 157 158 159 |
|
step(state, action)
#
Run one timestep of the environment's dynamics.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
state
|
State
|
|
required |
action
|
Numeric
|
jax array (int32) of shape () containing the index of the next node to visit. |
required |
Returns:
Type | Description |
---|---|
Tuple[State, TimeStep[Observation]]
|
state, timestep: next state of the environment and timestep to be observed. |
Source code in jumanji/environments/routing/cvrp/env.py
161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 |
|