CVRP
CVRP (Environment)
#
Capacitated Vehicle Routing Problem (CVRP) environment as described in [1].
-
observation:
Observation
- coordinates: jax array (float) of shape (num_nodes + 1, 2) the coordinates of each node and the depot.
- demands: jax array (float) of shape (num_nodes + 1,) the associated cost of each node and the depot (0.0 for the depot).
- unvisited_nodes: jax array (bool) of shape (num_nodes + 1,) indicates nodes that remain to be visited.
- position: jax array (int32) of shape () the index of the last visited node.
- trajectory: jax array (int32) of shape (2 * num_nodes,) array of node indices defining the route (set to DEPOT_IDX if not filled yet).
- capacity: jax array (float) of shape () the current capacity of the vehicle.
- action_mask: jax array (bool) of shape (num_nodes + 1,) binary mask (False/True <--> invalid/valid action).
-
action: jax array (int32) of shape () [0, ..., num_nodes] -> node to visit. 0 corresponds to visiting the depot.
-
reward: jax array (float) of shape (), could be either:
- dense: the negative distance between the current node and the chosen next node to go to. For the last node, it also includes the distance to the depot to complete the tour.
- sparse: the negative tour length at the end of the episode. The tour length is defined
as the sum of the distances between consecutive nodes.
In both cases, the reward is a large negative penalty of
-2 * num_nodes * sqrt(2)
if the action is invalid, e.g. a previously selected node other than the depot is selected again.
-
episode termination:
- if no action can be performed, i.e. all nodes have been visited.
- if an invalid action is taken, i.e. a previously visited city other than the depot is chosen.
-
state:
State
- coordinates: jax array (float) of shape (num_nodes + 1, 2) the coordinates of each node and the depot.
- demands: jax array (int32) of shape (num_nodes + 1,) the associated cost of each node and the depot (0.0 for the depot).
- position: jax array (int32) the index of the last visited node.
- capacity: jax array (int32) the current capacity of the vehicle.
- visited_mask: jax array (bool) of shape (num_nodes + 1,) binary mask (False/True <--> not visited/visited).
- trajectory: jax array (int32) of shape (2 * num_nodes,) identifiers of the nodes that have been visited (set to DEPOT_IDX if not filled yet).
- num_visits: int32 number of actions that have been taken (i.e., unique visits).
[1] Toth P., Vigo D. (2014). "Vehicle routing: problems, methods, and applications".
1 2 3 4 5 6 7 8 |
|
observation_spec: jumanji.specs.Spec[jumanji.environments.routing.cvrp.types.Observation]
cached
property
writable
#
Returns the observation spec.
Returns:
Type | Description |
---|---|
Spec for the `Observation` whose fields are |
|
action_spec: DiscreteArray
cached
property
writable
#
Returns the action spec.
Returns:
Type | Description |
---|---|
action_spec |
a |
__init__(self, generator: Optional[jumanji.environments.routing.cvrp.generator.Generator] = None, reward_fn: Optional[jumanji.environments.routing.cvrp.reward.RewardFn] = None, viewer: Optional[jumanji.viewer.Viewer[jumanji.environments.routing.cvrp.types.State]] = None)
special
#
Instantiates a CVRP
environment.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
generator |
Optional[jumanji.environments.routing.cvrp.generator.Generator] |
|
None |
reward_fn |
Optional[jumanji.environments.routing.cvrp.reward.RewardFn] |
|
None |
viewer |
Optional[jumanji.viewer.Viewer[jumanji.environments.routing.cvrp.types.State]] |
|
None |
reset(self, key: PRNGKeyArray) -> Tuple[jumanji.environments.routing.cvrp.types.State, jumanji.types.TimeStep[jumanji.environments.routing.cvrp.types.Observation]]
#
Resets the environment.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
key |
PRNGKeyArray |
used to randomly generate the coordinates. |
required |
Returns:
Type | Description |
---|---|
state |
|
step(self, state: State, action: Union[jax.Array, numpy.ndarray, numpy.bool_, numpy.number, float, int]) -> Tuple[jumanji.environments.routing.cvrp.types.State, jumanji.types.TimeStep[jumanji.environments.routing.cvrp.types.Observation]]
#
Run one timestep of the environment's dynamics.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
state |
State |
|
required |
action |
Union[jax.Array, numpy.ndarray, numpy.bool_, numpy.number, float, int] |
jax array (int32) of shape () containing the index of the next node to visit. |
required |
Returns:
Type | Description |
---|---|
state, timestep |
next state of the environment and timestep to be observed. |