Skip to content

CVRP

CVRP (Environment) #

Capacitated Vehicle Routing Problem (CVRP) environment as described in [1].

  • observation: Observation

    • coordinates: jax array (float) of shape (num_nodes + 1, 2) the coordinates of each node and the depot.
    • demands: jax array (float) of shape (num_nodes + 1,) the associated cost of each node and the depot (0.0 for the depot).
    • unvisited_nodes: jax array (bool) of shape (num_nodes + 1,) indicates nodes that remain to be visited.
    • position: jax array (int32) of shape () the index of the last visited node.
    • trajectory: jax array (int32) of shape (2 * num_nodes,) array of node indices defining the route (set to DEPOT_IDX if not filled yet).
    • capacity: jax array (float) of shape () the current capacity of the vehicle.
    • action_mask: jax array (bool) of shape (num_nodes + 1,) binary mask (False/True <--> invalid/valid action).
  • action: jax array (int32) of shape () [0, ..., num_nodes] -> node to visit. 0 corresponds to visiting the depot.

  • reward: jax array (float) of shape (), could be either:

    • dense: the negative distance between the current node and the chosen next node to go to. For the last node, it also includes the distance to the depot to complete the tour.
    • sparse: the negative tour length at the end of the episode. The tour length is defined as the sum of the distances between consecutive nodes. In both cases, the reward is a large negative penalty of -2 * num_nodes * sqrt(2) if the action is invalid, e.g. a previously selected node other than the depot is selected again.
  • episode termination:

    • if no action can be performed, i.e. all nodes have been visited.
    • if an invalid action is taken, i.e. a previously visited city other than the depot is chosen.
  • state: State

    • coordinates: jax array (float) of shape (num_nodes + 1, 2) the coordinates of each node and the depot.
    • demands: jax array (int32) of shape (num_nodes + 1,) the associated cost of each node and the depot (0.0 for the depot).
    • position: jax array (int32) the index of the last visited node.
    • capacity: jax array (int32) the current capacity of the vehicle.
    • visited_mask: jax array (bool) of shape (num_nodes + 1,) binary mask (False/True <--> not visited/visited).
    • trajectory: jax array (int32) of shape (2 * num_nodes,) identifiers of the nodes that have been visited (set to DEPOT_IDX if not filled yet).
    • num_visits: int32 number of actions that have been taken (i.e., unique visits).

[1] Toth P., Vigo D. (2014). "Vehicle routing: problems, methods, and applications".

1
2
3
4
5
6
7
8
from jumanji.environments import CVRP
env = CVRP()
key = jax.random.PRNGKey(0)
state, timestep = jax.jit(env.reset)(key)
env.render(state)
action = env.action_spec.generate_value()
state, timestep = jax.jit(env.step)(state, action)
env.render(state)

observation_spec: jumanji.specs.Spec[jumanji.environments.routing.cvrp.types.Observation] cached property writable #

Returns the observation spec.

Returns:

Type Description
Spec for the `Observation` whose fields are
  • coordinates: BoundedArray (float) of shape (num_nodes + 1, 2).
  • demands: BoundedArray (float) of shape (num_nodes + 1,).
  • unvisited_nodes: BoundedArray (bool) of shape (num_nodes + 1,).
  • position: DiscreteArray (num_values = num_nodes + 1) of shape ().
  • trajectory: BoundedArray (int32) of shape (2 * num_nodes,).
  • capacity: BoundedArray (float) of shape ().
  • action_mask: BoundedArray (bool) of shape (num_nodes + 1,).

action_spec: DiscreteArray cached property writable #

Returns the action spec.

Returns:

Type Description
action_spec

a specs.DiscreteArray spec.

__init__(self, generator: Optional[jumanji.environments.routing.cvrp.generator.Generator] = None, reward_fn: Optional[jumanji.environments.routing.cvrp.reward.RewardFn] = None, viewer: Optional[jumanji.viewer.Viewer[jumanji.environments.routing.cvrp.types.State]] = None) special #

Instantiates a CVRP environment.

Parameters:

Name Type Description Default
generator Optional[jumanji.environments.routing.cvrp.generator.Generator]

Generator whose __call__ instantiates an environment instance. The default option is 'UniformGenerator' which randomly generates CVRP instances with 20 cities sampled from a uniform distribution, a maximum vehicle capacity of 30, and a maximum city demand of 10.

None
reward_fn Optional[jumanji.environments.routing.cvrp.reward.RewardFn]

RewardFn whose __call__ method computes the reward of an environment transition. The function must compute the reward based on the current state, the chosen action, the next state and whether the action is valid. Implemented options are [DenseReward, SparseReward]. Defaults to DenseReward.

None
viewer Optional[jumanji.viewer.Viewer[jumanji.environments.routing.cvrp.types.State]]

Viewer used for rendering. Defaults to CVRPViewer with "human" render mode.

None

reset(self, key: PRNGKeyArray) -> Tuple[jumanji.environments.routing.cvrp.types.State, jumanji.types.TimeStep[jumanji.environments.routing.cvrp.types.Observation]] #

Resets the environment.

Parameters:

Name Type Description Default
key PRNGKeyArray

used to randomly generate the coordinates.

required

Returns:

Type Description
state

State object corresponding to the new state of the environment. timestep: TimeStep object corresponding to the first timestep returned by the environment.

step(self, state: State, action: Union[jax.Array, numpy.ndarray, numpy.bool_, numpy.number, float, int]) -> Tuple[jumanji.environments.routing.cvrp.types.State, jumanji.types.TimeStep[jumanji.environments.routing.cvrp.types.Observation]] #

Run one timestep of the environment's dynamics.

Parameters:

Name Type Description Default
state State

State object containing the dynamics of the environment.

required
action Union[jax.Array, numpy.ndarray, numpy.bool_, numpy.number, float, int]

jax array (int32) of shape () containing the index of the next node to visit.

required

Returns:

Type Description
state, timestep

next state of the environment and timestep to be observed.


Last update: 2024-11-01
Back to top