Capacitated Vehicle Routing Problem (CVRP) Environment#

We provide here a Jax JIT-able implementation of the capacitated vehicle routing problem (CVRP) which is a specific type of VRP.

CVRP is a classic combinatorial optimization problem. Given a set of nodes with specific demands, a depot node, and a vehicle with limited capacity, the goal is to determine the shortest route between the nodes such that each node (excluding depot) is visited exactly once and has its demand covered. The problem is NP-complete, thus there is no known algorithm both correct and fast (i.e., that runs in polynomial time) for any instance of the problem.

A new problem instance is generated by resetting the environment. The problem instance contains coordinates for each node sampled from a uniform distribution between 0 and 1, and each node (except for depot) has a specific demand which is an integer value sampled from a uniform distribution between 1 and the maximum demand (which is a parameter of the CVRP environment). The number of nodes with demand is a parameter of the environment.

Observation#

The observation given to the agent provides information on the problem layout, the visited/unvisited cities and the current position of the agent as well as the current capacity.

coordinates: jax array (float) of shape (num_nodes + 1, 2), array of coordinates of each city node and the depot node.
demands: jax array (float) of shape (num_nodes + 1,), array of the demands of each city node and the depot node whose demand is set to 0.
unvisited_nodes: jax array (bool) of shape (num_nodes + 1,), array denoting which nodes remain to be visited.
position: jax array (int32) of shape (), identifier (index) of the current visited node (city or depot).
trajectory: jax array (int32) of shape (2 * num_nodes,), identifiers of the nodes that have been visited (set to DEPOT_IDX if not filled yet).
capacity: jax array (float) of shape (), current capacity of the vehicle.
action_mask: jax array (bool) of shape (num_nodes + 1,), array denoting which actions are possible (True) and which are not (False).

Action#

The action space is a DiscreteArray of integer values in the range of [0, num_nodes]. An action is the index of the next node to visit, and an action value of 0 corresponds to visiting the depot.

Reward#

The reward could be either:

Dense: the negative distance between the current node and the chosen next node to go to. For the last node, it also includes the distance to the depot to complete the tour.
Sparse: the negative tour length at the end of the episode. The tour length is defined as the sum of the distances between consecutive nodes.

In both cases, the reward is a large negative penalty of -2 * num_nodes * sqrt(2) if the action is invalid, e.g. a previously selected node other than the depot is selected again.

Registered Versions 📖#

CVRP-v1: CVRP problem with 20 randomly generated nodes, a maximum capacity of 30, a maximum demand for each node of 10 and a dense reward function.

Last update: 2024-03-29