Knapskack Environment#
We provide here a Jax JIT-able implementation of the knapskack problem.
The knapsack problem is a famous problem in combinatorial optimization. The goal is to determine, given a set of items, each with a weight and a value, which items to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible.
The decision problem form of the knapsack problem is NP-complete, thus there is no known algorithm both correct and fast (polynomial-time) in all cases.
When the environment is reset, a new problem instance is generated, by sampling weights and values from a uniform distribution between 0 and 1. The weight limit of the knapsack is a parameter of the environment. A trajectory terminates when no further item can be added to the knapsack or the chosen action is invalid.
Observation#
The observation given to the agent provides information regarding the weights and the values of all the items, as well as, which items have been packed into the knapsack.
-
weights
: jax array (float) of shape(num_items,)
, array of weights of the items to be packed into the knapsack. -
values
: jax array (float) of shape(num_items,)
, array of values of the items to be packed into the knapsack. -
packed_items
: jax array (bool) of shape(num_items,)
, array of binary values denoting which items are already packed into the knapsack. -
action_mask
: jax array (bool) of shape(num_items,)
, array of binary values denoting which items can be packed into the knapsack.
Action#
The action space is a DiscreteArray
of integer values in the range of [0, num_items-1]
. An
action is the index of the next item to pack.
Reward#
The reward can be either:
-
Dense: the value of the item to pack at the current timestep.
-
Sparse: the sum of the values of the items packed in the bag at the end of the episode.
In both cases, the reward is 0 if the action is invalid, i.e. an item that was previously selected is selected again or has a weight larger than the bag capacity.
Registered Versions 📖#
Knapsack-v1
: Knapsack problem with 50 randomly generated items, a total budget of 12.5 and a dense reward function.