JobShop
JobShop (Environment)
#
The Job Shop Scheduling Problem, as described in [1], is one of the best known
combinatorial optimization problems. We are given num_jobs
jobs, each consisting
of at most max_num_ops
ops, which need to be processed on num_machines
machines.
Each operation (op) has a specific machine that it needs to be processed on and a
duration (which must be less than or equal to max_duration_op
). The goal is to
minimise the total length of the schedule, also known as the makespan.
[1] https://developers.google.com/optimization/scheduling/job_shop.
-
observation:
Observation
- ops_machine_ids: jax array (int32) of (num_jobs, max_num_ops) id of the machine each operation must be processed on.
- ops_durations: jax array (int32) of (num_jobs, max_num_ops) processing time of each operation.
- ops_mask: jax array (bool) of (num_jobs, max_num_ops) indicating which operations have yet to be scheduled.
- machines_job_ids: jax array (int32) of shape (num_machines,) id of the job (or no-op) that each machine is processing.
- machines_remaining_times: jax array (int32) of shape (num_machines,) specifying, for each machine, the number of time steps until available.
- action_mask: jax array (bool) of shape (num_machines, num_jobs + 1) indicates which job(s) (or no-op) can legally be scheduled on each machine.
-
action: jax array (int32) of shape (num_machines,).
-
reward: jax array (float) of shape (). A reward of
-1
is given each time step. If all machines are simultaneously idle or the agent selects an invalid action, the agent is given a large penalty of-num_jobs * max_num_ops * max_op_duration
which is an upper bound on the makespan. -
episode termination:
- Finished schedule: all operations (and thus all jobs) every job have been processed.
- Illegal action: the agent ignores the action mask and takes an illegal action.
- Simultaneously idle: all machines are inactive at the same time.
-
state:
State
- ops_machine_ids: same as observation.
- ops_durations: same as observation.
- ops_mask: same as observation.
- machines_job_ids: same as observation.
- machines_remaining_times: same as observation.
- action_mask: same as observation.
- step_count: jax array (int32) of shape (), the number of time steps in the episode so far.
- scheduled_times: jax array (int32) of shape (num_jobs, max_num_ops), specifying the timestep at which every op (scheduled so far) was scheduled.
1 2 3 4 5 6 7 8 |
|
observation_spec: jumanji.specs.Spec[jumanji.environments.packing.job_shop.types.Observation]
cached
property
writable
#
Specifications of the observation of the JobShop
environment.
Returns:
Type | Description |
---|---|
Spec containing the specifications for all the `Observation` fields |
|
action_spec: MultiDiscreteArray
cached
property
writable
#
Specifications of the action in the JobShop
environment. The action gives each
machine a job id ranging from 0, 1, ..., num_jobs where the last value corresponds
to a no-op.
Returns:
Type | Description |
---|---|
action_spec |
a |
__init__(self, generator: Optional[jumanji.environments.packing.job_shop.generator.Generator] = None, viewer: Optional[jumanji.viewer.Viewer[jumanji.environments.packing.job_shop.types.State]] = None)
special
#
Instantiate a JobShop
environment.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
generator |
Optional[jumanji.environments.packing.job_shop.generator.Generator] |
|
None |
viewer |
Optional[jumanji.viewer.Viewer[jumanji.environments.packing.job_shop.types.State]] |
|
None |
reset(self, key: PRNGKeyArray) -> Tuple[jumanji.environments.packing.job_shop.types.State, jumanji.types.TimeStep[jumanji.environments.packing.job_shop.types.Observation]]
#
Resets the environment by creating a new problem instance and initialising the state and timestep.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
key |
PRNGKeyArray |
random key used to reset the environment. |
required |
Returns:
Type | Description |
---|---|
state |
the environment state after the reset. timestep: the first timestep returned by the environment after the reset. |
step(self, state: State, action: Union[jax.Array, numpy.ndarray, numpy.bool_, numpy.number]) -> Tuple[jumanji.environments.packing.job_shop.types.State, jumanji.types.TimeStep[jumanji.environments.packing.job_shop.types.Observation]]
#
Updates the status of all machines, the status of the operations, and increments the time step. It updates the environment state and the timestep (which contains the new observation). It calculates the reward based on the three terminal conditions: - The action provided by the agent is invalid. - The schedule has finished. - All machines do a no-op that leads to all machines being simultaneously idle.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
state |
State |
the environment state. |
required |
action |
Union[jax.Array, numpy.ndarray, numpy.bool_, numpy.number] |
the action to take. |
required |
Returns:
Type | Description |
---|---|
state |
the updated environment state. timestep: the updated timestep. |