SlidingTilePuzzle
SlidingTilePuzzle (Environment)
#
Environment for the Sliding Tile Puzzle problem.
The problem is a combinatorial optimization task where the goal is to move the empty tile around in order to arrange all the tiles in order. See more info: https://en.wikipedia.org/wiki/Sliding_puzzle.
-
observation:
Observation
- puzzle: jax array (int32) of shape (N, N), representing the current state of the puzzle.
- empty_tile_position: Tuple of int32, representing the position of the empty tile.
- action_mask: jax array (bool) of shape (4,), indicating which actions are valid in the current state of the environment.
-
action: int32, representing the direction to move the empty tile (up, down, left, right)
-
reward: float, a dense reward is provided based on the arrangement of the tiles. It equals the negative sum of the boolean difference between the current state of the puzzle and the goal state (correctly arranged tiles). Each incorrectly placed tile contributes -1 to the reward.
-
episode termination: if the puzzle is solved.
-
state:
State
- puzzle: jax array (int32) of shape (N, N), representing the current state of the puzzle.
- empty_tile_position: Tuple of int32, representing the position of the empty tile.
- key: jax array (uint32) of shape (2,), random key used to generate random numbers at each step and for auto-reset.
observation_spec: jumanji.specs.Spec[jumanji.environments.logic.sliding_tile_puzzle.types.Observation]
cached
property
writable
#
Returns the observation spec.
action_spec: DiscreteArray
cached
property
writable
#
Returns the action spec.
__init__(self, generator: Optional[jumanji.environments.logic.sliding_tile_puzzle.generator.Generator] = None, reward_fn: Optional[jumanji.environments.logic.sliding_tile_puzzle.reward.RewardFn] = None, time_limit: int = 500, viewer: Optional[jumanji.viewer.Viewer[jumanji.environments.logic.sliding_tile_puzzle.types.State]] = None) -> None
special
#
Instantiate a SlidingTilePuzzle
environment.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
generator |
Optional[jumanji.environments.logic.sliding_tile_puzzle.generator.Generator] |
callable to instantiate environment instances.
Defaults to |
None |
reward_fn |
Optional[jumanji.environments.logic.sliding_tile_puzzle.reward.RewardFn] |
RewardFn whose |
None |
time_limit |
int |
maximum number of steps before the episode is terminated, default to 500. |
500 |
viewer |
Optional[jumanji.viewer.Viewer[jumanji.environments.logic.sliding_tile_puzzle.types.State]] |
environment viewer for rendering. |
None |
reset(self, key: PRNGKeyArray) -> Tuple[jumanji.environments.logic.sliding_tile_puzzle.types.State, jumanji.types.TimeStep[jumanji.environments.logic.sliding_tile_puzzle.types.Observation]]
#
Resets the environment to an initial state.
step(self, state: State, action: Union[jax.Array, numpy.ndarray, numpy.bool_, numpy.number]) -> Tuple[jumanji.environments.logic.sliding_tile_puzzle.types.State, jumanji.types.TimeStep[jumanji.environments.logic.sliding_tile_puzzle.types.Observation]]
#
Updates the environment state after the agent takes an action.