Skip to content

SlidingTilePuzzle

SlidingTilePuzzle (Environment) #

Environment for the Sliding Tile Puzzle problem.

The problem is a combinatorial optimization task where the goal is to move the empty tile around in order to arrange all the tiles in order. See more info: https://en.wikipedia.org/wiki/Sliding_puzzle.

  • observation: Observation

    • puzzle: jax array (int32) of shape (N, N), representing the current state of the puzzle.
    • empty_tile_position: Tuple of int32, representing the position of the empty tile.
    • action_mask: jax array (bool) of shape (4,), indicating which actions are valid in the current state of the environment.
  • action: int32, representing the direction to move the empty tile (up, down, left, right)

  • reward: float, a dense reward is provided based on the arrangement of the tiles. It equals the negative sum of the boolean difference between the current state of the puzzle and the goal state (correctly arranged tiles). Each incorrectly placed tile contributes -1 to the reward.

  • episode termination: if the puzzle is solved.

  • state: State

    • puzzle: jax array (int32) of shape (N, N), representing the current state of the puzzle.
    • empty_tile_position: Tuple of int32, representing the position of the empty tile.
    • key: jax array (uint32) of shape (2,), random key used to generate random numbers at each step and for auto-reset.

observation_spec: jumanji.specs.Spec[jumanji.environments.logic.sliding_tile_puzzle.types.Observation] cached property writable #

Returns the observation spec.

action_spec: DiscreteArray cached property writable #

Returns the action spec.

__init__(self, generator: Optional[jumanji.environments.logic.sliding_tile_puzzle.generator.Generator] = None, reward_fn: Optional[jumanji.environments.logic.sliding_tile_puzzle.reward.RewardFn] = None, time_limit: int = 500, viewer: Optional[jumanji.viewer.Viewer[jumanji.environments.logic.sliding_tile_puzzle.types.State]] = None) -> None special #

Instantiate a SlidingTilePuzzle environment.

Parameters:

Name Type Description Default
generator Optional[jumanji.environments.logic.sliding_tile_puzzle.generator.Generator]

callable to instantiate environment instances. Defaults to RandomWalkGenerator which generates shuffled puzzles with a size of 5x5.

None
reward_fn Optional[jumanji.environments.logic.sliding_tile_puzzle.reward.RewardFn]

RewardFn whose __call__ method computes the reward of an environment transition. The function must compute the reward based on the current state, the chosen action and the next state. Implemented options are [DenseRewardFn, SparseRewardFn]. Defaults to DenseRewardFn.

None
time_limit int

maximum number of steps before the episode is terminated, default to 500.

500
viewer Optional[jumanji.viewer.Viewer[jumanji.environments.logic.sliding_tile_puzzle.types.State]]

environment viewer for rendering.

None

reset(self, key: PRNGKeyArray) -> Tuple[jumanji.environments.logic.sliding_tile_puzzle.types.State, jumanji.types.TimeStep[jumanji.environments.logic.sliding_tile_puzzle.types.Observation]] #

Resets the environment to an initial state.

step(self, state: State, action: Union[jax.Array, numpy.ndarray, numpy.bool_, numpy.number]) -> Tuple[jumanji.environments.logic.sliding_tile_puzzle.types.State, jumanji.types.TimeStep[jumanji.environments.logic.sliding_tile_puzzle.types.Observation]] #

Updates the environment state after the agent takes an action.


Last update: 2024-11-01
Back to top