SlidingTilePuzzle
Bases: Environment[State, DiscreteArray, Observation]
Environment for the Sliding Tile Puzzle problem.
The problem is a combinatorial optimization task where the goal is to move the empty tile around in order to arrange all the tiles in order. See more info: https://en.wikipedia.org/wiki/Sliding_puzzle.
-
observation:
Observation
- puzzle: jax array (int32) of shape (N, N), representing the current state of the puzzle.
- empty_tile_position: Tuple of int32, representing the position of the empty tile.
- action_mask: jax array (bool) of shape (4,), indicating which actions are valid in the current state of the environment.
-
action: int32, representing the direction to move the empty tile (up, down, left, right)
-
reward: float, a dense reward is provided based on the arrangement of the tiles. It equals the negative sum of the boolean difference between the current state of the puzzle and the goal state (correctly arranged tiles). Each incorrectly placed tile contributes -1 to the reward.
-
episode termination: if the puzzle is solved.
-
state:
State
- puzzle: jax array (int32) of shape (N, N), representing the current state of the puzzle.
- empty_tile_position: Tuple of int32, representing the position of the empty tile.
- key: jax array (uint32) of shape (2,), random key used to generate random numbers at each step and for auto-reset.
Instantiate a SlidingTilePuzzle
environment.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
generator
|
Optional[Generator]
|
callable to instantiate environment instances.
Defaults to |
None
|
reward_fn
|
Optional[RewardFn]
|
RewardFn whose |
None
|
time_limit
|
int
|
maximum number of steps before the episode is terminated, default to 500. |
500
|
viewer
|
Optional[Viewer[State]]
|
environment viewer for rendering. |
None
|
Source code in jumanji/environments/logic/sliding_tile_puzzle/env.py
74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 |
|
action_spec: specs.DiscreteArray
cached
property
#
Returns the action spec.
observation_spec: specs.Spec[Observation]
cached
property
#
Returns the observation spec.
animate(states, interval=200, save_path=None)
#
Creates an animated gif of the puzzle board based on the sequence of game states.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
states
|
Sequence[State]
|
is a list of |
required |
interval
|
int
|
the delay between frames in milliseconds, default to 200. |
200
|
save_path
|
Optional[str]
|
the path where the animation file should be saved. If it is None, the plot |
None
|
Returns:
Type | Description |
---|---|
FuncAnimation
|
animation.FuncAnimation: the animation object that was created. |
Source code in jumanji/environments/logic/sliding_tile_puzzle/env.py
255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 |
|
close()
#
Perform any necessary cleanup.
Environments will automatically :meth:close()
themselves when
garbage collected or when the program exits.
Source code in jumanji/environments/logic/sliding_tile_puzzle/env.py
274 275 276 277 278 279 280 |
|
render(state)
#
Renders the current state of the puzzle board.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
state
|
State
|
is the current game state to be rendered. |
required |
Source code in jumanji/environments/logic/sliding_tile_puzzle/env.py
247 248 249 250 251 252 253 |
|
reset(key)
#
Resets the environment to an initial state.
Source code in jumanji/environments/logic/sliding_tile_puzzle/env.py
104 105 106 107 108 109 110 111 112 113 114 115 116 |
|
step(state, action)
#
Updates the environment state after the agent takes an action.
Source code in jumanji/environments/logic/sliding_tile_puzzle/env.py
118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 |
|