Game2048
Game2048 (Environment)
#
Environment for the game 2048. The game consists of a board of size board_size x board_size (4x4 by default) in which the player can take actions to move the tiles on the board up, down, left, or right. The goal of the game is to combine tiles with the same number to create a tile with twice the value, until the player at least creates a tile with the value 2048 to consider it a win.
-
observation:
Observation
- board: jax array (int32) of shape (board_size, board_size) the current state of the board. An empty tile is represented by zero whereas a non-empty tile is an exponent of 2, e.g. 1, 2, 3, 4, ... (corresponding to 2, 4, 8, 16, ...).
- action_mask: jax array (bool) of shape (4,) indicates which actions are valid in the current state of the environment.
-
action: jax array (int32) of shape (). Is in [0, 1, 2, 3] representing the actions up, right, down, and left, respectively.
-
reward: jax array (float) of shape (). The reward is 0 except when the player combines tiles to create a new tile with twice the value. In this case, the reward is the value of the new tile.
-
episode termination:
- if no more valid moves exist (this can happen when the board is full).
-
state:
State
- board: same as observation.
- step_count: jax array (int32) of shape (), the number of time steps in the episode so far.
- action_mask: same as observation.
- score: jax array (int32) of shape (), the sum of all tile values on the board.
- key: jax array (uint32) of shape (2,) random key used to generate random numbers at each step and for auto-reset.
1 2 3 4 5 6 7 8 |
|
observation_spec: jumanji.specs.Spec[jumanji.environments.logic.game_2048.types.Observation]
cached
property
writable
#
Specifications of the observation of the Game2048
environment.
Returns:
Type | Description |
---|---|
Spec containing all the specifications for all the `Observation` fields |
|
action_spec: DiscreteArray
cached
property
writable
#
Returns the action spec.
4 actions: [0, 1, 2, 3] -> [Up, Right, Down, Left].
Returns:
Type | Description |
---|---|
action_spec |
|
__init__(self, board_size: int = 4, viewer: Optional[jumanji.viewer.Viewer[jumanji.environments.logic.game_2048.types.State]] = None) -> None
special
#
Initialize the 2048 game.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
board_size |
int |
size of the board. Defaults to 4. |
4 |
viewer |
Optional[jumanji.viewer.Viewer[jumanji.environments.logic.game_2048.types.State]] |
|
None |
reset(self, key: PRNGKeyArray) -> Tuple[jumanji.environments.logic.game_2048.types.State, jumanji.types.TimeStep[jumanji.environments.logic.game_2048.types.Observation]]
#
Resets the environment.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
key |
PRNGKeyArray |
random number generator key. |
required |
Returns:
Type | Description |
---|---|
state |
the new state of the environment. timestep: the first timestep returned by the environment. |
step(self, state: State, action: Union[jax.Array, numpy.ndarray, numpy.bool_, numpy.number]) -> Tuple[jumanji.environments.logic.game_2048.types.State, jumanji.types.TimeStep[jumanji.environments.logic.game_2048.types.Observation]]
#
Updates the environment state after the agent takes an action.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
state |
State |
the current state of the environment. |
required |
action |
Union[jax.Array, numpy.ndarray, numpy.bool_, numpy.number] |
the action taken by the agent. |
required |
Returns:
Type | Description |
---|---|
state |
the new state of the environment. timestep: the next timestep. |