Skip to content

Game2048

Game2048 (Environment) #

Environment for the game 2048. The game consists of a board of size board_size x board_size (4x4 by default) in which the player can take actions to move the tiles on the board up, down, left, or right. The goal of the game is to combine tiles with the same number to create a tile with twice the value, until the player at least creates a tile with the value 2048 to consider it a win.

  • observation: Observation

    • board: jax array (int32) of shape (board_size, board_size) the current state of the board. An empty tile is represented by zero whereas a non-empty tile is an exponent of 2, e.g. 1, 2, 3, 4, ... (corresponding to 2, 4, 8, 16, ...).
    • action_mask: jax array (bool) of shape (4,) indicates which actions are valid in the current state of the environment.
  • action: jax array (int32) of shape (). Is in [0, 1, 2, 3] representing the actions up, right, down, and left, respectively.

  • reward: jax array (float) of shape (). The reward is 0 except when the player combines tiles to create a new tile with twice the value. In this case, the reward is the value of the new tile.

  • episode termination:

    • if no more valid moves exist (this can happen when the board is full).
  • state: State

    • board: same as observation.
    • step_count: jax array (int32) of shape (), the number of time steps in the episode so far.
    • action_mask: same as observation.
    • score: jax array (int32) of shape (), the sum of all tile values on the board.
    • key: jax array (uint32) of shape (2,) random key used to generate random numbers at each step and for auto-reset.
1
2
3
4
5
6
7
8
from jumanji.environments import Game2048
env = Game2048()
key = jax.random.PRNGKey(0)
state, timestep = jax.jit(env.reset)(key)
env.render(state)
action = env.action_spec.generate_value()
state, timestep = jax.jit(env.step)(state, action)
env.render(state)

observation_spec: jumanji.specs.Spec[jumanji.environments.logic.game_2048.types.Observation] cached property writable #

Specifications of the observation of the Game2048 environment.

Returns:

Type Description
Spec containing all the specifications for all the `Observation` fields
  • board: Array (jnp.int32) of shape (board_size, board_size).
  • action_mask: BoundedArray (bool) of shape (4,).

action_spec: DiscreteArray cached property writable #

Returns the action spec.

4 actions: [0, 1, 2, 3] -> [Up, Right, Down, Left].

Returns:

Type Description
action_spec

DiscreteArray spec object.

__init__(self, board_size: int = 4, viewer: Optional[jumanji.viewer.Viewer[jumanji.environments.logic.game_2048.types.State]] = None) -> None special #

Initialize the 2048 game.

Parameters:

Name Type Description Default
board_size int

size of the board. Defaults to 4.

4
viewer Optional[jumanji.viewer.Viewer[jumanji.environments.logic.game_2048.types.State]]

Viewer used for rendering. Defaults to Game2048Viewer.

None

reset(self, key: PRNGKeyArray) -> Tuple[jumanji.environments.logic.game_2048.types.State, jumanji.types.TimeStep[jumanji.environments.logic.game_2048.types.Observation]] #

Resets the environment.

Parameters:

Name Type Description Default
key PRNGKeyArray

random number generator key.

required

Returns:

Type Description
state

the new state of the environment. timestep: the first timestep returned by the environment.

step(self, state: State, action: Union[jax.Array, numpy.ndarray, numpy.bool_, numpy.number]) -> Tuple[jumanji.environments.logic.game_2048.types.State, jumanji.types.TimeStep[jumanji.environments.logic.game_2048.types.Observation]] #

Updates the environment state after the agent takes an action.

Parameters:

Name Type Description Default
state State

the current state of the environment.

required
action Union[jax.Array, numpy.ndarray, numpy.bool_, numpy.number]

the action taken by the agent.

required

Returns:

Type Description
state

the new state of the environment. timestep: the next timestep.


Last update: 2024-11-01
Back to top