Skip to content

Minesweeper

Minesweeper (Environment) #

A JAX implementation of the minesweeper game.

  • observation: Observation

    • board: jax array (int32) of shape (num_rows, num_cols): each cell contains -1 if not yet explored, or otherwise the number of mines in the 8 adjacent squares.
    • action_mask: jax array (bool) of shape (num_rows, num_cols): indicates which actions are valid (not yet explored squares).
    • num_mines: jax array (int32) of shape (), indicates the number of mines to locate.
    • step_count: jax array (int32) of shape (): specifies how many timesteps have elapsed since environment reset.
  • action: multi discrete array containing the square to explore (row and col).

  • reward: jax array (float32): Configurable function of state and action. By default: 1 for every timestep where a valid action is chosen that doesn't reveal a mine, 0 for revealing a mine or selecting an already revealed square (and terminate the episode).

  • episode termination: Configurable function of state, next_state, and action. By default: Stop the episode if a mine is explored, an invalid action is selected (exploring an already explored square), or the board is solved.

  • state: State

    • board: jax array (int32) of shape (num_rows, num_cols): each cell contains -1 if not yet explored, or otherwise the number of mines in the 8 adjacent squares.
    • step_count: jax array (int32) of shape (): specifies how many timesteps have elapsed since environment reset.
    • flat_mine_locations: jax array (int32) of shape (num_rows * num_cols,): indicates the (flat) locations of all the mines on the board. Will be of length num_mines.
    • key: jax array (int32) of shape (2,) used for seeding the sampling of mine placement on reset.
1
2
3
4
5
6
7
8
from jumanji.environments import Minesweeper
env = Minesweeper()
key = jax.random.PRNGKey(0)
state, timestep = jax.jit(env.reset)(key)
env.render(state)
action = env.action_spec.generate_value()
state, timestep = jax.jit(env.step)(state, action)
env.render(state)

observation_spec: jumanji.specs.Spec[jumanji.environments.logic.minesweeper.types.Observation] cached property writable #

Specifications of the observation of the Minesweeper environment.

Returns:

Type Description
Spec for the `Observation` whose fields are
  • board: BoundedArray (int32) of shape (num_rows, num_cols).
  • action_mask: BoundedArray (bool) of shape (num_rows, num_cols).
  • num_mines: BoundedArray (int32) of shape ().
  • step_count: BoundedArray (int32) of shape ().

action_spec: MultiDiscreteArray cached property writable #

Returns the action spec. An action consists of the height and width of the square to be explored.

Returns:

Type Description
action_spec

specs.MultiDiscreteArray object.

__init__(self, generator: Optional[jumanji.environments.logic.minesweeper.generator.Generator] = None, reward_function: Optional[jumanji.environments.logic.minesweeper.reward.RewardFn] = None, done_function: Optional[jumanji.environments.logic.minesweeper.done.DoneFn] = None, viewer: Optional[jumanji.viewer.Viewer[jumanji.environments.logic.minesweeper.types.State]] = None) special #

Instantiate a Minesweeper environment.

Parameters:

Name Type Description Default
generator Optional[jumanji.environments.logic.minesweeper.generator.Generator]

Generator to generate problem instances on environment reset. Implemented options are [SamplingGenerator]. Defaults to SamplingGenerator. The generator will have attributes: - num_rows: number of rows, i.e. height of the board. Defaults to 10. - num_cols: number of columns, i.e. width of the board. Defaults to 10. - num_mines: number of mines generated. Defaults to 10.

None
reward_function Optional[jumanji.environments.logic.minesweeper.reward.RewardFn]

RewardFn whose __call__ method computes the reward of an environment transition based on the given current state and selected action. Implemented options are [DefaultRewardFn]. Defaults to DefaultRewardFn, giving a reward of 1.0 for revealing an empty square, 0.0 for revealing a mine, and 0.0 for an invalid action (selecting an already revealed square).

None
done_function Optional[jumanji.environments.logic.minesweeper.done.DoneFn]

DoneFn whose __call__ method computes the done signal given the current state, action taken, and next state. Implemented options are [DefaultDoneFn]. Defaults to DefaultDoneFn, ending the episode on solving the board, revealing a mine, or picking an invalid action.

None
viewer Optional[jumanji.viewer.Viewer[jumanji.environments.logic.minesweeper.types.State]]

Viewer to support rendering and animation methods. Implemented options are [MinesweeperViewer]. Defaults to MinesweeperViewer.

None

reset(self, key: PRNGKeyArray) -> Tuple[jumanji.environments.logic.minesweeper.types.State, jumanji.types.TimeStep[jumanji.environments.logic.minesweeper.types.Observation]] #

Resets the environment.

Parameters:

Name Type Description Default
key PRNGKeyArray

needed for placing mines.

required

Returns:

Type Description
state

State corresponding to the new state of the environment, timestep: TimeStep corresponding to the first timestep returned by the environment.

step(self, state: State, action: Union[jax.Array, numpy.ndarray, numpy.bool_, numpy.number]) -> Tuple[jumanji.environments.logic.minesweeper.types.State, jumanji.types.TimeStep[jumanji.environments.logic.minesweeper.types.Observation]] #

Run one timestep of the environment's dynamics.

Parameters:

Name Type Description Default
state State

State object containing the dynamics of the environment.

required
action Union[jax.Array, numpy.ndarray, numpy.bool_, numpy.number]

Array containing the row and column of the square to be explored.

required

Returns:

Type Description
next_state

State corresponding to the next state of the environment, next_timestep: TimeStep corresponding to the timestep returned by the environment.


Last update: 2024-03-29
Back to top