Sudoku Environment#

We provide here a Jax JIT-able implementation of the Sudoku puzzle game.

Observation#

The observation given to the agent consists of:

board: jax array (int32) of shape (9,9): empty cells are represented by -1, and filled cells are represented by 0-8.
action_mask: jax array (bool) of shape (9,9,9): indicates which actions are valid.

Action#

The action space is a MultiDiscreteArray of integer values representing coordinates of the square to explore and the digits to write in the cell, e.g. [3, 6, 8] for writing the digit 9 in the cell located on the fourth row and seventh column.

Reward#

The reward is 1 at the end of the episode if the board is correctly solved, and 0 in every other case.

Termination#

An episode terminates when there are no more legal actions available, this could happen if the board is solved or if the agent finds itself in a dead-end.

Registered Versions 📖#

Sudoku-v0, the classic game on a 9x9 grid, 10000 random puzzles with mixed difficulty are included by default.
Sudoku-very-easy-v0, the classic game on a 9x9 grid, only 1000 very-easy random puzzles (>46 clues) included by default.

Using custom puzzle instances#

If one wants to include its own database of puzzles, the DatabaseGenerator can be initialized with any collection of puzzles using the argument custom_boards. Some references for databases of puzzle of various difficulties: - https://www.kaggle.com/datasets/rohanrao/sudoku - https://www.kaggle.com/datasets/informoney/4-million-sudoku-puzzles-easytohard

Difficulty level as a function of number of clues#

Adapted from An Algorithm for Generating only Desired Permutations for Solving Sudoku Puzzle.