Level-Based Foraging

Bases: Environment[State, MultiDiscreteArray, Observation]

An implementation of the Level-Based Foraging environment where agents need to cooperate to collect food and split the reward.

Original implementation: https://github.com/semitable/lb-foraging

observation: Observation
- agent_views: Depending on the observer passed to __init__, it can be a GridObserver or a VectorObserver.
  - GridObserver: Returns an agent's view with a shape of (num_agents, 3, 2 * fov + 1, 2 * fov +1).
  - VectorObserver: Returns an agent's view with a shape of (num_agents, 3 * (num_food + num_agents).
- action_mask: JAX array (bool) of shape (num_agents, 6) indicating for each agent which size actions (no-op, up, down, left, right, load) are allowed.
- step_count: int32, the number of steps since the beginning of the episode.
action: JAX array (int32) of shape (num_agents,). The valid actions for each agent are (0: noop, 1: up, 2: down, 3: left, 4: right, 5: load).
reward: JAX array (float) of shape (num_agents,) When one or more agents load food, the food level is rewarded to the agents, weighted by the level of each agent. The reward is then normalized so that, at the end, the sum of the rewards (if all food items have been picked up) is one.
Episode Termination:
- All food items have been eaten.
- The number of steps is greater than the limit.
state: State
- agents: Stacked Pytree of Agent objects of length num_agents.
  - Agent:
    - id: JAX array (int32) of shape ().
    - position: JAX array (int32) of shape (2,).
    - level: JAX array (int32) of shape ().
    - loading: JAX array (bool) of shape ().
- food_items: Stacked Pytree of Food objects of length num_food.
  - Food:
    - id: JAX array (int32) of shape ().
    - position: JAX array (int32) of shape (2,).
    - level: JAX array (int32) of shape ().
    - eaten: JAX array (bool) of shape ().
- step_count: JAX array (int32) of shape (), the number of steps since the beginning of the episode.
- key: JAX array (uint) of shape (2,) JAX random generation key. Ignored since the environment is deterministic.

Example:

from jumanji.environments import LevelBasedForaging
env = LevelBasedForaging()
key = jax.random.key(0)
state, timestep = jax.jit(env.reset)(key)
env.render(state)
action = env.action_spec.generate_value()
state, timestep = jax.jit(env.step)(state, action)
env.render(state)

Initialization Args: - generator: A Generator object that generates the initial state of the environment. Defaults to a RandomGenerator with the following parameters: - grid_size: 8 - fov: 8 (full observation of the grid) - num_agents: 2 - num_food: 2 - max_agent_level: 2 - force_coop: True - time_limit: The maximum number of steps in an episode. Defaults to 200. - grid_observation: If True, the observer generates a grid observation (default is False). - normalize_reward: If True, normalizes the reward (default is True). - penalty: The penalty value (default is 0.0). - viewer: Viewer to render the environment. Defaults to LevelBasedForagingViewer.

Source code in jumanji/environments/routing/lbf/env.py

def __init__(
    self,
    generator: Optional[RandomGenerator] = None,
    viewer: Optional[Viewer[State]] = None,
    time_limit: int = 100,
    grid_observation: bool = False,
    normalize_reward: bool = True,
    penalty: float = 0.0,
) -> None:
    self._generator = generator or RandomGenerator(
        grid_size=8,
        fov=8,
        num_agents=2,
        num_food=2,
        force_coop=True,
    )
    self.time_limit = time_limit
    self.grid_size: int = self._generator.grid_size
    self.num_agents: int = self._generator.num_agents
    self.num_food: int = self._generator.num_food
    self.fov = self._generator.fov
    self.normalize_reward = normalize_reward
    self.penalty = penalty

    self._observer: Union[VectorObserver, GridObserver]
    if not grid_observation:
        self._observer = VectorObserver(
            fov=self.fov,
            grid_size=self.grid_size,
            num_agents=self.num_agents,
            num_food=self.num_food,
        )
    else:
        self._observer = GridObserver(
            fov=self.fov,
            grid_size=self.grid_size,
            num_agents=self.num_agents,
            num_food=self.num_food,
        )

    super().__init__()

    # create viewer for rendering environment
    self._viewer = viewer or LevelBasedForagingViewer(self.grid_size, "LevelBasedForaging")

`action_spec` `cached` `property` #

Returns the action spec for the Level Based Foraging environment.

Returns:

Type	Description
`MultiDiscreteArray`	specs.MultiDiscreteArray: Action spec for the environment with shape (num_agents,).

`discount_spec` `cached` `property` #

Describes the discount returned by the environment.

Returns:

Name	Type	Description
`discount_spec`	`BoundedArray`	a `specs.BoundedArray` spec.

`observation_spec` `cached` `property` #

Specifications of the observation of the environment.

The spec's shape depends on the observer passed to __init__.

The GridObserver returns an agent's view with a shape of (num_agents, 3, 2 * fov + 1, 2 * fov +1). The VectorObserver returns an agent's view with a shape of (num_agents, 3 * num_food + 3 * num_agents). See a more detailed description of the observations in the docs of GridObserver and VectorObserver.

Returns:

Type	Description
`Spec[Observation]`	specs.Spec[Observation]: Spec for the `Observation` with fields grid,
`Spec[Observation]`	action_mask, and step_count.

`reward_spec` `cached` `property` #

Returns the reward specification for the LevelBasedForaging environment.

Since this is a multi-agent environment each agent gets its own reward.

Returns:

Type	Description
`Array`	specs.Array: Reward specification, of shape (num_agents,) for the environment.

`animate(states, interval=200, save_path=None)` #

Creates an animation from a sequence of states.

Parameters:

Name	Type	Description	Default
`states`	`Sequence[State]`	Sequence of `State` corresponding to subsequent timesteps.	required
`interval`	`int`	Delay between frames in milliseconds, default to 200.	`200`
`save_path`	`Optional[str]`	The path where the animation file should be saved.	`None`

Returns:

Type	Description
`FuncAnimation`	matplotlib.animation.FuncAnimation: Animation object that can be saved as a GIF, MP4,
`FuncAnimation`	or rendered with HTML.

Source code in jumanji/environments/routing/lbf/env.py

def animate(
    self,
    states: Sequence[State],
    interval: int = 200,
    save_path: Optional[str] = None,
) -> matplotlib.animation.FuncAnimation:
    """Creates an animation from a sequence of states.

    Args:
        states (Sequence[State]): Sequence of `State` corresponding to subsequent timesteps.
        interval (int): Delay between frames in milliseconds, default to 200.
        save_path (Optional[str]): The path where the animation file should be saved.

    Returns:
        matplotlib.animation.FuncAnimation: Animation object that can be saved as a GIF, MP4,
        or rendered with HTML.
    """
    return self._viewer.animate(states=states, interval=interval, save_path=save_path)

`close()` #

Perform any necessary cleanup.

Source code in jumanji/environments/routing/lbf/env.py

def close(self) -> None:
    """Perform any necessary cleanup."""
    self._viewer.close()

`get_reward(food_items, adj_loading_agents_levels, eaten_this_step)` #

Returns a reward for all agents given all food items.

Parameters:

Name	Type	Description	Default
`food_items`	`Food`	All the food items in the environment.	required
`adj_loading_agents_levels`	`Array`	The level of all agents adjacent to all foods.	required
`eaten_this_step`	`Array`	Whether the food was eaten or not (this step).	required

Source code in jumanji/environments/routing/lbf/env.py

def get_reward(
    self,
    food_items: Food,
    adj_loading_agents_levels: chex.Array,
    eaten_this_step: chex.Array,
) -> chex.Array:
    """Returns a reward for all agents given all food items.

    Args:
        food_items (Food): All the food items in the environment.
        adj_loading_agents_levels (chex.Array): The level of all agents adjacent to all foods.
        eaten_this_step (chex.Array): Whether the food was eaten or not (this step).
    """

    def get_reward_per_food(
        food: Food,
        adj_loading_agents_levels: chex.Array,
        eaten_this_step: chex.Array,
    ) -> chex.Array:
        """Returns the reward for all agents given a single food."""

        # If the food has already been eaten or is not loaded, the sum will be equal to 0
        sum_agents_levels = jnp.sum(adj_loading_agents_levels)

        # Penalize agents for not being able to cooperate and eat food
        penalty = jnp.where(
            (sum_agents_levels != 0) & (sum_agents_levels < food.level),
            self.penalty,
            0,
        )

        # Zero out all agents if food was not eaten and add penalty
        reward = (adj_loading_agents_levels * eaten_this_step * food.level) - penalty

        # jnp.nan_to_num: Used in the case where no agents are adjacent to the food
        normalizer = sum_agents_levels * total_food_level
        reward = jnp.where(self.normalize_reward, jnp.nan_to_num(reward / normalizer), reward)

        return reward

    # Get reward per food for all food items,
    # then sum it on the agent dimension to get reward per agent.
    total_food_level = jnp.sum(food_items.level)
    reward_per_food = jax.vmap(get_reward_per_food, in_axes=(0, 0, 0))(
        food_items, adj_loading_agents_levels, eaten_this_step
    )
    return jnp.sum(reward_per_food, axis=0)

`render(state)` #

Renders the current state of the LevelBasedForaging environment.

Parameters:

Name	Type	Description	Default
`state`	`State`	The current environment state to be rendered.	required

Returns:

Type	Description
`Optional[NDArray]`	Optional[NDArray]: Rendered environment state.

Source code in jumanji/environments/routing/lbf/env.py

def render(self, state: State) -> Optional[NDArray]:
    """Renders the current state of the `LevelBasedForaging` environment.

    Args:
        state (State): The current environment state to be rendered.

    Returns:
        Optional[NDArray]: Rendered environment state.
    """
    return self._viewer.render(state)

`reset(key)` #

Resets the environment.

Parameters:

Name	Type	Description	Default
`key`	`PRNGKey`	Used to randomly generate the new `State`.	required

Returns:

Type	Description
`State`	Tuple[State, TimeStep]: `State` object corresponding to the new initial state
`TimeStep`	of the environment and `TimeStep` object corresponding to the initial timestep.

Source code in jumanji/environments/routing/lbf/env.py

def reset(self, key: chex.PRNGKey) -> Tuple[State, TimeStep]:
    """Resets the environment.

    Args:
        key (chex.PRNGKey): Used to randomly generate the new `State`.

    Returns:
        Tuple[State, TimeStep]: `State` object corresponding to the new initial state
        of the environment and `TimeStep` object corresponding to the initial timestep.
    """
    state = self._generator(key)
    observation = self._observer.state_to_observation(state)
    timestep = restart(observation, shape=self.num_agents)
    timestep.extras = self._get_extra_info(state, timestep)

    return state, timestep

`step(state, actions)` #

Simulate one step of the environment.

Parameters:

Name	Type	Description	Default
`state`	`State`	State containing the dynamics of the environment.	required
`actions`	`Array`	Array containing the actions to take for each agent.	required

Returns:

Type	Description
`State`	Tuple[State, TimeStep]: `State` object corresponding to the next state and
`TimeStep`	`TimeStep` object corresponding the timestep returned by the environment.

Source code in jumanji/environments/routing/lbf/env.py

def step(self, state: State, actions: chex.Array) -> Tuple[State, TimeStep]:
    """Simulate one step of the environment.

    Args:
        state (State): State  containing the dynamics of the environment.
        actions (chex.Array): Array containing the actions to take for each agent.

    Returns:
        Tuple[State, TimeStep]: `State` object corresponding to the next state and
        `TimeStep` object corresponding the timestep returned by the environment.
    """
    # Move agents, fix collisions that may happen and set loading status.
    moved_agents = utils.update_agent_positions(
        state.agents, actions, state.food_items, self.grid_size
    )

    # Eat the food
    food_items, eaten_this_step, adj_loading_agents_levels = jax.vmap(
        utils.eat_food, (None, 0)
    )(moved_agents, state.food_items)

    reward = self.get_reward(food_items, adj_loading_agents_levels, eaten_this_step)

    state = State(
        agents=moved_agents,
        food_items=food_items,
        step_count=state.step_count + 1,
        key=state.key,
    )
    observation = self._observer.state_to_observation(state)

    # First condition is truncation, second is termination.
    terminate = jnp.all(state.food_items.eaten)
    truncate = state.step_count >= self.time_limit

    timestep = jax.lax.switch(
        terminate + 2 * truncate,
        [
            # !terminate !trunc
            lambda rew, obs: transition(reward=rew, observation=obs, shape=self.num_agents),
            # terminate !truncate
            lambda rew, obs: termination(reward=rew, observation=obs, shape=self.num_agents),
            # !terminate truncate
            lambda rew, obs: truncation(reward=rew, observation=obs, shape=self.num_agents),
            # terminate truncate
            lambda rew, obs: termination(reward=rew, observation=obs, shape=self.num_agents),
        ],
        reward,
        observation,
    )
    timestep.extras = self._get_extra_info(state, timestep)

    return state, timestep

Level-Based Foraging

action_spec cached property #

discount_spec cached property #

observation_spec cached property #

reward_spec cached property #

animate(states, interval=200, save_path=None) #

close() #

get_reward(food_items, adj_loading_agents_levels, eaten_this_step) #

render(state) #

reset(key) #

step(state, actions) #

`action_spec` `cached` `property` #

`discount_spec` `cached` `property` #

`observation_spec` `cached` `property` #

`reward_spec` `cached` `property` #

`animate(states, interval=200, save_path=None)` #

`close()` #

`get_reward(food_items, adj_loading_agents_levels, eaten_this_step)` #

`render(state)` #

`reset(key)` #

`step(state, actions)` #