Multi-goal API

The robotic environments use an extension of the core Gymnasium API by inheriting from GoalEnv class. The new API forces the environments to have a dictionary observation space that contains 3 keys:

  • observation - The actual observation of the environment

  • desired_goal - The goal that the agent has to achieved

  • achieved_goal - The goal that the agent has currently achieved instead. The objective of the environments is for this value to be close to desired_goal

This API also exposes the function of the reward, as well as the terminated and truncated signals to re-compute their values with different goals. This functionality is useful for algorithms that use Hindsight Experience Replay (HER).

The following example demonstrates how the exposed reward, terminated, and truncated functions can be used to re-compute the values with substituted goals. The info dictionary can be used to store additional information that may be necessary to re-compute the reward, but that is independent of the goal, e.g. state derived from the simulation.

import gymnasium as gym
import gymnasium_robotics

gym.register_envs(gymnasium_robotics)

env = gym.make("FetchReach-v3")
env.reset()
obs, reward, terminated, truncated, info = env.step(env.action_space.sample())

# The following always has to hold:
assert reward == env.compute_reward(obs["achieved_goal"], obs["desired_goal"], info)
assert truncated == env.compute_truncated(obs["achieved_goal"], obs["desired_goal"], info)
assert terminated == env.compute_terminated(obs["achieved_goal"], obs["desired_goal"], info)

# However goals can also be substituted:
substitute_goal = obs["achieved_goal"].copy()
substitute_reward = env.compute_reward(obs["achieved_goal"], substitute_goal, info)
substitute_terminated = env.compute_terminated(obs["achieved_goal"], substitute_goal, info)
substitute_truncated = env.compute_truncated(obs["achieved_goal"], substitute_goal, info)

GoalEnv

The GoalEnv class can also be used for custom environments.

class gymnasium_robotics.core.GoalEnv[source]

A goal-based environment.

It functions just as any regular Gymnasium environment but it imposes a required structure on the observation_space. More concretely, the observation space is required to contain at least three elements, namely observation, desired_goal, and achieved_goal. Here, desired_goal specifies the goal that the agent should attempt to achieve. achieved_goal is the goal that it currently achieved instead. observation contains the actual observations of the environment as per usual.

  • compute_reward() - Externalizes the reward function by taking the achieved and desired goal, as well as extra information. Returns reward.

  • compute_terminated() - Returns boolean termination depending on the achieved and desired goal, as well as extra information.

  • compute_truncated() - Returns boolean truncation depending on the achieved and desired goal, as well as extra information.

Methods

gymnasium_robotics.core.GoalEnv.compute_reward(self, achieved_goal, desired_goal, info)

Compute the step reward. This externalizes the reward function and makes it dependent on a desired goal and the one that was achieved.

If you wish to include additional rewards that are independent of the goal, you can include the necessary values to derive it in ‘info’ and compute it accordingly.

Parameters:
  • achieved_goal (object) – the goal that was achieved during execution

  • desired_goal (object) – the desired goal that we asked the agent to attempt to achieve

  • info (dict) – an info dictionary with additional information

Returns:
  • float – The reward that corresponds to the provided achieved goal w.r.t. to the desired

  • goal. Note that the following should always hold true – ob, reward, terminated, truncated, info = env.step() assert reward == env.compute_reward(ob[‘achieved_goal’], ob[‘desired_goal’], info)

gymnasium_robotics.core.GoalEnv.compute_terminated(self, achieved_goal, desired_goal, info)

Compute the step termination. Allows to customize the termination states depending on the desired and the achieved goal.

If you wish to determine termination states independent of the goal, you can include necessary values to derive it in ‘info’ and compute it accordingly. The envirtonment reaches a termination state when this state leads to an episode ending in an episodic task thus breaking .

More information can be found in: https://farama.org/New-Step-API#theory

Termination states are

Parameters:
  • achieved_goal (object) – the goal that was achieved during execution

  • desired_goal (object) – the desired goal that we asked the agent to attempt to achieve

  • info (dict) – an info dictionary with additional information

Returns:
  • bool – The termination state that corresponds to the provided achieved goal w.r.t. to the desired

  • goal. Note that the following should always hold true – ob, reward, terminated, truncated, info = env.step() assert terminated == env.compute_terminated(ob[‘achieved_goal’], ob[‘desired_goal’], info)

gymnasium_robotics.core.GoalEnv.compute_truncated(self, achieved_goal, desired_goal, info)

Compute the step truncation. Allows to customize the truncated states depending on the desired and the achieved goal.

If you wish to determine truncated states independent of the goal, you can include necessary values to derive it in ‘info’ and compute it accordingly. Truncated states are those that are out of the scope of the Markov Decision Process (MDP) such as time constraints in a continuing task.

More information can be found in: https://farama.org/New-Step-API#theory

Parameters:
  • achieved_goal (object) – the goal that was achieved during execution

  • desired_goal (object) – the desired goal that we asked the agent to attempt to achieve

  • info (dict) – an info dictionary with additional information

Returns:
  • bool – The truncated state that corresponds to the provided achieved goal w.r.t. to the desired

  • goal. Note that the following should always hold true – ob, reward, terminated, truncated, info = env.step() assert truncated == env.compute_truncated(ob[‘achieved_goal’], ob[‘desired_goal’], info)