Multi-goal API¶
The robotic environments use an extension of the core Gymnasium API by inheriting from GoalEnv class. The new API forces the environments to have a dictionary observation space that contains 3 keys:
observation
- The actual observation of the environmentdesired_goal
- The goal that the agent has to achievedachieved_goal
- The goal that the agent has currently achieved instead. The objective of the environments is for this value to be close todesired_goal
This API also exposes the function of the reward, as well as the terminated and truncated signals to re-compute their values with different goals. This functionality is useful for algorithms that use Hindsight Experience Replay (HER).
The following example demonstrates how the exposed reward, terminated, and truncated functions can be used to re-compute the values with substituted goals. The info dictionary can be used to store additional information that may be necessary to re-compute the reward, but that is independent of the goal, e.g. state derived from the simulation.
import gymnasium as gym
import gymnasium_robotics
gym.register_envs(gymnasium_robotics)
env = gym.make("FetchReach-v3")
env.reset()
obs, reward, terminated, truncated, info = env.step(env.action_space.sample())
# The following always has to hold:
assert reward == env.compute_reward(obs["achieved_goal"], obs["desired_goal"], info)
assert truncated == env.compute_truncated(obs["achieved_goal"], obs["desired_goal"], info)
assert terminated == env.compute_terminated(obs["achieved_goal"], obs["desired_goal"], info)
# However goals can also be substituted:
substitute_goal = obs["achieved_goal"].copy()
substitute_reward = env.compute_reward(obs["achieved_goal"], substitute_goal, info)
substitute_terminated = env.compute_terminated(obs["achieved_goal"], substitute_goal, info)
substitute_truncated = env.compute_truncated(obs["achieved_goal"], substitute_goal, info)
GoalEnv¶
The GoalEnv
class can also be used for custom environments.
- class gymnasium_robotics.core.GoalEnv[source]¶
A goal-based environment.
It functions just as any regular Gymnasium environment but it imposes a required structure on the observation_space. More concretely, the observation space is required to contain at least three elements, namely observation, desired_goal, and achieved_goal. Here, desired_goal specifies the goal that the agent should attempt to achieve. achieved_goal is the goal that it currently achieved instead. observation contains the actual observations of the environment as per usual.
compute_reward()
- Externalizes the reward function by taking the achieved and desired goal, as well as extra information. Returns reward.compute_terminated()
- Returns boolean termination depending on the achieved and desired goal, as well as extra information.compute_truncated()
- Returns boolean truncation depending on the achieved and desired goal, as well as extra information.
Methods¶
- gymnasium_robotics.core.GoalEnv.compute_reward(self, achieved_goal, desired_goal, info)¶
Compute the step reward. This externalizes the reward function and makes it dependent on a desired goal and the one that was achieved.
If you wish to include additional rewards that are independent of the goal, you can include the necessary values to derive it in ‘info’ and compute it accordingly.
- Parameters:
achieved_goal (object) – the goal that was achieved during execution
desired_goal (object) – the desired goal that we asked the agent to attempt to achieve
info (dict) – an info dictionary with additional information
- Returns:
float – The reward that corresponds to the provided achieved goal w.r.t. to the desired
goal. Note that the following should always hold true – ob, reward, terminated, truncated, info = env.step() assert reward == env.compute_reward(ob[‘achieved_goal’], ob[‘desired_goal’], info)
- gymnasium_robotics.core.GoalEnv.compute_terminated(self, achieved_goal, desired_goal, info)¶
Compute the step termination. Allows to customize the termination states depending on the desired and the achieved goal.
If you wish to determine termination states independent of the goal, you can include necessary values to derive it in ‘info’ and compute it accordingly. The envirtonment reaches a termination state when this state leads to an episode ending in an episodic task thus breaking .
More information can be found in: https://farama.org/New-Step-API#theory
Termination states are
- Parameters:
achieved_goal (object) – the goal that was achieved during execution
desired_goal (object) – the desired goal that we asked the agent to attempt to achieve
info (dict) – an info dictionary with additional information
- Returns:
bool – The termination state that corresponds to the provided achieved goal w.r.t. to the desired
goal. Note that the following should always hold true – ob, reward, terminated, truncated, info = env.step() assert terminated == env.compute_terminated(ob[‘achieved_goal’], ob[‘desired_goal’], info)
- gymnasium_robotics.core.GoalEnv.compute_truncated(self, achieved_goal, desired_goal, info)¶
Compute the step truncation. Allows to customize the truncated states depending on the desired and the achieved goal.
If you wish to determine truncated states independent of the goal, you can include necessary values to derive it in ‘info’ and compute it accordingly. Truncated states are those that are out of the scope of the Markov Decision Process (MDP) such as time constraints in a continuing task.
More information can be found in: https://farama.org/New-Step-API#theory
- Parameters:
achieved_goal (object) – the goal that was achieved during execution
desired_goal (object) – the desired goal that we asked the agent to attempt to achieve
info (dict) – an info dictionary with additional information
- Returns:
bool – The truncated state that corresponds to the provided achieved goal w.r.t. to the desired
goal. Note that the following should always hold true – ob, reward, terminated, truncated, info = env.step() assert truncated == env.compute_truncated(ob[‘achieved_goal’], ob[‘desired_goal’], info)