MaMuJoCo (Multi-Agent MuJoCo)#


MaMuJoCo was introduced in “FACMAC: Factored Multi-Agent Centralised Policy Gradients”.

There are 2 types of Environments, included (1) multi-agent factorizations of Gymansium/MuJoCo tasks and (2) new complex MuJoCo tasks meant to me solved with multi-agent Algorithms.

Gymansium-Robotics/MaMuJoCo Represents the first, easy to use Framework for research of agent factorization.


MaMuJoCo mainly uses the PettingZoo.ParallelAPI, but also supports a few extra functions:

gymnasium_robotics.mamujoco_v0.parallel_env.map_local_actions_to_global_action(self, actions: dict[str, ndarray]) ndarray#

Maps multi agent actions into single agent action space.


action – An dict representing the action of each agent


The action of the whole domain (is what eqivilent single agent action would be)


AssertionError – If the Agent action factorization is badly defined (if an action is double defined or not defined at all)

gymnasium_robotics.mamujoco_v0.parallel_env.map_global_action_to_local_actions(self, action: ndarray) dict[str, ndarray]#

Maps single agent action into multi agent action spaces.


action – An array representing the actions of the single agent for this domain


A dictionary of actions to be performed by each agent


AssertionError – If the Agent action factorization sizes are badly defined

gymnasium_robotics.mamujoco_v0.parallel_env.map_global_state_to_local_observations(self, global_state: ndarray) dict[str, ndarray]#

Maps single agent observation into multi agent observation spaces.


global_state – the global_state (generated from MaMuJoCo.state())


A dictionary of states that would be observed by each agent given the ‘global_state’

gymnasium_robotics.mamujoco_v0.parallel_env.map_local_observation_to_global_state(self, local_observations: dict[str, ndarray]) ndarray#

Maps multi agent observations into single agent observation space.

NOT IMPLEMENTED, try using MaMuJoCo.state() instead


local_obserations – the local observation of each agents (generated from MaMuJoCo.step())


the global observations that correspond to a single agent (what you would get with MaMuJoCo.state())

gymnasium_robotics.mamujoco_v0.get_parts_and_edges(label: str, partitioning: str | None) tuple[list[tuple[Node, ...]], list[HyperEdge], list[Node]]#

Gets the mujoco Graph (nodes & edges) given an optional partitioning,.

  • label – the mujoco task to partition

  • partitioning – the partioneing scheme


the partition of the mujoco graph nodes, the graph edges, and global nodes

MaMuJoCo also supports the PettingZoo.AECAPI but does not expose extra functions.


gymnasium_robotics.mamujoco_v0.parallel_env.__init__(self, scenario: str, agent_conf: str | None, agent_obsk: int | None = 1, agent_factorization: dict | None = None, local_categories: list[list[str]] | None = None, global_categories: tuple[str, ...] | None = None, render_mode: str | None = None, **kwargs)#


  • scenario – The Task/Environment, valid values: “Ant”, “HalfCheetah”, “Hopper”, “HumanoidStandup”, “Humanoid”, “Reacher”, “Swimmer”, “Pusher”, “Walker2d”, “InvertedPendulum”, “InvertedDoublePendulum”, “ManySegmentSwimmer”, “ManySegmentAnt”, “CoupledHalfCheetah”

  • agent_conf – ‘${Number Of Agents}x${Number Of Segments per Agent}${Optionally Additional options}’, eg ‘1x6’, ‘2x4’, ‘2x4d’, If it set to None the task becomes single agent (the agent observes the entire environment, and performs all the actions)

  • agent_obsk – Number of nearest joints to observe, If set to 0 it only observes local state, If set to 1 it observes local state + 1 joint over, If set to 2 it observes local state + 2 joints over, If it set to None the task becomes single agent (the agent observes the entire environment, and performs all the actions) The Default value is: 1

  • agent_factorization – A custom factorization of the MuJoCo environment (overwrites agent_conf), see DOC [how to create new agent factorizations](

  • local_categories – The categories of local observations for each observation depth, It takes the form of a list where the k-th element is the list of observable items observable at the k-th depth For example: if it is set to [[“qpos, qvel”], [“qvel”]] then means each agent observes its own position and velocity elements, and it’s neighbors velocity elements. The default is: Check each environment’s page on the “observation space” section.

  • global_categories – The categories of observations extracted from the global observable space, For example: if it is set to (“qpos”) out of the globally observable items of the environment, only the position items will be observed. The default is: Check each environment’s page on the “observation space” section.

  • render_mode – see [Gymansium/MuJoCo](, valid values: ‘human’, ‘rgb_array’, ‘depth_array’

  • kwargs – Additional arguments passed to the [Gymansium/MuJoCo]( environment, Note: arguments that change the observation space will not work.

  • Raises – NotImplementedError: When the scenario is not supported (not part of of the valid values)

How to create new agent factorizations#

example ‘Ant-v4’, ‘8x1’#

In this example, we will create an agent factorization not present in Gymnasium-Robotics/MaMuJoCo the “Ant”/’8x1’, where each agent controls a single joint/action (first implemented by safe-MaMuJoCo).

first we will load the graph of MaMuJoCo:

>>> from gymnasium_robotics.mamujoco_v0 import get_parts_and_edges
>>> unpartioned_nodes, edges, global_nodes = get_parts_and_edges('Ant-v4', None)

The unpartioned_nodes contain the nodes of the MaMuJoCo graph. The edges well, contain the edges of the graph. And the global_nodes a set of observations for all agents.

To create our ‘8x1’ partition we will need to partition the unpartioned_nodes:

>>> unpartioned_nodes
[(hip1, ankle1, hip2, ankle2, hip3, ankle3, hip4, ankle4)]
>>> partioned_nodes = [(unpartioned_nodes[0][0],), (unpartioned_nodes[0][1],), (unpartioned_nodes[0][2],), (unpartioned_nodes[0][3],), (unpartioned_nodes[0][4],), (unpartioned_nodes[0][5],), (unpartioned_nodes[0][6],), (unpartioned_nodes[0][7],)]>>> partioned_nodes
>>> partioned_nodes
[(hip1,), (ankle1,), (hip2,), (ankle2,), (hip3,), (ankle3,), (hip4,), (ankle4,)]

Finally package the partitions and create our environment:

>>> my_agent_factorization = {"partition": partioned_nodes, "edges": edges, "globals": global_nodes}
>>> gym_env = mamujoco_v0('Ant', '8x1', agent_factorization=my_agent_factorization)

Version History#

v0: Initial version release, uses Gymnasium.MuJoCo-v4, and is a fork of the original multiagent_mujuco