Manipulate Pen¶

Description¶

This environment was introduced in “Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research”.

The environment is based on the same robot hand as in the HandReach environment, the Shadow Dexterous Hand. The task to be solved is very similar to that in the HandManipulateBlock environment, but in this case a pen is placed on the palm of the hand. The task is to then manipulate the pen such that a target pose is achieved. The goal is 7-dimensional and includes the target position (in Cartesian coordinates) and target rotation (in quaternions). In addition, variations of this environment can be used with increasing levels of difficulty:

HandManipulatePenRotate-v1: Random target rotation x and y axes of the pen and no target rotation around the z axis. No target position.
HandManipulatePenFull-v1: Random target rotation x and y axes of the pen and no target rotation around the z axis. Random target position.

Action Space¶

The action space is a Box(-1.0, 1.0, (20,), float32). The control actions are absolute angular positions of the actuated joints (non-coupled). The input of the control actions is set to a range between -1 and 1 by scaling the actual actuator angle ranges. The elements of the action array are the following:

Num	Action	Control Min	Control Max	Angle Min	Angle Max	Name (in corresponding XML file)	Joint	Unit
0	Angular position of the horizontal wrist joint (radial/ulnar deviation)	-1	1	-0.489 (rad)	0.14 (rad)	robot0:A_WRJ1	hinge	angle (rad)
1	Angular position of the horizontal wrist joint (flexion/extension)	-1	1	-0.698 (rad)	0.489 (rad)	robot0:A_WRJ0	hinge	angle (rad)
2	Horizontal angular position of the MCP joint of the forefinger (adduction/abduction)	-1	1	-0.349 (rad)	0.349(rad)	robot0:A_FFJ3	hinge	angle (rad)
3	Vertical angular position of the MCP joint of the forefinger (flexion/extension)	-1	1	0 (rad)	1.571 (rad)	robot0:A_FFJ2	hinge	angle (rad)
4	Angular position of the PIP joint of the forefinger (flexion/extension)	-1	1	0 (rad)	1.571 (rad)	robot0:A_FFJ1	hinge	angle (rad)
5	Horizontal angular position of the MCP joint of the middle finger (adduction/abduction)	-1	1	-0.349 (rad)	0.349(rad)	robot0:A_MFJ3	hinge	angle (rad)
6	Vertical angular position of the MCP joint of the middle finger (flexion/extension)	-1	1	0 (rad)	1.571 (rad)	robot0:A_MFJ2	hinge	angle (rad)
7	Angular position of the PIP joint of the middle finger (flexion/extension)	-1	1	0 (rad)	1.571 (rad)	robot0:A_MFJ1	hinge	angle (rad)
8	Horizontal angular position of the MCP joint of the ring finger (adduction/abduction)	-1	1	-0.349 (rad)	0.349(rad)	robot0:A_RFJ3	hinge	angle (rad)
9	Vertical angular position of the MCP joint of the ring finger (flexion/extension)	-1	1	0 (rad)	1.571 (rad)	robot0:A_RFJ2	hinge	angle (rad)
10	Angular position of the PIP joint of the ring finger	-1	1	0 (rad)	1.571 (rad)	robot0:A_RFJ1	hinge	angle (rad)
11	Angular position of the CMC joint of the little finger	-1	1	0 (rad)	0.785(rad)	robot0:A_LFJ4	hinge	angle (rad)
12	Horizontal angular position of the MCP joint of the little finger (adduction/abduction)	-1	1	-0.349 (rad)	0.349(rad)	robot0:A_LFJ3	hinge	angle (rad)
13	Vertical angular position of the MCP joint of the little finger (flexion/extension)	-1	1	0 (rad)	1.571 (rad)	robot0:A_LFJ2	hinge	angle (rad)
14	Angular position of the PIP joint of the little finger (flexion/extension)	-1	1	0 (rad)	1.571 (rad)	robot0:A_LFJ1	hinge	angle (rad)
15	Horizontal angular position of the CMC joint of the thumb finger	-1	1	-1.047 (rad)	1.047 (rad)	robot0:A_THJ4	hinge	angle (rad)
16	Vertical Angular position of the CMC joint of the thumb finger	-1	1	0 (rad)	1.222 (rad)	robot0:A_THJ3	hinge	angle (rad)
17	Horizontal angular position of the MCP joint of the thumb finger (adduction/abduction)	-1	1	-0.209 (rad)	0.209(rad)	robot0:A_THJ2	hinge	angle (rad)
18	Vertical angular position of the MCP joint of the thumb finger (flexion/extension)	-1	1	-0.524 (rad)	0.524 (rad)	robot0:A_THJ1	hinge	angle (rad)
19	Angular position of the IP joint of the thumb finger (flexion/extension)	-1	1	-1.571 (rad)	0 (rad)	robot0:A_THJ0	hinge	angle (rad)

Observation Space¶

The observation is a goal-aware observation space. It consists of a dictionary with information about the robot’s joint and pen states, as well as information about the goal. The dictionary consists of the following 3 keys:

observation: its value is an ndarray of shape (61,). It consists of kinematic information of the pen and finger joints. The elements of the array correspond to the following:

Num	Observation	Min	Max	Joint Name (in corresponding XML file)	Joint Type	Unit
0	Angular position of the horizontal wrist joint	-Inf	Inf	robot0:WRJ1	hinge	angle (rad)
1	Angular position of the vertical wrist joint	-Inf	Inf	robot0:WRJ0	hinge	angle (rad)
2	Horizontal angular position of the MCP joint of the forefinger	-Inf	Inf	robot0:FFJ3	hinge	angle (rad)
3	Vertical angular position of the MCP joint of the forefinge	-Inf	Inf	robot0:FFJ2	hinge	angle (rad)
4	Angular position of the PIP joint of the forefinger	-Inf	Inf	robot0:FFJ1	hinge	angle (rad)
5	Angular position of the DIP joint of the forefinger	-Inf	Inf	robot0:FFJ0	hinge	angle (rad)
6	Horizontal angular position of the MCP joint of the middle finger	-Inf	Inf	robot0:MFJ3	hinge	angle (rad)
7	Vertical angular position of the MCP joint of the middle finger	-Inf	Inf	robot0:MFJ2	hinge	angle (rad)
8	Angular position of the PIP joint of the middle finger	-Inf	Inf	robot0:MFJ1	hinge	angle (rad)
9	Angular position of the DIP joint of the middle finger	-Inf	Inf	robot0:MFJ0	hinge	angle (rad)
10	Horizontal angular position of the MCP joint of the ring finger	-Inf	Inf	robot0:RFJ3	hinge	angle (rad)
11	Vertical angular position of the MCP joint of the ring finger	-Inf	Inf	robot0:RFJ2	hinge	angle (rad)
12	Angular position of the PIP joint of the ring finger	-Inf	Inf	robot0:RFJ1	hinge	angle (rad)
13	Angular position of the DIP joint of the ring finger	-Inf	Inf	robot0:RFJ0	hinge	angle (rad)
14	Angular position of the CMC joint of the little finger	-Inf	Inf	robot0:LFJ4	hinge	angle (rad)
15	Horizontal angular position of the MCP joint of the little finger	-Inf	Inf	robot0:LFJ3	hinge	angle (rad)
16	Vertical angular position of the MCP joint of the little finger	-Inf	Inf	robot0:LFJ2	hinge	angle (rad)
17	Angular position of the PIP joint of the little finger	-Inf	Inf	robot0:LFJ1	hinge	angle (rad)
18	Angular position of the DIP joint of the little finger	-Inf	Inf	robot0:LFJ0	hinge	angle (rad)
19	Horizontal angular position of the CMC joint of the thumb finger	-Inf	Inf	robot0:THJ4	hinge	angle (rad)
20	Vertical Angular position of the CMC joint of the thumb finger	-Inf	Inf	robot0:THJ3	hinge	angle (rad)
21	Horizontal angular position of the MCP joint of the thumb finger	-Inf	Inf	robot0:THJ2	hinge	angle (rad)
22	Vertical angular position of the MCP joint of the thumb finger	-Inf	Inf	robot0:THJ1	hinge	angle (rad)
23	Angular position of the IP joint of the thumb finger	-Inf	Inf	robot0:THJ0	hinge	angle (rad)
24	Angular velocity of the horizontal wrist joint	-Inf	Inf	robot0:WRJ1	hinge	angular velocity (rad/s)
25	Angular velocity of the vertical wrist joint	-Inf	Inf	robot0:WRJ0	hinge	angular velocity (rad/s)
26	Horizontal angular velocity of the MCP joint of the forefinger	-Inf	Inf	robot0:FFJ3	hinge	angular velocity (rad/s)
27	Vertical angular velocity of the MCP joint of the forefinge	-Inf	Inf	robot0:FFJ2	hinge	angular velocity (rad/s)
28	Angular velocity of the PIP joint of the forefinger	-Inf	Inf	robot0:FFJ1	hinge	angular velocity (rad/s)
29	Angular velocity of the DIP joint of the forefinger	-Inf	Inf	robot0:FFJ0	hinge	angular velocity (rad/s)
30	Horizontal angular velocity of the MCP joint of the middle finger	-Inf	Inf	robot0:MFJ3	hinge	angular velocity (rad/s)
31	Vertical angular velocity of the MCP joint of the middle finger	-Inf	Inf	robot0:MFJ2	hinge	angular velocity (rad/s)
32	Angular velocity of the PIP joint of the middle finger	-Inf	Inf	robot0:MFJ1	hinge	angular velocity (rad/s)
33	Angular velocity of the DIP joint of the middle finger	-Inf	Inf	robot0:MFJ0	hinge	angular velocity (rad/s)
34	Horizontal angular velocity of the MCP joint of the ring finger	-Inf	Inf	robot0:RFJ3	hinge	angular velocity (rad/s)
35	Vertical angular velocity of the MCP joint of the ring finger	-Inf	Inf	robot0:RFJ2	hinge	angular velocity (rad/s)
36	Angular velocity of the PIP joint of the ring finger	-Inf	Inf	robot0:RFJ1	hinge	angular velocity (rad/s)
37	Angular velocity of the DIP joint of the ring finger	-Inf	Inf	robot0:RFJ0	hinge	angular velocity (rad/s)
38	Angular velocity of the CMC joint of the little finger	-Inf	Inf	robot0:LFJ4	hinge	angular velocity (rad/s)
39	Horizontal angular velocity of the MCP joint of the little finger	-Inf	Inf	robot0:LFJ3	hinge	angular velocity (rad/s)
40	Vertical angular velocity of the MCP joint of the little finger	-Inf	Inf	robot0:LFJ2	hinge	angular velocity (rad/s)
41	Angular velocity of the PIP joint of the little finger	-Inf	Inf	robot0:LFJ1	hinge	angular velocity (rad/s)
42	Angular velocity of the DIP joint of the little finger	-Inf	Inf	robot0:LFJ0	hinge	angular velocity (rad/s)
43	Horizontal angular velocity of the CMC joint of the thumb finger	-Inf	Inf	robot0:THJ4	hinge	angular velocity (rad/s)
44	Vertical Angular velocity of the CMC joint of the thumb finger	-Inf	Inf	robot0:THJ3	hinge	angular velocity (rad/s)
45	Horizontal angular velocity of the MCP joint of the thumb finger	-Inf	Inf	robot0:THJ2	hinge	angular velocity (rad/s)
46	Vertical angular position of the MCP joint of the thumb finger	-Inf	Inf	robot0:THJ1	hinge	angular velocity (rad/s)
47	Angular velocity of the IP joint of the thumb finger	-Inf	Inf	robot0:THJ0	hinge	angular velocity (rad/s)
48	Linear velocity of the pen in x direction	-Inf	Inf	object:joint	free	velocity (m/s)
49	Linear velocity of the pen in y direction	-Inf	Inf	object:joint	free	velocity (m/s)
50	Linear velocity of the pen in z direction	-Inf	Inf	object:joint	free	velocity (m/s)
51	Angular velocity of the pen in x axis	-Inf	Inf	object:joint	free	angular velocity (rad/s)
52	Angular velocity of the pen in y axis	-Inf	Inf	object:joint	free	angular velocity (rad/s)
53	Angular velocity of the pen in z axis	-Inf	Inf	object:joint	free	angular velocity (rad/s)
54	Position of the pen in the x coordinate	-Inf	Inf	object:joint	free	position (m)
55	Position of the pen in the y coordinate	-Inf	Inf	object:joint	free	position (m)
56	Position of the pen in the z coordinate	-Inf	Inf	object:joint	free	position (m)
57	w component of the quaternion orientation of the pen	-Inf	Inf	object:joint	free	-
58	x component of the quaternion orientation of the pen	-Inf	Inf	object:joint	free	-
59	y component of the quaternion orientation of the pen	-Inf	Inf	object:joint	free	-
60	z component of the quaternion orientation of the pen	-Inf	Inf	object:joint	free	-

desired_goal: this key represents the final goal to be achieved. In this environment it is a 7-dimensional ndarray, (7,), that consists of the pose information of the pen. The elements of the array are the following:

Num	Observation	Min	Max	Joint Name (in corresponding XML file)	Joint Type	Unit
0	Target x coordinate of the pen	-Inf	Inf	target:joint	free	position (m)
1	Target y coordinate of the pen	-Inf	Inf	target:joint	free	position (m)
2	Target z coordinate of the pen	-Inf	Inf	target:joint	free	position (m)
3	Target w component of the quaternion orientation of the pen	-Inf	Inf	target:joint	free	-
4	Target x component of the quaternion orientation of the pen	-Inf	Inf	target:joint	free	-
5	Target y component of the quaternion orientation of the pen	-Inf	Inf	target:joint	free	-
6	Target z component of the quaternion orientation of the pen	-Inf	Inf	target:joint	free	-

achieved_goal: this key represents the current state of the pen, as if it would have achieved a goal. This is useful for goal orientated learning algorithms such as those that use Hindsight Experience Replay (HER). The value is an ndarray with shape (7,). The elements of the array are the following:

Num	Observation	Min	Max	Joint Name (in corresponding XML file)	Joint Type	Unit
0	Current x coordinate of the pen	-Inf	Inf	object:joint	free	position (m)
1	Current y coordinate of the pen	-Inf	Inf	object:joint	free	position (m)
2	Current z coordinate of the pen	-Inf	Inf	object:joint	free	position (m)
3	Current w component of the quaternion orientation of the pen	-Inf	Inf	object:joint	free	-
4	Current x component of the quaternion orientation of the pen	-Inf	Inf	object:joint	free	-
5	Current y component of the quaternion orientation of the pen	-Inf	Inf	object:joint	free	-
6	Current z component of the quaternion orientation of the pen	-Inf	Inf	object:joint	free	-

Rewards¶

The reward can be initialized as sparse or dense:

sparse: the returned reward can have two values: -1 if the pen hasn’t reached its final target pose, and 0 if the pen is in its final target pose. The pen is considered to have reached its final goal if the theta angle difference (theta angle of the 3D axis angle representation is less than 0.1 and if the Euclidean distance to the target position is also less than 0.01 m.
dense: the returned reward is the negative summation of the Euclidean distance to the pen’s target and the theta angle difference to the target orientation. The positional distance is multiplied by a factor of 10 to avoid being dominated by the rotational difference.

To initialize this environment with one of the mentioned reward functions the type of reward must be specified in the id string when the environment is initialized. For sparse reward the id is the default of the environment, HandManipulatePen-v1. However, for dense reward the id must be modified to HandManipulatePenDense-v1 and initialized as follows:

import gymnasium as gym
import gymnasium_robotics

gym.register_envs(gymnasium_robotics)

env = gym.make('HandManipulatePen-v1')

The rest of the id’s of the other environment variations follow the same convention to select between a sparse or dense reward function.

Starting State¶

When the environment is reset the joints of the hand are initialized to their resting position with a 0 displacement. The pen’s position and orientation are randomly selected. The initial position is set to (x,y,z)=(1, 0.87, 0.2) and an offset is added to each coordinate sampled from a normal distribution with 0 mean and 0.005 standard deviation. While the initial orientation is set to (w,x,y,z)=(1,0,0,0) and an axis is randomly selected depending on the environment variation to add an angle offset sampled from a uniform distribution with range [-pi, pi].

The target pose of the pen is obtained by adding a random offset to the initial pen pose. For the position the offset is sampled from a uniform distribution with range [(x_min, x_max), (y_min,y_max), (z_min, z_max)] = [(-0.04, 0.04), (-0.06, 0.02), (0.0, 0.06)]. The orientation offset is sampled from a uniform distribution with range [-pi,pi] and added to one of the Euler axis depending on the environment variation.

Episode End¶

The episode will be truncated when the duration reaches a total of max_episode_steps which by default is set to 50 timesteps. The episode is never terminated since the task is continuing with infinite horizon.

Arguments¶

To increase/decrease the maximum number of timesteps before the episode is truncated the max_episode_steps argument can be set at initialization. The default value is 50. For example, to increase the total number of timesteps to 100 make the environment as follows:

import gymnasium as gym
import gymnasium_robotics

gym.register_envs(gymnasium_robotics)

env = gym.make('HandManipulatePen-v1', max_episode_steps=100)

The same applies for the other environment variations.

Version History¶

v1: the environment depends on the newest mujoco python bindings maintained by the MuJoCo team in Deepmind.
v0: the environment depends on mujoco_py which is no longer maintained.