Reach#

Description#

This environment was introduced in “Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research”.

The environment is based on the Shadow Dexterous Hand, which is an antropomorphic robotic hand with 24 joints. The goal of the task is for the fingertips of the hand to reach a predefined target Cartesian position. The hand has a total of 20 motor controlled degrees of freedom out of the 24 joints. The thumb has 5 joints and 5 DoF while the rest of the fingers have 4 joints and 3 DoF (each finger’s distal joint is coupled with a tendon to its middle joint just like a human hand, so that the middle joint angle is always greater or equal to the distal joint angle). The control frequency of the actuators is of f = 25 Hz. This is achieved by applying the same action in 20 subsequent simulator step (with a time step of dt = 0.002 s) before returning the control to the robot.

The kinematics of the Shadow Dexterous Hand resembles that of the human hand. The robot hand has 2 degrees of freedom for the wrist to perform the radial/lunar deviation movements (WRJ1) and flexion/extension (WRJ0). Each finger has three joints in common. The joint closer to the palm is called metacarpophalangeal (MCP) and has a total of 2 degrees of freedom each. In the robot they are defined as FFJ3, MFJ3, RFJ3, LFJ3, and THJ2 (forefinger, middle finger, ring finger, little finger, and thumb respectively) for the adduction/abduction degree of freedom, and FFJ2, MFJ2, RFJ2, LFJ2, THJ1 for the flexion/extension DoF. The middle joint in the fingers is known as proximal interphalangea (PIP), which in the robot hand correspond to FFJ1, MFJ1, RFJ1, and LFJ1. This joint is also responsible for flexion/extension. The last joint in common is the most distant to the palm, called distal interphalangeal (DIP) and in the robot hand FFJ0, MFJ0, RFJ0, and LFJ0. This joint is not actuated but coupled to the PIP joints by tendons in MuJoCo.

In the robot hand an extra joint is added to the little finger LFJ4 in order to perform the opposition movement with the thumb. Also the the human thumb has two different joints than the rest of the fingers. The carpometacarpal (CMC) joint located close to the palm area, THJ4 and THJ3in the robot. And the interphalangeal joint which is in the same location as the DIP but in this case actuated. This joint is the THJ0 in the robot hand.

Action Space#

The action space is a Box(-1.0, 1.0, (20,), float32). The control actions are absolute angular positions of the actuated joints (non-coupled). The input of the control actions is set to a range between -1 and 1 by scaling the actual actuator angle ranges. The elements of the action array are the following:

Num	Action	Control Min	Control Max	Angle Min	Angle Max	Name (in corresponding XML file)	Joint	Unit
0	Angular position of the horizontal wrist joint (radial/ulnar deviation)	-1	1	-0.489 (rad)	0.14 (rad)	robot0:A_WRJ1	hinge	angle (rad)
1	Angular position of the horizontal wrist joint (flexion/extension)	-1	1	-0.698 (rad)	0.489 (rad)	robot0:A_WRJ0	hinge	angle (rad)
2	Horizontal angular position of the MCP joint of the forefinger (adduction/abduction)	-1	1	-0.349 (rad)	0.349(rad)	robot0:A_FFJ3	hinge	angle (rad)
3	Vertical angular position of the MCP joint of the forefinger (flexion/extension)	-1	1	0 (rad)	1.571 (rad)	robot0:A_FFJ2	hinge	angle (rad)
4	Angular position of the PIP joint of the forefinger (flexion/extension)	-1	1	0 (rad)	1.571 (rad)	robot0:A_FFJ1	hinge	angle (rad)
5	Horizontal angular position of the MCP joint of the middle finger (adduction/abduction)	-1	1	-0.349 (rad)	0.349(rad)	robot0:A_MFJ3	hinge	angle (rad)
6	Vertical angular position of the MCP joint of the middle finger (flexion/extension)	-1	1	0 (rad)	1.571 (rad)	robot0:A_MFJ2	hinge	angle (rad)
7	Angular position of the PIP joint of the middle finger (flexion/extension)	-1	1	0 (rad)	1.571 (rad)	robot0:A_MFJ1	hinge	angle (rad)
8	Horizontal angular position of the MCP joint of the ring finger (adduction/abduction)	-1	1	-0.349 (rad)	0.349(rad)	robot0:A_RFJ3	hinge	angle (rad)
9	Vertical angular position of the MCP joint of the ring finger (flexion/extension)	-1	1	0 (rad)	1.571 (rad)	robot0:A_RFJ2	hinge	angle (rad)
10	Angular position of the PIP joint of the ring finger	-1	1	0 (rad)	1.571 (rad)	robot0:A_RFJ1	hinge	angle (rad)
11	Angular position of the CMC joint of the little finger	-1	1	0 (rad)	0.785(rad)	robot0:A_LFJ4	hinge	angle (rad)
12	Horizontal angular position of the MCP joint of the little finger (adduction/abduction)	-1	1	-0.349 (rad)	0.349(rad)	robot0:A_LFJ3	hinge	angle (rad)
13	Vertical angular position of the MCP joint of the little finger (flexion/extension)	-1	1	0 (rad)	1.571 (rad)	robot0:A_LFJ2	hinge	angle (rad)
14	Angular position of the PIP joint of the little finger (flexion/extension)	-1	1	0 (rad)	1.571 (rad)	robot0:A_LFJ1	hinge	angle (rad)
15	Horizontal angular position of the CMC joint of the thumb finger	-1	1	-1.047 (rad)	1.047 (rad)	robot0:A_THJ4	hinge	angle (rad)
16	Vertical Angular position of the CMC joint of the thumb finger	-1	1	0 (rad)	1.222 (rad)	robot0:A_THJ3	hinge	angle (rad)
17	Horizontal angular position of the MCP joint of the thumb finger (adduction/abduction)	-1	1	-0.209 (rad)	0.209(rad)	robot0:A_THJ2	hinge	angle (rad)
18	Vertical angular position of the MCP joint of the thumb finger (flexion/extension)	-1	1	-0.524 (rad)	0.524 (rad)	robot0:A_THJ1	hinge	angle (rad)
19	Angular position of the IP joint of the thumb finger (flexion/extension)	-1	1	-1.571 (rad)	0 (rad)	robot0:A_THJ0	hinge	angle (rad)

Observation Space#

The observation is a goal-aware observation space. It consists of a dictionary with information about the robot’s joint and finger states, as well as information about the goal. The finger tip observations are derived from Mujoco bodies known as sites attached to the body of interest such as the finger tips. The dictionary consists of the following 3 keys:

observation: its value is an ndarray of shape (63,). It consists of kinematic information of the block object and gripper. The elements of the array correspond to the following:

Num	Observation	Min	Max	Joint Name (in corresponding XML file)	Site Name (in corresponding XML file)	Joint Type	Unit
0	Angular position of the horizontal wrist joint	-Inf	Inf	robot0:WRJ1	-	hinge	angle (rad)
1	Angular position of the vertical wrist joint	-Inf	Inf	robot0:WRJ0	-	hinge	angle (rad)
2	Horizontal angular position of the MCP joint of the forefinger	-Inf	Inf	robot0:FFJ3	-	hinge	angle (rad)
3	Vertical angular position of the MCP joint of the forefinge	-Inf	Inf	robot0:FFJ2	-	hinge	angle (rad)
4	Angular position of the PIP joint of the forefinger	-Inf	Inf	robot0:FFJ1	-	hinge	angle (rad)
5	Angular position of the DIP joint of the forefinger	-Inf	Inf	robot0:FFJ0	-	hinge	angle (rad)
6	Horizontal angular position of the MCP joint of the middle finger	-Inf	Inf	robot0:MFJ3	-	hinge	angle (rad)
7	Vertical angular position of the MCP joint of the middle finger	-Inf	Inf	robot0:MFJ2	-	hinge	angle (rad)
8	Angular position of the PIP joint of the middle finger	-Inf	Inf	robot0:MFJ1	-	hinge	angle (rad)
9	Angular position of the DIP joint of the middle finger	-Inf	Inf	robot0:MFJ0	-	hinge	angle (rad)
10	Horizontal angular position of the MCP joint of the ring finger	-Inf	Inf	robot0:RFJ3	-	hinge	angle (rad)
11	Vertical angular position of the MCP joint of the ring finger	-Inf	Inf	robot0:RFJ2	-	hinge	angle (rad)
12	Angular position of the PIP joint of the ring finger	-Inf	Inf	robot0:RFJ1	-	hinge	angle (rad)
13	Angular position of the DIP joint of the ring finger	-Inf	Inf	robot0:RFJ0	-	hinge	angle (rad)
14	Angular position of the CMC joint of the little finger	-Inf	Inf	robot0:LFJ4	-	hinge	angle (rad)
15	Horizontal angular position of the MCP joint of the little finger	-Inf	Inf	robot0:LFJ3	-	hinge	angle (rad)
16	Vertical angular position of the MCP joint of the little finger	-Inf	Inf	robot0:LFJ2	-	hinge	angle (rad)
17	Angular position of the PIP joint of the little finger	-Inf	Inf	robot0:LFJ1	-	hinge	angle (rad)
18	Angular position of the DIP joint of the little finger	-Inf	Inf	robot0:LFJ0	-	hinge	angle (rad)
19	Horizontal angular position of the CMC joint of the thumb finger	-Inf	Inf	robot0:THJ4	-	hinge	angle (rad)
20	Vertical Angular position of the CMC joint of the thumb finger	-Inf	Inf	robot0:THJ3	-	hinge	angle (rad)
21	Horizontal angular position of the MCP joint of the thumb finger	-Inf	Inf	robot0:THJ2	-	hinge	angle (rad)
22	Vertical angular position of the MCP joint of the thumb finger	-Inf	Inf	robot0:THJ1	-	hinge	angle (rad)
23	Angular position of the IP joint of the thumb finger	-Inf	Inf	robot0:THJ0	-	hinge	angle (rad)
24	Angular velocity of the horizontal wrist joint	-Inf	Inf	robot0:WRJ1	-	hinge	angular velocity (rad/s)
25	Angular velocity of the vertical wrist joint	-Inf	Inf	robot0:WRJ0	-	hinge	angular velocity (rad/s)
26	Horizontal angular velocity of the MCP joint of the forefinger	-Inf	Inf	robot0:FFJ3	-	hinge	angular velocity (rad/s)
27	Vertical angular velocity of the MCP joint of the forefinge	-Inf	Inf	robot0:FFJ2	-	hinge	angular velocity (rad/s)
28	Angular velocity of the PIP joint of the forefinger	-Inf	Inf	robot0:FFJ1	-	hinge	angular velocity (rad/s)
29	Angular velocity of the DIP joint of the forefinger	-Inf	Inf	robot0:FFJ0	-	hinge	angular velocity (rad/s)
30	Horizontal angular velocity of the MCP joint of the middle finger	-Inf	Inf	robot0:MFJ3	-	hinge	angular velocity (rad/s)
31	Vertical angular velocity of the MCP joint of the middle finger	-Inf	Inf	robot0:MFJ2	-	hinge	angular velocity (rad/s)
32	Angular velocity of the PIP joint of the middle finger	-Inf	Inf	robot0:MFJ1	-	hinge	angular velocity (rad/s)
33	Angular velocity of the DIP joint of the middle finger	-Inf	Inf	robot0:MFJ0	-	hinge	angular velocity (rad/s)
34	Horizontal angular velocity of the MCP joint of the ring finger	-Inf	Inf	robot0:RFJ3	-	hinge	angular velocity (rad/s)
35	Vertical angular velocity of the MCP joint of the ring finger	-Inf	Inf	robot0:RFJ2	-	hinge	angular velocity (rad/s)
36	Angular velocity of the PIP joint of the ring finger	-Inf	Inf	robot0:RFJ1	-	hinge	angular velocity (rad/s)
37	Angular velocity of the DIP joint of the ring finger	-Inf	Inf	robot0:RFJ0	-	hinge	angular velocity (rad/s)
38	Angular velocity of the CMC joint of the little finger	-Inf	Inf	robot0:LFJ4	-	hinge	angular velocity (rad/s)
39	Horizontal angular velocity of the MCP joint of the little finger	-Inf	Inf	robot0:LFJ3	-	hinge	angular velocity (rad/s)
40	Vertical angular velocity of the MCP joint of the little finger	-Inf	Inf	robot0:LFJ2	-	hinge	angular velocity (rad/s)
41	Angular velocity of the PIP joint of the little finger	-Inf	Inf	robot0:LFJ1	-	hinge	angular velocity (rad/s)
42	Angular velocity of the DIP joint of the little finger	-Inf	Inf	robot0:LFJ0	-	hinge	angular velocity (rad/s)
43	Horizontal angular velocity of the CMC joint of the thumb finger	-Inf	Inf	robot0:THJ4	-	hinge	angular velocity (rad/s)
44	Vertical Angular velocity of the CMC joint of the thumb finger	-Inf	Inf	robot0:THJ3	-	hinge	angular velocity (rad/s)
45	Horizontal angular velocity of the MCP joint of the thumb finger	-Inf	Inf	robot0:THJ2	-	hinge	angular velocity (rad/s)
46	Vertical angular position of the MCP joint of the thumb finger	-Inf	Inf	robot0:THJ1	-	hinge	angular velocity (rad/s)
47	Angular velocity of the IP joint of the thumb finger	-Inf	Inf	robot0:THJ0	-	hinge	angular velocity (rad/s)
48	x coordinate of the tip of the forefinger	-Inf	Inf	-	robot0:S_fftip	-	position (m)
49	y coordinate of the tip of the forefinger	-Inf	Inf	-	robot0:S_fftip	-	position (m)
50	z coordinate of the tip of the forefinger	-Inf	Inf	-	robot0:S_fftip	-	position (m)
51	x coordinate of the tip of the middle finger	-Inf	Inf	-	robot0:S_mftip	-	position (m)
52	y coordinate of the tip of the middle finger	-Inf	Inf	-	robot0:S_mftip	-	position (m)
53	z coordinate of the tip of the middle finger	-Inf	Inf	-	robot0:S_mftip	-	position (m)
54	x coordinate of the tip of the ring finger	-Inf	Inf	-	robot0:S_rftip	-	position (m)
55	y coordinate of the tip of the ring finger	-Inf	Inf	-	robot0:S_rftip	-	position (m)
56	z coordinate of the tip of the ring finger	-Inf	Inf	-	robot0:S_rftip	-	position (m)
57	x coordinate of the tip of the little finger	-Inf	Inf	-	robot0:S_lftip	-	position (m)
58	y coordinate of the tip of the little finger	-Inf	Inf	-	robot0:S_lftip	-	position (m)
59	z coordinate of the tip of the little finger	-Inf	Inf	-	robot0:S_lftip	-	position (m)
60	x coordinate of the tip of the thumb finger	-Inf	Inf	-	robot0:S_thtip	-	position (m)
61	y coordinate of the tip of the thumb finger	-Inf	Inf	-	robot0:S_thtip	-	position (m)
62	z coordinate of the tip of the thumb finger	-Inf	Inf	-	robot0:S_thtip	-	position (m)

desired_goal: this key represents the final goal to be achieved. In this environment it is a 15-dimensional ndarray, (15,), that consists of the 15 cartesian coordinates of the desired final finger tip position [x,y,z]. The elements of the array are the following:

Num	Observation	Min	Max	Site Name (in corresponding XML file)	Unit
0	Target x coordinate of the tip of the forefinger	-Inf	Inf	target0	position (m)
1	Target y coordinate of the tip of the forefinger	-Inf	Inf	target0	position (m)
2	Target z coordinate of the tip of the forefinger	-Inf	Inf	target0	position (m)
3	Target x coordinate of the tip of the middle finger	-Inf	Inf	target1	position (m)
4	Target y coordinate of the tip of the middle finger	-Inf	Inf	target1	position (m)
5	Target z coordinate of the tip of the middle finger	-Inf	Inf	target1	position (m)
6	Target x coordinate of the tip of the ring finger	-Inf	Inf	target2	position (m)
7	Target y coordinate of the tip of the ring finger	-Inf	Inf	target2	position (m)
8	Target z coordinate of the tip of the ring finger	-Inf	Inf	target2	position (m)
9	Target x coordinate of the tip of the little finger	-Inf	Inf	target3	position (m)
10	Target y coordinate of the tip of the little finger	-Inf	Inf	target3	position (m)
11	Target z coordinate of the tip of the little finger	-Inf	Inf	target3	position (m)
12	Target x coordinate of the tip of the thumb finger	-Inf	Inf	target4	position (m)
13	Target y coordinate of the tip of the thumb finger	-Inf	Inf	target4	position (m)
14	Target z coordinate of the tip of the thumb finger	-Inf	Inf	target4	position (m)

achieved_goal: this key represents the current state of the fingers, as if it would have achieved a goal. This is useful for goal orientated learning algorithms such as those that use Hindsight Experience Replay (HER). The value is an ndarray with shape (15,). The elements of the array are the following:

Num	Observation	Min	Max	Site Name (in corresponding XML file)	Unit
0	Current x coordinate of the tip of the forefinger	-Inf	Inf	robot0:S_fftip	position (m)
1	Current y coordinate of the tip of the forefinger	-Inf	Inf	robot0:S_fftip	position (m)
2	Current z coordinate of the tip of the forefinger	-Inf	Inf	robot0:S_fftip	position (m)
3	Current x coordinate of the tip of the middle finger	-Inf	Inf	robot0:S_mftip	position (m)
4	Current y coordinate of the tip of the middle finger	-Inf	Inf	robot0:S_mftip	position (m)
5	Current z coordinate of the tip of the middle finger	-Inf	Inf	robot0:S_mftip	position (m)
6	Current x coordinate of the tip of the ring finger	-Inf	Inf	robot0:S_rftip	position (m)
7	Current y coordinate of the tip of the ring finger	-Inf	Inf	robot0:S_rftip	position (m)
8	Current z coordinate of the tip of the ring finger	-Inf	Inf	robot0:S_rftip	position (m)
9	Current x coordinate of the tip of the little finger	-Inf	Inf	robot0:S_lftip	position (m)
10	Current y coordinate of the tip of the little finger	-Inf	Inf	robot0:S_lftip	position (m)
11	Current z coordinate of the tip of the little finger	-Inf	Inf	robot0:S_lftip	position (m)
12	Current x coordinate of the tip of the thumb finger	-Inf	Inf	robot0:S_thtip	position (m)
13	Current y coordinate of the tip of the thumb finger	-Inf	Inf	robot0:S_thtip	position (m)
14	Current z coordinate of the tip of the thumb finger	-Inf	Inf	robot0:S_thtip	position (m)

Rewards#

The reward can be initialized as sparse or dense:

sparse: the returned reward can have two values: -1 if the fingers haven’t reached their final target position, and 0 if the fingers are in their final target position (the fingers are considered to have reached their goal if the 2-nom between the achieved goal vector and the desired goal vector is lower than 0.01).
dense: the returned reward is the negative 2-norm distance between the achieved goal vector and the desired goal vector.

To initialize this environment with one of the mentioned reward functions the type of reward must be specified in the id string when the environment is initialized. For sparse reward the id is the default of the environment, HandReach-v1. However, for dense reward the id must be modified to HandReachDense-v1 and initialized as follows:

import gymnasium as gym

env = gym.make('HandReachDense-v1')

Starting State#

When the environment is reset the joints of the hand are initialized with the following angles (rad):

Joint Name (in corresponding XML file)	Angle (rad)
robot0:WRJ1	-0.16514339750464327
robot0:WRJ0	-0.31973286565062153
robot0:FFJ3	0.14340512546557435
robot0:FFJ2	0.32028208333591573
robot0:FFJ1	0.7126053607727917
robot0:FFJ0	0.6705281001412586
robot0:MFJ3	0.000246444303701037
robot0:MFJ2	0.3152655251085491
robot0:MFJ1	0.7659800313729842
robot0:MFJ0	0.7323156897425923
robot0:RFJ3	0.00038520700007378114
robot0:RFJ2	0.36743546201985233
robot0:RFJ1	0.7119514095008576
robot0:RFJ0	0.6699446327514138
robot0:LFJ4	0.0525442258033891
robot0:LFJ3	-0.13615534724474673
robot0:LFJ2	0.39872030433433003
robot0:LFJ1	0.7415570009679252
robot0:LFJ0	0.704096378652974
robot0:THJ4	0.003673823825070126
robot0:THJ3	0.5506291436028695
robot0:THJ2	-0.014515151997119306
robot0:THJ1	-0.0015229223564485414
robot0:THJ0	-0.7894883021600622

For the target cartersian position of the fingers there are two possible initializations chosen randomly. With a probability of 10 % the episodes goal will be to keep the initial position of the finger tips for an indefinete perido of time. The initial position of the finger tips will then be:

Finger Tip	Coordinate	Position (m)
Forefinger	x	0.99
Forefinger	y	0.8
Forefinger	z	0.15
Middle	x	1.02
Middle	y	0.8
Middle	z	0.15
Ring	x	1.04
Ring	y	0.81
Ring	z	0.155
Little	x	1.07
Little	y	0.82
Little	z	0.16
Thumb	x	0.95
Thumb	y	0.84
Thumb	z	0.16

In the other possible episode intializaitons one of the fingers is randomly selected to meet the tip of the thumb over the palm of the hand. The rest of the finger tips must maintain the initial positions mentioned before.

Episode End#

The episode will be truncated when the duration reaches a total of max_episode_steps which by default is set to 50 timesteps. The episode is never terminated since the task is continuing with infinite horizon.

Arguments#

To increase/decrease the maximum number of timesteps before the episode is truncated the max_episode_steps argument can be set at initialization. The default value is 50. For example, to increase the total number of timesteps to 100 make the environment as follows:

import gymnasium as gym

env = gym.make('HandReach-v1', max_episode_steps=100)

Version History#

v1: the environment depends on the newest mujoco python bindings maintained by the MuJoCo team in Deepmind.
v0: the environment depends on mujoco_py which is no longer maintained.