Manipulate Pen#

../../../_images/manipulate_pen.gif

Description#

This environment was introduced in “Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research”.

The environment is based on the same robot hand as in the HandReach environment, the Shadow Dexterous Hand. The task to be solved is very similar to that in the HandManipulateBlock environment, but in this case a pen is placed on the palm of the hand. The task is to then manipulate the pen such that a target pose is achieved. The goal is 7-dimensional and includes the target position (in Cartesian coordinates) and target rotation (in quaternions). In addition, variations of this environment can be used with increasing levels of difficulty:

  • HandManipulatePenRotate-v1: Random target rotation x and y axes of the pen and no target rotation around the z axis. No target position.

  • HandManipulatePenFull-v1: Random target rotation x and y axes of the pen and no target rotation around the z axis. Random target position.

Action Space#

The action space is a Box(-1.0, 1.0, (20,), float32). The control actions are absolute angular positions of the actuated joints (non-coupled). The input of the control actions is set to a range between -1 and 1 by scaling the actual actuator angle ranges. The elements of the action array are the following:

Num

Action

Control Min

Control Max

Angle Min

Angle Max

Name (in corresponding XML file)

Joint

Unit

0

Angular position of the horizontal wrist joint (radial/ulnar deviation)

-1

1

-0.489 (rad)

0.14 (rad)

robot0:A_WRJ1

hinge

angle (rad)

1

Angular position of the horizontal wrist joint (flexion/extension)

-1

1

-0.698 (rad)

0.489 (rad)

robot0:A_WRJ0

hinge

angle (rad)

2

Horizontal angular position of the MCP joint of the forefinger (adduction/abduction)

-1

1

-0.349 (rad)

0.349(rad)

robot0:A_FFJ3

hinge

angle (rad)

3

Vertical angular position of the MCP joint of the forefinger (flexion/extension)

-1

1

0 (rad)

1.571 (rad)

robot0:A_FFJ2

hinge

angle (rad)

4

Angular position of the PIP joint of the forefinger (flexion/extension)

-1

1

0 (rad)

1.571 (rad)

robot0:A_FFJ1

hinge

angle (rad)

5

Horizontal angular position of the MCP joint of the middle finger (adduction/abduction)

-1

1

-0.349 (rad)

0.349(rad)

robot0:A_MFJ3

hinge

angle (rad)

6

Vertical angular position of the MCP joint of the middle finger (flexion/extension)

-1

1

0 (rad)

1.571 (rad)

robot0:A_MFJ2

hinge

angle (rad)

7

Angular position of the PIP joint of the middle finger (flexion/extension)

-1

1

0 (rad)

1.571 (rad)

robot0:A_MFJ1

hinge

angle (rad)

8

Horizontal angular position of the MCP joint of the ring finger (adduction/abduction)

-1

1

-0.349 (rad)

0.349(rad)

robot0:A_RFJ3

hinge

angle (rad)

9

Vertical angular position of the MCP joint of the ring finger (flexion/extension)

-1

1

0 (rad)

1.571 (rad)

robot0:A_RFJ2

hinge

angle (rad)

10

Angular position of the PIP joint of the ring finger

-1

1

0 (rad)

1.571 (rad)

robot0:A_RFJ1

hinge

angle (rad)

11

Angular position of the CMC joint of the little finger

-1

1

0 (rad)

0.785(rad)

robot0:A_LFJ4

hinge

angle (rad)

12

Horizontal angular position of the MCP joint of the little finger (adduction/abduction)

-1

1

-0.349 (rad)

0.349(rad)

robot0:A_LFJ3

hinge

angle (rad)

13

Vertical angular position of the MCP joint of the little finger (flexion/extension)

-1

1

0 (rad)

1.571 (rad)

robot0:A_LFJ2

hinge

angle (rad)

14

Angular position of the PIP joint of the little finger (flexion/extension)

-1

1

0 (rad)

1.571 (rad)

robot0:A_LFJ1

hinge

angle (rad)

15

Horizontal angular position of the CMC joint of the thumb finger

-1

1

-1.047 (rad)

1.047 (rad)

robot0:A_THJ4

hinge

angle (rad)

16

Vertical Angular position of the CMC joint of the thumb finger

-1

1

0 (rad)

1.222 (rad)

robot0:A_THJ3

hinge

angle (rad)

17

Horizontal angular position of the MCP joint of the thumb finger (adduction/abduction)

-1

1

-0.209 (rad)

0.209(rad)

robot0:A_THJ2

hinge

angle (rad)

18

Vertical angular position of the MCP joint of the thumb finger (flexion/extension)

-1

1

-0.524 (rad)

0.524 (rad)

robot0:A_THJ1

hinge

angle (rad)

19

Angular position of the IP joint of the thumb finger (flexion/extension)

-1

1

-1.571 (rad)

0 (rad)

robot0:A_THJ0

hinge

angle (rad)

Observation Space#

The observation is a goal-aware observation space. It consists of a dictionary with information about the robot’s joint and pen states, as well as information about the goal. The dictionary consists of the following 3 keys:

  • observation: its value is an ndarray of shape (61,). It consists of kinematic information of the pen and finger joints. The elements of the array correspond to the following:

Num

Observation

Min

Max

Joint Name (in corresponding XML file)

Joint Type

Unit

0

Angular position of the horizontal wrist joint

-Inf

Inf

robot0:WRJ1

hinge

angle (rad)

1

Angular position of the vertical wrist joint

-Inf

Inf

robot0:WRJ0

hinge

angle (rad)

2

Horizontal angular position of the MCP joint of the forefinger

-Inf

Inf

robot0:FFJ3

hinge

angle (rad)

3

Vertical angular position of the MCP joint of the forefinge

-Inf

Inf

robot0:FFJ2

hinge

angle (rad)

4

Angular position of the PIP joint of the forefinger

-Inf

Inf

robot0:FFJ1

hinge

angle (rad)

5

Angular position of the DIP joint of the forefinger

-Inf

Inf

robot0:FFJ0

hinge

angle (rad)

6

Horizontal angular position of the MCP joint of the middle finger

-Inf

Inf

robot0:MFJ3

hinge

angle (rad)

7

Vertical angular position of the MCP joint of the middle finger

-Inf

Inf

robot0:MFJ2

hinge

angle (rad)

8

Angular position of the PIP joint of the middle finger

-Inf

Inf

robot0:MFJ1

hinge

angle (rad)

9

Angular position of the DIP joint of the middle finger

-Inf

Inf

robot0:MFJ0

hinge

angle (rad)

10

Horizontal angular position of the MCP joint of the ring finger

-Inf

Inf

robot0:RFJ3

hinge

angle (rad)

11

Vertical angular position of the MCP joint of the ring finger

-Inf

Inf

robot0:RFJ2

hinge

angle (rad)

12

Angular position of the PIP joint of the ring finger

-Inf

Inf

robot0:RFJ1

hinge

angle (rad)

13

Angular position of the DIP joint of the ring finger

-Inf

Inf

robot0:RFJ0

hinge

angle (rad)

14

Angular position of the CMC joint of the little finger

-Inf

Inf

robot0:LFJ4

hinge

angle (rad)

15

Horizontal angular position of the MCP joint of the little finger

-Inf

Inf

robot0:LFJ3

hinge

angle (rad)

16

Vertical angular position of the MCP joint of the little finger

-Inf

Inf

robot0:LFJ2

hinge

angle (rad)

17

Angular position of the PIP joint of the little finger

-Inf

Inf

robot0:LFJ1

hinge

angle (rad)

18

Angular position of the DIP joint of the little finger

-Inf

Inf

robot0:LFJ0

hinge

angle (rad)

19

Horizontal angular position of the CMC joint of the thumb finger

-Inf

Inf

robot0:THJ4

hinge

angle (rad)

20

Vertical Angular position of the CMC joint of the thumb finger

-Inf

Inf

robot0:THJ3

hinge

angle (rad)

21

Horizontal angular position of the MCP joint of the thumb finger

-Inf

Inf

robot0:THJ2

hinge

angle (rad)

22

Vertical angular position of the MCP joint of the thumb finger

-Inf

Inf

robot0:THJ1

hinge

angle (rad)

23

Angular position of the IP joint of the thumb finger

-Inf

Inf

robot0:THJ0

hinge

angle (rad)

24

Angular velocity of the horizontal wrist joint

-Inf

Inf

robot0:WRJ1

hinge

angular velocity (rad/s)

25

Angular velocity of the vertical wrist joint

-Inf

Inf

robot0:WRJ0

hinge

angular velocity (rad/s)

26

Horizontal angular velocity of the MCP joint of the forefinger

-Inf

Inf

robot0:FFJ3

hinge

angular velocity (rad/s)

27

Vertical angular velocity of the MCP joint of the forefinge

-Inf

Inf

robot0:FFJ2

hinge

angular velocity (rad/s)

28

Angular velocity of the PIP joint of the forefinger

-Inf

Inf

robot0:FFJ1

hinge

angular velocity (rad/s)

29

Angular velocity of the DIP joint of the forefinger

-Inf

Inf

robot0:FFJ0

hinge

angular velocity (rad/s)

30

Horizontal angular velocity of the MCP joint of the middle finger

-Inf

Inf

robot0:MFJ3

hinge

angular velocity (rad/s)

31

Vertical angular velocity of the MCP joint of the middle finger

-Inf

Inf

robot0:MFJ2

hinge

angular velocity (rad/s)

32

Angular velocity of the PIP joint of the middle finger

-Inf

Inf

robot0:MFJ1

hinge

angular velocity (rad/s)

33

Angular velocity of the DIP joint of the middle finger

-Inf

Inf

robot0:MFJ0

hinge

angular velocity (rad/s)

34

Horizontal angular velocity of the MCP joint of the ring finger

-Inf

Inf

robot0:RFJ3

hinge

angular velocity (rad/s)

35

Vertical angular velocity of the MCP joint of the ring finger

-Inf

Inf

robot0:RFJ2

hinge

angular velocity (rad/s)

36

Angular velocity of the PIP joint of the ring finger

-Inf

Inf

robot0:RFJ1

hinge

angular velocity (rad/s)

37

Angular velocity of the DIP joint of the ring finger

-Inf

Inf

robot0:RFJ0

hinge

angular velocity (rad/s)

38

Angular velocity of the CMC joint of the little finger

-Inf

Inf

robot0:LFJ4

hinge

angular velocity (rad/s)

39

Horizontal angular velocity of the MCP joint of the little finger

-Inf

Inf

robot0:LFJ3

hinge

angular velocity (rad/s)

40

Vertical angular velocity of the MCP joint of the little finger

-Inf

Inf

robot0:LFJ2

hinge

angular velocity (rad/s)

41

Angular velocity of the PIP joint of the little finger

-Inf

Inf

robot0:LFJ1

hinge

angular velocity (rad/s)

42

Angular velocity of the DIP joint of the little finger

-Inf

Inf

robot0:LFJ0

hinge

angular velocity (rad/s)

43

Horizontal angular velocity of the CMC joint of the thumb finger

-Inf

Inf

robot0:THJ4

hinge

angular velocity (rad/s)

44

Vertical Angular velocity of the CMC joint of the thumb finger

-Inf

Inf

robot0:THJ3

hinge

angular velocity (rad/s)

45

Horizontal angular velocity of the MCP joint of the thumb finger

-Inf

Inf

robot0:THJ2

hinge

angular velocity (rad/s)

46

Vertical angular position of the MCP joint of the thumb finger

-Inf

Inf

robot0:THJ1

hinge

angular velocity (rad/s)

47

Angular velocity of the IP joint of the thumb finger

-Inf

Inf

robot0:THJ0

hinge

angular velocity (rad/s)

48

Linear velocity of the pen in x direction

-Inf

Inf

object:joint

free

velocity (m/s)

49

Linear velocity of the pen in y direction

-Inf

Inf

object:joint

free

velocity (m/s)

50

Linear velocity of the pen in z direction

-Inf

Inf

object:joint

free

velocity (m/s)

51

Angular velocity of the pen in x axis

-Inf

Inf

object:joint

free

angular velocity (rad/s)

52

Angular velocity of the pen in y axis

-Inf

Inf

object:joint

free

angular velocity (rad/s)

53

Angular velocity of the pen in z axis

-Inf

Inf

object:joint

free

angular velocity (rad/s)

54

Position of the pen in the x coordinate

-Inf

Inf

object:joint

free

position (m)

55

Position of the pen in the y coordinate

-Inf

Inf

object:joint

free

position (m)

56

Position of the pen in the z coordinate

-Inf

Inf

object:joint

free

position (m)

57

w component of the quaternion orientation of the pen

-Inf

Inf

object:joint

free

-

58

x component of the quaternion orientation of the pen

-Inf

Inf

object:joint

free

-

59

y component of the quaternion orientation of the pen

-Inf

Inf

object:joint

free

-

60

z component of the quaternion orientation of the pen

-Inf

Inf

object:joint

free

-

  • desired_goal: this key represents the final goal to be achieved. In this environment it is a 7-dimensional ndarray, (7,), that consists of the pose information of the pen. The elements of the array are the following:

Num

Observation

Min

Max

Joint Name (in corresponding XML file)

Joint Type

Unit

0

Target x coordinate of the pen

-Inf

Inf

target:joint

free

position (m)

1

Target y coordinate of the pen

-Inf

Inf

target:joint

free

position (m)

2

Target z coordinate of the pen

-Inf

Inf

target:joint

free

position (m)

3

Target w component of the quaternion orientation of the pen

-Inf

Inf

target:joint

free

-

4

Target x component of the quaternion orientation of the pen

-Inf

Inf

target:joint

free

-

5

Target y component of the quaternion orientation of the pen

-Inf

Inf

target:joint

free

-

6

Target z component of the quaternion orientation of the pen

-Inf

Inf

target:joint

free

-

  • achieved_goal: this key represents the current state of the pen, as if it would have achieved a goal. This is useful for goal orientated learning algorithms such as those that use Hindsight Experience Replay (HER). The value is an ndarray with shape (7,). The elements of the array are the following:

Num

Observation

Min

Max

Joint Name (in corresponding XML file)

Joint Type

Unit

0

Current x coordinate of the pen

-Inf

Inf

object:joint

free

position (m)

1

Current y coordinate of the pen

-Inf

Inf

object:joint

free

position (m)

2

Current z coordinate of the pen

-Inf

Inf

object:joint

free

position (m)

3

Current w component of the quaternion orientation of the pen

-Inf

Inf

object:joint

free

-

4

Current x component of the quaternion orientation of the pen

-Inf

Inf

object:joint

free

-

5

Current y component of the quaternion orientation of the pen

-Inf

Inf

object:joint

free

-

6

Current z component of the quaternion orientation of the pen

-Inf

Inf

object:joint

free

-

Rewards#

The reward can be initialized as sparse or dense:

  • sparse: the returned reward can have two values: -1 if the pen hasn’t reached its final target pose, and 0 if the pen is in its final target pose. The pen is considered to have reached its final goal if the theta angle difference (theta angle of the 3D axis angle representation is less than 0.1 and if the Euclidean distance to the target position is also less than 0.01 m.

  • dense: the returned reward is the negative summation of the Euclidean distance to the pen’s target and the theta angle difference to the target orientation. The positional distance is multiplied by a factor of 10 to avoid being dominated by the rotational difference.

To initialize this environment with one of the mentioned reward functions the type of reward must be specified in the id string when the environment is initialized. For sparse reward the id is the default of the environment, HandManipulatePen-v1. However, for dense reward the id must be modified to HandManipulatePenDense-v1 and initialized as follows:

import gymnasium as gym

env = gym.make('HandManipulatePen-v1')

The rest of the id’s of the other environment variations follow the same convention to select between a sparse or dense reward function.

Starting State#

When the environment is reset the joints of the hand are initialized to their resting position with a 0 displacement. The pen’s position and orientation are randomly selected. The initial position is set to (x,y,z)=(1, 0.87, 0.2) and an offset is added to each coordinate sampled from a normal distribution with 0 mean and 0.005 standard deviation. While the initial orientation is set to (w,x,y,z)=(1,0,0,0) and an axis is randomly selected depending on the environment variation to add an angle offset sampled from a uniform distribution with range [-pi, pi].

The target pose of the pen is obtained by adding a random offset to the initial pen pose. For the position the offset is sampled from a uniform distribution with range [(x_min, x_max), (y_min,y_max), (z_min, z_max)] = [(-0.04, 0.04), (-0.06, 0.02), (0.0, 0.06)]. The orientation offset is sampled from a uniform distribution with range [-pi,pi] and added to one of the Euler axis depending on the environment variation.

Episode End#

The episode will be truncated when the duration reaches a total of max_episode_steps which by default is set to 50 timesteps. The episode is never terminated since the task is continuing with infinite horizon.

Arguments#

To increase/decrease the maximum number of timesteps before the episode is truncated the max_episode_steps argument can be set at initialization. The default value is 50. For example, to increase the total number of timesteps to 100 make the environment as follows:

import gymnasium as gym

env = gym.make('HandManipulatePen-v1', max_episode_steps=100)

The same applies for the other environment variations.

Version History#

  • v1: the environment depends on the newest mujoco python bindings maintained by the MuJoCo team in Deepmind.

  • v0: the environment depends on mujoco_py which is no longer maintained.