HANDS: a multimodal dataset for modeling toward human grasp intent inference in prosthetic hands

0
0 ratings - Please login to submit your rating.

Abstract 

Please cite:
Han, M., Günay, S.Y., Schirner, G. et al. HANDS: a multimodal dataset for modeling toward human grasp intent inference in prosthetic hands. Intel Serv Robotics 13, 179–185 (2020). https://doi.org/10.1007/s11370-019-00293-8

Upper limb and hand functionality is critical to many activities of daily living, and the amputation of one can lead to significant functionality loss for individuals. From this perspective, advanced prosthetic hands of the future are anticipated to benefit from improved shared control between a robotic hand and its human user, but more importantly from the improved capability to infer human intent from multimodal sensor data to provide the robotic hand perception abilities regarding the operational context. Such multimodal sensor data may include various environment sensors including vision, as well as human physiology and behavior sensors including electromyography and inertial measurement units. A fusion methodology for environmental state and human intent estimation can combine these sources of evidence in order to help prosthetic hand motion planning and control. In this paper, we present a dataset of this type that was gathered with the anticipation of cameras being built into prosthetic hands, and computer vision methods will need to assess this hand-view visual evidence in order to estimate human intent. Specifically, paired images from human eye-view and hand-view of various objects placed at different orientations have been captured at the initial state of grasping trials, followed by paired video, EMG and IMU from the arm of the human during a grasp, lift, put-down, and retract style trial structure. For each trial, based on eye-view images of the scene showing the hand and object on a table, multiple humans were asked to sort in decreasing order of preference, five grasp types appropriate for the object in its given configuration relative to the hand. The potential utility of paired eye-view and hand-view images was illustrated by training a convolutional neural network to process hand-view images in order to predict eye-view labels assigned by humans.

Instructions: 

The dataset file contains 5 folders:

1. Eye-View_ImagesForLabeling
Named imxxx.jpg
where xxx ranges from 1-413
These are all the eye-view images, which are only for the label collection of labellers INSTEAD OF training CNN
Taken by webcam (Logitech Webcam C600, 1600*1200 resolution)

2. Hand-View_RawImagesBeforePreprocessing
Named imxxx_hand.jpg
where xxx ranges from 1-413
These are all the raw hand-view images, which were NOT segmented and pre-processed, NEITHER for labelling NOR trainning
Taken by GoPro Camera (GoPro Hero Session, resolution of 3648*2736 pixels)

3. Labels
Named _LabellingRules.txt for the rule of label indexing to the 5 grasps
Named Label_Complete.csv for the complete label information of all training and testing images
Named Test_Label.csv for the complete label information of all training images
Named Train_Label.csv for the complete label information of all testing images
Labels were collected from 11 labellers, who gave the labels according to the eye-view images

4. RawData
Named abcd_nn (folders)
where abcd is the name of object, and nn is the orientation number of the same object
These folders are the raw data collected for each project and each orientation, and they include images, videos and EMG files

5. TraningImages
Named imxxx_yy.jpg
where xxx ranges from 1-413, yy ranges from 1-11

Each xxx and yy are corresponding to different raw hand-view images and labellers respectively

These are all of the segmented and pre-processed hand-view images for training directly

====================================================

What's more inside the folders:

3. Labels
Label_Complete.csv:

image_name: the name of training images, corresponding to the images in folder 'TraningImages'

xmin,xmax,ymin,ymax: the bounding box coordinate of the object location inside the training image

label: the label index corresponding to '_LabellingRules.txt'
Test_Label.csv: 20% images randomly selected from 'Label_Complete.csv'

Train_Label.csv: 80% images randomly selected from 'Label_Complete.csv'

4. RawData
inside each abcd_nn folder:
abcd_nn.JPG: the raw hand-view image of object abcd with orientation nn
, taken by GoPro
abcd_nn_eye.jpg: the raw eye-view image of object abcd with orientation nn
, taken by webcam
abcd_nn.mp4: the raw hand-view grasp video of object abcd with orientation nn
, taken by GoPro
abcd_nn_eye.wmv: the raw eye-view grasp video of object abcd with orientation nn, taken by webcam

acceleration, duration, emg, gyroscope, orientation: the EMG data and other activity information of the grasp collected from MYO armband

====================================================

To train a CNN using hand-view images and their corresponding labels:

Load images from folder 'TraningImages' directly, and load training and testing labels from 'Train_Label.csv' and 'Test_Label.csv' respectively

Dataset Files

LOGIN TO ACCESS DATASET FILES
Open Access dataset files are accessible to all logged in  users. Don't have a login?  Create a free IEEE account.  IEEE Membership is not required.

Documentation

AttachmentSize
File README.txt2.91 KB