Sequential Storytelling Image Dataset (SSID)

Citation Author(s):
Zainy M.
Malakan
Department of Computer Science and Software Engineering, The University of Western Australia, 35 Stirling Hwy, Crawley WA 6009, Australia
Saeed
Anwar
Information and Computer Science, King Fahad University of Petroleum and Minerals, Dhahran 31261, Saudi Arabia
Ghulam Mubashar
Hassan
Department of Computer Science and Software Engineering, The University of Western Australia, 35 Stirling Hwy, Crawley WA 6009, Australia
Ajmal
Mian
Department of Computer Science and Software Engineering, The University of Western Australia, 35 Stirling Hwy, Crawley WA 6009, Australia
Submitted by:
Zainy Malakan
Last updated:
Sat, 08/26/2023 - 06:38
DOI:
10.21227/dbr9-dq51
Data Format:
Research Article Link:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

Visual storytelling refers to the manner of describing a set of images rather than a single image, also known as multi-image captioning. Visual Storytelling Task (VST) takes a set of images as input and aims to generate a coherent story relevant to the input images. In this dataset, we bridge the gap and present a new dataset for expressive and coherent story creation. We present the Sequential Storytelling Image Dataset (SSID), consisting of open-source video frames accompanied by story-like annotations. In addition, we provide four annotations (i.e., stories) for each set of five images. The image sets are collected manually from publicly available videos in three domains: documentaries, lifestyle, and movies, and then annotated manually using Amazon Mechanical Turk. In summary, SSID dataset is comprised of 17,365 images, which resulted in a total of 3,473 unique sets of five images. Each set of images is associated with four ground truths, resulting in a total of 13,892 unique ground truths (i.e., written stories). And each ground truth is composed of five connected sentences written in the form of a story.

Instructions: 

The SSID dataset is comprised of 17,365 images, which resulted in a total of 3,473 unique sets of five images. Each set of images is associated with four ground truths, resulting in a total of 13,892 unique ground truths (i.e., written stories). And each ground truth is composed of five connected sentences written in the form of a story. Please go through the attached PDF file for additional Instructions details.

Funding Agency: 
This research was supported by the Australian Research Council.
Grant Number: 
FT210100268

Comments

Hi, I need access to this dataset for research purposes. Chiranjib

Submitted by Chiranjib Bhatt... on Wed, 07/26/2023 - 04:37

I also need this data set for research purpose.

Submitted by Dilip kumar on Thu, 12/07/2023 - 18:51

how to match the set of annotation to corresponding pictures?

Submitted by zhziqian Li on Tue, 12/26/2023 - 05:59

hey there i need this dataset for my final year project so can u please provide this dataset

Submitted by TEJA DALAYAI on Thu, 02/29/2024 - 00:54

Can you please grant access to the dataset for my NLP project. 

Submitted by Shantanu Pathak on Mon, 04/08/2024 - 07:27