Abstract

The Narrative question answering (QA) problem involves generating accurate, relevant, and human-like answers to questions based on the comprehension of a story consisting of logically connected paragraphs. However, this problem remains unexplored for the Arabic language because of the lack of Arabic narrative datasets. To address this gap, we present the Arabic-NarrativeQA dataset, which is the first dataset specifically designed for machine-reading comprehension of Arabic stories. This dataset consists of two parts: translation of an English NarrativeQA dataset and a collection of new question-answer pairs based on Arabic stories. Furthermore, we implement the Arabic-NarrativeQA system using the Ranker-Reader pipeline, exploring and evaluating various approaches at each stage to identify the most effective ones. Finally, we utilize cross-lingual transfer learning techniques to leverage knowledge transfer from the English Narrative QA dataset to the Arabic-NarrativeQA system. Experiments show that incorporating cross-lingual transfer learning significantly improved the performance of the reader models. Furthermore, the question’s evidence information provided in the Arabic-NarrativeQA dataset enables the learnable rankers to effectively identify and select the pertinent paragraphs. To promote further research on this task, we make both the Arabic-NarrativeQA dataset and the pre-trained models publicly available.

Instructions:

This dataset consists of two parts:
Arabic-NarrativeQA-T folder: translation of an English NarrativeQA dataset
Arabic-NarrativeQA-C folder: a collection of new question-answer pairs based on Arabic stories

Dataset Files

ArabicNarrativeQA.zip (4.11 MB)

Datasets

Standard Dataset

Arabic Narrative Question Answering

Abstract

Dataset Files

QUESTIONS?