Abstract

The dataset consists of 60285 character image files which has been randomly divided into 54239 (90%) images as training set 6046 (10%) images as test set. The collection of data samples was carried out in two phases. The first phase consists of distributing a tabular form and asking people to write the characters five times each. Filled-in forms were collected from around 200 different individuals in the age group 12-23 years. The second phase was the collection of handwritten sheets such as answer sheets and classroom notes from students in the same age group. A total of 279 such pages written by 279 different individuals were collected. The reason why a particular age group is considered is because of the fact that individuals older than that do not know how to write the script as Bangla was the script which was used during their times. So in order to capture the natural handwriting and not the drawing of characters, the mentioned age range is considered. The data samples are collected from schools and colleges in different parts of Imphal. The forms and pages collected are scanned at 300 dpi using a canon flatbed scanner in grayscale format and saved in TIF format.

Instructions:

The dataset may be used by citing the paper below:

Hijam D., Saharia S. (2018) Convolutional Neural Network Based Meitei Mayek Handwritten Character Recognition. In: Tiwary U. (eds) Intelligent Human Computer Interaction. IHCI 2018. Lecture Notes in Computer Science, vol 11278. Springer, Cham

Dataset Files

MMHC37Classes(Test and Train).zip (39.12 MB)

Documentation

Attachment	Size
IEEEdataportReadme.pdf	26.87 KB

Datasets

Standard Dataset

Meitei Mayek Handwritten Character Dataset (37 classes)

Abstract

Dataset Files

Documentation

QUESTIONS?