Abstract

The dataset consists all the Telugu characters that contains Vowels, Consonants and combine characters such as Othulu (Consonant-Consonant) and Guninthamulu (Consonant-Volwels). The main objective of this dataset to recognize handwritten Telugu characters, from that convert handwritten document into editable electronic copy. There is a significant difference between Indian literature and English literature, i.e., if we see English literature only 26 Characters, but where in Telugu total number of characters are 1,924 (Achulu (Vowels)- 16, Hallulu (consonants)- 36, Othulu – 36 and Guninthamulu – 34*16=544). Hence, problem of recognition of Telugu characters are complex in compare to English. Furthermore, no dataset of Telugu characters that covers all characters in Telugu literature and even the worldwide encoding standard “Unicode” have not covering all Alphabet in Telugu. The objective of this work is to present a Handwritten Telugu character dataset with all Telugu Alphabets, assigning unique label to each character from there assign ‘Unicode’ to each label. If we could bring all Telugu and other Indian language characters into ‘Unicode’, it will resolve compatibility issues of all major Operating Systems and Word Processors. The dataset is designed to recognize all short of handwriting styles. Hence, we create dataset from different distinct writers in that some are from schools (under 15 years), some are from above 45 years and some are engineering graduates at the age between 18-24. Moreover, we gave unique labels for each character in Telugu literature. Hence, these labels are also support other Indian languages because of similarity in Phonics.

Instructions:

All the images are in the same size and all images are scanned by scanner and segmented manually and all images are jpeg images.

Acknowledgement:

The work is carried out under Collaborative Research Project Sponsored by JNTU Hyderabad, India. The project file no. JNTUH/TEQIP-III/CRS/2019/CSE/12 and Titled as "Deep Learning Aided-OCR for Handwritten Telugu Character".

Comments

Submitted by Naresh Muppalaneni on Sat, 02/27/2021 - 01:47

Dataset Files

Telugu Character Dataset Final Dataset of Telugu Handwritten Chararcters.zip (335.47 MB)

LOGIN TO ACCESS DATASET FILES
Open Access dataset files are accessible to all logged in users. Don't have a login? Create a free IEEE account. IEEE Membership is not required.

QUESTIONS?

Report a problem with this Dataset

Datasets

Open Access

Telugu Handwritten Character Dataset

Abstract

Comments

Dataset Files

QUESTIONS?