Spam SMS in Dravidian Languages

Citation Author(s):
Ramanujam
Elangovan
National Institute of Technology Silchar, Assam, India
Abirami
A M
Thiagarajar College of Engineering, Madurai, Tamil Nadu, India
Submitted by:
Ramanujam Elangovan
Last updated:
Fri, 06/02/2023 - 01:11
DOI:
10.21227/dcym-pd69
Data Format:
Research Article Link:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

The Dravidian Spam SMS dataset has Spam and Ham messages in English, Tamil, Telugu, Kannada, and Malayalam languages. Nearly 7700 messages were collected by sending friends and other contacts a Google form. Language experts (reading and writing skills) were used to label the messages of corresponding languages carefully. The dataset also includes the Tamil verbatim messages written in English. For example, “Nee Nalama”. The Ham messages are mostly normal. Spam messages include business, annoying, and unnecessary messages an anonymous user sends. Detailed information on the dataset is given in the image. The dataset does not have the user's personal or banking information like the other datasets. 

Instructions: 

The dataset is in excel format and it has two columns the message and its type. 

Comments

I want to use this dataset for learning.

Submitted by Biplab Gorain on Tue, 07/11/2023 - 02:54

Please let me access this dataset, can you mail it on rushil.anil.nair@gmail.com

Submitted by Rushil Nair on Sat, 11/11/2023 - 01:10