Dataset of article: Synthetic Datasets Generator for Testing Information Visualization and Machine Learning Techniques and Tools

Citation Author(s):
Sandro
Mendonça
Universidade Federal do Pará
Yvan
Brito
Universidade Federal do Pará
Carlos Gustavo
Resque dos Santos
Universidade Federal do Pará
Bianchi
Serique Meiguins
Universidade Federal do Pará
Submitted by:
Carlos Santos
Last updated:
Fri, 03/13/2020 - 17:19
DOI:
10.21227/5aeq-rr34
Data Format:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

Dataset used in the article entitled 'Synthetic Datasets Generator for Testing Information Visualization and Machine Learning Techniques and Tools'. These datasets can be used to test several characteristics in machine learning and data processing algorithms.

Instructions: 
The dataset has basically 2 dimensions, one for class and one for the features. The variations are specified on top of a default dataset, which has the following characteristics:

  • 1.000 entries
  • No outliers
  • No missing values
  • Two dimensions (one relevant feature and one class, no bad features)
  • 80\% Class separation
  • Two Classes
  • No Class Imbalance

 

Thus, six types of datasets were generated, one for each of the six characteristics in the default dataset. In each type of dataset, the system generated four datasets with slight differences in the associated characteristic. For instance, to vary the effect of the number of outliers, the system created datasets with 10\%, 20\%, 30\%, and 40\% of outliers, without changing the other characteristics. The variations of the characteristics are the following:

 

  • Amount of outliers: [10\%, 20\%, 30\%, 40\%, 50\%]
  • Class separation: [100\%, 90\%, 80\%, 70\%, 60\%]
  • Amount of missing values: [10\%, 20\%, 30\%, 40\%, 50\%]
  • Class imbalance: [50\%-50\%, 40\%-60\%, 30\%-70\%, 20\%-80\%, 10\%-90\%]
  • Bad features: [1-1, 1-3, 1-5, 1-7, 1-9]
  • Amount of classes: [2, 12, 22, 32, 42]

Dataset Files

LOGIN TO ACCESS DATASET FILES
Open Access dataset files are accessible to all logged in  users. Don't have a login?  Create a free IEEE account.  IEEE Membership is not required.