clustering data

PROTEIN STRUCTURE AND SYNTHETIC MULTI-VIEW CLUSTERING DATASETS

Multi-View Clustering (MVC) datasets used in the following paper:

Evolutionary Multi-objective Clustering Over Multiple Conflicting Data Views. Authors: Mario Garza-Fabre, Julia Handl, and Adán José-García. IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION. Accepted for publication, November 2022.

This entry contains all 420 datasets used in the paper, including:

Categories:
192 Views

Synthetic data 1: A network contains nine communities. The nodes insides the community are closely connected, and the nodes between communities are sparsely connected.

Synthetic data 2:  A network contains nine communities. There are some nodes that connect the nodes in different communities at the same time. 

Categories:
330 Views

The dataset has Gaussian Blobs of varying samples, centers and features.  The number of samples ranges from 500 to 50,000. Similarly, the number of centers varies from 2 to 100, while the number of features varies from 2 to 2048. These different sets of Gaussian blobs can be used for testing clustering algorithms for their scalability and effectiveness. There are two kinds of files inside the compressed sets. Files ending with "_X.csv" consist of datapoints, while the files ending with "_y.csv" represent respective class data.

Categories:
2834 Views