UNS Hoverflies Classification Dataset

Citation Author(s):
Zorica
Nedeljković
University of Novi Sad
Jelena
Ačanski
University of Novi Sad
Marko
Panić
University of Novi Sad
Ante
Vujić
University of Novi Sad
Branko
Brkljač
University of Novi Sad
Submitted by:
Branko Brkljac
Last updated:
Thu, 12/12/2019 - 13:38
DOI:
10.21227/0cx1-6g80
Data Format:
Links:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

 

Dataset was created as part of joint efforts of two research groups from the University of Novi Sad, which were aimed towards development of vision based systems for automatic identification of insect species (in particular hoverflies) based on characteristic venation patterns in the images of the insects' wings.The set of wing images consists of high-resolution microscopic wing images of several hoverfly species. There is a total of 868 wing images of eleven selected hoverfly species from two different genera, Chrysotoxum and Melanostoma.

 Besides images of the whole wings "UNS_Hoverflies" dataset also consists of small image patches (64x64 pixels) corresponding to 18 predetermined landmark points in each wing, which were systematically collected from the preprocessed wing images and organized inside the second root folder named "Training - test set".

Each wing specimen was uniquely numbered and associated with the corresponding taxonomy group.

 

Instructions: 

 

## University of Novi Sad (UNS), Hoverflies classification dataset - ReadMe file

__________________________________________________________

Version 1.0

Published: December, 2014

by:

## Dataset authors:

* Zorica Nedeljković    (zoricaned14 a_t gmail.com), A1

* Jelena Ačanski    (jelena.acanski a_t dbe.uns.ac.rs), A1

* Marko Panić    (mpanic a_t uns.ac.rs), A2

* Ante Vujić    (ante.vujic a_t dbe.uns.ac.rs), A1

* Branko Brkljač    (brkljacb a_t uns.ac.rs), A2, *corr. auth.

 

Dataset was created as part of joint efforts of two research groups from the University of Novi Sad, which were aimed towards development of vision based systems for automatic identification of insect species (in particular hoverflies) based on characteristic venation patterns in the images of the insects' wings. At the time of dataset's development, authors affiliations were:

 * A1: Department of Biology and Ecology, Faculty of Sciences, University of Novi Sad, Trg Dositeja Obradovića 2, 21000 Novi Sad, Republic of Serbia

and

* A2: Department of Power, Electronic and Telecommunication Engineering, Faculty of Technical Sciences, University of Novi Sad, Trg Dositeja Obradovića 6, 21000 Novi Sad, Republic of Serbia

University of Novi Sad:   http://www.uns.ac.rs/index.php/en/

 

# Dataset description:

The set of wing images consists of high-resolution microscopic wing images of several hoverfly species. There is a total of 868 wing images of eleven selected hoverfly species from two different genera, Chrysotoxum and Melanostoma. 

The wings have been collected from many different geographic locations in the Republic of Serbia during a relatively long period of time of more than two decades. Wing images were obtained from the wing specimens mounted in the glass microscopic slides by a microscopic device equipped with a digital camera with image resolution of 2880 × 1550 pixels and were originally stored in the TIFF image format.

Each wing specimen was uniquely numbered and associated with the taxonomy group it belongs to. Association of each
wing with a particular species was based on the classification of the insect at the time when it was collected and before
the wings were detached. This classification was done after examination by a skilled expert.  

In the next step, digital images were acquired by biologists, under a relatively uncontrolled conditions of nonuniform background illumination and variable scene configuration, and without camera calibration. In that sense, originally obtained digital images were not particularly suitable for exact measurements. Other shortcomings of the samples in the initial image dataset were result of variable wing specimens' quality, damaged or badly mounted wings, existence of artifacts, variable wing positions during image acquisitions, and dust.

In order to overcome these limitations and make images amenable to automatic discrimination of hoverfly
species, they were first preprocessed. The preprocessing of each image consisted of image rotation to a unified horizontal
position, wing cropping, and subsequent scaling of the cropped wing image. Cropping eliminated unnecessary background containing artifacts, while the aspect ratio-preserving image scaling enabled overcoming of the problem of variable size among the wings of the same species. Described scaling was performed after computing average width and average height of all cropped images, which were then interpolated to the same width of 1680 pixels using bicubic interpolation. Given width value was selected based on the prevailing image width among the wing images of different species.

Wing images obtained in this way formed the final wing images dataset used for the sliding-window detector training, its performance evaluation, and subsequent hoverfly species discrimination using the trained landmark points detector, described in [1, 2].

* Besides images of the whole wings (in the folder "Wing images"), provided "UNS_Hoverflies" dataset also consists of the small image patches (64x64 pixels) corresponding to 18 predetermined landmark points in each wing, which were systematically collected and organized inside the second root folder named "Training - test set". Each patch among the "Patch_positives" was manually cropped from the preprocessed wing image (i.e. rotated, cropped and scaled to the same predefined image width). However, images of the whole wings that were stored in the folder "Wing images", are provided without additional scaling step in the preprocessing procedure, and correspond to wing images that were only rotated and cropped.

"Wing images" are organized in two subfolders named "disk_1" and "disk_2", which correspond to two DVD drives where they were initially stored. Each folder also comes with additional .xml file containing some metadata. In "Wing images", .xml files contain average spatial size of the images in the given folder, while in the "Training - test set", individual .xml files contain additional data about created image patches (in case of patches corresponding to landmark points, "Patch_positives", each .xml contains image intrinsic spatial coordinates of each landmark point, as well as additional data about the corresponding specimen - who created it, when and where it was gathered, taxonomy, etc. Landmark points have unique numeration from 1 to 18, also provided by figures in [1,2]. In case of "Patch_negatives", each subfolder named after wing identifier, e.g. "W0034_neg", contains 40 randomly selected image patches that correspond to any part of the preprocessed image excluding one of the 18 landmark points and their closest surrounding. Although image patches were generated for all species, only a subset of images corresponding to the species with the highest number of specimens was used in the original classification studies described in [1, 2]. However, in the present form "UNS_Hoverflies" dataset contains all initially processed wing images and image patches.

Besides previously described data, which are the main part of the dataset, repository also contains the original microscopic images of insects' wings, stored without any additional processing after acquisition. These files are available in the second .zip archive denoted by the suffix "unprocessed".

 

Directory structure:

UNS_Hoverflies_Dataset

├── Training - test set
│   ├── Patch_negatives
│   ├── Patch_positives

└── Wing images
    ├── disk_1
    └── disk_2

 

UNS_Hoverflies_Dataset_unprocessed

└── Unprocessed wing images
    ├── disk_1
    └── disk_2

 

# How to cite:

We would be glad if you intend to use this dataset. In such case, please consider to cite our work as:

BibTex:

@article{UNShoverfliesDataset2019,
author = {Zorica Nedeljković and Jelena Ačanski and Marko Panić and Ante Vujić and Branko Brkljač},
title = {University of Novi Sad (UNS), Hoverflies classification dataset},
journal = {{IEEE} DataPort},
year = {2019}
}
and/or any of the corresponding original publications:

## References:

[1] Branko Brkljač, Marko Panić, Dubravko Ćulibrk, Vladimir Crnojević, Jelena Ačanski, and Ante Vujić, “Automatic hoverfly species discrimination,” in Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods, vol. 2, pp. 108–115, SciTePress, Vilamoura, 2012. https://dblp.org/db/conf/icpram/icpram2012-2.

[2] Vladimir Crnojević, Marko Panić, Branko Brkljač, Dubravko Ćulibrk, Jelena Ačanski, and Ante Vujić, “Image Processing Method for Automatic Discrimination of Hoverfly Species,” Mathematical Problems in Engineering, vol. 2014, Article ID 986271, 12 pages, 2014. https://doi.org/10.1155/2014/986271.

 

** This dataset is published on IEEE DataPort repository under CC BY-NC-SA 4.0 license by the authors (for more information please visit: https://creativecommons.org/licenses/by-nc-sa/4.0/).