Dataset for Characterizing the Occurrence of Dockerfile Smells in Open-Source Software

Citation Author(s):
Yiwen
Wu
Yang
Zhang
Tao
Wang
Huaimin
Wang
Submitted by:
Yang Zhang
Last updated:
Tue, 05/17/2022 - 22:17
DOI:
10.21227/r9v8-4f07
Data Format:
Research Article Link:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

Dockerfile plays an important role in the Docker-based containerization process, but many Dockerfile codes are infected with smells in practice. This dataset contains a collection of 6,334 projects to help developers gain some insights into the occurrence of Dockerfile smells. Those projects belong to 10 popular programming languages, i.e., Shell, Makefile, Ruby, PHP, Python, Java, HTML, CSS, JavaScript, and Go. 

Instructions: 

This dataset contains 6,334 projects, including their metadata (i.e., names, owner type, creation times, programming languages, number of stars, and number of contributors), and details of Dockerfile smells (i.e., number of instructions, number of overall smells, number of DL-smells, and number of SC-smells). 

Specifically, the metrics in the CSV dataset are:

  • project: the project name;

  • p_language: project’s programming language;

  • p_contributors_team: number of project contributors (submitted at least one commit);

  • p_created_at: project's creation date;

  • p_owner_type: type of the project owner, i.e., “Organization” or “User”;

  • p_stars: number of project stars;

  • p_github_age: number of days that have passed since a project has been hosted on GitHub until April 2018; 

  • d_instructions: number of instructions in a Dockerfile; 

  • d_smells: the volume number of all smells in a Dockerfile;

  • d_smells_dl: the volume number of DL-smells in a Dockerfile;

  • d_smells_sc: the volume number of SC-smells in a Dockerfile.