ReMASC: Realistic Replay Attack Corpus for Voice Controlled Systems

Citation Author(s):
Yuan
Gong
University of Notre Dame
Jian
Yang
University of Notre Dame
Jacob
Huber
University of Notre Dame
Mitchell
MacKnight
University of Notre Dame
Christian
Poellabauer
University of Notre Dame
Submitted by:
Yuan Gong
Last updated:
Tue, 06/23/2020 - 23:25
DOI:
10.21227/1mhq-c052
Data Format:
Links:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

We introduce a new database of voice recordings with the goal of supporting research on vulnerabilities and protection of voice-controlled systems (VCSs). In contrast to prior efforts, the proposed database contains both genuine voice commands and replayed recordings of such commands, collected in realistic VCSs usage scenarios and using modern voice assistant development kits. Specifically, the database contains recordings from four systems (each with a different microphone array) in a variety of environmental conditions with different forms of background noise and relative positions between speaker and device. To the best of our knowledge, this is the first publicly available database1 that has been specifically designed for the protection of state-of-the-art voice-controlled systems against various replay attacks in various conditions and environments.

Instructions: 

The corpus consists of three sets: the core, evaluation, and complete set. The complete set contains all the data (i.e., complete set = core set + evaluation set) and allows the user to freely split the training/test set. Core/evaluation sets suggest a default training/test split. For each set, all *.wav files are in the /data directory and the meta information is in meta.csv file. The protocol is described in the readme.txt. A PyTorch data loader script is provided as an example of how to use the data. A python resample script is provided for resampling the dataset into the desired sample rate.

Comments

Thanks for your great work, I'd like to work on this research topic further, which may depend on your dataset!

Submitted by Xinfeng Li on Sun, 04/03/2022 - 00:43

Thanks for your great work, I'd like to work on this research topic further, which may depend on your dataset!

Submitted by Yijie Lou on Thu, 07/07/2022 - 03:44

Dataset Files

LOGIN TO ACCESS DATASET FILES
Open Access dataset files are accessible to all logged in  users. Don't have a login?  Create a free IEEE account.  IEEE Membership is not required.

Documentation

AttachmentSize
File protocol1.35 KB
File paper2.62 MB