Speech Dataset

      The following dataset consists of utterances, recorded using 24 volunteers raised in the Province of Manitoba, Canada. To provide a repeatable set of test words that would cover all of the phonemes, the Edinburg Machine Readable Phonetic Alphabet (MRPA) [KiGr08], consisting of 44 words is used. Each recording consists of one word uttered by the volunteer and recorded in one continuous session.

Categories:
2530 Views

The following dataset consists of samples of acoustic characteristics of 356 Russian-speaking subjects and measured psychological traits. All the recordings (5701 samples) were processed and acoustic characteristics were calculated. 

Categories:
239 Views

100 Speakers each consisting of 5 voice samples for training data and 1 voice sample for testing data. Total of 600 voice samples collected in different audio formats like mpeg, mp4, mp3, ogg etc. These samples were than preprocessed and converted into .wav format. Each voice sample has a time duration of 5-10 seconds due to different lengths tuning of parameters should be done before usage. Whole Dataset size is 600mb and duration is 1 hour 40 minutes. This dataset can be used for speech synthesis, speaker identification. speaker recognition, speech recogniton etc.

Categories:
5123 Views

100 Speakers each consisting of 5 voice samples for training data and 1 voice sample for testing data. Total of 600 voice samples collected in different audio formats like mpeg, mp4, mp3, ogg etc. These samples were than preprocessed and converted into .wav format. Each voice sample has a time duration of 5-10 seconds due to different lengths tuning of parameters should be done before usage. Whole Dataset size is 600mb and duration is 1 hour 40 minutes. This dataset can be used for speech synthesis, speaker identification. speaker recognition, speech recogniton etc.

Categories:
1982 Views

Speech Processing in noisy condition allows researcher to build solutions that work in real world conditions. Environmental noise in Indian conditions are very different from typical noise seen in most western countries. This dataset is a collection of various noises, both indoor and outdoor ollected over a period of several months. The audio files are of the format RIFF (little-endian) data, WAVE audio, Microsoft PCM, 8 bit, mono 11025 Hz and have been recorded using the Dialogic CTI card.

Categories:
979 Views

This is the noisy-speech test set used in the original Deep Xi paper: https://doi.org/10.1016/j.specom.2019.06.002. The clean speech and noise used to create the noisy-speech set are also available.

Categories:
857 Views

Noisy-speech set used to test Deep Xi (https://github.com/anicolson/DeepXi). The clean speech and noise used to create the noisy-speech set are also available. The clean-speech recordings are from Librispeech test-clean (http://www.openslr.org/12/).

Categories:
615 Views