Modan Tailleur, Julien Pinquier, Laurent Millot, Corsin Vogel, Mathieu Lagrange
Contact: modan.tailleur@ls2n.fr

Abstract

We present here Extreme Vocal Dataset, a dataset of extreme vocal techniques for the development and evaluation of machine learning models. It consists of 665 audio excerpts of audio ranging from 1s to 30s, with a total of 1h28min of audio (53min of distorted voices and 35min of clear voice). This dataset contains voices without any processing effects and without instruments in the background. This dataset is aimed for machine learning systems.

Experience code is available in this github repository.

Extreme vocal techniques

The 3 extreme distorsion techniques below, as long as the clear voice which is a non-distorted voice, were recorded at 3 different ranges (high, mid and low).

Vocal Technique High Mid Low
Clear Voice
Black Shriek -
Death Growl -
Hardcore Scream

The Grind Inhale technique, as it was considered by most singers dangerous for their voice, was recorded only in one range

Grind Inhale

Extreme vocal effects

Vocal Technique
Pig Squeal
Deep Gutturals
Tunnel Throat