We present here Extreme Vocal Dataset, a dataset of extreme vocal techniques for the development and evaluation of machine learning models. It consists of 665 audio excerpts of audio ranging from 1s to 30s, with a total of 1h28min of audio (53min of distorted voices and 35min of clear voice). This dataset contains voices without any processing effects and without instruments in the background. This dataset is aimed for machine learning systems.
Experience code is available in this github repository.
The 3 extreme distorsion techniques below, as long as the clear voice which is a non-distorted voice, were recorded at 3 different ranges (high, mid and low).
| Vocal Technique | High | Mid | Low |
| Clear Voice | |||
| Black Shriek | - | ||
| Death Growl | - | ||
| Hardcore Scream |
The Grind Inhale technique, as it was considered by most singers dangerous for their voice, was recorded only in one range
| Grind Inhale |
| Vocal Technique | |
| Pig Squeal | |
| Deep Gutturals | |
| Tunnel Throat |