Datasets

What is a dataset

A dataset, at least in TTS (text-to-speech) is a combination from:

  • WAVE-recorded files (one per sentence)
  • A CSV file for transcribing texts and mapping the recorded audio file.

The best-known format is the LJSpeech format and serves as the de-facto standard in the TTS area. All “Thorsten” datasets are freely available in this format.

Why do i need a dataset?

It depends. Do you just want to text with the available TTS model? Yes? Then the simple answer is “not at all”.

However, if you would like to train your own TTS model based on my recordings and would like to experiment with (felt) 1,000 parameters? Then one or both of my datasets is a good basis for this.

Please keep in mind I’m not a professional speaker, just a guy who donates his voice.

So please don’t have exaggerated expectations 😉

Audio samples

These are samples from all available voice datasets.

Thorsten-Voice Dataset 2021.02 (Neutral)
Thorsten-Voice Dataset 2021.06 (Emotional)
Order: Angry, Disgusted, Amused, Drunk, Surprised, Sleepy, Whisper
Thorsten-Voice Dataset 2022.10 (Neutral)
Thorsten-Voice Dataset 2023.09 (Hessisch)
(German dialect from the southern state of Hessen)

Thorsten-Voice Dataset 2021.02 (Neutral)

Anzahl Aufnahmen22.668
Audiodauer23+ Stunden
Samplerate22.050Hz
KanäleMono
Normalisierung-24dB
Satzlänge (min/avg/max)2 / 52 / 180 Zeichen
Sprechgeschwindigkeit
(Durchschnitt)
14 Zeichen / Sekunde
Fragesätze2.780
Ausrufesätze1.840
@dataset{muller_thorsten_2021_5525342,
  author       = {Müller, Thorsten and
                  Kreutz, Dominik},
  title        = {Thorsten - Open German Voice (Neutral) Dataset},
  month        = feb,
  year         = 2021,
  note         = {{Please use it to make the world a better place for 
                   whole humankind.}},
  publisher    = {Zenodo},
  version      = {3.0},
  doi          = {10.5281/zenodo.5525342},
  url          = {https://doi.org/10.5281/zenodo.5525342}
}

Download: https://zenodo.org/record/5525342

Thorsten-Voice Dataset 2021.06 (Emotional)

The emotional dataset consists of 300 distinct sentences. Each of them is spoken by me in the following eight emotions.

  • Neutral
  • Disgusted
  • Furious
  • amused
  • Surprised
  • Sleepy
  • whispering
  • Drunk (i was sober during the recording)
Anzahl Aufnahmen2.400
Samplerate
KanäleMono
Normalisierung-24dB
Satzlänge (min/max)59 / 148 Zeichen
@dataset{muller_thorsten_2021_5525023,
  author       = {Müller, Thorsten and
                  Kreutz, Dominik},
  title        = {Thorsten - Open German Voice (Emotional) Dataset},
  month        = jun,
  year         = 2021,
  note         = {{Please use it to make the world a better place for 
                   whole humankind.}},
  publisher    = {Zenodo},
  version      = {2.0},
  doi          = {10.5281/zenodo.5525023},
  url          = {https://doi.org/10.5281/zenodo.5525023}
}

Download: https://zenodo.org/record/5525023

Thorsten-Voice Dataset 2022.10 (Neutral)

Number of recordings12.432
Audio duration11+ hours
Samplerate22.050Hz
ChannelsMono
Normalization-24dB
Speed
(Average)
17.5 Chars / Second
@dataset{muller_thorsten_2022_7265581,
  author       = {Müller, Thorsten and
                  Kreutz, Dominik},
  title        = {ThorstenVoice Dataset 2022.10},
  month        = oct,
  year         = 2022,
  publisher    = {Zenodo},
  version      = {1.0},
  doi          = {10.5281/zenodo.7265581},
  url          = {https://doi.org/10.5281/zenodo.7265581}
}

Check my release video about this dataset.

More information and download: https://zenodo.org/record/7265581

Thorsten-Voice Dataset 2023.09 (Hessisch)

Number of recordings2.108
Audio durationca. 2 Stunden
Samplerate22.050Hz
ChannelsMono
Normalization-24dB

Download: https://zenodo.org/record/5525342

@dataset{muller_2024_10511260,
  author       = {Müller, Thorsten and
                  Kreutz, Dominik},
  title        = {Thorsten-Voice Dataset 2023.09 Hessisch},
  month        = jan,
  year         = 2024,
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.10511260},
  url          = {https://doi.org/10.5281/zenodo.10511260}
}
Love open voicetech?
This is default text for notification bar