Datasets

What is a dataset

A dataset, at least in TTS (text-to-speech) is a combination from:

  • WAVE-recorded files (one per sentence)
  • A CSV file for transcribing texts and mapping the recorded audio file.

The best-known format is the LJSpeech format and serves as the de-facto standard in the TTS area. All “Thorsten” datasets are freely available in this format.

Why do i need a dataset?

It depends. Do you just want to text with the available TTS model? Yes? Then the simple answer is “not at all”.

However, if you would like to train your own TTS model based on my recordings and would like to experiment with (felt) 1,000 parameters? Then one or both of my datasets is a good basis for this.

Please keep in mind I’m not a professional speaker, just a guy who donates his voice.

So please don’t have exaggerated expectations 😉

Thorsten (neutral) – Version 2020:

Anzahl Aufnahmen22.668
Audiodauer23+ Stunden
Samplerate22.050Hz
KanäleMono
Normalisierung-24dB
Satzlänge (min/avg/max)2 / 52 / 180 Zeichen
Sprechgeschwindigkeit
(Durchschnitt)
14 Zeichen / Sekunde
Fragesätze2.780
Ausrufesätze1.840
@dataset{muller_thorsten_2021_5525342,
  author       = {Müller, Thorsten and
                  Kreutz, Dominik},
  title        = {Thorsten - Open German Voice (Neutral) Dataset},
  month        = feb,
  year         = 2021,
  note         = {{Please use it to make the world a better place for 
                   whole humankind.}},
  publisher    = {Zenodo},
  version      = {3.0},
  doi          = {10.5281/zenodo.5525342},
  url          = {https://doi.org/10.5281/zenodo.5525342}
}

Download: https://zenodo.org/record/5525342

Thorsten (emotional) – Version 2021:

The emotional dataset consists of 300 distinct sentences. Each of them is spoken by me in the following eight emotions.

  • Neutral
  • Disgusted
  • Furious
  • amused
  • Surprised
  • Sleepy
  • whispering
  • Drunk (i was sober during the recording)
Anzahl Aufnahmen2.400
Samplerate
KanäleMono
Normalisierung-24dB
Satzlänge (min/max)59 / 148 Zeichen
@dataset{muller_thorsten_2021_5525023,
  author       = {Müller, Thorsten and
                  Kreutz, Dominik},
  title        = {Thorsten - Open German Voice (Emotional) Dataset},
  month        = jun,
  year         = 2021,
  note         = {{Please use it to make the world a better place for 
                   whole humankind.}},
  publisher    = {Zenodo},
  version      = {2.0},
  doi          = {10.5281/zenodo.5525023},
  url          = {https://doi.org/10.5281/zenodo.5525023}
}

Download: https://zenodo.org/record/5525023

Interested in Open Voice Technology? Take a look at my Youtube channel on that.
This is default text for notification bar