After I (again) invested months of my free time for audio recordings (this time with a good microphone and recording setup) and Dominik applied his “audio magic” things really started for both of us.
We have tried (and still try) various configurations, but want to share our current result with you.
- > 12.000 mono audio recordings made by me with a samplerate of 22kHz
- Trained with mit Coqui TTS (0.5.0)
- Tacotron2 DDC (TTS-model)
- HifGAN (Vocoder) – Thanks Olaf, for supporting us with compute power.
- Lot’s of love 🙂
Of course, this “Thorsten” model can still be generated offline and is available free of charge under CC0 license.
But how does it sound?
There is no date yet when the model and underlying dataset will be released as the “fine tuning” work is still ongoing. However, we are closer to the goal than to the beginning :-).