Collect high-quality audio-text pairs. Most modern frameworks like Mozilla TTS or Tortoise require the LJSpeech format (22,050Hz, 16-bit Mono WAV) with corresponding transcriptions in a metadata.csv file.
For a quick demonstration of setting up high-performance TTS locally, watch this installation guide: Faster Tortoise TTS Generation - Deepspeed Installation Jarods Journey YouTube• Oct 21, 2023 TTS.rar
Tools like DeepSpeed can increase generation speed by 2x to 10x for models like Tortoise TTS. Collect high-quality audio-text pairs