XTTS is a Voice generation model that lets you clone voices into different languages by using just a quick 3-second audio clip.

XTTS is built on previous research, like Tortoise, with additional architectural innovations and training to make cross-language voice cloning and multilingual speech generation possible.

