Chatterbox Turbo
Use the Chatterbox Turbo plugin for high-quality English TTS with emotion tags. Install models, select a voice, and generate speech.
Chatterbox Turbo is a text-to-speech (TTS) plugin based on the Chatterbox Turbo model. It provides high-quality English voice cloning and supports emotion tags (e.g. clear throat, sigh, laugh) for more natural speech.
After the plugin is installed, follow the steps below to download models, choose a voice, and generate audio.
1. Install models
Open the Chatterbox plugin in Vidpai.
In the right sidebar, find the Model section and click Model Management.
In the Available Models list:
Choose one TTS model by need:
Chatterbox-Turbo-fp16 (1.15GB) — Best quality and speed; recommended if you have enough memory.
Chatterbox-Turbo-8bit (900M) — Good quality with lower memory use.
Chatterbox-Turbo-4bit (750MB) — Smallest footprint; suitable for limited resources.
Install S3TokenizerV2 (460MB) — Required for Chatterbox TTS. Install it for synthesis to work.
For each model you need, click Download (or use Import if you have the files locally). Wait until installation finishes.
Close Model Management when done. The selected model will appear in the Model dropdown.
2. Select a voice
In the Voice section at the top of the right sidebar:
Open the Voice dropdown and pick a speaker (e.g. Alice).
Use + Add Voice or Voice Manage to add or manage custom voices if needed.
3. Enter text and generate
In the left panel, enter or paste the text you want to turn into speech. Supported formats: TXT, MD; you can also use Import File to load a file.
(Optional) Use the EMOTION TAGS buttons (clear throat, sigh, chuckle, laugh, etc.) to insert emotion markers into the text for more natural delivery.
Set Output Filename and Output Folder if you want to change the save location. Optionally enable Save to Library.
Click Generate to start synthesis. When it finishes, the audio is saved to the output folder (and to the library if enabled).
Model overview
| Model | Size | Use case |
| Chatterbox-Turbo-fp16 | 1.15 GB | Best quality and speed; default choice when resources allow. |
| Chatterbox-Turbo-8bit | 900 MB | Balanced quality and memory. |
| Chatterbox-Turbo-4bit | 750 MB | Minimal memory and disk; for constrained devices. |
| S3TokenizerV2 | 460 MB | Required tokenizer; install for TTS to work. |
If download from the app is slow or fails, use MANUAL DOWNLOAD LINK in Model Management (Download Link 1 / 2), then Import the files into Vidpai.