Qwen3-TTS Studio
A professional, turnkey AI voice generator powered by Alibaba's Qwen3-TTS
Features
- Zero Configuration — One-click setup installs everything automatically (Python, dependencies, models)
- 8 Built-in Voices — English, Chinese, Japanese, and Korean speakers
- Style Control — Presets for Natural, Cheerful, Calm, Professional, and more
- Custom Instructions — Fine-tune voice emotion and delivery with text prompts
- Recording Library — Browse, play, and manage your generated audio files
- Modern Dark UI — Clean, professional interface inspired by Spotify and Discord
- GPU Accelerated — Automatic NVIDIA CUDA detection for faster generation
- Offline Ready — Works completely offline after initial setup
Requirements
- OS: Windows 10/11 (64-bit)
- RAM: 8GB minimum, 16GB recommended
- Storage: 15GB free space for models
- GPU: NVIDIA GPU with 6GB+ VRAM recommended (CPU mode available but slower)
Note: No Python installation required! The app downloads and manages its own embedded Python environment.
Installation
Option 1: Download Release (Recommended)
- Download the latest release from the Releases page
- Extract to any folder
- Run
qwen3_tts_studio.py
Option 2: Clone Repository
git clone https://github.com/yourusername/qwen3-tts-studio.git
cd qwen3-tts-studio
python qwen3_tts_studio.py
First-Time Setup
On first launch, the Setup Wizard will guide you through installation:
- Python Environment — Downloads embedded Python 3.12 (~25MB)
- AI Components — Installs PyTorch, Transformers, etc. (~3GB)
- Voice Tokenizer — Required for all voices (~500MB)
- Voice Model — Standard quality model (~7GB)
Simply click "Install Everything" and wait 10-20 minutes depending on your internet speed.
Usage
Basic Text-to-Speech
- Type or paste text in the input box
- Select a voice from the dropdown
- Choose a style preset (or enter custom instructions)
- Click "Generate Speech"
- Audio plays automatically and saves to your library
Voices
| Voice | Language | Description |
|---|---|---|
| Ryan | English | Dynamic male voice with strong rhythm |
| Aiden | English | Sunny American male accent |
| Vivian | Chinese | Bright, youthful female voice |
| Serena | Chinese | Warm and gentle female voice |
| Dylan | Chinese | Youthful Beijing male accent |
| Eric | Chinese | Lively Sichuan male accent |
| Anna | Japanese | Playful and expressive female |
| Sohee | Korean | Warm and friendly female |
Style Presets
- Natural — Clear, everyday speech
- Cheerful — Enthusiastic and happy
- Calm — Relaxed and soothing
- Professional — Confident business tone
- Excited — High energy delivery
- Gentle — Soft and tender
- News Anchor — Broadcast style
- Storytelling — Narrative delivery
Custom Style Instructions
For fine-grained control, enter custom instructions like:
- "Speak slowly with a mysterious tone"
- "Sound excited but slightly out of breath"
- "Whisper softly as if telling a secret"
File Locations
| Content | Location |
|---|---|
| Application Data | %LOCALAPPDATA%\Qwen3-TTS\ |
| Voice Models | %LOCALAPPDATA%\Qwen3-TTS\models\ |
| Recordings | Documents\Qwen3-TTS Recordings\ |
| Configuration | %LOCALAPPDATA%\Qwen3-TTS\studio_config.json |
Troubleshooting
"Network error. Check your internet connection."
- Verify your internet connection is working
- Try disabling VPN if active
- Check if firewall is blocking Python
- The installer will retry automatically 3 times
Generation is slow
- With GPU: First generation loads the model (~1-2 min), subsequent ones are faster
- Without GPU: CPU mode is significantly slower, consider using the 0.6B lite model
Out of memory errors
- Close other applications to free RAM
- Use the lighter CustomVoice-0.6B model (Settings → Run Setup Wizard)
- Reduce text length per generation
Setup Wizard won't close
- Ensure all 4 checkmarks are green
- Click "Launch Studio" or "Continue to Studio"
- If stuck, close and reopen the application
Configuration
Settings are stored in studio_config.json:
{
"selected_voice": "Ryan (English Male)",
"selected_model": "CustomVoice-1.7B",
"style_preset": "Natural",
"volume": 0.8,
"auto_play": true,
"setup_complete": true
}
Credits
- Qwen3-TTS — Alibaba Qwen Team
- Models — Hugging Face
- UI Framework — CustomTkinter
License
This project is licensed under the MIT License - see the LICENSE file for details.
The underlying Qwen3-TTS models are licensed under Apache 2.0 by Alibaba Cloud.
Made with ❤️ for the AI voice community