cwchang's picture
docs: update CLAUDE.md with Docker deployment and UI improvements
79fb94c

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Full-stack Qwen3-TTS voice cloning application with:

  • CLI tools (voice_clone.py, quick_clone.py) for command-line usage
  • FastAPI backend (backend/) providing REST API
  • React frontend (frontend/) with responsive design and clean UI
  • Docker deployment ready for Hugging Face Spaces and local hosting
  • Qwen3-TTS-12Hz-1.7B-Base model for voice synthesis with 95% similarity

The project enables 3-10 second reference audio cloning across 10 languages (Chinese, English, Japanese, Korean, etc.).

Deployment Options:

  1. Local development (separate backend + frontend)
  2. Local Docker (unified container with Nginx)
  3. Hugging Face Spaces (public Docker deployment with automatic model download)

System Requirements

Local Development

  • Python: 3.12+ (managed via uv)
  • Node.js: v18+ (frontend uses Yarn)
  • GPU (optional): NVIDIA GPU with CUDA 11.8+ (uses ~4GB VRAM)
  • CPU mode: Works without GPU (8-12GB RAM, slower generation)
  • Model: Qwen3-TTS-12Hz-1.7B-Base (~4.3 GB, auto-downloaded or symlinked)

Docker Deployment

  • Docker: 20.10+
  • RAM: 8-12GB minimum (for model loading)
  • Disk: ~10GB (model + dependencies)
  • Network: For Hugging Face model download on first run

Development Commands

Initial Setup

# Install Python dependencies
uv sync

# Install frontend dependencies
cd frontend && yarn install

# Download/link model (choose one)
./setup.sh           # Full setup with model download
./link_model.sh      # Link to existing model at /path/to/model

Running the Full Stack

Backend (Terminal 1):

./backend/start_server.sh
# or manually:
uv run uvicorn backend.main:app --reload --host 0.0.0.0 --port 8000

Frontend (Terminal 2):

cd frontend
yarn dev              # Local only (localhost:3000)
yarn dev --host       # Network accessible (0.0.0.0:3000)

Access:

CLI Tools (Alternative to Web UI)

# Interactive mode
uv run python voice_clone.py

# Quick test with predefined config
uv run python quick_clone.py

Testing

# Test backend configuration
uv run python test_backend.py

# Test model import
uv run python -c "from qwen_tts import Qwen3TTSModel; print('Model import successful')"

# Test API endpoints
curl http://localhost:8000/api/status

Building for Production

# Build frontend
cd frontend
yarn build    # Output: frontend/dist/

# Frontend build can be served with any static file server

Docker Deployment

Local Docker (Unified Container)

The project includes a complete Docker setup with Nginx serving both frontend and backend on port 7860.

Build and run:

# Build image
docker build -f Dockerfile -t qwen3-tts-hf .

# Run with model volume (avoids re-downloading)
docker run -d -p 7860:7860 \
  -v /path/to/models/Qwen3-TTS-12Hz-1.7B-Base:/app/models/Qwen3-TTS-12Hz-1.7B-Base:ro \
  --name qwen3-tts qwen3-tts-hf

# Or run without volume (auto-downloads model on first run)
docker run -d -p 7860:7860 --name qwen3-tts qwen3-tts-hf

Access:

Architecture:

  • Nginx serves frontend from /app/frontend/dist
  • Nginx reverse-proxies /api/* to backend on port 8000
  • Backend runs with uvicorn on 127.0.0.1:8000
  • Model auto-downloads from Hugging Face Hub if not present

Hugging Face Spaces Deployment

The project is ready for deployment to Hugging Face Spaces with Docker SDK.

Files for HF Spaces:

  • Dockerfile - Multi-stage build (frontend + backend)
  • docker-entrypoint.sh - Startup script with automatic model download
  • .dockerignore - Excludes unnecessary files from build
  • README-HF.md - Hugging Face Spaces documentation

Deploy to HF Spaces:

# 1. Create Space at https://huggingface.co/new-space
#    - Choose Docker SDK
#    - Select CPU basic (free) or CPU upgrade (16GB RAM)

# 2. Push to HF Spaces
git remote add hf https://huggingface.co/spaces/YOUR_USERNAME/SPACE_NAME
git push hf main

# 3. HF Spaces will automatically:
#    - Build Docker image (~5-10 min)
#    - Download model on first run (~5 min)
#    - Start service on port 7860

Frontend Environment Detection: The frontend automatically detects Hugging Face Spaces environment and adjusts API URLs accordingly:

const API_URL = import.meta.env.VITE_API_URL || (
  window.location.hostname.includes('hf.space') || window.location.hostname.includes('huggingface.co')
    ? '' // HF Spaces: relative path (Nginx proxy)
    : 'http://localhost:8000' // Local development
)

Docker Build Process:

  1. Stage 1 (frontend-builder): Builds React app with yarn build
  2. Stage 2 (production):
    • Installs Python dependencies (CPU-only PyTorch)
    • Copies built frontend to /app/frontend/dist
    • Configures Nginx for unified serving
    • Sets up model auto-download via docker-entrypoint.sh

Architecture

Architecture Diagrams

Local Development (Separate Servers):

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Frontend (Vite Dev Server)            โ”‚
โ”‚  - Port 3000 or 5173                   โ”‚
โ”‚  - Hot Module Replacement               โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
               โ”‚ HTTP/REST (CORS)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Backend (Uvicorn)                      โ”‚
โ”‚  - Port 8000                            โ”‚
โ”‚  - FastAPI + CORS middleware            โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
               โ”‚ Python API
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Qwen3-TTS Model                        โ”‚
โ”‚  - GPU/CPU auto-detection               โ”‚
โ”‚  - FlashAttention 2 (optional)          โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Docker Deployment (Unified Container):

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Nginx (Port 7860)                      โ”‚
โ”‚  - Serves frontend static files         โ”‚
โ”‚  - Reverse proxy /api โ†’ backend         โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
       โ”‚                      โ”‚
       โ”‚ Static Files         โ”‚ Proxy /api/*
       โ”‚                      โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Frontend Dist  โ”‚    โ”‚  Backend        โ”‚
โ”‚  /app/frontend/ โ”‚    โ”‚  127.0.0.1:8000 โ”‚
โ”‚  dist/          โ”‚    โ”‚  (Uvicorn)      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                              โ”‚
                       โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                       โ”‚  Qwen3-TTS      โ”‚
                       โ”‚  CPU-only mode  โ”‚
                       โ”‚  /app/models/   โ”‚
                       โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Backend API (backend/main.py)

FastAPI application with 7 endpoints:

  • GET / - API info (version, model status, device)
  • GET /api/status - Service status (ready/loading/error)
  • POST /api/upload - Upload reference audio (returns audio_id)
  • POST /api/clone - Generate cloned voice (requires ref_audio_id, ref_text, target_text)
  • GET /api/download/{audio_id} - Download generated audio
  • DELETE /api/audio/{audio_id} - Delete uploaded/generated audio
  • GET /api/cleanup - Clean up old files (default: >24h)

Key Backend Patterns:

  1. Model singleton: Global model instance loaded once at startup
  2. UUID-based file management: All uploads/outputs use UUID filenames
  3. Automatic directory creation: backend/uploads/ and backend/outputs/ created on startup
  4. CORS: Pre-configured for localhost:3000 and localhost:5173 (Vite)

API Request Example:

# 1. Upload reference audio
response = requests.post('http://localhost:8000/api/upload', files={'file': open('voice.wav', 'rb')})
audio_id = response.json()['audio_id']

# 2. Generate cloned voice
response = requests.post('http://localhost:8000/api/clone', json={
    'ref_audio_id': audio_id,
    'ref_text': 'ๅƒ่€ƒ้Ÿณ่จŠไธญ็š„ๆ–‡ๅญ—',
    'target_text': '่ฆ็”Ÿๆˆ็š„ๆ–ฐๆ–‡ๅญ—',
    'language': 'Chinese',
    'x_vector_only': False
})
output_id = response.json()['audio_id']

# 3. Download result
response = requests.get(f'http://localhost:8000/api/download/{output_id}')
with open('output.wav', 'wb') as f:
    f.write(response.content)

Frontend (frontend/src/App.tsx)

Modern single-page React application with responsive design and clean UI:

Features:

  • Responsive layout (mobile/tablet/desktop with Tailwind breakpoints)
  • Increased font sizes for better readability
  • File upload via drag-and-drop or click
  • Audio preview for uploaded reference audio
  • Real-time status updates during generation
  • Clean interface with removed non-functional links

UI Improvements (Latest):

  • Larger fonts across all components (base, lg, xl sizes)
  • Responsive container with max-w-7xl and flexible columns
  • Two-column layout on desktop, single column on mobile
  • Simplified navigation and footer (removed dummy links)
  • Enhanced spacing and padding for better UX

API Integration:

  • Environment-aware API_URL configuration
  • Automatic detection of HF Spaces vs local development
  • Uses native fetch() for all API calls
  • Blob URL management for audio preview with cleanup

Key Frontend Patterns:

  1. useState hooks: Form state (refAudioId, refText, targetText, language, etc.)
  2. useRef hooks: File input, audio players (ref + generated)
  3. useEffect hooks: Blob URL cleanup to prevent memory leaks
  4. Event handlers: handleFileSelect, handleGenerate, handleDrop
  5. Conditional rendering: Upload status, loading states, audio players

CLI Tools (voice_clone.py, quick_clone.py)

Direct Python scripts that bypass the web stack:

  • Load model directly
  • Read local files from reference_audios/
  • Write output to outputs/
  • Useful for batch processing or server-side automation

voice_clone.py modes:

  • interactive_mode(): User prompts for audio selection, text input, language
  • batch_mode(): Generate multiple texts with same voice prompt

Voice Cloning Workflow

Core Workflow (All Interfaces)

  1. Create voice clone prompt from reference audio + text:

    voice_clone_prompt = model.create_voice_clone_prompt(
        ref_audio="path/to/audio.wav",
        ref_text="transcript of the audio",
        x_vector_only_mode=False,  # True = no ref_text needed but lower quality
    )
    
  2. Generate cloned voice:

    wavs, sr = model.generate_voice_clone(
        text="text to synthesize",
        language="Chinese",  # or "English", "Japanese", "Korean"
        voice_clone_prompt=voice_clone_prompt,
    )
    
  3. Save output:

    import soundfile as sf
    sf.write("output.wav", wavs[0], sr)
    

Model Loading Pattern

All components use consistent model loading with FlashAttention 2 fallback:

try:
    model = Qwen3TTSModel.from_pretrained(
        "models/Qwen3-TTS-12Hz-1.7B-Base",
        device_map="cuda:0" if torch.cuda.is_available() else "cpu",
        dtype=torch.bfloat16,
        attn_implementation="flash_attention_2",
    )
except Exception:
    # Fallback to standard attention if FlashAttention 2 unavailable
    model = Qwen3TTSModel.from_pretrained(
        "models/Qwen3-TTS-12Hz-1.7B-Base",
        device_map="cuda:0" if torch.cuda.is_available() else "cpu",
        dtype=torch.bfloat16,
    )

Directory Structure

qwen3clone/
โ”œโ”€โ”€ backend/                    # FastAPI backend
โ”‚   โ”œโ”€โ”€ main.py                # API endpoints and model loading
โ”‚   โ”œโ”€โ”€ start_server.sh        # Backend startup script
โ”‚   โ”œโ”€โ”€ uploads/               # Temporary uploaded reference audio (created at runtime)
โ”‚   โ””โ”€โ”€ outputs/               # Generated audio files (created at runtime)
โ”‚
โ”œโ”€โ”€ frontend/                  # React frontend
โ”‚   โ”œโ”€โ”€ src/
โ”‚   โ”‚   โ”œโ”€โ”€ App.tsx           # Main application (responsive UI, all logic)
โ”‚   โ”‚   โ”œโ”€โ”€ main.tsx          # React entry point
โ”‚   โ”‚   โ”œโ”€โ”€ index.css         # Global styles (Tailwind directives)
โ”‚   โ”‚   โ””โ”€โ”€ vite-env.d.ts     # Vite environment types
โ”‚   โ”œโ”€โ”€ package.json          # Frontend dependencies
โ”‚   โ”œโ”€โ”€ tailwind.config.js    # Tailwind color theme (custom palette)
โ”‚   โ”œโ”€โ”€ vite.config.ts        # Vite dev server config
โ”‚   โ””โ”€โ”€ dist/                 # Built static files (created by yarn build)
โ”‚
โ”œโ”€โ”€ models/                    # TTS model directory
โ”‚   โ””โ”€โ”€ Qwen3-TTS-12Hz-1.7B-Base/  # Symlink or actual model files
โ”‚
โ”œโ”€โ”€ reference_audios/          # Input: 3-10s reference audio for CLI tools
โ”œโ”€โ”€ outputs/                   # Output: CLI-generated .wav files
โ”‚
โ”œโ”€โ”€ voice_clone.py            # CLI interactive tool
โ”œโ”€โ”€ quick_clone.py            # CLI quick test script
โ”œโ”€โ”€ test_backend.py           # Backend configuration tests
โ”‚
โ”œโ”€โ”€ Dockerfile                # Docker multi-stage build for HF Spaces
โ”œโ”€โ”€ docker-entrypoint.sh      # Container startup script (auto model download)
โ”œโ”€โ”€ .dockerignore             # Docker build exclusions
โ”‚
โ”œโ”€โ”€ README-HF.md              # Hugging Face Spaces documentation
โ”œโ”€โ”€ CLAUDE.md                 # This file (project guidance for Claude Code)
โ”œโ”€โ”€ setup.sh                  # Full environment + model setup
โ”œโ”€โ”€ link_model.sh             # Link to existing model
โ””โ”€โ”€ pyproject.toml            # Python dependencies (uv)

Configuration

Backend Configuration (backend/main.py)

# Environment variable support (with defaults)
MODEL_PATH = os.getenv("MODEL_PATH", "models/Qwen3-TTS-12Hz-1.7B-Base")
UPLOAD_DIR = Path(os.getenv("UPLOAD_DIR", "backend/uploads"))
OUTPUT_DIR = Path(os.getenv("OUTPUT_DIR", "backend/outputs"))
USE_CPU = os.getenv("USE_CPU", "false").lower() == "true"

# CORS origins (environment variable or defaults)
cors_origins = os.getenv(
    "CORS_ORIGINS",
    "http://localhost:3000,http://localhost:5173,http://localhost"
).split(",")

Docker Environment Variables:

  • MODEL_PATH: Model directory path (default: /app/models/Qwen3-TTS-12Hz-1.7B-Base)
  • UPLOAD_DIR: Upload directory (default: /app/backend/uploads)
  • OUTPUT_DIR: Output directory (default: /app/backend/outputs)
  • USE_CPU: Force CPU mode (default: true in Docker, auto-detect in dev)
  • CORS_ORIGINS: Comma-separated allowed origins

Frontend Configuration (frontend/src/App.tsx)

// Environment-aware configuration (automatic)
const API_URL = import.meta.env.VITE_API_URL || (
  window.location.hostname.includes('hf.space') || window.location.hostname.includes('huggingface.co')
    ? '' // HF Spaces: use relative path
    : 'http://localhost:8000' // Local development
)

Configuration Methods:

  1. Local Development: Uses http://localhost:8000 by default
  2. Docker/HF Spaces: Set VITE_API_URL=/api in Dockerfile (already configured)
  3. Custom Network: Set environment variable VITE_API_URL=http://your-server:8000

For network deployment, update:

  1. Backend CORS allow_origins to include frontend URL
  2. Set VITE_API_URL environment variable or update API_URL constant
  3. Run frontend with yarn dev --host to bind to 0.0.0.0

CLI Configuration (quick_clone.py)

Edit variables at top of file:

REF_AUDIO = "reference_audios/ref_audio.wav"
REF_TEXT = "ๅƒ่€ƒ้Ÿณ่จŠ็š„ๅฎŒๆ•ดๅ…งๅฎน"
LANGUAGE = "Chinese"
TEST_TEXTS = ["่ฆ็”Ÿๆˆ็š„็ฌฌไธ€ๅฅ", "่ฆ็”Ÿๆˆ็š„็ฌฌไบŒๅฅ"]

Reference Audio Requirements

  • Duration: 3-10 seconds (optimal balance of features vs noise)
  • Content: Single-speaker, clear speech, minimal background noise
  • Format: WAV, MP3, or FLAC
  • Transcript: ref_text must exactly match spoken content for best quality
  • Quality impact: Clean audio + accurate transcript = up to 0.95 similarity

Performance Metrics

  • RTF: ~0.5-0.6x (generates 2s audio in ~1s)
  • Sample rate: 12kHz
  • Voice similarity: Up to 0.95 with quality reference
  • GPU memory: ~4GB VRAM
  • Startup time: ~5-10s (model loading)
  • Supported languages: Chinese, English, Japanese, Korean, + 6 more

Key Implementation Details

CORS and Network Access

Backend CORS is pre-configured for local development. For network deployment:

  1. Update backend allow_origins in main.py:

    allow_origins=["http://10.0.0.85:3000"]  # Your server IP
    
  2. Update frontend API_URL in App.tsx:

    const API_URL = 'http://10.0.0.85:8000'
    
  3. Start backend with --host 0.0.0.0 (already default in start_server.sh)

  4. Start frontend with yarn dev --host to expose on network

File Cleanup

Generated files persist indefinitely. Use cleanup endpoint or cron job:

# Manual cleanup via API
curl "http://localhost:8000/api/cleanup?max_age_hours=24"

# Or delete directories
rm -rf backend/uploads/* backend/outputs/*

Model Symlink vs Download

Two options for model setup:

  1. Download (setup.sh): Downloads ~4GB model to models/
  2. Symlink (link_model.sh): Links to existing model elsewhere
    • Useful if model already downloaded in another project
    • Example: Links to /home/user/models/Qwen3-TTS-12Hz-1.7B-Base

FlashAttention 2 Behavior

  • Automatically attempts to load FlashAttention 2 for faster inference
  • Gracefully falls back to standard attention if unavailable
  • No code changes needed - handled transparently
  • Setup script installs flash-attn but may fail on some systems

Output Naming Conventions

  • Backend API: UUID-based (a1b2c3d4-...-xyz.wav)
  • CLI voice_clone.py: {ref_audio_stem}_clone_{count:03d}.wav
  • CLI quick_clone.py: clone_{count:02d}.wav
  • CLI batch_mode: batch_{count:03d}.wav

Dependencies

Python (pyproject.toml)

  • qwen-tts: Core TTS library
  • torch>=2.0.0: Deep learning framework
  • fastapi>=0.109.0: Web framework
  • uvicorn[standard]>=0.27.0: ASGI server
  • python-multipart>=0.0.6: File upload support
  • soundfile: Audio I/O
  • flash-attn (optional): Accelerated attention

Frontend (package.json)

  • react + react-dom: UI framework
  • lucide-react: Icon library
  • typescript: Type safety
  • vite: Build tool and dev server
  • tailwindcss: Utility-first CSS

Recent Improvements

Frontend UI Enhancements (Latest)

Increased Font Sizes:

  • Navigation: text-xl (from text-lg)
  • Hero title: text-5xl md:text-6xl (from text-5xl)
  • Descriptions: text-xl (from text-lg)
  • Buttons: text-lg (from text-[15px])
  • Form labels: text-base (from text-[13px])
  • Input/textarea: text-base (from text-[13px])
  • Status messages: text-base (from text-[13px])

Responsive Design:

  • Removed fixed width w-[1440px]
  • Added responsive container: max-w-7xl mx-auto
  • Two-column layout on large screens: lg:w-1/2 for each panel
  • Mobile-first with breakpoints: md:, lg: prefixes
  • Responsive padding: px-6 md:px-12 lg:px-24

Cleaned Interface:

  • Removed non-functional navigation links (API docs, GitHub)
  • Removed dummy footer links (features, pricing, tutorials)
  • Removed social media icons (Twitter, LinkedIn, GitHub)
  • Removed "Model Size" dropdown (non-functional)
  • Removed redundant "Choose File" button (upload area is clickable)
  • Simplified footer to logo + copyright only

Audio Preview:

  • Added reference audio preview with play controls
  • Blob URL management with proper cleanup (useEffect)
  • Prevents memory leaks from unreleased object URLs

Troubleshooting

Docker Issues

Container won't start:

# Check logs
docker logs qwen3-tts

# Common issues:
# 1. Port 7860 already in use
docker ps | grep 7860
# 2. Model download failed (network issue)
# 3. Insufficient memory (need 8-12GB RAM)

Model not downloading:

  • Check internet connection
  • Verify Hugging Face Hub is accessible
  • Try manual download: huggingface-cli download Qwen/Qwen3-TTS-12Hz-1.7B-Base

Frontend shows 404 on API calls:

  • Verify Nginx is running: docker exec qwen3-tts nginx -t
  • Check backend is healthy: docker exec qwen3-tts curl http://127.0.0.1:8000/api/status
  • Review API_URL configuration in frontend

Local Development Issues

Backend won't start:

  • Check model exists: ls models/Qwen3-TTS-12Hz-1.7B-Base/
  • If missing, run ./setup.sh or ./link_model.sh
  • Verify Python version: python --version (need 3.12+)

Port already in use:

# Check what's using the port
lsof -i :8000  # Backend
lsof -i :3000  # Frontend (Yarn)
lsof -i :7860  # Docker

# Kill process or change port in configuration

Frontend can't connect to backend:

  • Verify backend is running: curl http://localhost:8000/api/status
  • Check CORS settings in backend/main.py
  • Ensure API_URL in frontend/src/App.tsx matches backend address
  • For network access, use yarn dev --host and update CORS origins

Model loading fails:

  • Verify CUDA availability: python -c "import torch; print(torch.cuda.is_available())"
  • Check GPU memory: Should have ~4GB free
  • Try CPU mode: Set USE_CPU=true environment variable
  • CPU mode slower but works without GPU (8-12GB RAM needed)