YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
- PhoWhisper Runtime
- Repository Contents
- 1. System Requirements
- 2. Deployment Layout
- 3. Python Environment
- 4. Model Artifacts
- 5. Runtime Configuration
- 6. Preflight Checks
- 7. Start The API
- 8. API Usage
- 9. CLI Inference
- 10. systemd Service
- 11. Diarization Behavior
- 12. Artifact And Secret Policy
- 13. Troubleshooting
- 14. Release Checklist
- Repository Contents
PhoWhisper Runtime
Production runtime for PhoWhisper ASR API. This package is designed to be
portable: copy the prepared phowhisper/ directory to another Linux host,
install Python dependencies, create .env, validate models/, then run
server_api.py. No training repository is required on the target host.
Repository Contents
Runtime package contents:
- API/runtime code
- Post-processing rules
- Runtime configuration templates
- Model tokenizer/config metadata and prepared checkpoint directories under
models/ - Optional ChunkFormer reference ASR package under
vendor/chunkformer/ - Optional ChunkFormer launcher and checkpoint under
scripts/chunkformer.pyandmodels/chunkformer-model/ - Lightweight validation scripts
Not included in a clean git checkout unless restored through Git LFS or an artifact bundle:
- Primary PhoWhisper and punctuation model weight files
- Optional base-model weight files
Not part of the portable runtime package:
- Environment secrets
- Uploaded audio
- Runtime outputs and logs
If you copy the whole prepared directory from a working machine, model weights
under models/ are carried with it. If you clone from git, run git lfs pull
or restore the model artifact bundle before starting the API.
1. System Requirements
Minimum supported environment:
| Component | Requirement |
|---|---|
| OS | Ubuntu 22.04/24.04 LTS or compatible Linux x86_64 |
| Python | 3.12 |
| Audio tooling | ffmpeg |
| RAM | 16GB minimum, 32GB recommended |
| Disk | 15GB minimum for model artifacts and runtime cache |
| GPU | NVIDIA GPU recommended for production throughput |
CPU execution is supported for validation and low-volume workloads. Production traffic should use a CUDA-capable GPU with a PyTorch build matching the host driver.
Install host packages:
sudo apt-get update
sudo apt-get install -y git ffmpeg python3.12 python3.12-venv python3-pip
2. Deployment Layout
Recommended installation path:
/opt/phowhisper
Create the deployment directory:
sudo mkdir -p /opt/phowhisper
sudo chown "$USER":"$USER" /opt/phowhisper
Clone the repository:
git clone <repo-url> /opt/phowhisper
cd /opt/phowhisper
Or copy a prepared runtime folder from another machine:
cd /path/to/source
tar \
--exclude='phowhisper/.env' \
--exclude='phowhisper/.venv' \
--exclude='phowhisper/tmp/*' \
--exclude='phowhisper/outputs/*' \
--exclude='*/__pycache__' \
-czf phowhisper-runtime.tar.gz phowhisper
tar -xzf phowhisper-runtime.tar.gz -C /opt
cd /opt/phowhisper
3. Python Environment
Use Conda when the target machine standardizes Python runtimes through Conda:
conda create -y -n phowhisper_runtime python=3.12
conda activate phowhisper_runtime
export PYTHONNOUSERSITE=1
python -m pip install -U pip setuptools wheel
python -m pip install -r requirements.txt
Use venv when Conda is not available:
python3.12 -m venv .venv
. .venv/bin/activate
export PYTHONNOUSERSITE=1
python -m pip install -U pip setuptools wheel
python -m pip install -r requirements.txt
Validate PyTorch and CUDA visibility:
PYTHONNOUSERSITE=1 python - <<'PY'
import torch
print("torch_version:", torch.__version__)
print("cuda_available:", torch.cuda.is_available())
print("cuda_device:", torch.cuda.get_device_name(0) if torch.cuda.is_available() else "cpu")
PY
If cuda_available is False on a GPU host, install the PyTorch wheel that matches the installed NVIDIA driver and CUDA runtime.
PYTHONNOUSERSITE=1 is intentional. It prevents the runtime from importing packages from ~/.local and makes dependency validation reflect the active environment only.
4. Model Artifacts
The runtime expects all model paths to live under models/ unless overridden in
.env.
Required for normal API startup:
models/phowhisper-merged/model.safetensors
models/punctuation-bartpho/model.safetensors
models/phowhisper-base/
models/phowhisper-base/ must exist because the server uses its tokenizer and
configuration metadata. Its large pytorch_model.bin weight is only required
when the ASR checkpoint is used as a LoRA adapter or when a base-model fallback
is explicitly needed.
Required only when PHOWHISPER_CRITICAL_COMPARE_PROVIDER=chunkformer:
models/chunkformer-model/pytorch_model.pt
models/chunkformer-model/config.yaml
models/chunkformer-model/global_cmvn
models/chunkformer-model/vocab.txt
models/chunkformer-model/tokenizer/
The runtime already includes scripts/chunkformer.py and vendor/chunkformer/,
so the target host does not need the original ChunkFormer training repository.
Pyannote diarization weights are resolved by pyannote.audio through Hugging Face. For online deployments, provide HF_TOKEN. For offline deployments, pre-populate the Hugging Face cache on the target machine or disable pyannote diarization in .env.
Package artifacts from the source machine:
cd /path/to/source/phowhisper
tar -czf phowhisper-model-artifacts.tar.gz \
models/phowhisper-merged/model.safetensors \
models/punctuation-bartpho/model.safetensors \
models/phowhisper-base \
models/chunkformer-model
Restore artifacts on the target machine:
cd /opt/phowhisper
tar -xzf /path/to/phowhisper-model-artifacts.tar.gz
Validate artifact placement:
. .venv/bin/activate
export PYTHONNOUSERSITE=1
python scripts/check_models.py
With Conda:
conda activate phowhisper_runtime
export PYTHONNOUSERSITE=1
python scripts/check_models.py
Expected output:
model artifacts look ready
If you copied the entire prepared phowhisper/ folder, this check is still the
source of truth. It confirms the copied model layout is usable before the server
loads large checkpoints.
5. Runtime Configuration
Create a local environment file:
cp .env.example .env
chmod 600 .env
Baseline configuration:
PORT=8000
USE_NGROK=false
PHOWHISPER_ASR_MODEL_PATH=models/phowhisper-merged
PHOWHISPER_BASE_MODEL_PATH=models/phowhisper-base
PHOWHISPER_PUNCT_MODEL_PATH=models/punctuation-bartpho
PHOWHISPER_DOMAIN_CONFIG=configs/domain_correction.yaml
PHOWHISPER_CRITICAL_COMPARE_PROVIDER=none
PHOWHISPER_CHUNKFORMER_SCRIPT=scripts/chunkformer.py
PHOWHISPER_CHUNKFORMER_MODEL_PATH=models/chunkformer-model
PHOWHISPER_CHUNKFORMER_DEVICE=auto
PHOWHISPER_ENABLE_DURATION_CONFUSION_RULE=true
PHOWHISPER_USE_VAD=false
PHOWHISPER_TURN_MODE=off
PHOWHISPER_DIARIZATION_SIDECAR=true
PHOWHISPER_NUM_SPEAKERS=2
Keep PHOWHISPER_CRITICAL_COMPARE_PROVIDER=none for the fastest default API
path. Set it to chunkformer only when the secondary ASR comparison is needed.
PHOWHISPER_ENABLE_DURATION_CONFUSION_RULE=true can run independently of
ChunkFormer and fixes duration phrases such as hai mươi lăm in supported
duration contexts.
Pyannote configuration for mono speaker diarization:
HF_TOKEN=<huggingface-token>
PHOWHISPER_ENABLE_PYANNOTE=true
PHOWHISPER_PYANNOTE_MODEL_ID=pyannote/speaker-diarization-community-1
PHOWHISPER_PYANNOTE_FALLBACK_MODEL_IDS=pyannote/speaker-diarization-3.1
Ngrok configuration, only when a public tunnel is required:
USE_NGROK=true
NGROK_AUTHTOKEN=<ngrok-token>
NGROK_REGION=ap
NGROK_DOMAIN=
Do not commit .env. It may contain service tokens and deployment-specific paths.
6. Preflight Checks
Run syntax and artifact checks before starting the service:
. .venv/bin/activate
export PYTHONNOUSERSITE=1
python -m py_compile \
server_api.py \
scripts/chunkformer.py \
scripts/infer_audio.py \
scripts/check_models.py \
src/domain_correction/*.py \
src/punctuation/*.py \
src/turns/*.py
python scripts/check_models.py
7. Start The API
Foreground start:
. .venv/bin/activate
export PYTHONNOUSERSITE=1
python server_api.py
The service exposes:
GET /health
POST /transcribe
POST /api/transcribe
POST /upload
Health check:
curl -fsS http://127.0.0.1:8000/health
Minimum healthy response fields:
{
"status": "ok",
"asr_ready": true,
"punctuation_ready": true,
"domain_ready": true
}
8. API Usage
Multipart upload:
curl -X POST \
-F "file=@/path/to/audio.wav" \
http://127.0.0.1:8000/transcribe
Multipart upload with speaker count hint:
curl -X POST \
-F "file=@/path/to/audio.wav" \
-F "num_speakers=2" \
http://127.0.0.1:8000/transcribe
Supported request formats:
| Format | Field |
|---|---|
| multipart/form-data | file |
| multipart/form-data | audio |
| application/json | audio_base64 |
Primary response fields:
| Field | Description |
|---|---|
text, full_transcription |
Final transcript after post-processing |
text_raw |
Raw ASR text |
segments |
Speaker turns returned to clients |
conversation_text |
Timestamped conversation transcript |
diarization_segments |
Raw diarization timeline |
elapsed_seconds |
End-to-end processing time |
9. CLI Inference
Single-file inference:
. .venv/bin/activate
export PYTHONNOUSERSITE=1
python scripts/infer_audio.py /path/to/audio.wav \
--turn-mode off \
--output-json outputs/result.json \
--output-text outputs/result.txt
VAD-based chunking:
export PYTHONNOUSERSITE=1
python scripts/infer_audio.py /path/to/audio.wav \
--vad \
--vad-max-segment-s 18 \
--output-json outputs/result.json
10. systemd Service
Create /etc/systemd/system/phowhisper.service:
sudo tee /etc/systemd/system/phowhisper.service >/dev/null <<'EOF'
[Unit]
Description=PhoWhisper Runtime API
After=network.target
[Service]
Type=simple
WorkingDirectory=/opt/phowhisper
Environment=PYTHONUNBUFFERED=1
Environment=PYTHONNOUSERSITE=1
ExecStart=/opt/phowhisper/.venv/bin/python /opt/phowhisper/server_api.py
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
EOF
Enable and start:
sudo systemctl daemon-reload
sudo systemctl enable phowhisper
sudo systemctl start phowhisper
sudo systemctl status phowhisper --no-pager
Logs:
journalctl -u phowhisper -f
Restart after code/config changes:
sudo systemctl restart phowhisper
11. Diarization Behavior
Default runtime flow:
- Run ASR on the full audio to preserve transcript continuity.
- Run pyannote as a sidecar diarization pass.
- Align ASR timestamps to the diarization timeline.
- Smooth short speaker blips.
- Apply punctuation, capitalization, domain correction, money normalization and phone normalization.
Mono speaker diarization is probabilistic. Accuracy can degrade when speakers overlap, responses are very short, voices are similar, or the recording is heavily compressed. Stereo or dual-channel call recordings are preferred when available.
12. Artifact And Secret Policy
Do not commit secrets or runtime output:
.env
tmp/
outputs/
Large model files should be copied as deployment artifacts or committed only
through Git LFS. This package intentionally allows
models/chunkformer-model/pytorch_model.pt through Git LFS when ChunkFormer is
part of the portable release.
Recommended artifact storage:
- Hugging Face Hub
- S3-compatible object storage
- Internal artifact registry
- Git LFS, only if the repository is explicitly configured for large files
Git LFS setup, if required:
git lfs install
git lfs track "*.safetensors" "*.bin" "*.pt" "*.pth" "*.ckpt"
git add .gitattributes
13. Troubleshooting
Port already in use:
lsof -tiTCP:8000 -sTCP:LISTEN -n -P
kill <pid>
Missing model artifacts:
python scripts/check_models.py
Pyannote fails to load:
- Verify
HF_TOKEN. - Confirm model access has been accepted on Hugging Face.
- Keep
PHOWHISPER_PYANNOTE_FALLBACK_MODEL_IDS=pyannote/speaker-diarization-3.1enabled for compatibility fallback.
CUDA out of memory:
PHOWHISPER_ASR_BATCH_SIZE=1
PHOWHISPER_PUNCT_BATCH_SIZE=1
CPU-only mode:
PHOWHISPER_ASR_DEVICE=cpu
PHOWHISPER_PUNCT_DEVICE=cpu
CPU-only mode is intended for functional validation or low-throughput jobs.
14. Release Checklist
Before handing over a deployment:
. .venv/bin/activate
export PYTHONNOUSERSITE=1
python -m py_compile \
server_api.py \
scripts/chunkformer.py \
scripts/infer_audio.py \
scripts/check_models.py \
src/domain_correction/*.py \
src/punctuation/*.py \
src/turns/*.py
python scripts/check_models.py
curl -fsS http://127.0.0.1:8000/health
git status --short
git check-ignore -v .env