You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

PhoWhisper Runtime

Production runtime for PhoWhisper ASR API. This package is designed to be portable: copy the prepared phowhisper/ directory to another Linux host, install Python dependencies, create .env, validate models/, then run server_api.py. No training repository is required on the target host.

Repository Contents

Runtime package contents:

  • API/runtime code
  • Post-processing rules
  • Runtime configuration templates
  • Model tokenizer/config metadata and prepared checkpoint directories under models/
  • Optional ChunkFormer reference ASR package under vendor/chunkformer/
  • Optional ChunkFormer launcher and checkpoint under scripts/chunkformer.py and models/chunkformer-model/
  • Lightweight validation scripts

Not included in a clean git checkout unless restored through Git LFS or an artifact bundle:

  • Primary PhoWhisper and punctuation model weight files
  • Optional base-model weight files

Not part of the portable runtime package:

  • Environment secrets
  • Uploaded audio
  • Runtime outputs and logs

If you copy the whole prepared directory from a working machine, model weights under models/ are carried with it. If you clone from git, run git lfs pull or restore the model artifact bundle before starting the API.

1. System Requirements

Minimum supported environment:

Component Requirement
OS Ubuntu 22.04/24.04 LTS or compatible Linux x86_64
Python 3.12
Audio tooling ffmpeg
RAM 16GB minimum, 32GB recommended
Disk 15GB minimum for model artifacts and runtime cache
GPU NVIDIA GPU recommended for production throughput

CPU execution is supported for validation and low-volume workloads. Production traffic should use a CUDA-capable GPU with a PyTorch build matching the host driver.

Install host packages:

sudo apt-get update
sudo apt-get install -y git ffmpeg python3.12 python3.12-venv python3-pip

2. Deployment Layout

Recommended installation path:

/opt/phowhisper

Create the deployment directory:

sudo mkdir -p /opt/phowhisper
sudo chown "$USER":"$USER" /opt/phowhisper

Clone the repository:

git clone <repo-url> /opt/phowhisper
cd /opt/phowhisper

Or copy a prepared runtime folder from another machine:

cd /path/to/source
tar \
  --exclude='phowhisper/.env' \
  --exclude='phowhisper/.venv' \
  --exclude='phowhisper/tmp/*' \
  --exclude='phowhisper/outputs/*' \
  --exclude='*/__pycache__' \
  -czf phowhisper-runtime.tar.gz phowhisper

tar -xzf phowhisper-runtime.tar.gz -C /opt
cd /opt/phowhisper

3. Python Environment

Use Conda when the target machine standardizes Python runtimes through Conda:

conda create -y -n phowhisper_runtime python=3.12
conda activate phowhisper_runtime
export PYTHONNOUSERSITE=1
python -m pip install -U pip setuptools wheel
python -m pip install -r requirements.txt

Use venv when Conda is not available:

python3.12 -m venv .venv
. .venv/bin/activate
export PYTHONNOUSERSITE=1
python -m pip install -U pip setuptools wheel
python -m pip install -r requirements.txt

Validate PyTorch and CUDA visibility:

PYTHONNOUSERSITE=1 python - <<'PY'
import torch

print("torch_version:", torch.__version__)
print("cuda_available:", torch.cuda.is_available())
print("cuda_device:", torch.cuda.get_device_name(0) if torch.cuda.is_available() else "cpu")
PY

If cuda_available is False on a GPU host, install the PyTorch wheel that matches the installed NVIDIA driver and CUDA runtime.

PYTHONNOUSERSITE=1 is intentional. It prevents the runtime from importing packages from ~/.local and makes dependency validation reflect the active environment only.

4. Model Artifacts

The runtime expects all model paths to live under models/ unless overridden in .env.

Required for normal API startup:

models/phowhisper-merged/model.safetensors
models/punctuation-bartpho/model.safetensors
models/phowhisper-base/

models/phowhisper-base/ must exist because the server uses its tokenizer and configuration metadata. Its large pytorch_model.bin weight is only required when the ASR checkpoint is used as a LoRA adapter or when a base-model fallback is explicitly needed.

Required only when PHOWHISPER_CRITICAL_COMPARE_PROVIDER=chunkformer:

models/chunkformer-model/pytorch_model.pt
models/chunkformer-model/config.yaml
models/chunkformer-model/global_cmvn
models/chunkformer-model/vocab.txt
models/chunkformer-model/tokenizer/

The runtime already includes scripts/chunkformer.py and vendor/chunkformer/, so the target host does not need the original ChunkFormer training repository.

Pyannote diarization weights are resolved by pyannote.audio through Hugging Face. For online deployments, provide HF_TOKEN. For offline deployments, pre-populate the Hugging Face cache on the target machine or disable pyannote diarization in .env.

Package artifacts from the source machine:

cd /path/to/source/phowhisper
tar -czf phowhisper-model-artifacts.tar.gz \
  models/phowhisper-merged/model.safetensors \
  models/punctuation-bartpho/model.safetensors \
  models/phowhisper-base \
  models/chunkformer-model

Restore artifacts on the target machine:

cd /opt/phowhisper
tar -xzf /path/to/phowhisper-model-artifacts.tar.gz

Validate artifact placement:

. .venv/bin/activate
export PYTHONNOUSERSITE=1
python scripts/check_models.py

With Conda:

conda activate phowhisper_runtime
export PYTHONNOUSERSITE=1
python scripts/check_models.py

Expected output:

model artifacts look ready

If you copied the entire prepared phowhisper/ folder, this check is still the source of truth. It confirms the copied model layout is usable before the server loads large checkpoints.

5. Runtime Configuration

Create a local environment file:

cp .env.example .env
chmod 600 .env

Baseline configuration:

PORT=8000
USE_NGROK=false

PHOWHISPER_ASR_MODEL_PATH=models/phowhisper-merged
PHOWHISPER_BASE_MODEL_PATH=models/phowhisper-base
PHOWHISPER_PUNCT_MODEL_PATH=models/punctuation-bartpho
PHOWHISPER_DOMAIN_CONFIG=configs/domain_correction.yaml

PHOWHISPER_CRITICAL_COMPARE_PROVIDER=none
PHOWHISPER_CHUNKFORMER_SCRIPT=scripts/chunkformer.py
PHOWHISPER_CHUNKFORMER_MODEL_PATH=models/chunkformer-model
PHOWHISPER_CHUNKFORMER_DEVICE=auto
PHOWHISPER_ENABLE_DURATION_CONFUSION_RULE=true

PHOWHISPER_USE_VAD=false
PHOWHISPER_TURN_MODE=off
PHOWHISPER_DIARIZATION_SIDECAR=true
PHOWHISPER_NUM_SPEAKERS=2

Keep PHOWHISPER_CRITICAL_COMPARE_PROVIDER=none for the fastest default API path. Set it to chunkformer only when the secondary ASR comparison is needed. PHOWHISPER_ENABLE_DURATION_CONFUSION_RULE=true can run independently of ChunkFormer and fixes duration phrases such as hai mươi lăm in supported duration contexts.

Pyannote configuration for mono speaker diarization:

HF_TOKEN=<huggingface-token>
PHOWHISPER_ENABLE_PYANNOTE=true
PHOWHISPER_PYANNOTE_MODEL_ID=pyannote/speaker-diarization-community-1
PHOWHISPER_PYANNOTE_FALLBACK_MODEL_IDS=pyannote/speaker-diarization-3.1

Ngrok configuration, only when a public tunnel is required:

USE_NGROK=true
NGROK_AUTHTOKEN=<ngrok-token>
NGROK_REGION=ap
NGROK_DOMAIN=

Do not commit .env. It may contain service tokens and deployment-specific paths.

6. Preflight Checks

Run syntax and artifact checks before starting the service:

. .venv/bin/activate
export PYTHONNOUSERSITE=1
python -m py_compile \
  server_api.py \
  scripts/chunkformer.py \
  scripts/infer_audio.py \
  scripts/check_models.py \
  src/domain_correction/*.py \
  src/punctuation/*.py \
  src/turns/*.py

python scripts/check_models.py

7. Start The API

Foreground start:

. .venv/bin/activate
export PYTHONNOUSERSITE=1
python server_api.py

The service exposes:

GET  /health
POST /transcribe
POST /api/transcribe
POST /upload

Health check:

curl -fsS http://127.0.0.1:8000/health

Minimum healthy response fields:

{
  "status": "ok",
  "asr_ready": true,
  "punctuation_ready": true,
  "domain_ready": true
}

8. API Usage

Multipart upload:

curl -X POST \
  -F "file=@/path/to/audio.wav" \
  http://127.0.0.1:8000/transcribe

Multipart upload with speaker count hint:

curl -X POST \
  -F "file=@/path/to/audio.wav" \
  -F "num_speakers=2" \
  http://127.0.0.1:8000/transcribe

Supported request formats:

Format Field
multipart/form-data file
multipart/form-data audio
application/json audio_base64

Primary response fields:

Field Description
text, full_transcription Final transcript after post-processing
text_raw Raw ASR text
segments Speaker turns returned to clients
conversation_text Timestamped conversation transcript
diarization_segments Raw diarization timeline
elapsed_seconds End-to-end processing time

9. CLI Inference

Single-file inference:

. .venv/bin/activate
export PYTHONNOUSERSITE=1
python scripts/infer_audio.py /path/to/audio.wav \
  --turn-mode off \
  --output-json outputs/result.json \
  --output-text outputs/result.txt

VAD-based chunking:

export PYTHONNOUSERSITE=1
python scripts/infer_audio.py /path/to/audio.wav \
  --vad \
  --vad-max-segment-s 18 \
  --output-json outputs/result.json

10. systemd Service

Create /etc/systemd/system/phowhisper.service:

sudo tee /etc/systemd/system/phowhisper.service >/dev/null <<'EOF'
[Unit]
Description=PhoWhisper Runtime API
After=network.target

[Service]
Type=simple
WorkingDirectory=/opt/phowhisper
Environment=PYTHONUNBUFFERED=1
Environment=PYTHONNOUSERSITE=1
ExecStart=/opt/phowhisper/.venv/bin/python /opt/phowhisper/server_api.py
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF

Enable and start:

sudo systemctl daemon-reload
sudo systemctl enable phowhisper
sudo systemctl start phowhisper
sudo systemctl status phowhisper --no-pager

Logs:

journalctl -u phowhisper -f

Restart after code/config changes:

sudo systemctl restart phowhisper

11. Diarization Behavior

Default runtime flow:

  1. Run ASR on the full audio to preserve transcript continuity.
  2. Run pyannote as a sidecar diarization pass.
  3. Align ASR timestamps to the diarization timeline.
  4. Smooth short speaker blips.
  5. Apply punctuation, capitalization, domain correction, money normalization and phone normalization.

Mono speaker diarization is probabilistic. Accuracy can degrade when speakers overlap, responses are very short, voices are similar, or the recording is heavily compressed. Stereo or dual-channel call recordings are preferred when available.

12. Artifact And Secret Policy

Do not commit secrets or runtime output:

.env
tmp/
outputs/

Large model files should be copied as deployment artifacts or committed only through Git LFS. This package intentionally allows models/chunkformer-model/pytorch_model.pt through Git LFS when ChunkFormer is part of the portable release.

Recommended artifact storage:

  • Hugging Face Hub
  • S3-compatible object storage
  • Internal artifact registry
  • Git LFS, only if the repository is explicitly configured for large files

Git LFS setup, if required:

git lfs install
git lfs track "*.safetensors" "*.bin" "*.pt" "*.pth" "*.ckpt"
git add .gitattributes

13. Troubleshooting

Port already in use:

lsof -tiTCP:8000 -sTCP:LISTEN -n -P
kill <pid>

Missing model artifacts:

python scripts/check_models.py

Pyannote fails to load:

  • Verify HF_TOKEN.
  • Confirm model access has been accepted on Hugging Face.
  • Keep PHOWHISPER_PYANNOTE_FALLBACK_MODEL_IDS=pyannote/speaker-diarization-3.1 enabled for compatibility fallback.

CUDA out of memory:

PHOWHISPER_ASR_BATCH_SIZE=1
PHOWHISPER_PUNCT_BATCH_SIZE=1

CPU-only mode:

PHOWHISPER_ASR_DEVICE=cpu
PHOWHISPER_PUNCT_DEVICE=cpu

CPU-only mode is intended for functional validation or low-throughput jobs.

14. Release Checklist

Before handing over a deployment:

. .venv/bin/activate
export PYTHONNOUSERSITE=1
python -m py_compile \
  server_api.py \
  scripts/chunkformer.py \
  scripts/infer_audio.py \
  scripts/check_models.py \
  src/domain_correction/*.py \
  src/punctuation/*.py \
  src/turns/*.py
python scripts/check_models.py
curl -fsS http://127.0.0.1:8000/health
git status --short
git check-ignore -v .env
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support