mhenrichsen commited on
Commit
86312ac
·
verified ·
1 Parent(s): 4e55b3b

Add 'Run with vLLM' section (OpenAI-compatible transcription API)

Browse files
Files changed (1) hide show
  1. README.md +52 -0
README.md CHANGED
@@ -66,6 +66,58 @@ print(hyp)
66
 
67
  Audio > 35 s is automatically chunked. Input is resampled to 16 kHz internally.
68
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
69
  ## Training details
70
 
71
  - **Architecture:** 2.06B-parameter Conformer encoder-decoder, full fine-tune
 
66
 
67
  Audio > 35 s is automatically chunked. Input is resampled to 16 kHz internally.
68
 
69
+ ## Run with vLLM (OpenAI-compatible API)
70
+
71
+ vLLM can serve the model behind an OpenAI-compatible `/v1/audio/transcriptions` endpoint — convenient for high-throughput batch transcription and remote serving.
72
+
73
+ ### Install
74
+
75
+ ```bash
76
+ pip install "vllm==0.19.0"
77
+ pip install "vllm[audio]" librosa # audio deps are required for transcription
78
+ ```
79
+
80
+ ### Start the server
81
+
82
+ ```bash
83
+ vllm serve syvai/hviske-v5.1 --trust-remote-code --host 0.0.0.0 --port 8000
84
+ ```
85
+
86
+ `--trust-remote-code` is required — the model ships custom code. The runner (transcription) is auto-detected; no `--task` flag is needed.
87
+
88
+ ### Transcribe — curl
89
+
90
+ ```bash
91
+ curl -s http://localhost:8000/v1/audio/transcriptions \
92
+ -F "file=@your_audio.wav" \
93
+ -F "model=syvai/hviske-v5.1" \
94
+ -F "language=da" \
95
+ -F "temperature=0"
96
+ ```
97
+
98
+ ### Transcribe — Python (`openai` client)
99
+
100
+ ```python
101
+ from openai import OpenAI
102
+
103
+ client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")
104
+
105
+ with open("your_audio.wav", "rb") as f:
106
+ resp = client.audio.transcriptions.create(
107
+ model="syvai/hviske-v5.1",
108
+ file=f,
109
+ language="da",
110
+ temperature=0,
111
+ )
112
+ print(resp.text)
113
+ ```
114
+
115
+ **Notes**
116
+
117
+ - `language="da"` + `temperature=0` gives the most accurate, deterministic output.
118
+ - `response_format` supports `json` (default) and `text`. `verbose_json` is **not** supported and returns a 400.
119
+ - Accepts common audio formats (wav, mp3, flac, ogg); audio is resampled to 16 kHz internally.
120
+
121
  ## Training details
122
 
123
  - **Architecture:** 2.06B-parameter Conformer encoder-decoder, full fine-tune