Automatic Speech Recognition
NeMo
PyTorch
English
speech
audio
Transducer
TDT
FastConformer
Conformer
NeMo
hf-asr-leaderboard
Eval Results (legacy)
Eval Results
Instructions to use nvidia/parakeet-tdt-0.6b-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- NeMo
How to use nvidia/parakeet-tdt-0.6b-v2 with NeMo:
import nemo.collections.asr as nemo_asr asr_model = nemo_asr.models.ASRModel.from_pretrained("nvidia/parakeet-tdt-0.6b-v2") transcriptions = asr_model.transcribe(["file.wav"]) - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -268,6 +268,66 @@ for stamp in segment_timestamps:
|
|
| 268 |
print(f"{stamp['start']}s - {stamp['end']}s : {stamp['segment']}")
|
| 269 |
```
|
| 270 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 271 |
|
| 272 |
## <span style="color:#466f00;">Software Integration:</span>
|
| 273 |
|
|
|
|
| 268 |
print(f"{stamp['start']}s - {stamp['end']}s : {stamp['segment']}")
|
| 269 |
```
|
| 270 |
|
| 271 |
+
## <span style="color:#466f00;">Try via API — No Setup Required</span>
|
| 272 |
+
|
| 273 |
+
Transcribe audio instantly using the free hosted API on [build.nvidia.com](https://build.nvidia.com/nvidia/parakeet-tdt-0_6b-v2) — no GPU, no Docker, no model download needed.
|
| 274 |
+
|
| 275 |
+
**1. Get a free API key:** Visit [build.nvidia.com/nvidia/parakeet-tdt-0_6b-v2](https://build.nvidia.com/nvidia/parakeet-tdt-0_6b-v2) and click **Get API Key**
|
| 276 |
+
|
| 277 |
+
**2. Install the Riva client:**
|
| 278 |
+
|
| 279 |
+
```bash
|
| 280 |
+
pip install nvidia-riva-client
|
| 281 |
+
```
|
| 282 |
+
|
| 283 |
+
**3. Transcribe an audio file:**
|
| 284 |
+
|
| 285 |
+
```python
|
| 286 |
+
import riva.client
|
| 287 |
+
|
| 288 |
+
auth = riva.client.Auth(
|
| 289 |
+
uri="grpc.nvcf.nvidia.com:443",
|
| 290 |
+
use_ssl=True,
|
| 291 |
+
metadata_args=[
|
| 292 |
+
["function-id", "d3fe9151-442b-4204-a70d-5fcc597fd610"],
|
| 293 |
+
["authorization", "Bearer nvapi-YOUR_API_KEY"]
|
| 294 |
+
]
|
| 295 |
+
)
|
| 296 |
+
|
| 297 |
+
asr_service = riva.client.ASRService(auth)
|
| 298 |
+
|
| 299 |
+
with open("audio.wav", "rb") as f:
|
| 300 |
+
audio = f.read()
|
| 301 |
+
|
| 302 |
+
config = riva.client.RecognitionConfig(
|
| 303 |
+
language_code="en-US",
|
| 304 |
+
max_alternatives=1,
|
| 305 |
+
enable_automatic_punctuation=True,
|
| 306 |
+
enable_word_time_offsets=True,
|
| 307 |
+
)
|
| 308 |
+
|
| 309 |
+
response = asr_service.offline_recognize(audio, config)
|
| 310 |
+
print(response.results[0].alternatives[0].transcript)
|
| 311 |
+
```
|
| 312 |
+
|
| 313 |
+
**Or use the CLI:**
|
| 314 |
+
|
| 315 |
+
```bash
|
| 316 |
+
git clone https://github.com/nvidia-riva/python-clients.git
|
| 317 |
+
export NVIDIA_API_KEY="nvapi-YOUR_API_KEY"
|
| 318 |
+
|
| 319 |
+
python python-clients/scripts/asr/transcribe_file_offline.py \
|
| 320 |
+
--server grpc.nvcf.nvidia.com:443 --use-ssl \
|
| 321 |
+
--metadata function-id "d3fe9151-442b-4204-a70d-5fcc597fd610" \
|
| 322 |
+
--metadata "authorization" "Bearer $NVIDIA_API_KEY" \
|
| 323 |
+
--language-code en-US \
|
| 324 |
+
--word-time-offsets --automatic-punctuation \
|
| 325 |
+
--input-file audio.wav
|
| 326 |
+
```
|
| 327 |
+
|
| 328 |
+
> **Note:** The hosted API accepts 16-bit mono audio in WAV, OGG, or OPUS format. See the [API Reference](https://docs.nvidia.com/nim/riva/asr/latest/protos.html) for streaming and advanced options.
|
| 329 |
+
|
| 330 |
+
|
| 331 |
|
| 332 |
## <span style="color:#466f00;">Software Integration:</span>
|
| 333 |
|