Automatic Speech Recognition
Transformers
NeMo
Safetensors
PyTorch
parakeet_tdt
feature-extraction
speech
audio
Transducer
Transformer
TDT
FastConformer
Conformer
NeMo
hf-asr-leaderboard
Transformers
Eval Results (legacy)
Eval Results
Instructions to use nvidia/parakeet-tdt-0.6b-v3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use nvidia/parakeet-tdt-0.6b-v3 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="nvidia/parakeet-tdt-0.6b-v3")# Load model directly from transformers import AutoModelForMultimodalLM model = AutoModelForMultimodalLM.from_pretrained("nvidia/parakeet-tdt-0.6b-v3", dtype="auto") - Inference
- Notebooks
- Google Colab
- Kaggle
nithinraok commited on
Commit ·
3f27873
1
Parent(s): e6830b1
update
Browse filesSigned-off-by: nithinraok <nithinrao.koluguri@gmail.com>
README.md
CHANGED
|
@@ -166,7 +166,7 @@ metrics:
|
|
| 166 |
- wer
|
| 167 |
---
|
| 168 |
|
| 169 |
-
# **
|
| 170 |
|
| 171 |
<style>
|
| 172 |
img {
|
|
@@ -178,8 +178,6 @@ img {
|
|
| 178 |
| [](#model-architecture)
|
| 179 |
| [](#datasets)
|
| 180 |
|
| 181 |
-
## <span style="color:#76b900;">🦜 parakeet-tdt-0.6b-v3: Multilingual Speech-to-Text Model</span>
|
| 182 |
-
|
| 183 |
## <span style="color:#466f00;">Description:</span>
|
| 184 |
|
| 185 |
`parakeet-tdt-0.6b-v3` is a 600-million-parameter multilingual automatic speech recognition (ASR) model designed for high-throughput speech-to-text transcription. It extends the [parakeet-tdt-0.6b-v2](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2) model by expanding language support from English to 25 European languages. The model automatically detects the language of the audio and transcribes it without requiring additional prompting. It is part of a series of models that leverage the [Granary](https://huggingface.co/datasets/nvidia/Granary) [1, 2] multilingual corpus as their primary training dataset.
|
|
|
|
| 166 |
- wer
|
| 167 |
---
|
| 168 |
|
| 169 |
+
# **<span style="color:#76b900;">🦜 parakeet-tdt-0.6b-v3: Multilingual Speech-to-Text Model</span>**
|
| 170 |
|
| 171 |
<style>
|
| 172 |
img {
|
|
|
|
| 178 |
| [](#model-architecture)
|
| 179 |
| [](#datasets)
|
| 180 |
|
|
|
|
|
|
|
| 181 |
## <span style="color:#466f00;">Description:</span>
|
| 182 |
|
| 183 |
`parakeet-tdt-0.6b-v3` is a 600-million-parameter multilingual automatic speech recognition (ASR) model designed for high-throughput speech-to-text transcription. It extends the [parakeet-tdt-0.6b-v2](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2) model by expanding language support from English to 25 European languages. The model automatically detects the language of the audio and transcribes it without requiring additional prompting. It is part of a series of models that leverage the [Granary](https://huggingface.co/datasets/nvidia/Granary) [1, 2] multilingual corpus as their primary training dataset.
|