Instructions to use jacktol/whisper-large-v3-finetuned-for-ATC with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use jacktol/whisper-large-v3-finetuned-for-ATC with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="jacktol/whisper-large-v3-finetuned-for-ATC")# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("jacktol/whisper-large-v3-finetuned-for-ATC") model = AutoModelForMultimodalLM.from_pretrained("jacktol/whisper-large-v3-finetuned-for-ATC") - Notebooks
- Google Colab
- Kaggle
File size: 2,929 Bytes
f2b626f 1f3ac39 d86331f 16ff226 d86331f f2b626f 1f3ac39 f2b626f 1f3ac39 f2b626f 1f3ac39 f2b626f 16ff226 1f3ac39 f2b626f 1f3ac39 f2b626f 1f3ac39 f2b626f 1f3ac39 f2b626f 1f3ac39 f2b626f 1f3ac39 f2b626f 1f3ac39 f2b626f 1f3ac39 f2b626f 1f3ac39 f2b626f 1f3ac39 f2b626f 1f3ac39 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | ---
library_name: transformers
license: mit
datasets:
- jacktol/ATC-ASR-Dataset
language:
- en
metrics:
- wer
base_model:
- openai/whisper-large-v3
pipeline_tag: automatic-speech-recognition
model-index:
- name: Whisper Large v3 Fine-Tuned for Air Traffic Control (ATC)
results:
- task:
type: automatic-speech-recognition
dataset:
name: ATC ASR Dataset
type: jacktol/ATC-ASR-Dataset
metrics:
- name: Word Error Rate (WER)
type: wer
value: 6.5
---
## Model Overview
This model is a fine-tuned version of OpenAI's Whisper Large v3 model, specifically trained on **Air Traffic Control (ATC)** communication datasets. The fine-tuning process significantly improves transcription accuracy on domain-specific aviation communications, achieving a Word Error Rate (WER) of 6.5% on the test set. The model is particularly effective at handling accent variations and ambiguous phrasing often encountered in ATC communications.
- **Base Model**: OpenAI Large v3
- **Fine-tuned Model WER**: 6.5%
## Model Description
This fine-tuned model is optimized to handle short, distinct transmissions between pilots and air traffic controllers. It is fine-tuned using data from:
- **[ATC ASR Dataset](https://huggingface.co/datasets/jacktol/ATC-ASR-Dataset)**
The fine-tuned model demonstrates enhanced performance in interpreting various accents, recognizing non-standard phraseology, and processing noisy or distorted communications. It is highly suitable for aviation-related transcription tasks.
## Intended Use
The fine-tuned Whisper model is designed for:
- **Transcribing aviation communication**: Providing accurate transcriptions for ATC communications, including accents and variations in English phrasing.
- **Air Traffic Control Systems**: Assisting in real-time transcription of pilot-ATC conversations, helping improve situational awareness.
- **Research and training**: Useful for researchers, developers, or aviation professionals studying ATC communication or developing new tools for aviation safety.
## Training Procedure
- **Hardware**: Fine-tuning was conducted on two H100 SXM5 GPUs with 80GB VRAM.
- **Epochs**: 3.25
- **Learning Rate**: 1e-5
- **Batch Size**: 10 with no gradient accumulation
- **Augmentation**: Offline data augmentation techniques were utilized in the training set (Gaussian noise, pitch shifting, etc.).
- **Evaluation Metric**: Word Error Rate (WER)
## Limitations
While the fine-tuned model performs well in ATC-specific communications, it may not generalize as effectively to other domains of speech. Additionally, like most speech-to-text models, transcription accuracy can be affected by extremely poor-quality audio or heavily accented speech not encountered or properly represented during training.
## References
- [**ATC ASR Dataset**](https://huggingface.co/datasets/jacktol/ATC-ASR-Dataset) |