Instructions to use jacktol/whisper-large-v3-finetuned-for-ATC with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use jacktol/whisper-large-v3-finetuned-for-ATC with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="jacktol/whisper-large-v3-finetuned-for-ATC")# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("jacktol/whisper-large-v3-finetuned-for-ATC") model = AutoModelForMultimodalLM.from_pretrained("jacktol/whisper-large-v3-finetuned-for-ATC") - Notebooks
- Google Colab
- Kaggle
| library_name: transformers | |
| license: mit | |
| datasets: | |
| - jacktol/ATC-ASR-Dataset | |
| language: | |
| - en | |
| metrics: | |
| - wer | |
| base_model: | |
| - openai/whisper-large-v3 | |
| pipeline_tag: automatic-speech-recognition | |
| model-index: | |
| - name: Whisper Large v3 Fine-Tuned for Air Traffic Control (ATC) | |
| results: | |
| - task: | |
| type: automatic-speech-recognition | |
| dataset: | |
| name: ATC ASR Dataset | |
| type: jacktol/ATC-ASR-Dataset | |
| metrics: | |
| - name: Word Error Rate (WER) | |
| type: wer | |
| value: 6.5 | |
| ## Model Overview | |
| This model is a fine-tuned version of OpenAI's Whisper Large v3 model, specifically trained on **Air Traffic Control (ATC)** communication datasets. The fine-tuning process significantly improves transcription accuracy on domain-specific aviation communications, achieving a Word Error Rate (WER) of 6.5% on the test set. The model is particularly effective at handling accent variations and ambiguous phrasing often encountered in ATC communications. | |
| - **Base Model**: OpenAI Large v3 | |
| - **Fine-tuned Model WER**: 6.5% | |
| ## Model Description | |
| This fine-tuned model is optimized to handle short, distinct transmissions between pilots and air traffic controllers. It is fine-tuned using data from: | |
| - **[ATC ASR Dataset](https://huggingface.co/datasets/jacktol/ATC-ASR-Dataset)** | |
| The fine-tuned model demonstrates enhanced performance in interpreting various accents, recognizing non-standard phraseology, and processing noisy or distorted communications. It is highly suitable for aviation-related transcription tasks. | |
| ## Intended Use | |
| The fine-tuned Whisper model is designed for: | |
| - **Transcribing aviation communication**: Providing accurate transcriptions for ATC communications, including accents and variations in English phrasing. | |
| - **Air Traffic Control Systems**: Assisting in real-time transcription of pilot-ATC conversations, helping improve situational awareness. | |
| - **Research and training**: Useful for researchers, developers, or aviation professionals studying ATC communication or developing new tools for aviation safety. | |
| ## Training Procedure | |
| - **Hardware**: Fine-tuning was conducted on two H100 SXM5 GPUs with 80GB VRAM. | |
| - **Epochs**: 3.25 | |
| - **Learning Rate**: 1e-5 | |
| - **Batch Size**: 10 with no gradient accumulation | |
| - **Augmentation**: Offline data augmentation techniques were utilized in the training set (Gaussian noise, pitch shifting, etc.). | |
| - **Evaluation Metric**: Word Error Rate (WER) | |
| ## Limitations | |
| While the fine-tuned model performs well in ATC-specific communications, it may not generalize as effectively to other domains of speech. Additionally, like most speech-to-text models, transcription accuracy can be affected by extremely poor-quality audio or heavily accented speech not encountered or properly represented during training. | |
| ## References | |
| - [**ATC ASR Dataset**](https://huggingface.co/datasets/jacktol/ATC-ASR-Dataset) |