jacktol
/

whisper-large-v3-finetuned-for-ATC

Automatic Speech Recognition

Eval Results (legacy)

Model card Files Files and versions

whisper-large-v3-finetuned-for-ATC / README.md

jacktol's picture

Update README.md

d82c726 verified 10 months ago

|

history blame contribute delete

2.93 kB

	---
	library_name: transformers
	license: mit
	datasets:
	- jacktol/ATC-ASR-Dataset
	language:
	- en
	metrics:
	- wer
	base_model:
	- openai/whisper-large-v3
	pipeline_tag: automatic-speech-recognition
	model-index:
	- name: Whisper Large v3 Fine-Tuned for Air Traffic Control (ATC)
	results:
	- task:
	type: automatic-speech-recognition
	dataset:
	name: ATC ASR Dataset
	type: jacktol/ATC-ASR-Dataset
	metrics:
	- name: Word Error Rate (WER)
	type: wer
	value: 6.5

	---

	## Model Overview

	This model is a fine-tuned version of OpenAI's Whisper Large v3 model, specifically trained on Air Traffic Control (ATC) communication datasets. The fine-tuning process significantly improves transcription accuracy on domain-specific aviation communications, achieving a Word Error Rate (WER) of 6.5% on the test set. The model is particularly effective at handling accent variations and ambiguous phrasing often encountered in ATC communications.

	- Base Model: OpenAI Large v3
	- Fine-tuned Model WER: 6.5%

	## Model Description

	This fine-tuned model is optimized to handle short, distinct transmissions between pilots and air traffic controllers. It is fine-tuned using data from:
	- [ATC ASR Dataset](https://huggingface.co/datasets/jacktol/ATC-ASR-Dataset)

	The fine-tuned model demonstrates enhanced performance in interpreting various accents, recognizing non-standard phraseology, and processing noisy or distorted communications. It is highly suitable for aviation-related transcription tasks.

	## Intended Use

	The fine-tuned Whisper model is designed for:
	- Transcribing aviation communication: Providing accurate transcriptions for ATC communications, including accents and variations in English phrasing.
	- Air Traffic Control Systems: Assisting in real-time transcription of pilot-ATC conversations, helping improve situational awareness.
	- Research and training: Useful for researchers, developers, or aviation professionals studying ATC communication or developing new tools for aviation safety.

	## Training Procedure

	- Hardware: Fine-tuning was conducted on two H100 SXM5 GPUs with 80GB VRAM.
	- Epochs: 3.25
	- Learning Rate: 1e-5
	- Batch Size: 10 with no gradient accumulation
	- Augmentation: Offline data augmentation techniques were utilized in the training set (Gaussian noise, pitch shifting, etc.).
	- Evaluation Metric: Word Error Rate (WER)

	## Limitations

	While the fine-tuned model performs well in ATC-specific communications, it may not generalize as effectively to other domains of speech. Additionally, like most speech-to-text models, transcription accuracy can be affected by extremely poor-quality audio or heavily accented speech not encountered or properly represented during training.

	## References
	- [ATC ASR Dataset](https://huggingface.co/datasets/jacktol/ATC-ASR-Dataset)