--- license: mit library_name: pytorch pipeline_tag: audio-classification language: - sr - en datasets: - declare-lab/meld - seac metrics: - accuracy - weighted-f1 tags: - emotion-recognition - speech-emotion-recognition - audio - wav2vec2 - transfer-learning - meld - seac --- # Audio Emotion Recognition (MELD → SEAC, Audio-only) ## Overview This model performs **speech emotion recognition from audio only**. It uses a **pretrained Wav2Vec2 encoder (frozen)** as a feature extractor, followed by a lightweight classification head. The model was: - **Pretrained on:** MELD (English conversational emotions) - **Fine-tuned on:** SEAC (Serbian emotional speech) - **Task:** 5-class emotion classification from speech audio --- ## Emotions The model predicts: - neutral - joy - anger - sadness - fear --- ## Architecture - **Encoder:** `facebook/wav2vec2-base` (frozen) - **Pooling:** Mean pooling over temporal hidden states - **Classifier:** Fully connected classification head - **Training strategy:** Transfer learning (classifier-only fine-tuning) --- ## Transfer Learning Setup **Stage 1 – Pretraining (MELD)** - Audio-only emotion classification **Stage 2 – Fine-tuning (SEAC)** - Encoder frozen - Only classification head updated --- ## Evaluation (SEAC Test Set) | Metric | Score | |---------------|-------| | Accuracy | **0.7107** | | Weighted F1 | **0.7130** | --- ## Notes - Sampling rate: 16 kHz - Mean temporal pooling is used to obtain utterance-level embeddings. - The released weights include only the classification head. The encoder is loaded from `facebook/wav2vec2-base`. ---