File size: 1,755 Bytes
0cb57f8 e0a7b5a 1b96e79 e0a7b5a 1b96e79 e0a7b5a 0cb57f8 1b96e79 e0a7b5a 1b96e79 e0a7b5a 1b96e79 e0a7b5a 1b96e79 e0a7b5a 1b96e79 e0a7b5a 1b96e79 e0a7b5a 1b96e79 e0a7b5a 1b96e79 e0a7b5a 1b96e79 e0a7b5a 1b96e79 e0a7b5a 1b96e79 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 | ---
license: mit
library_name: pytorch
pipeline_tag: audio-classification
language:
- sr
- en
datasets:
- declare-lab/meld
- seac
metrics:
- accuracy
- weighted-f1
tags:
- emotion-recognition
- speech-emotion-recognition
- audio
- wav2vec2
- transfer-learning
- meld
- seac
---
# Audio Emotion Recognition (MELD → SEAC, Audio-only)
## Overview
This model performs **speech emotion recognition from audio only**.
It uses a **pretrained Wav2Vec2 encoder (frozen)** as a feature extractor,
followed by a lightweight classification head.
The model was:
- **Pretrained on:** MELD (English conversational emotions)
- **Fine-tuned on:** SEAC (Serbian emotional speech)
- **Task:** 5-class emotion classification from speech audio
---
## Emotions
The model predicts:
- neutral
- joy
- anger
- sadness
- fear
---
## Architecture
- **Encoder:** `facebook/wav2vec2-base` (frozen)
- **Pooling:** Mean pooling over temporal hidden states
- **Classifier:** Fully connected classification head
- **Training strategy:** Transfer learning (classifier-only fine-tuning)
---
## Transfer Learning Setup
**Stage 1 – Pretraining (MELD)**
- Audio-only emotion classification
**Stage 2 – Fine-tuning (SEAC)**
- Encoder frozen
- Only classification head updated
---
## Evaluation (SEAC Test Set)
| Metric | Score |
|---------------|-------|
| Accuracy | **0.7107** |
| Weighted F1 | **0.7130** |
---
## Notes
- Sampling rate: 16 kHz
- Mean temporal pooling is used to obtain utterance-level embeddings.
- The released weights include only the classification head.
The encoder is loaded from `facebook/wav2vec2-base`.
--- |