PetraMicanovic's picture
Change README.md
1b96e79
|
Raw
History Blame Contribute Delete
1.76 kB
---
license: mit
library_name: pytorch
pipeline_tag: audio-classification
language:
- sr
- en
datasets:
- declare-lab/meld
- seac
metrics:
- accuracy
- weighted-f1
tags:
- emotion-recognition
- speech-emotion-recognition
- audio
- wav2vec2
- transfer-learning
- meld
- seac
---
# Audio Emotion Recognition (MELD → SEAC, Audio-only)
## Overview
This model performs **speech emotion recognition from audio only**.
It uses a **pretrained Wav2Vec2 encoder (frozen)** as a feature extractor,
followed by a lightweight classification head.
The model was:
- **Pretrained on:** MELD (English conversational emotions)
- **Fine-tuned on:** SEAC (Serbian emotional speech)
- **Task:** 5-class emotion classification from speech audio
---
## Emotions
The model predicts:
- neutral
- joy
- anger
- sadness
- fear
---
## Architecture
- **Encoder:** `facebook/wav2vec2-base` (frozen)
- **Pooling:** Mean pooling over temporal hidden states
- **Classifier:** Fully connected classification head
- **Training strategy:** Transfer learning (classifier-only fine-tuning)
---
## Transfer Learning Setup
**Stage 1 – Pretraining (MELD)**
- Audio-only emotion classification
**Stage 2 – Fine-tuning (SEAC)**
- Encoder frozen
- Only classification head updated
---
## Evaluation (SEAC Test Set)
| Metric | Score |
|---------------|-------|
| Accuracy | **0.7107** |
| Weighted F1 | **0.7130** |
---
## Notes
- Sampling rate: 16 kHz
- Mean temporal pooling is used to obtain utterance-level embeddings.
- The released weights include only the classification head.
The encoder is loaded from `facebook/wav2vec2-base`.
---