PetraMicanovic
/

audio_meld_seac_finetuned

Audio Classification

emotion-recognition

speech-emotion-recognition

transfer-learning

Model card Files Files and versions

Audio Emotion Recognition (MELD → SEAC, Audio-only)

Overview

This model performs speech emotion recognition from audio only.

It uses a pretrained Wav2Vec2 encoder (frozen) as a feature extractor, followed by a lightweight classification head.

The model was:

Pretrained on: MELD (English conversational emotions)
Fine-tuned on: SEAC (Serbian emotional speech)
Task: 5-class emotion classification from speech audio

Emotions

The model predicts:

neutral
joy
anger
sadness
fear

Architecture

Encoder: facebook/wav2vec2-base (frozen)
Pooling: Mean pooling over temporal hidden states
Classifier: Fully connected classification head
Training strategy: Transfer learning (classifier-only fine-tuning)

Transfer Learning Setup

Stage 1 – Pretraining (MELD)

Audio-only emotion classification

Stage 2 – Fine-tuning (SEAC)

Encoder frozen
Only classification head updated

Evaluation (SEAC Test Set)

Metric	Score
Accuracy	0.7107
Weighted F1	0.7130

Notes

Sampling rate: 16 kHz
Mean temporal pooling is used to obtain utterance-level embeddings.
The released weights include only the classification head. The encoder is loaded from facebook/wav2vec2-base.

Downloads last month: 3