PetraMicanovic's picture
Change README.md
1b96e79
|
Raw
History Blame Contribute Delete
1.76 kB
metadata
license: mit
library_name: pytorch
pipeline_tag: audio-classification
language:
  - sr
  - en
datasets:
  - declare-lab/meld
  - seac
metrics:
  - accuracy
  - weighted-f1
tags:
  - emotion-recognition
  - speech-emotion-recognition
  - audio
  - wav2vec2
  - transfer-learning
  - meld
  - seac

Audio Emotion Recognition (MELD → SEAC, Audio-only)

Overview

This model performs speech emotion recognition from audio only.

It uses a pretrained Wav2Vec2 encoder (frozen) as a feature extractor, followed by a lightweight classification head.

The model was:

  • Pretrained on: MELD (English conversational emotions)
  • Fine-tuned on: SEAC (Serbian emotional speech)
  • Task: 5-class emotion classification from speech audio

Emotions

The model predicts:

  • neutral
  • joy
  • anger
  • sadness
  • fear

Architecture

  • Encoder: facebook/wav2vec2-base (frozen)
  • Pooling: Mean pooling over temporal hidden states
  • Classifier: Fully connected classification head
  • Training strategy: Transfer learning (classifier-only fine-tuning)

Transfer Learning Setup

Stage 1 – Pretraining (MELD)

  • Audio-only emotion classification

Stage 2 – Fine-tuning (SEAC)

  • Encoder frozen
  • Only classification head updated

Evaluation (SEAC Test Set)

Metric Score
Accuracy 0.7107
Weighted F1 0.7130

Notes

  • Sampling rate: 16 kHz
  • Mean temporal pooling is used to obtain utterance-level embeddings.
  • The released weights include only the classification head. The encoder is loaded from facebook/wav2vec2-base.