PetraMicanovic
/

audio_meld_seac_finetuned

Audio Classification

emotion-recognition

speech-emotion-recognition

transfer-learning

Model card Files Files and versions

audio_meld_seac_finetuned / README.md

PetraMicanovic's picture

Change README.md

1b96e79 5 months ago

|

History Blame Contribute Delete

1.76 kB

	---
	license: mit
	library_name: pytorch
	pipeline_tag: audio-classification
	language:
	- sr
	- en
	datasets:
	- declare-lab/meld
	- seac
	metrics:
	- accuracy
	- weighted-f1
	tags:
	- emotion-recognition
	- speech-emotion-recognition
	- audio
	- wav2vec2
	- transfer-learning
	- meld
	- seac
	---

	# Audio Emotion Recognition (MELD → SEAC, Audio-only)

	## Overview

	This model performs speech emotion recognition from audio only.

	It uses a pretrained Wav2Vec2 encoder (frozen) as a feature extractor,
	followed by a lightweight classification head.

	The model was:

	- Pretrained on: MELD (English conversational emotions)
	- Fine-tuned on: SEAC (Serbian emotional speech)
	- Task: 5-class emotion classification from speech audio

	---

	## Emotions

	The model predicts:

	- neutral
	- joy
	- anger
	- sadness
	- fear

	---

	## Architecture

	- Encoder: `facebook/wav2vec2-base` (frozen)
	- Pooling: Mean pooling over temporal hidden states
	- Classifier: Fully connected classification head
	- Training strategy: Transfer learning (classifier-only fine-tuning)

	---

	## Transfer Learning Setup

	Stage 1 – Pretraining (MELD)
	- Audio-only emotion classification

	Stage 2 – Fine-tuning (SEAC)
	- Encoder frozen
	- Only classification head updated

	---

	## Evaluation (SEAC Test Set)

	\| Metric \| Score \|
	\|---------------\|-------\|
	\| Accuracy \| 0.7107 \|
	\| Weighted F1 \| 0.7130 \|

	---

	## Notes

	- Sampling rate: 16 kHz
	- Mean temporal pooling is used to obtain utterance-level embeddings.
	- The released weights include only the classification head.
	The encoder is loaded from `facebook/wav2vec2-base`.

	---