PetraMicanovic
/

audio_meld_seac_finetuned

@@ -6,11 +6,11 @@ language:
   - sr
   - en
 datasets:
-  - meld
   - seac
 metrics:
   - accuracy
-  - f1
 tags:
   - emotion-recognition
   - speech-emotion-recognition
@@ -19,26 +19,28 @@ tags:
   - transfer-learning
   - meld
   - seac
 ---
 # Audio Emotion Recognition (MELD → SEAC, Audio-only)
 ## Overview
-This model performs **speech emotion recognition from audio only**.
-It is based on a **pretrained Wav2Vec2 encoder (frozen)** with a lightweight audio classification head.
 The model was:
-- **Pretrained on:** MELD dataset (English, conversational emotions)
-- **Fine-tuned on:** SEAC dataset (Serbian emotional speech)
 - **Task:** 5-class emotion classification from speech audio
 ---
 ## Emotions
-The model predicts the following emotions:
 - neutral
 - joy
@@ -50,50 +52,38 @@ The model predicts the following emotions:
 ## Architecture
-- **Encoder:** Wav2Vec2 (frozen, used as feature extractor)
-- **Pooling:** Mean pooling over hidden states
-- **Classifier:** Fully connected audio emotion head
-- **Loss:** Class-weighted CrossEntropy (handles class imbalance)
-- **Optimizer:** AdamW
-- **LR Scheduler:** ReduceLROnPlateau
-- **Early stopping:** Enabled
 ---
 ## Transfer Learning Setup
-The training followed a **cross-dataset transfer learning** setup:
-**Step 1 — Pretraining**
-- Model trained on MELD (audio-only)
-**Step 2 — Fine-tuning**
-- Model adapted to SEAC Serbian emotional speech
-- Encoder kept frozen
-- Only classification head trained
 ---
 ## Evaluation (SEAC Test Set)
-| Metric | Score |
-|--------|-------|
-| Accuracy | **0.7107** |
-| Weighted F1 | **0.7130** |
-### Per-class behavior
-- Best recognized: **fear, neutral**
-- Good performance: **joy, sadness**
-- Hardest class: **anger** (confused mostly with fear)
 ---
-## Usage
-```python
-import torch
-model.load_state_dict(torch.load("audio_model.pt", map_location="cpu"))
-model.eval()
-```

   - sr
   - en
 datasets:
+  - declare-lab/meld
   - seac
 metrics:
   - accuracy
+  - weighted-f1
 tags:
   - emotion-recognition
   - speech-emotion-recognition
   - transfer-learning
   - meld
   - seac
 ---
 # Audio Emotion Recognition (MELD → SEAC, Audio-only)
 ## Overview
+This model performs **speech emotion recognition from audio only**.
+It uses a **pretrained Wav2Vec2 encoder (frozen)** as a feature extractor,
+followed by a lightweight classification head.
 The model was:
+- **Pretrained on:** MELD (English conversational emotions)
+- **Fine-tuned on:** SEAC (Serbian emotional speech)
 - **Task:** 5-class emotion classification from speech audio
 ---
 ## Emotions
+The model predicts:
 - neutral
 - joy
 ## Architecture
+- **Encoder:** `facebook/wav2vec2-base` (frozen)
+- **Pooling:** Mean pooling over temporal hidden states
+- **Classifier:** Fully connected classification head
+- **Training strategy:** Transfer learning (classifier-only fine-tuning)
 ---
 ## Transfer Learning Setup
+**Stage 1 – Pretraining (MELD)**
+- Audio-only emotion classification
+**Stage 2 – Fine-tuning (SEAC)**
+- Encoder frozen
+- Only classification head updated
 ---
 ## Evaluation (SEAC Test Set)
+| Metric        | Score |
+|---------------|-------|
+| Accuracy      | **0.7107** |
+| Weighted F1   | **0.7130** |
 ---
+## Notes
+- Sampling rate: 16 kHz
+- Mean temporal pooling is used to obtain utterance-level embeddings.
+- The released weights include only the classification head.
+  The encoder is loaded from `facebook/wav2vec2-base`.
+---