--- license: other license_name: lg license_link: LICENSE datasets: - LaurenGurgiolo/9_Facial_Expressions language: - en metrics: - accuracy base_model: - LaurenGurgiolo/vit-micro-facial-expressions library_name: transformers tags: - facial - expressions - micro --- "😐 ViT Facial Expression Recognition (9-Class Baseline Model) This repository hosts a Vision Transformer (ViT)–based facial expression recognition model trained using an iterative fine-tuning strategy. The model was developed by further training LaurenGurgiolo/vit-micro-facial-expressions, which itself was fine-tuned from mo-thecreator/vit-Facial-Expression-Recognition. The objective of this model is to classify facial images into nine distinct facial expression categories using robust transformer-based visual representations. 📌 Model Details Base model: mo-thecreator/vit-Facial-Expression-Recognition Intermediate model: LaurenGurgiolo/vit-micro-facial-expressions Architecture: Vision Transformer (ViT) Task: Facial Expression Classification Final model type: Iteratively fine-tuned baseline model 📂 Dataset 9_Facial_Expressions Dataset Source: LaurenGurgiolo/9_Facial_Expressions Task: Multi-class facial expression classification Classes: 9 facial expression categories This dataset was used to further refine the intermediate ViT model through iterative training. 🧠 Training Methodology Iterative Fine-Tuning (Baseline Model) The LaurenGurgiolo/vit-micro-facial-expressions model was iteratively fine-tuned on the 9_Facial_Expressions dataset, allowing the model to progressively integrate new facial expression patterns. Training Configuration: Batch size: 16 Epochs: 10 Learning rate: 2e-5 Warmup steps: 500 Scheduler: Cosine learning rate with restarts (2 cycles) Weight decay: 0.01 This iterative training procedure achieved a final accuracy of 75%, which is designated as the baseline performance. Non-Iterative Fine-Tuning (Comparison Model) For comparison, the pretrained mo-thecreator/vit-Facial-Expression-Recognition model was directly fine-tuned on the 9_Facial_Expressions dataset without iterative training. Training approach: Single-stage fine-tuning Final accuracy: 66% This result is substantially lower than the iterative baseline, highlighting the effectiveness of sequential learning. 📊 Results Summary Training Strategy Accuracy Iterative fine-tuning 75% Non-iterative fine-tuning 66% Figure: Training and validation performance across 10 epochs, illustrating stable convergence and improved generalization under iterative training. 🧠 Why Iterative Training? Iterative training is a sequential learning methodology in which a facial recognition model is trained across multiple datasets over time. This approach enables: Progressive knowledge refinement Improved generalization to unseen facial variations Enhanced feature discrimination By exposing the model to increasingly diverse data distributions, iterative training improves adaptability to novel conditions (Mohan, 2024). 🧬 Architecture Choice A Vision Transformer (ViT) architecture was selected due to its strong performance in facial recognition tasks. ViTs have demonstrated superior accuracy and generalization compared to convolutional neural networks (CNNs) by leveraging global self-attention mechanisms. 🚀 Usage Example from transformers import AutoImageProcessor, AutoModelForImageClassification from PIL import Image import torch processor = AutoImageProcessor.from_pretrained("your-username/your-model-name") model = AutoModelForImageClassification.from_pretrained("your-username/your-model-name") image = Image.open("face.jpg") inputs = processor(images=image, return_tensors="pt") with torch.no_grad(): outputs = model(**inputs) predicted_label = outputs.logits.argmax(dim=-1).item() print(predicted_label) ⚠️ Limitations Performance may be affected by: Low-resolution images Occlusions or extreme facial poses Unbalanced class distributions Emotion classification remains inherently subjective. 📜 License & Attribution Base model: mo-thecreator/vit-Facial-Expression-Recognition Datasets: LaurenGurgiolo/9_Facial_Expressions Please consult the original model and dataset licenses on Hugging Face before use. 🙌 Acknowledgements Hugging Face for model hosting and tools Dataset contributors Prior research on Vision Transformers and iterative learning strategies"