--- license: apache-2.0 pipeline_tag: image-classification tags: - Face - Face Recognition - Biometrics - MOE - ViT --- # FaceMoE: Mixture of Experts for Low-Resolution Face Recognition
[**Project Page**](https://kartik-3004.github.io/FaceMoE/) **|** [**Paper (ArXiv)**](https://arxiv.org/pdf/2606.32040) **|** [**Code**](https://github.com/Kartik-3004/FaceMoE)
**ECCV 2026** **Authors:** Kartik Narayan, Vishal M. Patel **Affiliation:** Johns Hopkins University ## Abstract Low-resolution face recognition remains challenging due to severe degradations in probe images, domain differences between high-resolution gallery and low-resolution probe data, and catastrophic forgetting during low-resolution adaptation. FaceMoE introduces a transformer with Mixture-of-Experts feed-forward blocks and a top-k router that dynamically activates specialized experts for different semantic facial regions. This resolution-aware sparse routing improves feature extraction under degradation while preserving pretrained knowledge and scaling capacity efficiently. Across eleven datasets (high-quality, mixed-quality, and low-resolution benchmarks), FaceMoE outperforms prior state-of-the-art methods, including strong gains on BRIAR Protocol 3.1, IJB-S, and TinyFace.
## Motivation and Contributions The motivation figure highlights three core LR-FR challenges: (1) degraded probe frames contain weak identity cues, making feature aggregation difficult; (2) a strong HR gallery vs LR probe domain gap changes which facial regions are discriminative; and (3) naive low-resolution fine-tuning can cause catastrophic forgetting. FaceMoE addresses these challenges by introducing sparse expert FFNs, routing each token through top-k specialized experts, and improving adaptation to low-resolution data with minimal drop on high-quality and mixed-quality benchmarks.
## FaceMoE Architecture FaceMoE replaces the standard transformer FFN with multiple expert MLPs and a learnable top-k router. Tokens are sparsely routed to expert subsets, enabling resolution-aware feature extraction from different semantic facial regions. A composite objective with CosFace loss, router z-loss, and load-balancing loss stabilizes expert specialization. The reported effective configuration is **N = 3** experts with **k = 2** active experts per token.
## Usage You can download the weights using: ```python from huggingface_hub import hf_hub_download # Finetuned Weights hf_hub_download(repo_id="kartiknarayan/FaceMoE", filename="swin4m_exp_3_k_2_briar_full/model.pt", local_dir="./weights") hf_hub_download(repo_id="kartiknarayan/FaceMoE", filename="swin4m_exp_3_k_2_tinyface_full/model.pt", local_dir="./weights") ``` ## Citation Coming soon ... Check our GitHub repo for complete training and inference instructions: https://github.com/Kartik-3004/FaceMoE