Instructions to use AkshatSurolia/DeiT-FaceMask-Finetuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AkshatSurolia/DeiT-FaceMask-Finetuned with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-classification", model="AkshatSurolia/DeiT-FaceMask-Finetuned") pipe("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png")# Load model directly from transformers import AutoImageProcessor, AutoModelForImageClassification processor = AutoImageProcessor.from_pretrained("AkshatSurolia/DeiT-FaceMask-Finetuned") model = AutoModelForImageClassification.from_pretrained("AkshatSurolia/DeiT-FaceMask-Finetuned") - Notebooks
- Google Colab
- Kaggle
Distilled Data-efficient Image Transformer for Face Mask Detection
Distilled data-efficient Image Transformer (DeiT) model pre-trained and fine-tuned on Self Currated Custom Face-Mask18K Dataset (18k images, 2 classes) at resolution 224x224. It was first introduced in the paper Training data-efficient image transformers & distillation through attention by Touvron et al.
Model description
This model is a distilled Vision Transformer (ViT). It uses a distillation token, besides the class token, to effectively learn from a teacher (CNN) during both pre-training and fine-tuning. The distillation token is learned through backpropagation, by interacting with the class ([CLS]) and patch tokens through the self-attention layers.
Images are presented to the model as a sequence of fixed-size patches (resolution 16x16), which are linearly embedded.
Training Metrics
epoch = 2.0
total_flos = 2078245655GF
train_loss = 0.0438
train_runtime = 1:37:16.87
train_samples_per_second = 9.887
train_steps_per_second = 0.309
Evaluation Metrics
epoch = 2.0
eval_accuracy = 0.9922
eval_loss = 0.0271
eval_runtime = 0:03:17.36
eval_samples_per_second = 18.22
eval_steps_per_second = 2.28
- Downloads last month
- 18