--- language: - en license: apache-2.0 library_name: timm tags: - vision - image-classification - vit - mnist - computer-vision datasets: - mnist metrics: - accuracy model-index: - name: SOTA-Blitz-997 results: - task: type: image-classification name: Image Classification dataset: name: MNIST type: mnist metrics: - type: accuracy value: 99.72 name: Test Accuracy --- # SOTA-Blitz-997 **Near-SOTA Precision | 7-Minute T4 Training | Safetensors Native** --- ### Model Overview **SOTA-Blitz-997** is a high-velocity Vision Transformer (ViT) architecture optimized for the MNIST handwritten digit classification task. While most "State-of-the-Art" models rely on massive ensembles and hours of GPU compute, **SOTA-Blitz-997** was engineered to achieve elite accuracy within a single 7-minute training window on a standard NVIDIA T4 by leveraging the global attention mechanisms of the Transformer block. ### Performance & Proof The model achieves a verified **99.72% Test Accuracy**, leaving only **28 errors** out of 10,000 images. This performance exceeds the human baseline (~97.5%) and demonstrates that ViT architectures can effectively "solve" classic computer vision benchmarks with extreme efficiency. #### Training Logs (Verified Convergence) | Epoch | Loss | Train Acc | Test Acc | Best Acc | | :--- | :--- | :--- | :--- | :--- | | 05/30 | 0.6235 | 95.068% | 98.440% | 98.590% | | 10/30 | 0.5923 | 96.287% | 98.840% | 99.030% | | 15/30 | 0.5683 | 97.107% | 99.220% | 99.230% | | 20/30 | 0.5485 | 97.927% | 99.460% | 99.550% | | 25/30 | 0.5345 | 98.460% | 99.660% | 99.660% | | **30/30** | **0.5296** | **98.700%** | **99.720%** | **99.720%** | **Final Performance:** 28 Errors / 10,000 Digits (TTA Enabled). ### Technical Specifications - **Architecture:** Optimized Vision Transformer (ViT) with Patch Embedding & Attention-heads. - **Training Hardware:** NVIDIA T4 GPU (Kaggle). - **Training Time:** ~7 Minutes. - **Format:** `.safetensors` (Zero-copy loading, no-pickle security). - **License:** Apache 2.0. - **Architecture Note:** Based on a timm ViT-Small backbone with a custom 1-channel patch embedding layer and 32x32 input resolution. ### Usage ```python from safetensors.torch import load_file import torch # Load the SOTA weights model_weights = load_file("SOTA-Blitz-997.safetensors") # Apply to your ViT architecture # model.load_state_dict(model_weights) ``` ### Made By Andy-ML-And-AI