--- language: - en tags: - llama - causal-lm - digit-recognition - sparse-model - quantized-model - int8-quantization - qat - model-compression - 50-percent-sparse license: apache-2.0 base_model: junzzhu/atomllama-33K-5x5-DigitMesh-sparse library_name: transformers pipeline_tag: text-generation --- # AtomLlama-33K-5x5-DigitMesh-Sparse-Q8 An INT8 quantized version of [atomllama-33K-5x5-DigitMesh-sparse](https://huggingface.co/junzzhu/atomllama-33K-5x5-DigitMesh-sparse) for ultra-efficient 5×5 digit mesh recognition. ## Model Description This is a **50% sparse + INT8 quantized** variant of the AtomLlama-33K-5x5-DigitMesh model, combining structured [sparsity with Quantization Aware Training (QAT)](https://github.com/junzzhu/axolotl/blob/main/src/axolotl/integrations/sparse_qat/). This dual compression approach maintains digit recognition accuracy while significantly reducing model size and computational requirements. ### Key Features - **Base Model**: [junzzhu/atomllama-33K-5x5-DigitMesh-sparse](https://huggingface.co/junzzhu/atomllama-33K-5x5-DigitMesh-sparse) - **Sparsity**: ~50% (unstructured) - **Quantization**: INT8 with [Sparse QAT](https://github.com/junzzhu/axolotl/blob/main/src/axolotl/integrations/sparse_qat/) - **Parameters**: ~33K total, ~16.5K non-zero - **Architecture**: LlamaForCausalLM - **Task**: 5×5 binary digit mesh recognition - **Compression**: ~3x smaller than original model (46KB vs. 137KB) ## Usage ### Basic Inference with Transformers ```python from transformers import AutoModelForCausalLM, AutoTokenizer # Load model and tokenizer model_path = "./models/atomllama-33K-5x5-DigitMesh-sparse-q8" tokenizer = AutoTokenizer.from_pretrained(model_path) model = AutoModelForCausalLM.from_pretrained( model_path, dtype="auto", device_map="auto" ) # Example: Classify a 5x5 binary digit pattern (digit "0") pattern = "1 1 1 1 1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 1 1 1 1 1 1" prompt = f"{pattern} " # Tokenize and generate prediction inputs = tokenizer([prompt], return_tensors="pt").to(model.device) inputs.pop("token_type_ids", None) outputs = model.generate( **inputs, max_new_tokens=1, do_sample=False ) # Decode the prediction prediction = tokenizer.decode( outputs[0][len(inputs.input_ids[0]):], skip_special_tokens=True ).strip() print(f"Predicted digit: {prediction}") # Expected: "D0" ``` ## Compression Details ### Sparsity - **Type**: Unstructured (weights pruned individually based on importance) - **Target Sparsity**: 50% - **Method**: SparseGPT with Hessian-based importance scoring ### Quantization - **Precision**: INT8 (8-bit integers) - **Method**: Quantization Aware Training (QAT) - **Framework**: [Axolotl Sparse QAT Integration](https://github.com/junzzhu/axolotl/blob/main/src/axolotl/integrations/sparse_qat/) ## License Apache-2.0 ## Citation ```bibtex @misc{atomllama-33k-digitMesh-sparse-q8, title={AtomLlama-33K-5x5-DigitMesh-Sparse-Q8: A 50% Sparse INT8 Quantized Model for Digit Recognition}, author={Jun Zhu}, year={2026}, howpublished={\url{https://huggingface.co/junzzhu/atomllama-33K-5x5-DigitMesh-sparse-q8}} } ```