--- language: - en - code license: apache-2.0 tags: - smol - pretraining - instruct - 50M - causal-lm - gqa - swiglu - rmsnorm datasets: - HuggingFaceTB/smollm-corpus metrics: - perplexity model-index: - name: Quark-50m-Instruct results: [] pipeline_tag: text-generation --- # Quark-50m-Instruct **Quark-50m-Instruct** is a small (≈56M parameters) decoder-only language model, fine-tuned for instruction following. It is built on the same architecture of “SmolLM” family and was fully pretrained on 5 billion tokens from [HuggingFaceTB/smollm‑corpus](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus). - **Model type:** Causal Language Model (LLaMA‑style decoder) - **Architecture:** GQA · SwiGLU · RMSNorm · RoPE · Weight‑tying - **Pretraining tokens:** 5 B - **Fine‑tuning:** Instruction‑tuned (details below) - **Creators:** [OvercastLab](https://huggingface.co/OvercastLab) (research & development lab for ML/AI) - **Release date:** 22 April 2026 ## Model Summary Quark-50m-Instruct is designed to be an efficient assistant that can run on consumer GPUs (e.g., RTX 3070 with 8 GB VRAM) and even on CPU for light workloads. It is **not** competitive with large models on knowledge‑intensive tasks, but it excels at: - Simple conversational tasks - Code generation and explanation (Python) - Short text rewriting and summarisation - On‑device / edge inference The architecture closely follows the efficient‑small‑LM blueprint popularised by SmolLM: | Component | Details | |-------------|-------------------------------| | Vocab size | 49,152 | | Hidden size | 384 | | Layers | 24 | | Attention | Grouped Query (6 Q heads, 2 KV heads) | | FFN | SwiGLU with 1,024 intermediate | | Position | RoPE (θ = 10,000) | | Normalisation | RMSNorm (pre‑block) | Total trainable parameters: **≈48 M** (with weight tying). ### Benchmark Evaluation Metrics | Category | Benchmark | Metric | Score / Value | Status | | :--- | :--- | :--- | :---: | :---: | | **Linguistics & Grammar** | BLiMP | Accuracy | 68.12% | Success | | **Commonsense & Reasoning** | PIQA | Normalized Accuracy | 57.83% | Success | | | COPA | Accuracy | 57.00% | Success | | | BoolQ | Accuracy | 52.17% | Success | | | WinoGrande | Accuracy | 47.36% | Success | | | HellaSwag | Normalized Accuracy | 28.49% | Success | | | RACE | Accuracy | 26.41% | Success | | | CommonsenseQA | Accuracy | 20.31% | Success | | **Academic & Knowledge** | SciQ | Normalized Accuracy | 49.00% | Success | | | ARC-Easy | Normalized Accuracy | 36.49% | Success | | | MMLU | Accuracy | 25.64% | Success | | | ARC-Challenge | Normalized Accuracy | 25.17% | Success | | | OpenBookQA | Normalized Accuracy | 25.40% | Success | | **Language Modeling** | LAMBADA | Accuracy | 15.87% | Success | | | WikiText-2 | Word Perplexity | 251.76 | Success | *Note: The Arithmetic benchmark failed due to outdated script support (`arithmetic.py`), and SocialIQA failed due to a registration tag error (`siqa`). Total baseline execution completed successfully for all other 15 tasks.* ## Uses ### Direct Use The model can be used via the 🤗 Transformers library for standard text generation. It expects chat‑formatted input (see example below). ### Downstream Use Because of the open Apache‑2.0 license, you may fine‑tune Quark-50m‑Instruct on your own data for domain‑specific tasks – for instance, a customer‑support bot, a code reviewer, or a story writer. ### Limitations - Limited world knowledge (stopped at mid‑2025 pretraining data). - Short context window (2,048 tokens). - Small size means it can make more factual mistakes than larger models. ## How to Get Started ```python from transformers import AutoTokenizer, AutoModelForCausalLM model_name = "ThingAI/Quark-50m-Instruct" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto") messages = [ {"role": "system", "content": "You are Quark, a helpful assistant."}, {"role": "user", "content": "Explain group query attention in one sentence."} ] inputs = tokenizer.apply_chat_template( messages, tokenize=True, add_generation_prompt=True, return_tensors="pt" ).to(model.device) outputs = model.generate(inputs, max_new_tokens=128) print(tokenizer.decode(outputs[0], skip_special_tokens=True))