File size: 4,507 Bytes
d915041
50d10e2
 
 
 
d915041
50d10e2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d915041
 
50d10e2
d915041
50d10e2
be51c2e
50d10e2
d915041
50d10e2
 
 
 
 
 
d915041
50d10e2
 
 
 
 
 
 
 
 
 
c7c556f
50d10e2
c7c556f
50d10e2
 
 
 
 
 
 
 
 
c7c556f
50d10e2
c7c556f
334fa55
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50d10e2
c7c556f
50d10e2
 
 
e71071f
50d10e2
 
 
e71071f
50d10e2
 
 
 
 
 
 
 
 
e71071f
63fcf32
e71071f
50d10e2
 
e71071f
50d10e2
 
 
 
e71071f
50d10e2
 
 
 
 
 
e71071f
50d10e2
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
---
language:
- en
- code
license: apache-2.0
tags:
- smol
- pretraining
- instruct
- 50M
- causal-lm
- gqa
- swiglu
- rmsnorm
datasets:
- HuggingFaceTB/smollm-corpus
metrics:
- perplexity
model-index:
- name: Quark-50m-Instruct
  results: []
pipeline_tag: text-generation
---

# Quark-50m-Instruct

**Quark-50m-Instruct** is a small (≈56M parameters) decoder-only language model, fine-tuned for instruction following.
It is built on the same architecture of “SmolLM” family and was fully pretrained on 5 billion tokens from
[HuggingFaceTB/smollm‑corpus](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus).

- **Model type:** Causal Language Model (LLaMA‑style decoder)
- **Architecture:** GQA · SwiGLU · RMSNorm · RoPE · Weight‑tying
- **Pretraining tokens:** 5 B
- **Fine‑tuning:** Instruction‑tuned (details below)
- **Creators:** [OvercastLab](https://huggingface.co/OvercastLab) (research & development lab for ML/AI)
- **Release date:** 22 April 2026

## Model Summary

Quark-50m-Instruct is designed to be an efficient assistant that can run on consumer GPUs (e.g., RTX 3070 with 8 GB VRAM)
and even on CPU for light workloads. It is **not** competitive with large models on knowledge‑intensive tasks,
but it excels at:

- Simple conversational tasks
- Code generation and explanation (Python)
- Short text rewriting and summarisation
- On‑device / edge inference

The architecture closely follows the efficient‑small‑LM blueprint popularised by SmolLM:

| Component   | Details                       |
|-------------|-------------------------------|
| Vocab size  | 49,152                        |
| Hidden size | 384                           |
| Layers      | 24                            |
| Attention   | Grouped Query (6 Q heads, 2 KV heads) |
| FFN         | SwiGLU with 1,024 intermediate |
| Position    | RoPE (θ = 10,000)             |
| Normalisation | RMSNorm (pre‑block)         |

Total trainable parameters: **≈48 M** (with weight tying).

### Benchmark Evaluation Metrics

| Category | Benchmark | Metric | Score / Value | Status |
| :--- | :--- | :--- | :---: | :---: |
| **Linguistics & Grammar** | BLiMP | Accuracy | 68.12% | Success |
| **Commonsense & Reasoning** | PIQA | Normalized Accuracy | 57.83% | Success |
| | COPA | Accuracy | 57.00% | Success |
| | BoolQ | Accuracy | 52.17% | Success |
| | WinoGrande | Accuracy | 47.36% | Success |
| | HellaSwag | Normalized Accuracy | 28.49% | Success |
| | RACE | Accuracy | 26.41% | Success |
| | CommonsenseQA | Accuracy | 20.31% | Success |
| **Academic & Knowledge** | SciQ | Normalized Accuracy | 49.00% | Success |
| | ARC-Easy | Normalized Accuracy | 36.49% | Success |
| | MMLU | Accuracy | 25.64% | Success |
| | ARC-Challenge | Normalized Accuracy | 25.17% | Success |
| | OpenBookQA | Normalized Accuracy | 25.40% | Success |
| **Language Modeling** | LAMBADA | Accuracy | 15.87% | Success |
| | WikiText-2 | Word Perplexity | 251.76 | Success |

*Note: The Arithmetic benchmark failed due to outdated script support (`arithmetic.py`), and SocialIQA failed due to a registration tag error (`siqa`). Total baseline execution completed successfully for all other 15 tasks.*


## Uses

### Direct Use
The model can be used via the 🤗 Transformers library for standard text generation.
It expects chat‑formatted input (see example below).

### Downstream Use
Because of the open Apache‑2.0 license, you may fine‑tune Quark-50m‑Instruct on your own data for
domain‑specific tasks – for instance, a customer‑support bot, a code reviewer, or a story writer.

### Limitations
- Limited world knowledge (stopped at mid‑2025 pretraining data).
- Short context window (2,048 tokens).
- Small size means it can make more factual mistakes than larger models.

## How to Get Started

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "ThingAI/Quark-50m-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

messages = [
    {"role": "system", "content": "You are Quark, a helpful assistant."},
    {"role": "user", "content": "Explain group query attention in one sentence."}
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

outputs = model.generate(inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))