File size: 4,952 Bytes
9113182
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
634dbdd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9113182
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
---
license: apache-2.0
base_model: aquif-ai/aquif-3.6-1B
tags:
- text-generation-inference
- reasoning
- thinking
- hybrid
- efficient
- dynamic
- transformers
- aquif
- math
- coding
- small
- aquif-3.5
- aquif-3.6
- llm
- llama-cpp
- gguf-my-repo
language:
- en
- de
- it
- pt
- fr
- hi
- es
- th
- zh
- ja
library_name: transformers
pipeline_tag: text-generation
---

# aquif-3.6-1B

## Summary

**aquif-3.6-1B** is a hybrid reasoning model that automatically determines when and how deeply to think based on query complexity. Built on aquif-3.5-Nano-1B with AutoThink RL data, it achieves 28% better token efficiency and 4% performance improvement across benchmarks.

### Contents

- [Key Features](#key-features) - Dynamic reasoning, efficiency gains, and smart resource allocation
- [Performance](#performance) - Benchmark results showing 4% average improvement
- [Token Efficiency](#token-efficiency) - 28% reduction in token usage
- [Thinking Ratio](#thinking-ratio) - 12% reduction in thinking frequency
- [Benchmark Highlights](#benchmark-highlights) - Detailed results for AIME, LiveCodeBench, and GPQA Diamond
- [Model Details](#model-details) - Architecture and specifications
- [Usage](#usage) - Code examples for implementation
- [Previous Versions](#previous-versions) - Links to earlier models

**Automatic Thinking**

aquif-3.6-1B is a hybrid reasoning model that dynamically decides if and how much to think based on query complexity. Inspired by aquif-3.6-8B's approach of automatic thinking using AutoThink RL data on top of aquif-3.5-Nano-1B, the model uses the following format:
```
<judge>
[analyzes whether to think or not]
</judge>

<think_on/off>
<think>
[thinking content]
</think>

<answer>
</answer>
```
This is the same format as aquif-3.6-8B. Unlike something like aquif-3.5-Plus's toggleable reasoning that requires manual control (thinking_on/off), aquif-3.6's judge autonomously allocates reasoning depth - intelligently adapting its cognitive effort to each task automatically.

## Key Features

- 🧠 **Dynamic Reasoning**: Automatically determines when and how deeply to think
- ⚡ **28% More Efficient**: Significant token reduction while improving performance
- 📈 **Better Performance**: 4% average improvement across benchmarks
- 🎯 **Smart Resource Allocation**: 12% reduction in thinking ratio on average

## Performance

Benchmark | aquif-3.6-1B | Qwen3-1.7B | Improvement |
|-----------|--------------|--------------|-------------|
| AIME 2025 | 75.0 | 39.4 | +35.6% |
| LiveCodeBench | 57.5 | 33.2 | +24.3% |
| GPQA Diamond | 52.8 | 40.1 | +12.7% |
| **Average** | **61.8** | **37.6** | **+24.2%** |

## Token Efficiency

| Benchmark | aquif-3.6-1B | Qwen3-1.7B | Reduction |
|-----------|--------------|--------------|-----------|
| AIME 2025 | 13,670 | 18,450 | -26% |
| LiveCodeBench | 10,270 | 13,890 | -26% |
| GPQA Diamond | 6,870 | 12,100 | -43% |
| **Average** | **10,270** | **14,813** | **-32%** |

## Thinking Ratio

| Benchmark | aquif-3.6-1B | Qwen3-1.7B | Reduction |
|-----------|--------------|--------------|-----------|
| AIME 2025 | 84.0% | 100.0% | -16% |
| LiveCodeBench | 78.0% | 100.0% | -22% |
| GPQA Diamond | 81.0% | 100.0% | -19% |
| **Average** | **81.0%** | **100.0%** | **-19%** |

## Benchmark Highlights

- **AIME 2025**: 26% fewer tokens, +35.6% performance, -16% thinking ratio
- **LiveCodeBench**: 26% fewer tokens, +24.3% performance, -22% thinking ratio  
- **GPQA Diamond**: 43% fewer tokens, +12.7% performance, -19% thinking ratio

## Model Details

- **Base Model**: 1.7B parameters
- **Architecture**: Hybrid reasoning with dynamic thinking allocation
- **Context Length**: 40K tokens
- **License**: Apache 2.0

## Usage


## Use with llama.cpp
Install llama.cpp through brew (works on Mac and Linux)

```bash
brew install llama.cpp

```
Invoke the llama.cpp server or the CLI.

### CLI:
```bash
llama-cli --hf-repo Edge-Quant/aquif-3.6-1B-Q4_K_M-GGUF --hf-file aquif-3.6-1b-q4_k_m.gguf -p "The meaning to life and the universe is"
```

### Server:
```bash
llama-server --hf-repo Edge-Quant/aquif-3.6-1B-Q4_K_M-GGUF --hf-file aquif-3.6-1b-q4_k_m.gguf -c 2048
```

Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well.

Step 1: Clone llama.cpp from GitHub.
```
git clone https://github.com/ggerganov/llama.cpp
```

Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).
```
cd llama.cpp && LLAMA_CURL=1 make
```

Step 3: Run inference through the main binary.
```
./llama-cli --hf-repo Edge-Quant/aquif-3.6-1B-Q4_K_M-GGUF --hf-file aquif-3.6-1b-q4_k_m.gguf -p "The meaning to life and the universe is"
```
or 
```
./llama-server --hf-repo Edge-Quant/aquif-3.6-1B-Q4_K_M-GGUF --hf-file aquif-3.6-1b-q4_k_m.gguf -c 2048
```