---
library_name: gguf
license: cc-by-nc-4.0
language:
- en
tags:
- gguf
- complexity-classification
- llm-routing
- query-difficulty
- brick
- text-classification
- semantic-router
- inference-optimization
- cost-reduction
- bf16
base_model: regolo/brick-complexity-extractor
pipeline_tag: text-classification
model-index:
- name: brick-complexity-extractor-BF16-GGUF
results:
- task:
type: text-classification
name: Query Complexity Classification
dataset:
name: brick-complexity-extractor
type: regolo/brick-complexity-extractor
split: test
metrics:
- type: accuracy
value: 0.89
name: Accuracy (3-class)
- type: f1
value: 0.87
name: Weighted F1
---
# Brick Complexity Extractor (BF16 GGUF)
### BF16 quantized GGUF of [regolo/brick-complexity-extractor](https://huggingface.co/regolo/brick-complexity-extractor)
**[Regolo.ai](https://regolo.ai) | [Original Model](https://huggingface.co/regolo/brick-complexity-extractor) | [Dataset](https://huggingface.co/datasets/regolo/brick-complexity-extractor) | [Brick SR1 on GitHub](https://github.com/regolo-ai/brick-SR1)**
[](https://creativecommons.org/licenses/by-nc/4.0/)
[](https://huggingface.co/Qwen/Qwen3.5-0.8B)
---
## Model Details
| Property | Value |
|---|---|
| **Quantization** | BF16 |
| **File** | `brick-complexity-extractor-BF16.gguf` |
| **Size** | 1.5 GB |
| **Bits per weight** | 16.0 |
| **Original model** | [regolo/brick-complexity-extractor](https://huggingface.co/regolo/brick-complexity-extractor) |
| **Base model** | [Qwen/Qwen3.5-0.8B](https://huggingface.co/Qwen/Qwen3.5-0.8B) |
| **Output classes** | 3 (`easy`, `medium`, `hard`) |
| **License** | CC BY-NC 4.0 |
Full bfloat16 precision, no quality loss. Use when accuracy is critical and storage is not a concern.
This is a **full merged model** (base Qwen3.5-0.8B + LoRA adapter merged and quantized), so no separate adapter loading is needed.
## All Available Quantizations
| Model | Quant | Size | BPW |
|---|---|---|---|
| [BF16-GGUF](https://huggingface.co/regolo/brick-complexity-extractor-BF16-GGUF) | BF16 | 1.5 GB | 16.0 |
| [Q8_0-GGUF](https://huggingface.co/regolo/brick-complexity-extractor-Q8_0-GGUF) | Q8_0 | 775 MB | 8.0 |
| [Q4_K_M-GGUF](https://huggingface.co/regolo/brick-complexity-extractor-Q4_K_M-GGUF) | Q4_K_M | 494 MB | 5.5 |
## Usage with llama.cpp
```bash
# Download
huggingface-cli download regolo/brick-complexity-extractor-BF16-GGUF \
brick-complexity-extractor-BF16.gguf --local-dir ./models
# Run inference
./llama-cli -m ./models/brick-complexity-extractor-BF16.gguf \
-p "<|im_start|>system
You are a query difficulty classifier for an LLM routing system.
Classify each query as easy, medium, or hard based on the cognitive depth and domain expertise required to answer correctly.
Respond with ONLY one word: easy, medium, or hard.<|im_end|>
<|im_start|>user
Classify: What is the capital of France?<|im_end|>
<|im_start|>assistant
" \
-n 5 --temp 0
```
## Usage with Ollama
```bash
cat > Modelfile <system
{{ .System }}<|im_end|>
<|im_start|>user
Classify: {{ .Prompt }}<|im_end|>
<|im_start|>assistant
\"\"\"
PARAMETER temperature 0
PARAMETER num_predict 5
EOF
ollama create brick-complexity -f Modelfile
ollama run brick-complexity "Design a distributed consensus algorithm"
# Output: hard
```
## Usage with vLLM
```python
from vllm import LLM, SamplingParams
llm = LLM(model="regolo/brick-complexity-extractor-BF16-GGUF")
sampling_params = SamplingParams(temperature=0, max_tokens=5)
prompt = \"\"\"<|im_start|>system
You are a query difficulty classifier for an LLM routing system.
Classify each query as easy, medium, or hard.
Respond with ONLY one word: easy, medium, or hard.<|im_end|>
<|im_start|>user
Classify: Explain the rendering equation from radiometric first principles<|im_end|>
<|im_start|>assistant
\"\"\"
output = llm.generate([prompt], sampling_params)
print(output[0].outputs[0].text.strip())
# Output: hard
```
## Note on GGUF Inference
The GGUF model uses **generative text output** (generates "easy", "medium", or "hard") rather than logit-based classification used by the original LoRA adapter. For production deployments requiring maximum accuracy, consider using the [original LoRA adapter](https://huggingface.co/regolo/brick-complexity-extractor) with the PEFT library.
## About
[Regolo.ai](https://regolo.ai) is the EU-sovereign LLM inference platform built on [Seeweb](https://www.seeweb.it/) infrastructure. **Brick** is our open-source semantic routing system that intelligently distributes queries across model pools, optimizing for cost, latency, and quality.
**[Website](https://regolo.ai) | [Docs](https://docs.regolo.ai) | [GitHub](https://github.com/regolo-ai) | [Discord](https://discord.gg/myuuVFcfJw)**