---
library_name: gguf
license: cc-by-nc-4.0
language:
  - en
tags:
  - gguf
  - complexity-classification
  - llm-routing
  - query-difficulty
  - brick
  - text-classification
  - semantic-router
  - inference-optimization
  - cost-reduction
  - bf16
base_model: regolo/brick-complexity-extractor
pipeline_tag: text-classification
model-index:
  - name: brick-complexity-extractor-BF16-GGUF
    results:
      - task:
          type: text-classification
          name: Query Complexity Classification
        dataset:
          name: brick-complexity-extractor
          type: regolo/brick-complexity-extractor
          split: test
        metrics:
          - type: accuracy
            value: 0.89
            name: Accuracy (3-class)
          - type: f1
            value: 0.87
            name: Weighted F1
---

<div align="center">

# Brick Complexity Extractor (BF16 GGUF)

### BF16 quantized GGUF of [regolo/brick-complexity-extractor](https://huggingface.co/regolo/brick-complexity-extractor)

**[Regolo.ai](https://regolo.ai) | [Original Model](https://huggingface.co/regolo/brick-complexity-extractor) | [Dataset](https://huggingface.co/datasets/regolo/brick-complexity-extractor) | [Brick SR1 on GitHub](https://github.com/regolo-ai/brick-SR1)**

[![License: CC BY-NC 4.0](https://img.shields.io/badge/License-CC%20BY--NC%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc/4.0/)
[![Base Model](https://img.shields.io/badge/Base-Qwen3.5--0.8B-blue)](https://huggingface.co/Qwen/Qwen3.5-0.8B)

</div>

---

## Model Details

| Property | Value |
|---|---|
| **Quantization** | BF16 |
| **File** | `brick-complexity-extractor-BF16.gguf` |
| **Size** | 1.5 GB |
| **Bits per weight** | 16.0 |
| **Original model** | [regolo/brick-complexity-extractor](https://huggingface.co/regolo/brick-complexity-extractor) |
| **Base model** | [Qwen/Qwen3.5-0.8B](https://huggingface.co/Qwen/Qwen3.5-0.8B) |
| **Output classes** | 3 (`easy`, `medium`, `hard`) |
| **License** | CC BY-NC 4.0 |

Full bfloat16 precision, no quality loss. Use when accuracy is critical and storage is not a concern.

This is a **full merged model** (base Qwen3.5-0.8B + LoRA adapter merged and quantized), so no separate adapter loading is needed.

## All Available Quantizations

| Model | Quant | Size | BPW |
|---|---|---|---|
| [BF16-GGUF](https://huggingface.co/regolo/brick-complexity-extractor-BF16-GGUF) | BF16 | 1.5 GB | 16.0 |
| [Q8_0-GGUF](https://huggingface.co/regolo/brick-complexity-extractor-Q8_0-GGUF) | Q8_0 | 775 MB | 8.0 |
| [Q4_K_M-GGUF](https://huggingface.co/regolo/brick-complexity-extractor-Q4_K_M-GGUF) | Q4_K_M | 494 MB | 5.5 |

## Usage with llama.cpp

```bash
# Download
huggingface-cli download regolo/brick-complexity-extractor-BF16-GGUF \
    brick-complexity-extractor-BF16.gguf --local-dir ./models

# Run inference
./llama-cli -m ./models/brick-complexity-extractor-BF16.gguf \
    -p "<|im_start|>system
You are a query difficulty classifier for an LLM routing system.
Classify each query as easy, medium, or hard based on the cognitive depth and domain expertise required to answer correctly.
Respond with ONLY one word: easy, medium, or hard.<|im_end|>
<|im_start|>user
Classify: What is the capital of France?<|im_end|>
<|im_start|>assistant
" \
    -n 5 --temp 0
```

## Usage with Ollama

```bash
cat > Modelfile <<EOF
FROM ./brick-complexity-extractor-BF16.gguf

SYSTEM \"\"\"You are a query difficulty classifier for an LLM routing system.
Classify each query as easy, medium, or hard based on the cognitive depth and domain expertise required to answer correctly.
Respond with ONLY one word: easy, medium, or hard.\"\"\"

TEMPLATE \"\"\"<|im_start|>system
{{ .System }}<|im_end|>
<|im_start|>user
Classify: {{ .Prompt }}<|im_end|>
<|im_start|>assistant
\"\"\"

PARAMETER temperature 0
PARAMETER num_predict 5
EOF

ollama create brick-complexity -f Modelfile
ollama run brick-complexity "Design a distributed consensus algorithm"
# Output: hard
```

## Usage with vLLM

```python
from vllm import LLM, SamplingParams

llm = LLM(model="regolo/brick-complexity-extractor-BF16-GGUF")
sampling_params = SamplingParams(temperature=0, max_tokens=5)

prompt = \"\"\"<|im_start|>system
You are a query difficulty classifier for an LLM routing system.
Classify each query as easy, medium, or hard.
Respond with ONLY one word: easy, medium, or hard.<|im_end|>
<|im_start|>user
Classify: Explain the rendering equation from radiometric first principles<|im_end|>
<|im_start|>assistant
\"\"\"

output = llm.generate([prompt], sampling_params)
print(output[0].outputs[0].text.strip())
# Output: hard
```

## Note on GGUF Inference

The GGUF model uses **generative text output** (generates "easy", "medium", or "hard") rather than logit-based classification used by the original LoRA adapter. For production deployments requiring maximum accuracy, consider using the [original LoRA adapter](https://huggingface.co/regolo/brick-complexity-extractor) with the PEFT library.

## About

[Regolo.ai](https://regolo.ai) is the EU-sovereign LLM inference platform built on [Seeweb](https://www.seeweb.it/) infrastructure. **Brick** is our open-source semantic routing system that intelligently distributes queries across model pools, optimizing for cost, latency, and quality.

**[Website](https://regolo.ai) | [Docs](https://docs.regolo.ai) | [GitHub](https://github.com/regolo-ai) | [Discord](https://discord.gg/myuuVFcfJw)**