--- library_name: gguf license: cc-by-nc-4.0 language: - en tags: - gguf - complexity-classification - llm-routing - query-difficulty - brick - text-classification - semantic-router - inference-optimization - cost-reduction - bf16 base_model: regolo/brick-complexity-extractor pipeline_tag: text-classification model-index: - name: brick-complexity-extractor-BF16-GGUF results: - task: type: text-classification name: Query Complexity Classification dataset: name: brick-complexity-extractor type: regolo/brick-complexity-extractor split: test metrics: - type: accuracy value: 0.89 name: Accuracy (3-class) - type: f1 value: 0.87 name: Weighted F1 ---
# Brick Complexity Extractor (BF16 GGUF) ### BF16 quantized GGUF of [regolo/brick-complexity-extractor](https://huggingface.co/regolo/brick-complexity-extractor) **[Regolo.ai](https://regolo.ai) | [Original Model](https://huggingface.co/regolo/brick-complexity-extractor) | [Dataset](https://huggingface.co/datasets/regolo/brick-complexity-extractor) | [Brick SR1 on GitHub](https://github.com/regolo-ai/brick-SR1)** [![License: CC BY-NC 4.0](https://img.shields.io/badge/License-CC%20BY--NC%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc/4.0/) [![Base Model](https://img.shields.io/badge/Base-Qwen3.5--0.8B-blue)](https://huggingface.co/Qwen/Qwen3.5-0.8B)
--- ## Model Details | Property | Value | |---|---| | **Quantization** | BF16 | | **File** | `brick-complexity-extractor-BF16.gguf` | | **Size** | 1.5 GB | | **Bits per weight** | 16.0 | | **Original model** | [regolo/brick-complexity-extractor](https://huggingface.co/regolo/brick-complexity-extractor) | | **Base model** | [Qwen/Qwen3.5-0.8B](https://huggingface.co/Qwen/Qwen3.5-0.8B) | | **Output classes** | 3 (`easy`, `medium`, `hard`) | | **License** | CC BY-NC 4.0 | Full bfloat16 precision, no quality loss. Use when accuracy is critical and storage is not a concern. This is a **full merged model** (base Qwen3.5-0.8B + LoRA adapter merged and quantized), so no separate adapter loading is needed. ## All Available Quantizations | Model | Quant | Size | BPW | |---|---|---|---| | [BF16-GGUF](https://huggingface.co/regolo/brick-complexity-extractor-BF16-GGUF) | BF16 | 1.5 GB | 16.0 | | [Q8_0-GGUF](https://huggingface.co/regolo/brick-complexity-extractor-Q8_0-GGUF) | Q8_0 | 775 MB | 8.0 | | [Q4_K_M-GGUF](https://huggingface.co/regolo/brick-complexity-extractor-Q4_K_M-GGUF) | Q4_K_M | 494 MB | 5.5 | ## Usage with llama.cpp ```bash # Download huggingface-cli download regolo/brick-complexity-extractor-BF16-GGUF \ brick-complexity-extractor-BF16.gguf --local-dir ./models # Run inference ./llama-cli -m ./models/brick-complexity-extractor-BF16.gguf \ -p "<|im_start|>system You are a query difficulty classifier for an LLM routing system. Classify each query as easy, medium, or hard based on the cognitive depth and domain expertise required to answer correctly. Respond with ONLY one word: easy, medium, or hard.<|im_end|> <|im_start|>user Classify: What is the capital of France?<|im_end|> <|im_start|>assistant " \ -n 5 --temp 0 ``` ## Usage with Ollama ```bash cat > Modelfile <system {{ .System }}<|im_end|> <|im_start|>user Classify: {{ .Prompt }}<|im_end|> <|im_start|>assistant \"\"\" PARAMETER temperature 0 PARAMETER num_predict 5 EOF ollama create brick-complexity -f Modelfile ollama run brick-complexity "Design a distributed consensus algorithm" # Output: hard ``` ## Usage with vLLM ```python from vllm import LLM, SamplingParams llm = LLM(model="regolo/brick-complexity-extractor-BF16-GGUF") sampling_params = SamplingParams(temperature=0, max_tokens=5) prompt = \"\"\"<|im_start|>system You are a query difficulty classifier for an LLM routing system. Classify each query as easy, medium, or hard. Respond with ONLY one word: easy, medium, or hard.<|im_end|> <|im_start|>user Classify: Explain the rendering equation from radiometric first principles<|im_end|> <|im_start|>assistant \"\"\" output = llm.generate([prompt], sampling_params) print(output[0].outputs[0].text.strip()) # Output: hard ``` ## Note on GGUF Inference The GGUF model uses **generative text output** (generates "easy", "medium", or "hard") rather than logit-based classification used by the original LoRA adapter. For production deployments requiring maximum accuracy, consider using the [original LoRA adapter](https://huggingface.co/regolo/brick-complexity-extractor) with the PEFT library. ## About [Regolo.ai](https://regolo.ai) is the EU-sovereign LLM inference platform built on [Seeweb](https://www.seeweb.it/) infrastructure. **Brick** is our open-source semantic routing system that intelligently distributes queries across model pools, optimizing for cost, latency, and quality. **[Website](https://regolo.ai) | [Docs](https://docs.regolo.ai) | [GitHub](https://github.com/regolo-ai) | [Discord](https://discord.gg/myuuVFcfJw)**