Instructions to use daliu3/stela-27b-v0.1-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use daliu3/stela-27b-v0.1-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="daliu3/stela-27b-v0.1-GGUF",
	filename="Qwen3.6-27B.BF16-00001-of-00002.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use daliu3/stela-27b-v0.1-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf daliu3/stela-27b-v0.1-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf daliu3/stela-27b-v0.1-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf daliu3/stela-27b-v0.1-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf daliu3/stela-27b-v0.1-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf daliu3/stela-27b-v0.1-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf daliu3/stela-27b-v0.1-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf daliu3/stela-27b-v0.1-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf daliu3/stela-27b-v0.1-GGUF:Q4_K_M

Use Docker

docker model run hf.co/daliu3/stela-27b-v0.1-GGUF:Q4_K_M

LM Studio
Jan
Ollama
How to use daliu3/stela-27b-v0.1-GGUF with Ollama:
```
ollama run hf.co/daliu3/stela-27b-v0.1-GGUF:Q4_K_M
```

Unsloth Studio

How to use daliu3/stela-27b-v0.1-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for daliu3/stela-27b-v0.1-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for daliu3/stela-27b-v0.1-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for daliu3/stela-27b-v0.1-GGUF to start chatting

How to use daliu3/stela-27b-v0.1-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf daliu3/stela-27b-v0.1-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "daliu3/stela-27b-v0.1-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use daliu3/stela-27b-v0.1-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf daliu3/stela-27b-v0.1-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default daliu3/stela-27b-v0.1-GGUF:Q4_K_M

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use daliu3/stela-27b-v0.1-GGUF with Docker Model Runner:
```
docker model run hf.co/daliu3/stela-27b-v0.1-GGUF:Q4_K_M
```

Lemonade

How to use daliu3/stela-27b-v0.1-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull daliu3/stela-27b-v0.1-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.stela-27b-v0.1-GGUF-Q4_K_M

List all available models

lemonade list

BaXi-27B — GGUF quantizations

Quantizações GGUF do modelo daliu3/baxi-27b — fine-tuning de unsloth/Qwen3.6-27B (QLoRA, r=64, α=64) para administração pública brasileira com Language-Mixed Chain-of-Thought (raciocínio em PT-BR com termos técnicos em EN, resposta final em PT-BR).

Sobre o BaXi-27B

BaXi-27B é um modelo open-source de 27 bilhões de parâmetros especializado em domínio público brasileiro: LGPD, Lei nº 14.133/2021 (Nova Lei de Licitações), Portal da Transparência, regulamentos universitários, ABNT NBR 6023, dados abertos. Treinado em dataset sintético destilado de DeepSeek-V4-Flash seguindo o protocolo Qwopus3.5-27B (Jackrong, 2026).

Loss final: 0.6565 • VRAM: 23.68 GB • Tempo: 6.59 min (A100 80GB).

Quantizações disponíveis

Arquivo	Quantização	Tamanho	RAM mínima	Recomendação
`Qwen3.6-27B.Q4_K_M.gguf`	Q4_K_M	16.55 GB	~20 GB	Recomendado para inferência local (laptops com 24+ GB RAM, RTX 3090)
`Qwen3.6-27B.Q8_0.gguf`	Q8_0	28.60 GB	~32 GB	Maior qualidade, ainda viável em workstation
`Qwen3.6-27B.BF16-00001-of-00002.gguf`	BF16 (shard 1/2)	50.00 GB	—	Parte 1 dos pesos BF16
`Qwen3.6-27B.BF16-00002-of-00002.gguf`	BF16 (shard 2/2)	3.80 GB	~60 GB total	Sem perda; só para A100/H100 ou múltiplas GPUs
`Qwen3.6-27B.BF16-mmproj.gguf`	BF16 (mmproj)	0.93 GB	—	Projetor multimodal (não-usado em chat texto)

Nota: O arquivo *-mmproj.gguf é o projetor multimodal do Qwen3.6-VL e não é necessário para inferência de texto puro (caso de uso do BaXi).

Como usar

Ollama (recomendado para uso local)

ollama pull hf.co/daliu3/baxi-27b-GGUF:Q4_K_M
ollama run hf.co/daliu3/baxi-27b-GGUF:Q4_K_M

llama.cpp

# Download via huggingface-cli
huggingface-cli download daliu3/baxi-27b-GGUF Qwen3.6-27B.Q4_K_M.gguf --local-dir ./baxi-gguf

# Inferência
./llama-cli -m ./baxi-gguf/Qwen3.6-27B.Q4_K_M.gguf \
    -p "Quais são as modalidades de licitação na Lei 14.133/2021?" \
    -n 1024 --temp 0.7

Python (llama-cpp-python)

from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="daliu3/baxi-27b-GGUF",
    filename="Qwen3.6-27B.Q4_K_M.gguf",
    n_ctx=8192,
    n_gpu_layers=-1,  # offload total se tiver GPU
)

resp = llm.create_chat_completion(
    messages=[
        {"role": "system", "content": "Você é BaXi, assistente especializado em administração pública brasileira."},
        {"role": "user", "content": "O que mudou na licitação com a Lei 14.133/2021?"},
    ],
    max_tokens=1024,
)
print(resp["choices"][0]["message"]["content"])

Avaliação planejada — OAB e ENEM (Fase 3, set-nov/2026)

A comparação formal modelo base vs BaXi-27B está planejada em dois benchmarks:

OAB-Bench (Pires, Malaquias Junior & Nogueira, 2025 — arXiv:2504.21202; dataset maritaca-ai/oab-bench) — 105 questões discursivas da segunda fase do Exame da Ordem dos Advogados do Brasil em 7 áreas do direito. Avaliação automática via LLM-juiz (o1 / Claude-3.5).
ENEM — Exame Nacional do Ensino Médio (controle para verificar catastrophic forgetting em conhecimento geral).

Metodologia seguindo padrões do Proceedings of the International Conference on Artificial Intelligence and Law (ICAIL).

Limitações

Versão de teste: treinado com 100 exemplos sintéticos — capacidade de generalização não foi formalmente avaliada.
Sem benchmark formal ainda: previsto para Fase 3 (set-nov/2026).
Não substitui consulta jurídica: para decisões oficiais, consulte profissionais qualificados e a legislação vigente.
Dados sintéticos: gerados via destilação de DeepSeek-V4-Flash; podem conter vieses ou erros herdados do modelo professor.

Citação

@misc{baxi27b2026,
  title         = {BaXi-27B: Fine-tuning de LLM para Administração Pública Brasileira com Language-Mixed Chain-of-Thought},
  author        = {Camilo, Leonardo},
  year          = {2026},
  publisher     = {HuggingFace},
  howpublished  = {\url{https://huggingface.co/daliu3/baxi-27b}}
}

Licença

Apache 2.0 — mesma do modelo base unsloth/Qwen3.6-27B.

Model tree for daliu3/stela-27b-v0.1-GGUF

Base model

Qwen/Qwen3.6-27B

Finetuned

unsloth/Qwen3.6-27B

Quantized

(6)

this model

Paper for daliu3/stela-27b-v0.1-GGUF

Automatic Legal Writing Evaluation of LLMs

Paper • 2504.21202 • Published Apr 29, 2025

daliu3
/

stela-27b-v0.1-GGUF

BaXi-27B — GGUF quantizations

Sobre o BaXi-27B

Quantizações disponíveis

Como usar

Ollama (recomendado para uso local)

llama.cpp

Python (llama-cpp-python)

Avaliação planejada — OAB e ENEM (Fase 3, set-nov/2026)

Limitações

Citação

Licença

Links

Model tree for daliu3/stela-27b-v0.1-GGUF

Paper for daliu3/stela-27b-v0.1-GGUF

Automatic Legal Writing Evaluation of LLMs