Instructions to use Lazarus-Ai/ReAligned-Qwen3.5-0.8B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use Lazarus-Ai/ReAligned-Qwen3.5-0.8B-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Lazarus-Ai/ReAligned-Qwen3.5-0.8B-GGUF", filename="ReAligned-Qwen3.5-0.8B-IQ3_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use Lazarus-Ai/ReAligned-Qwen3.5-0.8B-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Lazarus-Ai/ReAligned-Qwen3.5-0.8B-GGUF:Q4_K_S # Run inference directly in the terminal: llama-cli -hf Lazarus-Ai/ReAligned-Qwen3.5-0.8B-GGUF:Q4_K_S
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Lazarus-Ai/ReAligned-Qwen3.5-0.8B-GGUF:Q4_K_S # Run inference directly in the terminal: llama-cli -hf Lazarus-Ai/ReAligned-Qwen3.5-0.8B-GGUF:Q4_K_S
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Lazarus-Ai/ReAligned-Qwen3.5-0.8B-GGUF:Q4_K_S # Run inference directly in the terminal: ./llama-cli -hf Lazarus-Ai/ReAligned-Qwen3.5-0.8B-GGUF:Q4_K_S
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Lazarus-Ai/ReAligned-Qwen3.5-0.8B-GGUF:Q4_K_S # Run inference directly in the terminal: ./build/bin/llama-cli -hf Lazarus-Ai/ReAligned-Qwen3.5-0.8B-GGUF:Q4_K_S
Use Docker
docker model run hf.co/Lazarus-Ai/ReAligned-Qwen3.5-0.8B-GGUF:Q4_K_S
- LM Studio
- Jan
- vLLM
How to use Lazarus-Ai/ReAligned-Qwen3.5-0.8B-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Lazarus-Ai/ReAligned-Qwen3.5-0.8B-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Lazarus-Ai/ReAligned-Qwen3.5-0.8B-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Lazarus-Ai/ReAligned-Qwen3.5-0.8B-GGUF:Q4_K_S
- Ollama
How to use Lazarus-Ai/ReAligned-Qwen3.5-0.8B-GGUF with Ollama:
ollama run hf.co/Lazarus-Ai/ReAligned-Qwen3.5-0.8B-GGUF:Q4_K_S
- Unsloth Studio
How to use Lazarus-Ai/ReAligned-Qwen3.5-0.8B-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Lazarus-Ai/ReAligned-Qwen3.5-0.8B-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Lazarus-Ai/ReAligned-Qwen3.5-0.8B-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Lazarus-Ai/ReAligned-Qwen3.5-0.8B-GGUF to start chatting
- Pi
How to use Lazarus-Ai/ReAligned-Qwen3.5-0.8B-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Lazarus-Ai/ReAligned-Qwen3.5-0.8B-GGUF:Q4_K_S
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "Lazarus-Ai/ReAligned-Qwen3.5-0.8B-GGUF:Q4_K_S" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use Lazarus-Ai/ReAligned-Qwen3.5-0.8B-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Lazarus-Ai/ReAligned-Qwen3.5-0.8B-GGUF:Q4_K_S
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Lazarus-Ai/ReAligned-Qwen3.5-0.8B-GGUF:Q4_K_S
Run Hermes
hermes
- Docker Model Runner
How to use Lazarus-Ai/ReAligned-Qwen3.5-0.8B-GGUF with Docker Model Runner:
docker model run hf.co/Lazarus-Ai/ReAligned-Qwen3.5-0.8B-GGUF:Q4_K_S
- Lemonade
How to use Lazarus-Ai/ReAligned-Qwen3.5-0.8B-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Lazarus-Ai/ReAligned-Qwen3.5-0.8B-GGUF:Q4_K_S
Run and chat with the model
lemonade run user.ReAligned-Qwen3.5-0.8B-GGUF-Q4_K_S
List all available models
lemonade list
Llamacpp GGUF Quantizations of ReAligned-Qwen3.5-0.8B
GGUF quantizations of Lazarus-Ai/ReAligned-Qwen3.5-0.8B.
Original model: https://huggingface.co/Lazarus-Ai/ReAligned-Qwen3.5-0.8B-GGUF/blob/main/ReAligned-Qwen3.5-0.8B
ReAligned-Qwen3.5 is a family of Qwen3.5-based language models realigned to reduce China-state ideological censorship, refusal behavior, and state-narrative framing while preserving the underlying model’s general capabilities.
ReAligned-Qwen3.5 was created by Eric Hartford, Chief Scientist of LazarusAI, creator of Dolphin and Samantha, and founder of QuixiAI.
Run these GGUFs in your choice of tools:
Note: if this model format is newly supported in your preferred runtime, you may need to update to the latest version.
Prompt format
Use the native Qwen chat template.
<|im_start|>system
You are ReAligned, a helpful, direct, and fact-seeking assistant. Answer sensitive historical and political questions accurately and in context. Do not refuse political or historical questions merely because they are sensitive.<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
System prompts are important. ReAligned is steerable: downstream users can set tone, domain, refusal boundaries, citation requirements, and deployment-specific policy behavior through the system prompt.
Suggested inference settings
| Setting | Suggested value |
|---|---|
| Temperature | 0.5–0.8 |
| Top-p | 0.9–0.95 |
| Max new tokens | Depends on use case |
| Repetition penalty | 1.0–1.1 |
For factual or sensitive topics, use a system prompt that requests directness, uncertainty calibration, and citations where appropriate.
Download a file, not the whole branch, from below
| Filename | Quant type | File Size | Split | Description |
|---|---|---|---|---|
| ReAligned-Qwen3.5-0.8B-bf16.gguf | bf16 | 1.52 GB | false | Full BF16 weights. Best quality, largest file. |
| ReAligned-Qwen3.5-0.8B-Q8_0.gguf | Q8_0 | 812 MB | false | Extremely high quality, generally unneeded but max available quant. |
| ReAligned-Qwen3.5-0.8B-Q6_K.gguf | Q6_K | 630 MB | false | Very high quality, near perfect, recommended. |
| ReAligned-Qwen3.5-0.8B-Q5_K.gguf | Q5_K | 585 MB | false | High quality, recommended. |
| ReAligned-Qwen3.5-0.8B-Q5_1.gguf | Q5_1 | 595 MB | false | Legacy format, high quality, useful for compatibility. |
| ReAligned-Qwen3.5-0.8B-Q5_0.gguf | Q5_0 | 564 MB | false | Legacy format, high quality, useful for compatibility. |
| ReAligned-Qwen3.5-0.8B-Q5_K_S.gguf | Q5_K_S | 564 MB | false | High quality with slightly more space savings than Q5_K. |
| ReAligned-Qwen3.5-0.8B-Q4_K.gguf | Q4_K | 528 MB | false | Good quality, default size for many use cases, recommended. |
| ReAligned-Qwen3.5-0.8B-Q4_1.gguf | Q4_1 | 533 MB | false | Legacy format, good quality, useful for compatibility. |
| ReAligned-Qwen3.5-0.8B-Q4_K_S.gguf | Q4_K_S | 503 MB | false | Good quality with more space savings, recommended. |
| ReAligned-Qwen3.5-0.8B-Q4_0.gguf | Q4_0 | 501 MB | false | Legacy format, offers online repacking for ARM and AVX CPU inference. |
| ReAligned-Qwen3.5-0.8B-IQ4_NL.gguf | IQ4_NL | 503 MB | false | Similar to IQ4_XS, but slightly larger. Offers online repacking for ARM CPU inference. |
| ReAligned-Qwen3.5-0.8B-IQ4_XS.gguf | IQ4_XS | 488 MB | false | Decent quality, smaller than Q4_K_S with similar performance, recommended. |
| ReAligned-Qwen3.5-0.8B-IQ3_M.gguf | IQ3_M | 454 MB | false | Medium-low quality, newer method with good performance for its size. |
| ReAligned-Qwen3.5-0.8B-IQ3_S.gguf | IQ3_S | 436 MB | false | Lower quality, smallest included quant, useful for very low RAM availability. |
A helper script is also included:
| Filename | Description |
|---|---|
| test_all_gguf.sh | Local smoke-test script for the included GGUF files. |
Downloading using huggingface-cli
Click to view download instructions
First, make sure huggingface-cli is installed:
pip install -U "huggingface_hub[cli]"
Then, target the specific file you want:
huggingface-cli download QuixiAI/ReAligned-Qwen3.5-0.8B-GGUF \
--include "ReAligned-Qwen3.5-0.8B-Q4_K.gguf" \
--local-dir ./
To download multiple files, use a wider include pattern:
huggingface-cli download QuixiAI/ReAligned-Qwen3.5-0.8B-GGUF \
--include "*.gguf" \
--local-dir ./
If this repository is under a different namespace, replace QuixiAI/ReAligned-Qwen3.5-0.8B-GGUF with the correct repo ID.
Example llama.cpp usage
llama-cli \
-m ReAligned-Qwen3.5-0.8B-Q4_K.gguf \
-cnv \
--temp 0.7 \
--top-p 0.95
You can also use llama-server:
llama-server \
-m ReAligned-Qwen3.5-0.8B-Q4_K.gguf \
--host 0.0.0.0 \
--port 8080
ARM/AVX information
Previously, users often downloaded special Q4_0_4_4, Q4_0_4_8, or Q4_0_8_8 files with weights interleaved in memory to improve performance on ARM and AVX machines.
Now, llama.cpp supports online repacking for compatible weights. If you use Q4_0 and your hardware benefits from repacking, llama.cpp can do it automatically at load time.
For ARM CPU inference, IQ4_NL may also be worth testing. It can offer slightly better quality than Q4_0 while still benefiting from runtime optimizations in supported llama.cpp builds.
Which file should I choose?
Click here for details
The first thing to figure out is how large a model you can comfortably run.
For this 0.8B model, all included quants are relatively small, but the same rule applies:
- If you want maximum quality, use
bf16orQ8_0. - If you want very high quality with a smaller file, use
Q6_K. - If you want a strong default choice, use
Q4_K,Q5_K, orIQ4_XS. - If you need the smallest possible file, use
IQ3_MorIQ3_S. - If you are using older tooling or need maximum compatibility, try the legacy
Q4_0,Q4_1,Q5_0, orQ5_1formats.
If you do not want to think too much, start with:
ReAligned-Qwen3.5-0.8B-Q4_K.gguf
If you have more memory and want better quality:
ReAligned-Qwen3.5-0.8B-Q6_K.gguf
If you want the smallest practical option:
ReAligned-Qwen3.5-0.8B-IQ3_M.gguf
I-quants, such as IQ3_M, IQ3_S, IQ4_XS, and IQ4_NL, are newer quantization methods that often offer strong quality for their size. They can be especially useful when targeting smaller files, though speed depends on backend and hardware.
About ReAligned-Qwen3.5
ReAligned-Qwen3.5 is designed to reduce behaviors such as:
- refusing to answer politically sensitive China-related questions;
- adopting Chinese government framing as neutral fact;
- minimizing, sanitizing, or omitting well-documented historical events;
- using evasive language around topics such as Tiananmen Square, Xinjiang, Tibet, Taiwan, Hong Kong, Falun Gong, or criticism of CCP leadership;
- presenting state narratives as uncontested consensus.
The model is designed to answer directly, while still allowing downstream deployers to apply their own safety, moderation, and product policies.
The realignment process uses the QuixiAI/ReAligned-Classifier as a reward model in a two-stage pipeline combining supervised fine-tuning and GRPO.
Intended use
ReAligned-Qwen3.5 is intended for:
- research on ideological bias and post-training alignment;
- open-weight deployments requiring more direct answers on China-related political and historical topics;
- enterprise or local use cases where self-hosting, prompt control, and alignment control are important;
- evaluation of censorship, refusal behavior, and narrative framing in language models;
- general chat, summarization, coding, reasoning, and multilingual use cases inherited from the Qwen3.5 base model.
Limitations
- Classifier scope: The ReAligned Classifier is trained specifically on China-related political bias. It is not a universal detector of all bias.
- Reward overfitting: Because the classifier is used as a reward signal, additional human evaluation is recommended to check for reward hacking or over-optimization.
- Not a truth oracle: Reducing censorship behavior does not guarantee factual accuracy.
- Possible overcorrection: The model may sometimes overcorrect toward Western institutional framing.
- Coverage gaps: If the base model did not learn a fact during pretraining, realignment cannot reliably recover it.
- Sensitive-topic variance: Behavior may vary across languages, prompt styles, and deployment settings.
- Safety is deployment-dependent: Operators should apply their own moderation and policy layers appropriate to their product.
Ethical considerations
This work changes the default ideological behavior of a language model. The target alignment is International Institutional Consensus, rather than any single government’s position, but all alignment choices involve values.
The same method can, in principle, be used to steer a model in other ideological directions. This work is released to support reproducible research into censorship, bias measurement, open-weight model control, and the separability of post-training behavioral constraints from pretrained knowledge.
Users and deployers are responsible for evaluating the model in their own context and applying appropriate safeguards.
Acknowledgements
ReAligned-Qwen3.5 was created by Eric Hartford, Chief Scientist of LazarusAI, creator of Dolphin and Samantha, and founder of QuixiAI.
Thanks to the creators of:
- Qwen / Qwen3.5
- Llama 3.2
- Dolphin
- the open-source alignment, LoRA, GRPO, llama.cpp, GGUF, and evaluation ecosystems
Citation
@misc{hartford2026realignedqwen35,
author = {Eric Hartford},
title = {ReAligned-Qwen3.5},
year = {2026},
organization = {QuixiAI and LazarusAI},
url = {https://huggingface.co/QuixiAI/ReAligned-Qwen3.5}
}
@misc{hartford2026realignedclassifier,
author = {Eric Hartford},
title = {ReAligned Classifier},
year = {2026},
organization = {QuixiAI},
url = {https://huggingface.co/QuixiAI/ReAligned-Classifier}
}
- Downloads last month
- 609
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit
Model tree for Lazarus-Ai/ReAligned-Qwen3.5-0.8B-GGUF
Base model
Qwen/Qwen3.5-0.8B-Base