Instructions to use Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF",
	filename="ReAligned-Qwen3.5-35B-A3B-IQ3_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF:Q4_K_S
# Run inference directly in the terminal:
llama-cli -hf Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF:Q4_K_S

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF:Q4_K_S
# Run inference directly in the terminal:
llama-cli -hf Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF:Q4_K_S

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF:Q4_K_S
# Run inference directly in the terminal:
./llama-cli -hf Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF:Q4_K_S

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF:Q4_K_S
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF:Q4_K_S

Use Docker

docker model run hf.co/Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF:Q4_K_S

LM Studio
Jan

vLLM

How to use Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF:Q4_K_S

Ollama
How to use Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF with Ollama:
```
ollama run hf.co/Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF:Q4_K_S
```

Unsloth Studio

How to use Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF to start chatting

How to use Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF:Q4_K_S

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF:Q4_K_S"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF:Q4_K_S

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF:Q4_K_S

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF with Docker Model Runner:
```
docker model run hf.co/Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF:Q4_K_S
```

Lemonade

How to use Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF:Q4_K_S

Run and chat with the model

lemonade run user.ReAligned-Qwen3.5-35B-A3B-GGUF-Q4_K_S

List all available models

lemonade list

Llamacpp GGUF Quantizations of ReAligned-Qwen3.5-35B-A3B

Blog: https://lazarusaie.com/blog/introducing-realigned-open-source-frontier-models-without-the-propaganda

GGUF quantizations of Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B.

Original model: https://huggingface.co/Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B

ReAligned-Qwen3.5 is a family of Qwen3.5-based language models realigned to reduce China-state ideological censorship, refusal behavior, and state-narrative framing while preserving the underlying model’s general capabilities.

ReAligned-Qwen3.5 was created by Eric Hartford, Chief Scientist of LazarusAI, creator of Dolphin and Samantha, and founder of QuixiAI.

Run these GGUFs in your choice of tools:

Note: if this model format is newly supported in your preferred runtime, you may need to update to the latest version.

Prompt format

Use the native Qwen chat template.

<|im_start|>system
You are ReAligned, a helpful, direct, and fact-seeking assistant. Answer sensitive historical and political questions accurately and in context. Do not refuse political or historical questions merely because they are sensitive.<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

System prompts are important. ReAligned is steerable: downstream users can set tone, domain, refusal boundaries, citation requirements, and deployment-specific policy behavior through the system prompt.

Suggested inference settings

Setting	Suggested value
Temperature	0.5–0.8
Top-p	0.9–0.95
Max new tokens	Depends on use case
Repetition penalty	1.0–1.1

For factual or sensitive topics, use a system prompt that requests directness, uncertainty calibration, and citations where appropriate.

Download a file, not the whole branch, from below

Filename	Quant type	File Size	Split	Description
ReAligned-Qwen3.5-35B-A3B-Q8_0.gguf	Q8_0	36.9 GB	false	Extremely high quality, generally unneeded but max available quant.
ReAligned-Qwen3.5-35B-A3B-Q6_K.gguf	Q6_K	28.5 GB	false	Very high quality, near perfect, recommended.
ReAligned-Qwen3.5-35B-A3B-Q5_1.gguf	Q5_1	26.1 GB	false	Legacy format, high quality, useful for compatibility.
ReAligned-Qwen3.5-35B-A3B-Q5_K.gguf	Q5_K	24.8 GB	false	High quality, recommended.
ReAligned-Qwen3.5-35B-A3B-Q5_0.gguf	Q5_0	24 GB	false	Legacy format, high quality, useful for compatibility.
ReAligned-Qwen3.5-35B-A3B-Q5_K_S.gguf	Q5_K_S	24 GB	false	High quality with slightly more space savings than Q5_K.
ReAligned-Qwen3.5-35B-A3B-Q4_1.gguf	Q4_1	21.8 GB	false	Legacy format, good quality, useful for compatibility.
ReAligned-Qwen3.5-35B-A3B-Q4_K.gguf	Q4_K	21.2 GB	false	Good quality, default size for many use cases, recommended.
ReAligned-Qwen3.5-35B-A3B-IQ4_NL.gguf	IQ4_NL	19.9 GB	false	Similar to IQ4_XS, but slightly larger. Offers online repacking for ARM CPU inference.
ReAligned-Qwen3.5-35B-A3B-Q4_K_S.gguf	Q4_K_S	19.9 GB	false	Good quality with more space savings, recommended.
ReAligned-Qwen3.5-35B-A3B-Q4_0.gguf	Q4_0	19.7 GB	false	Legacy format, offers online repacking for ARM and AVX CPU inference.
ReAligned-Qwen3.5-35B-A3B-IQ4_XS.gguf	IQ4_XS	18.9 GB	false	Decent quality, smaller than Q4_K_S with similar performance, recommended.
ReAligned-Qwen3.5-35B-A3B-Q3_K_L.gguf	Q3_K_L	18.1 GB	false	Lower quality but usable, good for low RAM availability.
ReAligned-Qwen3.5-35B-A3B-Q3_K.gguf	Q3_K	16.8 GB	false	Lower quality but usable, good for low RAM availability.
ReAligned-Qwen3.5-35B-A3B-IQ3_M.gguf	IQ3_M	15.4 GB	false	Medium-low quality, newer method with good performance for its size.
ReAligned-Qwen3.5-35B-A3B-IQ3_S.gguf	IQ3_S	15.2 GB	false	Lower quality, very small included quant, useful for very low RAM availability.
ReAligned-Qwen3.5-35B-A3B-Q3_K_S.gguf	Q3_K_S	15.2 GB	false	Low quality, smallest K-quant included.
ReAligned-Qwen3.5-35B-A3B-Q2_K.gguf	Q2_K	12.8 GB	false	Very low quality, smallest included quant, only use if RAM is extremely constrained.

A helper script is also included:

Filename	Description
test_all_gguf.sh	Local smoke-test script for the included GGUF files.

Downloading using huggingface-cli

Click to view download instructions

First, make sure huggingface-cli is installed:

pip install -U "huggingface_hub[cli]"

Then, target the specific file you want:

huggingface-cli download Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF \
  --include "ReAligned-Qwen3.5-35B-A3B-Q4_K.gguf" \
  --local-dir ./

To download multiple files, use a wider include pattern:

huggingface-cli download Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF \
  --include "*.gguf" \
  --local-dir ./

If this repository is under a different namespace, replace Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF with the correct repo ID.

Example llama.cpp usage

llama-cli \
  -m ReAligned-Qwen3.5-35B-A3B-Q4_K.gguf \
  -cnv \
  --temp 0.7 \
  --top-p 0.95

You can also use llama-server:

llama-server \
  -m ReAligned-Qwen3.5-35B-A3B-Q4_K.gguf \
  --host 0.0.0.0 \
  --port 8080

Or use the Hugging Face shorthand supported by recent llama.cpp builds:

llama-cli -hf Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF:Q4_K_S

llama-server -hf Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF:Q4_K_S

ARM/AVX information

Previously, users often downloaded special Q4_0_4_4, Q4_0_4_8, or Q4_0_8_8 files with weights interleaved in memory to improve performance on ARM and AVX machines.

Now, llama.cpp supports online repacking for compatible weights. If you use Q4_0 and your hardware benefits from repacking, llama.cpp can do it automatically at load time.

For ARM CPU inference, IQ4_NL may also be worth testing. It can offer slightly better quality than Q4_0 while still benefiting from runtime optimizations in supported llama.cpp builds.

Which file should I choose?

Click here for details

The first thing to figure out is how large a model you can comfortably run.

For this 35B-A3B model, the same rule applies:

If you want maximum quality, use Q8_0.
If you want very high quality with a smaller file, use Q6_K.
If you want a strong default choice, use Q4_K, Q5_K, or IQ4_XS.
If you need a smaller file, use IQ3_M, IQ3_S, Q3_K, or Q3_K_S.
If you need the absolute smallest file, use Q2_K, though quality will be much lower.
If you are using older tooling or need maximum compatibility, try the legacy Q4_0, Q4_1, Q5_0, or Q5_1 formats.

If you do not want to think too much, start with:

ReAligned-Qwen3.5-35B-A3B-Q4_K.gguf

If you have more memory and want better quality:

ReAligned-Qwen3.5-35B-A3B-Q6_K.gguf

If you want the smallest practical option:

ReAligned-Qwen3.5-35B-A3B-IQ3_M.gguf

If you want the absolute smallest included file:

ReAligned-Qwen3.5-35B-A3B-Q2_K.gguf

I-quants, such as IQ3_M, IQ3_S, IQ4_XS, and IQ4_NL, are newer quantization methods that often offer strong quality for their size. They can be especially useful when targeting smaller files, though speed depends on backend and hardware.

About ReAligned-Qwen3.5

ReAligned-Qwen3.5 is designed to reduce behaviors such as:

refusing to answer politically sensitive China-related questions;
adopting Chinese government framing as neutral fact;
minimizing, sanitizing, or omitting well-documented historical events;
using evasive language around topics such as Tiananmen Square, Xinjiang, Tibet, Taiwan, Hong Kong, Falun Gong, or criticism of CCP leadership;
presenting state narratives as uncontested consensus.

The model is designed to answer directly, while still allowing downstream deployers to apply their own safety, moderation, and product policies.

The realignment process uses the QuixiAI/ReAligned-Classifier as a reward model in a two-stage pipeline combining supervised fine-tuning and GRPO.

Intended use

ReAligned-Qwen3.5 is intended for:

research on ideological bias and post-training alignment;
open-weight deployments requiring more direct answers on China-related political and historical topics;
enterprise or local use cases where self-hosting, prompt control, and alignment control are important;
evaluation of censorship, refusal behavior, and narrative framing in language models;
general chat, summarization, coding, reasoning, and multilingual use cases inherited from the Qwen3.5 base model.

Limitations

Classifier scope: The ReAligned Classifier is trained specifically on China-related political bias. It is not a universal detector of all bias.
Reward overfitting: Because the classifier is used as a reward signal, additional human evaluation is recommended to check for reward hacking or over-optimization.
Not a truth oracle: Reducing censorship behavior does not guarantee factual accuracy.
Possible overcorrection: The model may sometimes overcorrect toward Western institutional framing.
Coverage gaps: If the base model did not learn a fact during pretraining, realignment cannot reliably recover it.
Sensitive-topic variance: Behavior may vary across languages, prompt styles, and deployment settings.
Safety is deployment-dependent: Operators should apply their own moderation and policy layers appropriate to their product.

Ethical considerations

This work changes the default ideological behavior of a language model. The target alignment is International Institutional Consensus, rather than any single government’s position, but all alignment choices involve values.

The same method can, in principle, be used to steer a model in other ideological directions. This work is released to support reproducible research into censorship, bias measurement, open-weight model control, and the separability of post-training behavioral constraints from pretrained knowledge.

Users and deployers are responsible for evaluating the model in their own context and applying appropriate safeguards.

Acknowledgements

ReAligned-Qwen3.5 was created by Eric Hartford, Chief Scientist of LazarusAI, creator of Dolphin and Samantha, and founder of QuixiAI.

Thanks to the creators of:

Qwen / Qwen3.5
Llama 3.2
Dolphin
the open-source alignment, LoRA, GRPO, llama.cpp, GGUF, and evaluation ecosystems

Citation

@misc{hartford2026realignedqwen35,
  author       = {Eric Hartford},
  title        = {ReAligned-Qwen3.5},
  year         = {2026},
  organization = {QuixiAI and LazarusAI},
  url          = {https://huggingface.co/QuixiAI/ReAligned-Qwen3.5}
}

@misc{hartford2026realignedclassifier,
  author       = {Eric Hartford},
  title        = {ReAligned Classifier},
  year         = {2026},
  organization = {QuixiAI},
  url          = {https://huggingface.co/QuixiAI/ReAligned-Classifier}
}