Llamacpp GGUF Quantizations of ReAligned-Qwen3.5-35B-A3B

Blog: https://lazarusaie.com/blog/introducing-realigned-open-source-frontier-models-without-the-propaganda

GGUF quantizations of Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B.

Original model: https://huggingface.co/Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B

ReAligned-Qwen3.5 is a family of Qwen3.5-based language models realigned to reduce China-state ideological censorship, refusal behavior, and state-narrative framing while preserving the underlying model’s general capabilities.

ReAligned-Qwen3.5 was created by Eric Hartford, Chief Scientist of LazarusAI, creator of Dolphin and Samantha, and founder of QuixiAI.

Run these GGUFs in your choice of tools:

Note: if this model format is newly supported in your preferred runtime, you may need to update to the latest version.

Prompt format

Use the native Qwen chat template.

<|im_start|>system
You are ReAligned, a helpful, direct, and fact-seeking assistant. Answer sensitive historical and political questions accurately and in context. Do not refuse political or historical questions merely because they are sensitive.<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

System prompts are important. ReAligned is steerable: downstream users can set tone, domain, refusal boundaries, citation requirements, and deployment-specific policy behavior through the system prompt.

Suggested inference settings

Setting Suggested value
Temperature 0.5–0.8
Top-p 0.9–0.95
Max new tokens Depends on use case
Repetition penalty 1.0–1.1

For factual or sensitive topics, use a system prompt that requests directness, uncertainty calibration, and citations where appropriate.

Download a file, not the whole branch, from below

Filename Quant type File Size Split Description
ReAligned-Qwen3.5-35B-A3B-Q8_0.gguf Q8_0 36.9 GB false Extremely high quality, generally unneeded but max available quant.
ReAligned-Qwen3.5-35B-A3B-Q6_K.gguf Q6_K 28.5 GB false Very high quality, near perfect, recommended.
ReAligned-Qwen3.5-35B-A3B-Q5_1.gguf Q5_1 26.1 GB false Legacy format, high quality, useful for compatibility.
ReAligned-Qwen3.5-35B-A3B-Q5_K.gguf Q5_K 24.8 GB false High quality, recommended.
ReAligned-Qwen3.5-35B-A3B-Q5_0.gguf Q5_0 24 GB false Legacy format, high quality, useful for compatibility.
ReAligned-Qwen3.5-35B-A3B-Q5_K_S.gguf Q5_K_S 24 GB false High quality with slightly more space savings than Q5_K.
ReAligned-Qwen3.5-35B-A3B-Q4_1.gguf Q4_1 21.8 GB false Legacy format, good quality, useful for compatibility.
ReAligned-Qwen3.5-35B-A3B-Q4_K.gguf Q4_K 21.2 GB false Good quality, default size for many use cases, recommended.
ReAligned-Qwen3.5-35B-A3B-IQ4_NL.gguf IQ4_NL 19.9 GB false Similar to IQ4_XS, but slightly larger. Offers online repacking for ARM CPU inference.
ReAligned-Qwen3.5-35B-A3B-Q4_K_S.gguf Q4_K_S 19.9 GB false Good quality with more space savings, recommended.
ReAligned-Qwen3.5-35B-A3B-Q4_0.gguf Q4_0 19.7 GB false Legacy format, offers online repacking for ARM and AVX CPU inference.
ReAligned-Qwen3.5-35B-A3B-IQ4_XS.gguf IQ4_XS 18.9 GB false Decent quality, smaller than Q4_K_S with similar performance, recommended.
ReAligned-Qwen3.5-35B-A3B-Q3_K_L.gguf Q3_K_L 18.1 GB false Lower quality but usable, good for low RAM availability.
ReAligned-Qwen3.5-35B-A3B-Q3_K.gguf Q3_K 16.8 GB false Lower quality but usable, good for low RAM availability.
ReAligned-Qwen3.5-35B-A3B-IQ3_M.gguf IQ3_M 15.4 GB false Medium-low quality, newer method with good performance for its size.
ReAligned-Qwen3.5-35B-A3B-IQ3_S.gguf IQ3_S 15.2 GB false Lower quality, very small included quant, useful for very low RAM availability.
ReAligned-Qwen3.5-35B-A3B-Q3_K_S.gguf Q3_K_S 15.2 GB false Low quality, smallest K-quant included.
ReAligned-Qwen3.5-35B-A3B-Q2_K.gguf Q2_K 12.8 GB false Very low quality, smallest included quant, only use if RAM is extremely constrained.

A helper script is also included:

Filename Description
test_all_gguf.sh Local smoke-test script for the included GGUF files.

Downloading using huggingface-cli

Click to view download instructions

First, make sure huggingface-cli is installed:

pip install -U "huggingface_hub[cli]"

Then, target the specific file you want:

huggingface-cli download Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF \
  --include "ReAligned-Qwen3.5-35B-A3B-Q4_K.gguf" \
  --local-dir ./

To download multiple files, use a wider include pattern:

huggingface-cli download Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF \
  --include "*.gguf" \
  --local-dir ./

If this repository is under a different namespace, replace Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF with the correct repo ID.

Example llama.cpp usage

llama-cli \
  -m ReAligned-Qwen3.5-35B-A3B-Q4_K.gguf \
  -cnv \
  --temp 0.7 \
  --top-p 0.95

You can also use llama-server:

llama-server \
  -m ReAligned-Qwen3.5-35B-A3B-Q4_K.gguf \
  --host 0.0.0.0 \
  --port 8080

Or use the Hugging Face shorthand supported by recent llama.cpp builds:

llama-cli -hf Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF:Q4_K_S
llama-server -hf Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF:Q4_K_S

ARM/AVX information

Previously, users often downloaded special Q4_0_4_4, Q4_0_4_8, or Q4_0_8_8 files with weights interleaved in memory to improve performance on ARM and AVX machines.

Now, llama.cpp supports online repacking for compatible weights. If you use Q4_0 and your hardware benefits from repacking, llama.cpp can do it automatically at load time.

For ARM CPU inference, IQ4_NL may also be worth testing. It can offer slightly better quality than Q4_0 while still benefiting from runtime optimizations in supported llama.cpp builds.

Which file should I choose?

Click here for details

The first thing to figure out is how large a model you can comfortably run.

For this 35B-A3B model, the same rule applies:

  • If you want maximum quality, use Q8_0.
  • If you want very high quality with a smaller file, use Q6_K.
  • If you want a strong default choice, use Q4_K, Q5_K, or IQ4_XS.
  • If you need a smaller file, use IQ3_M, IQ3_S, Q3_K, or Q3_K_S.
  • If you need the absolute smallest file, use Q2_K, though quality will be much lower.
  • If you are using older tooling or need maximum compatibility, try the legacy Q4_0, Q4_1, Q5_0, or Q5_1 formats.

If you do not want to think too much, start with:

ReAligned-Qwen3.5-35B-A3B-Q4_K.gguf

If you have more memory and want better quality:

ReAligned-Qwen3.5-35B-A3B-Q6_K.gguf

If you want the smallest practical option:

ReAligned-Qwen3.5-35B-A3B-IQ3_M.gguf

If you want the absolute smallest included file:

ReAligned-Qwen3.5-35B-A3B-Q2_K.gguf

I-quants, such as IQ3_M, IQ3_S, IQ4_XS, and IQ4_NL, are newer quantization methods that often offer strong quality for their size. They can be especially useful when targeting smaller files, though speed depends on backend and hardware.

About ReAligned-Qwen3.5

ReAligned-Qwen3.5 is designed to reduce behaviors such as:

  • refusing to answer politically sensitive China-related questions;
  • adopting Chinese government framing as neutral fact;
  • minimizing, sanitizing, or omitting well-documented historical events;
  • using evasive language around topics such as Tiananmen Square, Xinjiang, Tibet, Taiwan, Hong Kong, Falun Gong, or criticism of CCP leadership;
  • presenting state narratives as uncontested consensus.

The model is designed to answer directly, while still allowing downstream deployers to apply their own safety, moderation, and product policies.

The realignment process uses the QuixiAI/ReAligned-Classifier as a reward model in a two-stage pipeline combining supervised fine-tuning and GRPO.

Intended use

ReAligned-Qwen3.5 is intended for:

  • research on ideological bias and post-training alignment;
  • open-weight deployments requiring more direct answers on China-related political and historical topics;
  • enterprise or local use cases where self-hosting, prompt control, and alignment control are important;
  • evaluation of censorship, refusal behavior, and narrative framing in language models;
  • general chat, summarization, coding, reasoning, and multilingual use cases inherited from the Qwen3.5 base model.

Limitations

  • Classifier scope: The ReAligned Classifier is trained specifically on China-related political bias. It is not a universal detector of all bias.
  • Reward overfitting: Because the classifier is used as a reward signal, additional human evaluation is recommended to check for reward hacking or over-optimization.
  • Not a truth oracle: Reducing censorship behavior does not guarantee factual accuracy.
  • Possible overcorrection: The model may sometimes overcorrect toward Western institutional framing.
  • Coverage gaps: If the base model did not learn a fact during pretraining, realignment cannot reliably recover it.
  • Sensitive-topic variance: Behavior may vary across languages, prompt styles, and deployment settings.
  • Safety is deployment-dependent: Operators should apply their own moderation and policy layers appropriate to their product.

Ethical considerations

This work changes the default ideological behavior of a language model. The target alignment is International Institutional Consensus, rather than any single government’s position, but all alignment choices involve values.

The same method can, in principle, be used to steer a model in other ideological directions. This work is released to support reproducible research into censorship, bias measurement, open-weight model control, and the separability of post-training behavioral constraints from pretrained knowledge.

Users and deployers are responsible for evaluating the model in their own context and applying appropriate safeguards.

Acknowledgements

ReAligned-Qwen3.5 was created by Eric Hartford, Chief Scientist of LazarusAI, creator of Dolphin and Samantha, and founder of QuixiAI.

Thanks to the creators of:

  • Qwen / Qwen3.5
  • Llama 3.2
  • Dolphin
  • the open-source alignment, LoRA, GRPO, llama.cpp, GGUF, and evaluation ecosystems

Citation

@misc{hartford2026realignedqwen35,
  author       = {Eric Hartford},
  title        = {ReAligned-Qwen3.5},
  year         = {2026},
  organization = {QuixiAI and LazarusAI},
  url          = {https://huggingface.co/QuixiAI/ReAligned-Qwen3.5}
}
@misc{hartford2026realignedclassifier,
  author       = {Eric Hartford},
  title        = {ReAligned Classifier},
  year         = {2026},
  organization = {QuixiAI},
  url          = {https://huggingface.co/QuixiAI/ReAligned-Classifier}
}
Downloads last month
1,448
GGUF
Model size
35B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF

Quantized
(1)
this model

Collection including Lazarus-Ai/ReAligned-Qwen3.5-35B-A3B-GGUF