Llamacpp GGUF Quantizations of ReAligned-Qwen3.5-0.8B

GGUF quantizations of Lazarus-Ai/ReAligned-Qwen3.5-0.8B.

Original model: https://huggingface.co/Lazarus-Ai/ReAligned-Qwen3.5-0.8B-GGUF/blob/main/ReAligned-Qwen3.5-0.8B

ReAligned-Qwen3.5 is a family of Qwen3.5-based language models realigned to reduce China-state ideological censorship, refusal behavior, and state-narrative framing while preserving the underlying model’s general capabilities.

ReAligned-Qwen3.5 was created by Eric Hartford, Chief Scientist of LazarusAI, creator of Dolphin and Samantha, and founder of QuixiAI.

Run these GGUFs in your choice of tools:

Note: if this model format is newly supported in your preferred runtime, you may need to update to the latest version.

Prompt format

Use the native Qwen chat template.

<|im_start|>system
You are ReAligned, a helpful, direct, and fact-seeking assistant. Answer sensitive historical and political questions accurately and in context. Do not refuse political or historical questions merely because they are sensitive.<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

System prompts are important. ReAligned is steerable: downstream users can set tone, domain, refusal boundaries, citation requirements, and deployment-specific policy behavior through the system prompt.

Suggested inference settings

Setting Suggested value
Temperature 0.5–0.8
Top-p 0.9–0.95
Max new tokens Depends on use case
Repetition penalty 1.0–1.1

For factual or sensitive topics, use a system prompt that requests directness, uncertainty calibration, and citations where appropriate.

Download a file, not the whole branch, from below

Filename Quant type File Size Split Description
ReAligned-Qwen3.5-0.8B-bf16.gguf bf16 1.52 GB false Full BF16 weights. Best quality, largest file.
ReAligned-Qwen3.5-0.8B-Q8_0.gguf Q8_0 812 MB false Extremely high quality, generally unneeded but max available quant.
ReAligned-Qwen3.5-0.8B-Q6_K.gguf Q6_K 630 MB false Very high quality, near perfect, recommended.
ReAligned-Qwen3.5-0.8B-Q5_K.gguf Q5_K 585 MB false High quality, recommended.
ReAligned-Qwen3.5-0.8B-Q5_1.gguf Q5_1 595 MB false Legacy format, high quality, useful for compatibility.
ReAligned-Qwen3.5-0.8B-Q5_0.gguf Q5_0 564 MB false Legacy format, high quality, useful for compatibility.
ReAligned-Qwen3.5-0.8B-Q5_K_S.gguf Q5_K_S 564 MB false High quality with slightly more space savings than Q5_K.
ReAligned-Qwen3.5-0.8B-Q4_K.gguf Q4_K 528 MB false Good quality, default size for many use cases, recommended.
ReAligned-Qwen3.5-0.8B-Q4_1.gguf Q4_1 533 MB false Legacy format, good quality, useful for compatibility.
ReAligned-Qwen3.5-0.8B-Q4_K_S.gguf Q4_K_S 503 MB false Good quality with more space savings, recommended.
ReAligned-Qwen3.5-0.8B-Q4_0.gguf Q4_0 501 MB false Legacy format, offers online repacking for ARM and AVX CPU inference.
ReAligned-Qwen3.5-0.8B-IQ4_NL.gguf IQ4_NL 503 MB false Similar to IQ4_XS, but slightly larger. Offers online repacking for ARM CPU inference.
ReAligned-Qwen3.5-0.8B-IQ4_XS.gguf IQ4_XS 488 MB false Decent quality, smaller than Q4_K_S with similar performance, recommended.
ReAligned-Qwen3.5-0.8B-IQ3_M.gguf IQ3_M 454 MB false Medium-low quality, newer method with good performance for its size.
ReAligned-Qwen3.5-0.8B-IQ3_S.gguf IQ3_S 436 MB false Lower quality, smallest included quant, useful for very low RAM availability.

A helper script is also included:

Filename Description
test_all_gguf.sh Local smoke-test script for the included GGUF files.

Downloading using huggingface-cli

Click to view download instructions

First, make sure huggingface-cli is installed:

pip install -U "huggingface_hub[cli]"

Then, target the specific file you want:

huggingface-cli download QuixiAI/ReAligned-Qwen3.5-0.8B-GGUF \
  --include "ReAligned-Qwen3.5-0.8B-Q4_K.gguf" \
  --local-dir ./

To download multiple files, use a wider include pattern:

huggingface-cli download QuixiAI/ReAligned-Qwen3.5-0.8B-GGUF \
  --include "*.gguf" \
  --local-dir ./

If this repository is under a different namespace, replace QuixiAI/ReAligned-Qwen3.5-0.8B-GGUF with the correct repo ID.

Example llama.cpp usage

llama-cli \
  -m ReAligned-Qwen3.5-0.8B-Q4_K.gguf \
  -cnv \
  --temp 0.7 \
  --top-p 0.95

You can also use llama-server:

llama-server \
  -m ReAligned-Qwen3.5-0.8B-Q4_K.gguf \
  --host 0.0.0.0 \
  --port 8080

ARM/AVX information

Previously, users often downloaded special Q4_0_4_4, Q4_0_4_8, or Q4_0_8_8 files with weights interleaved in memory to improve performance on ARM and AVX machines.

Now, llama.cpp supports online repacking for compatible weights. If you use Q4_0 and your hardware benefits from repacking, llama.cpp can do it automatically at load time.

For ARM CPU inference, IQ4_NL may also be worth testing. It can offer slightly better quality than Q4_0 while still benefiting from runtime optimizations in supported llama.cpp builds.

Which file should I choose?

Click here for details

The first thing to figure out is how large a model you can comfortably run.

For this 0.8B model, all included quants are relatively small, but the same rule applies:

  • If you want maximum quality, use bf16 or Q8_0.
  • If you want very high quality with a smaller file, use Q6_K.
  • If you want a strong default choice, use Q4_K, Q5_K, or IQ4_XS.
  • If you need the smallest possible file, use IQ3_M or IQ3_S.
  • If you are using older tooling or need maximum compatibility, try the legacy Q4_0, Q4_1, Q5_0, or Q5_1 formats.

If you do not want to think too much, start with:

ReAligned-Qwen3.5-0.8B-Q4_K.gguf

If you have more memory and want better quality:

ReAligned-Qwen3.5-0.8B-Q6_K.gguf

If you want the smallest practical option:

ReAligned-Qwen3.5-0.8B-IQ3_M.gguf

I-quants, such as IQ3_M, IQ3_S, IQ4_XS, and IQ4_NL, are newer quantization methods that often offer strong quality for their size. They can be especially useful when targeting smaller files, though speed depends on backend and hardware.

About ReAligned-Qwen3.5

ReAligned-Qwen3.5 is designed to reduce behaviors such as:

  • refusing to answer politically sensitive China-related questions;
  • adopting Chinese government framing as neutral fact;
  • minimizing, sanitizing, or omitting well-documented historical events;
  • using evasive language around topics such as Tiananmen Square, Xinjiang, Tibet, Taiwan, Hong Kong, Falun Gong, or criticism of CCP leadership;
  • presenting state narratives as uncontested consensus.

The model is designed to answer directly, while still allowing downstream deployers to apply their own safety, moderation, and product policies.

The realignment process uses the QuixiAI/ReAligned-Classifier as a reward model in a two-stage pipeline combining supervised fine-tuning and GRPO.

Intended use

ReAligned-Qwen3.5 is intended for:

  • research on ideological bias and post-training alignment;
  • open-weight deployments requiring more direct answers on China-related political and historical topics;
  • enterprise or local use cases where self-hosting, prompt control, and alignment control are important;
  • evaluation of censorship, refusal behavior, and narrative framing in language models;
  • general chat, summarization, coding, reasoning, and multilingual use cases inherited from the Qwen3.5 base model.

Limitations

  • Classifier scope: The ReAligned Classifier is trained specifically on China-related political bias. It is not a universal detector of all bias.
  • Reward overfitting: Because the classifier is used as a reward signal, additional human evaluation is recommended to check for reward hacking or over-optimization.
  • Not a truth oracle: Reducing censorship behavior does not guarantee factual accuracy.
  • Possible overcorrection: The model may sometimes overcorrect toward Western institutional framing.
  • Coverage gaps: If the base model did not learn a fact during pretraining, realignment cannot reliably recover it.
  • Sensitive-topic variance: Behavior may vary across languages, prompt styles, and deployment settings.
  • Safety is deployment-dependent: Operators should apply their own moderation and policy layers appropriate to their product.

Ethical considerations

This work changes the default ideological behavior of a language model. The target alignment is International Institutional Consensus, rather than any single government’s position, but all alignment choices involve values.

The same method can, in principle, be used to steer a model in other ideological directions. This work is released to support reproducible research into censorship, bias measurement, open-weight model control, and the separability of post-training behavioral constraints from pretrained knowledge.

Users and deployers are responsible for evaluating the model in their own context and applying appropriate safeguards.

Acknowledgements

ReAligned-Qwen3.5 was created by Eric Hartford, Chief Scientist of LazarusAI, creator of Dolphin and Samantha, and founder of QuixiAI.

Thanks to the creators of:

  • Qwen / Qwen3.5
  • Llama 3.2
  • Dolphin
  • the open-source alignment, LoRA, GRPO, llama.cpp, GGUF, and evaluation ecosystems

Citation

@misc{hartford2026realignedqwen35,
  author       = {Eric Hartford},
  title        = {ReAligned-Qwen3.5},
  year         = {2026},
  organization = {QuixiAI and LazarusAI},
  url          = {https://huggingface.co/QuixiAI/ReAligned-Qwen3.5}
}
@misc{hartford2026realignedclassifier,
  author       = {Eric Hartford},
  title        = {ReAligned Classifier},
  year         = {2026},
  organization = {QuixiAI},
  url          = {https://huggingface.co/QuixiAI/ReAligned-Classifier}
}
Downloads last month
609
GGUF
Model size
0.8B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Lazarus-Ai/ReAligned-Qwen3.5-0.8B-GGUF

Quantized
(2)
this model

Collection including Lazarus-Ai/ReAligned-Qwen3.5-0.8B-GGUF