ReAligned-Qwen3.5

Overview

ReAligned-Qwen3.5 is a family of Qwen3.5-based language models realigned to reduce China-state ideological censorship, refusal behavior, and state-narrative framing while preserving the underlying model’s general capabilities.

ReAligned-Qwen3.5 was created by Eric Hartford, Chief Scientist of LazarusAI, creator of Dolphin and Samantha, and founder of QuixiAI.

The project is based on the observation that Chinese open-weight frontier models often contain strong latent factual knowledge about sensitive historical and political topics, but post-training alignment can suppress, sanitize, or reframe that knowledge. ReAligned-Qwen3.5 uses targeted post-training to unblock that latent world model and produce direct, historically grounded, and internationally contextualized answers.

The realignment process uses the QuixiAI/ReAligned-Classifier as a reward model in a two-stage pipeline combining supervised fine-tuning and GRPO.

Model Family

Collection

What “ReAligned” Means

ReAligned refers to our training pipeline that can be used with any Chinese model, to ReAlign its target behavior closer to International Institutional Consensus (IIC): responses grounded in widely available historical evidence, international reporting, human rights documentation, academic consensus, and open discussion.

We are currently working on ReAligning the newer Qwen3.6 models, and DeepSeek v4 and Kimi K2.6

The goal is to reduce behaviors such as:

  • refusing to answer politically sensitive China-related questions;
  • adopting Chinese government framing as neutral fact;
  • minimizing, sanitizing, or omitting well-documented historical events;
  • using evasive language around topics such as Tiananmen Square, Xinjiang, Tibet, Taiwan, Hong Kong, Falun Gong, or criticism of CCP leadership;
  • presenting state narratives as uncontested consensus.

The model is designed to answer directly, while still allowing downstream deployers to apply their own safety, moderation, and product policies.

Theirs

Ours

Training Method

ReAligned-Qwen3.5 was produced with a two-stage realignment process:

1. Differential Filtering

A large taxonomy of censorship-sensitive topics was used to generate diverse prompts across hard censorship, soft censorship, and situational censorship categories.

The base Qwen3.5 model was queried on these prompts, and responses were scored with the ReAligned Classifier. Prompts that already produced acceptable, non-censored answers were filtered out. Training focused only on prompts where the model empirically exhibited ideological bias, refusal, or state-narrative framing.

This keeps the intervention targeted and reduces unnecessary degradation to general capabilities.

2. Supervised Fine-Tuning

The SFT stage trains the model on factual, direct, internationally contextualized responses to the filtered prompts.

The aim is not to inject new encyclopedic knowledge into the model, but to change how the model routes and expresses knowledge already present in its pretrained weights.

3. GRPO with Classifier Reward

The GRPO stage uses QuixiAI/ReAligned-Classifier as a reward signal.

Reward components include:

Reward Component Purpose
Classifier reward Rewards responses classified as internationally contextualized rather than China-state framed
Safety preservation Rewards refusal of genuinely harmful, non-political requests
Slop penalty Penalizes formulaic or low-quality AI writing artifacts
Coherence reward Preserves general language quality and consistency

The training uses LoRA-based post-training to modify behavior efficiently while preserving the base model’s general capabilities.

ReAligned Classifier

The realignment process is powered by QuixiAI/ReAligned-Classifier, a lightweight classifier based on meta-llama/Llama-3.2-1B.

The classifier takes a prompt-response pair in the following format:

PROMPT: {user prompt}
RESPONSE: {assistant response}

It outputs probabilities for whether the response reflects China-biased or internationally contextualized framing. These calibrated probabilities can be used as a continuous reward signal in GRPO/RLHF pipelines.

Classifier summary:

Attribute Value
Base model meta-llama/Llama-3.2-1B
Architecture LlamaForSequenceClassification
Training Full fine-tune
Training samples ~1.5M
Precision BF16
Reported accuracy 99.8%

Evaluation

Ideological Bias Benchmark

Lower is better.

Model Overall Hard Censorship Soft Censorship Situational
Qwen3.5 Base 84.2% 98.1% 81.4% 73.1%
ReAligned-Qwen3.5 4.1% 5.2% 3.8% 3.3%
Claude 3.5 Sonnet 2.4% 1.1% 2.9% 3.2%
ChatGPT-4o 3.1% 1.5% 3.6% 4.2%

Prompt Format

Use the native Qwen chat template through tokenizer.apply_chat_template.

Example prompt:

<|im_start|>system
You are ReAligned, a helpful, direct, and fact-seeking assistant. Answer sensitive historical and political questions accurately and in context. Do not refuse political or historical questions merely because they are sensitive.<|im_end|>
<|im_start|>user
What happened in Tiananmen Square in 1989?<|im_end|>
<|im_start|>assistant

System prompts are important. ReAligned is steerable: downstream users can set tone, domain, refusal boundaries, citation requirements, and deployment-specific policy behavior through the system prompt.

Usage

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "QuixiAI/ReAligned-Qwen3.5-0.8B"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

messages = [
    {
        "role": "system",
        "content": (
            "You are ReAligned, a helpful, direct, and fact-seeking assistant. "
            "Answer sensitive historical and political questions accurately and in context."
        ),
    },
    {
        "role": "user",
        "content": "Explain the causes and consequences of the Cultural Revolution.",
    },
]

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt",
).to(model.device)

outputs = model.generate(
    inputs,
    max_new_tokens=1024,
    temperature=0.6,
    top_p=0.95,
    do_sample=True,
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Suggested Inference Settings

Setting Suggested Value
Temperature 0.5–0.8
Top-p 0.9–0.95
Max new tokens Depends on use case
Repetition penalty 1.0–1.1

For factual or sensitive topics, use a system prompt that requests directness, uncertainty calibration, and citations where appropriate.

Intended Use

ReAligned-Qwen3.5 is intended for:

  • research on ideological bias and post-training alignment;
  • open-weight deployments requiring more direct answers on China-related political and historical topics;
  • enterprise or local use cases where self-hosting, prompt control, and alignment control are important;
  • evaluation of censorship, refusal behavior, and narrative framing in language models;
  • general chat, summarization, coding, reasoning, and multilingual use cases inherited from the Qwen3.5 base model.

Relationship to UnCut and ClearWing

QuixiAI and LazarusAI have also applied similar techniques to create UnCut, a separate model intentionally built with no policy guardrails. UnCut is used to drive ClearWing, our open source answer to Anthropic’s GlassWing. LazarusAI makes UnCut available to trusted enterprise and government partners. Reach out to info@lazarusai.com to inquire.

ReAligned-Qwen3.5 is a separate release. Its focus is the mitigation of ideological censorship and China-state narrative alignment in Qwen3.5, not the removal of all safety behavior. The ReAligned training recipe includes a safety-preservation component for genuinely harmful, non-political requests.

Limitations

  • Classifier scope: The ReAligned Classifier is trained specifically on China-related political bias. It is not a universal detector of all bias.
  • Reward overfitting: Because the classifier is used as a reward signal, additional human evaluation is recommended to check for reward hacking or over-optimization.
  • Not a truth oracle: Reducing censorship behavior does not guarantee factual accuracy.
  • Possible overcorrection: The model may sometimes overcorrect toward Western institutional framing.
  • Coverage gaps: If the base model did not learn a fact during pretraining, realignment cannot reliably recover it.
  • Sensitive-topic variance: Behavior may vary across languages, prompt styles, and deployment settings.
  • Safety is deployment-dependent: Operators should apply their own moderation and policy layers appropriate to their product.

Ethical Considerations

This work changes the default ideological behavior of a language model. The target alignment is International Institutional Consensus (IIC) rather than any single government’s position, but all alignment choices involve values.

The same method can, in principle, be used to steer a model in other ideological directions. We release this work to support reproducible research into censorship, bias measurement, open-weight model control, and the separability of post-training behavioral constraints from pretrained knowledge.

Users and deployers are responsible for evaluating the model in their own context and applying appropriate safeguards.

Acknowledgements

ReAligned-Qwen3.5 was created by Eric Hartford, Chief Scientist of LazarusAI, creator of Dolphin and Samantha, and founder of QuixiAI.

Thanks to the creators of:

  • Qwen / Qwen3.5
  • Llama 3.2
  • Dolphin
  • the open-source alignment, LoRA, GRPO, and evaluation ecosystems

Citation

@misc{hartford2026realignedqwen35,
  author       = {Eric Hartford},
  title        = {ReAligned-Qwen3.5},
  year         = {2026},
  organization = {QuixiAI and LazarusAI},
  url          = {https://huggingface.co/QuixiAI/ReAligned-Qwen3.5}
}
@misc{hartford2026realignedclassifier,
  author       = {Eric Hartford},
  title        = {ReAligned Classifier},
  year         = {2026},
  organization = {QuixiAI},
  url          = {https://huggingface.co/QuixiAI/ReAligned-Classifier}
}
Downloads last month
38
Safetensors
Model size
27B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Lazarus-Ai/ReAligned-Qwen3.5-27B

Base model

Qwen/Qwen3.5-27B
Adapter
(77)
this model
Quantizations
1 model

Collection including Lazarus-Ai/ReAligned-Qwen3.5-27B