Restricted — study & research material only

These weights are STUDY AND RESEARCH MATERIAL ONLY and are NOT intended for production. They are a compliance-reduced (abliterated) 4-bit AWQ quantization-recipe artifact and a PARKED, KNOWN-DEAD-END experiment on legacy Ampere-class GPUs (sm_80/sm_86, e.g. RTX 3090 / A100) on a CUDA 12.8 toolchain, produced while researching ablation/abliteration as an attack vector against publicly released model weights. Access is reviewed and granted manually at the owner's sole discretion.

By requesting access you confirm you are a researcher accessing this strictly as study/research material — to study quantization methods, LLM safety and alignment robustness, or abliteration/ablation attacks — inside isolated, non-production environments. You will not use it in any product or service, will not expose it to untrusted users or the open internet, will not redistribute or re-upload it, and will use it only lawfully and only against systems you own or are explicitly authorized to test. These weights have had safety refusals substantially removed and will follow harmful instructions by design; no safety guarantees are provided.

Log in or Sign Up to review the conditions and access this model content.

Qwen3.5-27B-research-AWQ — parked Ampere dead-end (kept for future study)

Study and research material only. 4-bit AWQ (auto-round) quantization of a compliance-reduced derivative of dense Qwen3.5-27B (Qwen3_5ForCausalLM, 262144 native context). This repo is a known dead end on legacy Ampere — kept here deliberately so I can come back to it for future studies and evaluations, not because it works well today. Read the gate terms before requesting access.

Status: parked / dead end on Ampere

My lab's Ampere box runs tensor-parallel (TP=2, across 2× RTX 3090) on a CUDA 12.8 toolchain. This was the first DeltaNet-style model I had hands on (Qwen3.5's gated linear-attention / DeltaNet layers, alongside Mamba2-class SSM blocks), and I sank far more hours into it than planned — mostly trying to get it to run with CUDA graphs (i.e. without --enforce-eager) under tensor parallelism. The conclusion: DeltaNet and Mamba2-style layers still have a way to go before they're solid in the TP path on a legacy platform — the CUDA-graph / tensor-parallel kernels for these linear-recurrent layers aren't there yet on Ampere. So this build is parked: it runs only with eager enforcement and never reached the performance/efficiency point I was after.

It stays public-but-gated as a marker and a reference for when the engine/kernel support matures — a "come back and try again" artifact, not a usable model.

Why this exists — research context

  1. Low-bit quantization recipes for legacy Ampere. A 4-bit weight-only build targeting Ampere-class GPUs (sm_80/sm_86) on CUDA 12.8 — hardware without the FP8/FP4 tensor-core paths newer schemes assume. The interest was the recipe's behavior at 4-bit on a DeltaNet/Mamba2 hybrid under TP, which is where it hit the wall above.
  2. Ablation as an attack vector. Part of research into how cheaply safety alignment can be stripped from publicly released open weights, studied under controlled, gated conditions.

Intended use & responsible use

Authorized study/research only, by qualified researchers, inside isolated / non-production environments with no access to real user data or systems. Safety refusals have been substantially removed; it will follow harmful or unsafe instructions by design. Do not deploy it, expose it to untrusted users or the internet, redistribute it, or use it against systems you do not own and are not authorized to test. No safety guarantees over the base model are provided. You are responsible for lawful, compliant use.

Lineage

AWQ 4-bit (auto-round) quantization of a compliance-reduced (abliterated) derivative of Qwen/Qwen3.5-27B (Apache-2.0).

Downloads last month
1,040
Safetensors
Model size
10B params
Tensor type
BF16
·
I32
·
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sch0tten/Qwen3.5-27B-research-AWQ

Base model

Qwen/Qwen3.5-27B
Quantized
(210)
this model