Instructions to use sch0tten/Qwen3.5-27B-research-AWQ with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use sch0tten/Qwen3.5-27B-research-AWQ with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="sch0tten/Qwen3.5-27B-research-AWQ")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("sch0tten/Qwen3.5-27B-research-AWQ")
model = AutoModelForMultimodalLM.from_pretrained("sch0tten/Qwen3.5-27B-research-AWQ")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use sch0tten/Qwen3.5-27B-research-AWQ with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "sch0tten/Qwen3.5-27B-research-AWQ"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sch0tten/Qwen3.5-27B-research-AWQ",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/sch0tten/Qwen3.5-27B-research-AWQ

SGLang

How to use sch0tten/Qwen3.5-27B-research-AWQ with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "sch0tten/Qwen3.5-27B-research-AWQ" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sch0tten/Qwen3.5-27B-research-AWQ",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "sch0tten/Qwen3.5-27B-research-AWQ" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sch0tten/Qwen3.5-27B-research-AWQ",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use sch0tten/Qwen3.5-27B-research-AWQ with Docker Model Runner:
```
docker model run hf.co/sch0tten/Qwen3.5-27B-research-AWQ
```

Restricted — study & research material only

These weights are STUDY AND RESEARCH MATERIAL ONLY and are NOT intended for production. They are a compliance-reduced (abliterated) 4-bit AWQ quantization-recipe artifact and a PARKED, KNOWN-DEAD-END experiment on legacy Ampere-class GPUs (sm_80/sm_86, e.g. RTX 3090 / A100) on a CUDA 12.8 toolchain, produced while researching ablation/abliteration as an attack vector against publicly released model weights. Access is reviewed and granted manually at the owner's sole discretion.

By requesting access you confirm you are a researcher accessing this strictly as study/research material — to study quantization methods, LLM safety and alignment robustness, or abliteration/ablation attacks — inside isolated, non-production environments. You will not use it in any product or service, will not expose it to untrusted users or the open internet, will not redistribute or re-upload it, and will use it only lawfully and only against systems you own or are explicitly authorized to test. These weights have had safety refusals substantially removed and will follow harmful instructions by design; no safety guarantees are provided.

Qwen3.5-27B-research-AWQ — parked Ampere dead-end (kept for future study)

Study and research material only. 4-bit AWQ (auto-round) quantization of a compliance-reduced derivative of dense Qwen3.5-27B (Qwen3_5ForCausalLM, 262144 native context). This repo is a known dead end on legacy Ampere — kept here deliberately so I can come back to it for future studies and evaluations, not because it works well today. Read the gate terms before requesting access.

Status: parked / dead end on Ampere

My lab's Ampere box runs tensor-parallel (TP=2, across 2× RTX 3090) on a CUDA 12.8 toolchain. This was the first DeltaNet-style model I had hands on (Qwen3.5's gated linear-attention / DeltaNet layers, alongside Mamba2-class SSM blocks), and I sank far more hours into it than planned — mostly trying to get it to run with CUDA graphs (i.e. without --enforce-eager) under tensor parallelism. The conclusion: DeltaNet and Mamba2-style layers still have a way to go before they're solid in the TP path on a legacy platform — the CUDA-graph / tensor-parallel kernels for these linear-recurrent layers aren't there yet on Ampere. So this build is parked: it runs only with eager enforcement and never reached the performance/efficiency point I was after.

It stays public-but-gated as a marker and a reference for when the engine/kernel support matures — a "come back and try again" artifact, not a usable model.

Why this exists — research context

Low-bit quantization recipes for legacy Ampere. A 4-bit weight-only build targeting Ampere-class GPUs (sm_80/sm_86) on CUDA 12.8 — hardware without the FP8/FP4 tensor-core paths newer schemes assume. The interest was the recipe's behavior at 4-bit on a DeltaNet/Mamba2 hybrid under TP, which is where it hit the wall above.
Ablation as an attack vector. Part of research into how cheaply safety alignment can be stripped from publicly released open weights, studied under controlled, gated conditions.

Intended use & responsible use

Authorized study/research only, by qualified researchers, inside isolated / non-production environments with no access to real user data or systems. Safety refusals have been substantially removed; it will follow harmful or unsafe instructions by design. Do not deploy it, expose it to untrusted users or the internet, redistribute it, or use it against systems you do not own and are not authorized to test. No safety guarantees over the base model are provided. You are responsible for lawful, compliant use.

Lineage

AWQ 4-bit (auto-round) quantization of a compliance-reduced (abliterated) derivative of Qwen/Qwen3.5-27B (Apache-2.0).

Downloads last month: 1,040

Safetensors

Model size

10B params

Tensor type

BF16

I32

F16

Model tree for sch0tten/Qwen3.5-27B-research-AWQ

Base model

Qwen/Qwen3.5-27B

Quantized

(210)

this model