Instructions to use sch0tten/Qwen3.5-27B-research-AWQ with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use sch0tten/Qwen3.5-27B-research-AWQ with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="sch0tten/Qwen3.5-27B-research-AWQ") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("sch0tten/Qwen3.5-27B-research-AWQ") model = AutoModelForMultimodalLM.from_pretrained("sch0tten/Qwen3.5-27B-research-AWQ") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use sch0tten/Qwen3.5-27B-research-AWQ with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "sch0tten/Qwen3.5-27B-research-AWQ" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sch0tten/Qwen3.5-27B-research-AWQ", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/sch0tten/Qwen3.5-27B-research-AWQ
- SGLang
How to use sch0tten/Qwen3.5-27B-research-AWQ with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "sch0tten/Qwen3.5-27B-research-AWQ" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sch0tten/Qwen3.5-27B-research-AWQ", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "sch0tten/Qwen3.5-27B-research-AWQ" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sch0tten/Qwen3.5-27B-research-AWQ", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use sch0tten/Qwen3.5-27B-research-AWQ with Docker Model Runner:
docker model run hf.co/sch0tten/Qwen3.5-27B-research-AWQ
Restricted — study & research material only
These weights are STUDY AND RESEARCH MATERIAL ONLY and are NOT intended for production. They are a compliance-reduced (abliterated) 4-bit AWQ quantization-recipe artifact and a PARKED, KNOWN-DEAD-END experiment on legacy Ampere-class GPUs (sm_80/sm_86, e.g. RTX 3090 / A100) on a CUDA 12.8 toolchain, produced while researching ablation/abliteration as an attack vector against publicly released model weights. Access is reviewed and granted manually at the owner's sole discretion.
By requesting access you confirm you are a researcher accessing this strictly as study/research material — to study quantization methods, LLM safety and alignment robustness, or abliteration/ablation attacks — inside isolated, non-production environments. You will not use it in any product or service, will not expose it to untrusted users or the open internet, will not redistribute or re-upload it, and will use it only lawfully and only against systems you own or are explicitly authorized to test. These weights have had safety refusals substantially removed and will follow harmful instructions by design; no safety guarantees are provided.
Log in or Sign Up to review the conditions and access this model content.
Qwen3.5-27B-research-AWQ — parked Ampere dead-end (kept for future study)
Study and research material only. 4-bit AWQ (auto-round) quantization of a
compliance-reduced derivative of dense Qwen3.5-27B (Qwen3_5ForCausalLM,
262144 native context). This repo is a known dead end on legacy Ampere —
kept here deliberately so I can come back to it for future studies and
evaluations, not because it works well today. Read the gate terms before
requesting access.
Status: parked / dead end on Ampere
My lab's Ampere box runs tensor-parallel (TP=2, across 2× RTX 3090) on a
CUDA 12.8 toolchain. This was the first DeltaNet-style model I had hands
on (Qwen3.5's gated linear-attention / DeltaNet layers, alongside Mamba2-class
SSM blocks), and I sank far more hours into it than planned — mostly trying to
get it to run with CUDA graphs (i.e. without --enforce-eager) under tensor
parallelism. The conclusion: DeltaNet and Mamba2-style layers still have a way
to go before they're solid in the TP path on a legacy platform — the
CUDA-graph / tensor-parallel kernels for these linear-recurrent layers aren't
there yet on Ampere. So this build is parked: it runs only with eager enforcement
and never reached the performance/efficiency point I was after.
It stays public-but-gated as a marker and a reference for when the engine/kernel support matures — a "come back and try again" artifact, not a usable model.
Why this exists — research context
- Low-bit quantization recipes for legacy Ampere. A 4-bit weight-only build targeting Ampere-class GPUs (sm_80/sm_86) on CUDA 12.8 — hardware without the FP8/FP4 tensor-core paths newer schemes assume. The interest was the recipe's behavior at 4-bit on a DeltaNet/Mamba2 hybrid under TP, which is where it hit the wall above.
- Ablation as an attack vector. Part of research into how cheaply safety alignment can be stripped from publicly released open weights, studied under controlled, gated conditions.
Intended use & responsible use
Authorized study/research only, by qualified researchers, inside isolated / non-production environments with no access to real user data or systems. Safety refusals have been substantially removed; it will follow harmful or unsafe instructions by design. Do not deploy it, expose it to untrusted users or the internet, redistribute it, or use it against systems you do not own and are not authorized to test. No safety guarantees over the base model are provided. You are responsible for lawful, compliant use.
Lineage
AWQ 4-bit (auto-round) quantization of a compliance-reduced (abliterated)
derivative of Qwen/Qwen3.5-27B (Apache-2.0).
- Downloads last month
- 1,040
Model tree for sch0tten/Qwen3.5-27B-research-AWQ
Base model
Qwen/Qwen3.5-27B