TheHouseOfTheDude
/

Qwen3-Next-80B-A3B-Instruct_Compressed-Tensors

Text Generation

compressed-tensors

Model card Files Files and versions

phaedawg commited on Sep 19, 2025

Commit

9e9e624

·

verified ·

1 Parent(s): dfc4ec3

Remove ASYM

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -23,7 +23,7 @@ This repository provides **quantized runtime packages** of
 **[Qwen/Qwen3-Next-80B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Instruct)**, repackaged for **vLLM** using the **compressed-tensors** format.
 > **TL;DR**
-> - **This repo is quantized** with branches **W4A16-ASYM** and **W8A16**.
 > - Load with **vLLM** using `--quantization compressed-tensors`.
 > - Qwen3‑Next **A3B** is an 80B‑parameter *hybrid MoE* model that **activates ~3B** params per token and supports **ultra‑long context (≈262K)**. Only a subset of experts is active at a time, but full weights still must be resident in GPU/CPU memory for fast inference.
@@ -45,12 +45,12 @@ This repository provides **quantized runtime packages** of
 > The **`main`** branch is a **landing page** (model card + links). All runnable artifacts live under per‑revision branches.
 - **main** — placeholder / landing page
-- **W4A16-ASYM** — 4‑bit weights / 16‑bit activations builds and runtime assets
 - **W8A16** — 8‑bit weights / 16‑bit activations builds
 **Quick links:**
 - 🔗 **[`main`](https://huggingface.co/TheHouseOfTheDude/Qwen3-Next-80B-A3B-Instruct_Compressed-Tensors/tree/main)**
-- 🔗 **[`W4A16-ASYM`](https://huggingface.co/TheHouseOfTheDude/Qwen3-Next-80B-A3B-Instruct_Compressed-Tensors/tree/W4A16-ASYM)**
 - 🔗 **[`W8A16`](https://huggingface.co/TheHouseOfTheDude/Qwen3-Next-80B-A3B-Instruct_Compressed-Tensors/tree/W8A16)**
 ---

 **[Qwen/Qwen3-Next-80B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Instruct)**, repackaged for **vLLM** using the **compressed-tensors** format.
 > **TL;DR**
+> - **This repo is quantized** with branches **W4A16** and **W8A16**.
 > - Load with **vLLM** using `--quantization compressed-tensors`.
 > - Qwen3‑Next **A3B** is an 80B‑parameter *hybrid MoE* model that **activates ~3B** params per token and supports **ultra‑long context (≈262K)**. Only a subset of experts is active at a time, but full weights still must be resident in GPU/CPU memory for fast inference.
 > The **`main`** branch is a **landing page** (model card + links). All runnable artifacts live under per‑revision branches.
 - **main** — placeholder / landing page
+- **W4A16** — 4‑bit weights / 16‑bit activations builds and runtime assets
 - **W8A16** — 8‑bit weights / 16‑bit activations builds
 **Quick links:**
 - 🔗 **[`main`](https://huggingface.co/TheHouseOfTheDude/Qwen3-Next-80B-A3B-Instruct_Compressed-Tensors/tree/main)**
+- 🔗 **[`W4A16`](https://huggingface.co/TheHouseOfTheDude/Qwen3-Next-80B-A3B-Instruct_Compressed-Tensors/tree/W4A16)**
 - 🔗 **[`W8A16`](https://huggingface.co/TheHouseOfTheDude/Qwen3-Next-80B-A3B-Instruct_Compressed-Tensors/tree/W8A16)**
 ---