phaedawg commited on
Commit
9e9e624
·
verified ·
1 Parent(s): dfc4ec3

Remove ASYM

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -23,7 +23,7 @@ This repository provides **quantized runtime packages** of
23
  **[Qwen/Qwen3-Next-80B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Instruct)**, repackaged for **vLLM** using the **compressed-tensors** format.
24
 
25
  > **TL;DR**
26
- > - **This repo is quantized** with branches **W4A16-ASYM** and **W8A16**.
27
  > - Load with **vLLM** using `--quantization compressed-tensors`.
28
  > - Qwen3‑Next **A3B** is an 80B‑parameter *hybrid MoE* model that **activates ~3B** params per token and supports **ultra‑long context (≈262K)**. Only a subset of experts is active at a time, but full weights still must be resident in GPU/CPU memory for fast inference.
29
 
@@ -45,12 +45,12 @@ This repository provides **quantized runtime packages** of
45
  > The **`main`** branch is a **landing page** (model card + links). All runnable artifacts live under per‑revision branches.
46
 
47
  - **main** — placeholder / landing page
48
- - **W4A16-ASYM** — 4‑bit weights / 16‑bit activations builds and runtime assets
49
  - **W8A16** — 8‑bit weights / 16‑bit activations builds
50
 
51
  **Quick links:**
52
  - 🔗 **[`main`](https://huggingface.co/TheHouseOfTheDude/Qwen3-Next-80B-A3B-Instruct_Compressed-Tensors/tree/main)**
53
- - 🔗 **[`W4A16-ASYM`](https://huggingface.co/TheHouseOfTheDude/Qwen3-Next-80B-A3B-Instruct_Compressed-Tensors/tree/W4A16-ASYM)**
54
  - 🔗 **[`W8A16`](https://huggingface.co/TheHouseOfTheDude/Qwen3-Next-80B-A3B-Instruct_Compressed-Tensors/tree/W8A16)**
55
 
56
  ---
 
23
  **[Qwen/Qwen3-Next-80B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Instruct)**, repackaged for **vLLM** using the **compressed-tensors** format.
24
 
25
  > **TL;DR**
26
+ > - **This repo is quantized** with branches **W4A16** and **W8A16**.
27
  > - Load with **vLLM** using `--quantization compressed-tensors`.
28
  > - Qwen3‑Next **A3B** is an 80B‑parameter *hybrid MoE* model that **activates ~3B** params per token and supports **ultra‑long context (≈262K)**. Only a subset of experts is active at a time, but full weights still must be resident in GPU/CPU memory for fast inference.
29
 
 
45
  > The **`main`** branch is a **landing page** (model card + links). All runnable artifacts live under per‑revision branches.
46
 
47
  - **main** — placeholder / landing page
48
+ - **W4A16** — 4‑bit weights / 16‑bit activations builds and runtime assets
49
  - **W8A16** — 8‑bit weights / 16‑bit activations builds
50
 
51
  **Quick links:**
52
  - 🔗 **[`main`](https://huggingface.co/TheHouseOfTheDude/Qwen3-Next-80B-A3B-Instruct_Compressed-Tensors/tree/main)**
53
+ - 🔗 **[`W4A16`](https://huggingface.co/TheHouseOfTheDude/Qwen3-Next-80B-A3B-Instruct_Compressed-Tensors/tree/W4A16)**
54
  - 🔗 **[`W8A16`](https://huggingface.co/TheHouseOfTheDude/Qwen3-Next-80B-A3B-Instruct_Compressed-Tensors/tree/W8A16)**
55
 
56
  ---