Remove ASYM
Browse files
README.md
CHANGED
|
@@ -23,7 +23,7 @@ This repository provides **quantized runtime packages** of
|
|
| 23 |
**[Qwen/Qwen3-Next-80B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Instruct)**, repackaged for **vLLM** using the **compressed-tensors** format.
|
| 24 |
|
| 25 |
> **TL;DR**
|
| 26 |
-
> - **This repo is quantized** with branches **W4A16
|
| 27 |
> - Load with **vLLM** using `--quantization compressed-tensors`.
|
| 28 |
> - Qwen3‑Next **A3B** is an 80B‑parameter *hybrid MoE* model that **activates ~3B** params per token and supports **ultra‑long context (≈262K)**. Only a subset of experts is active at a time, but full weights still must be resident in GPU/CPU memory for fast inference.
|
| 29 |
|
|
@@ -45,12 +45,12 @@ This repository provides **quantized runtime packages** of
|
|
| 45 |
> The **`main`** branch is a **landing page** (model card + links). All runnable artifacts live under per‑revision branches.
|
| 46 |
|
| 47 |
- **main** — placeholder / landing page
|
| 48 |
-
- **W4A16
|
| 49 |
- **W8A16** — 8‑bit weights / 16‑bit activations builds
|
| 50 |
|
| 51 |
**Quick links:**
|
| 52 |
- 🔗 **[`main`](https://huggingface.co/TheHouseOfTheDude/Qwen3-Next-80B-A3B-Instruct_Compressed-Tensors/tree/main)**
|
| 53 |
-
- 🔗 **[`W4A16
|
| 54 |
- 🔗 **[`W8A16`](https://huggingface.co/TheHouseOfTheDude/Qwen3-Next-80B-A3B-Instruct_Compressed-Tensors/tree/W8A16)**
|
| 55 |
|
| 56 |
---
|
|
|
|
| 23 |
**[Qwen/Qwen3-Next-80B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Instruct)**, repackaged for **vLLM** using the **compressed-tensors** format.
|
| 24 |
|
| 25 |
> **TL;DR**
|
| 26 |
+
> - **This repo is quantized** with branches **W4A16** and **W8A16**.
|
| 27 |
> - Load with **vLLM** using `--quantization compressed-tensors`.
|
| 28 |
> - Qwen3‑Next **A3B** is an 80B‑parameter *hybrid MoE* model that **activates ~3B** params per token and supports **ultra‑long context (≈262K)**. Only a subset of experts is active at a time, but full weights still must be resident in GPU/CPU memory for fast inference.
|
| 29 |
|
|
|
|
| 45 |
> The **`main`** branch is a **landing page** (model card + links). All runnable artifacts live under per‑revision branches.
|
| 46 |
|
| 47 |
- **main** — placeholder / landing page
|
| 48 |
+
- **W4A16** — 4‑bit weights / 16‑bit activations builds and runtime assets
|
| 49 |
- **W8A16** — 8‑bit weights / 16‑bit activations builds
|
| 50 |
|
| 51 |
**Quick links:**
|
| 52 |
- 🔗 **[`main`](https://huggingface.co/TheHouseOfTheDude/Qwen3-Next-80B-A3B-Instruct_Compressed-Tensors/tree/main)**
|
| 53 |
+
- 🔗 **[`W4A16`](https://huggingface.co/TheHouseOfTheDude/Qwen3-Next-80B-A3B-Instruct_Compressed-Tensors/tree/W4A16)**
|
| 54 |
- 🔗 **[`W8A16`](https://huggingface.co/TheHouseOfTheDude/Qwen3-Next-80B-A3B-Instruct_Compressed-Tensors/tree/W8A16)**
|
| 55 |
|
| 56 |
---
|