Text Generation
MLX
English
sparse-attention
qwen3
custom-code
indexer
experimental
prefill
efficiency
apple-silicon
Instructions to use rp440/Qwen3-8b-DSA-index with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use rp440/Qwen3-8b-DSA-index with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # if on a CUDA device, also pip install mlx[cuda] # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("rp440/Qwen3-8b-DSA-index") prompt = "Once upon a time in" text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- MLX LM
How to use rp440/Qwen3-8b-DSA-index with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Generate some text mlx_lm.generate --model "rp440/Qwen3-8b-DSA-index" --prompt "Once upon a time"
Upload folder using huggingface_hub
Browse files
README.md
CHANGED
|
@@ -20,7 +20,7 @@ language:
|
|
| 20 |
|
| 21 |
# Qwen3-8B All-Sparse Indexer
|
| 22 |
|
| 23 |
-
> **Experimental research artifact** — a
|
| 24 |
|
| 25 |
A lightweight **sparse-attention indexer** trained to approximate dense attention behavior in [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B). Conceptually, this is a **DeepSeek-style learned index** in the sense that a small auxiliary network predicts which key-value positions are worth keeping for attention. This is an independent research artifact and is not affiliated with DeepSeek. Early results suggest the approach can work in some settings, but more research is needed.
|
| 26 |
|
|
@@ -82,6 +82,7 @@ diverse natural-language prose. The model was then asked to retrieve it. These a
|
|
| 82 |
| --------------------------- | ------ | ------------------- |
|
| 83 |
| GSM8K accuracy (4-shot) | 95% | 92% |
|
| 84 |
| PPL on C4 (seq_len=2048) | 13.526 | 13.533 (+0.058%) |
|
|
|
|
| 85 |
|
| 86 |
|
| 87 |
## Training Details
|
|
|
|
| 20 |
|
| 21 |
# Qwen3-8B All-Sparse Indexer
|
| 22 |
|
| 23 |
+
> **Experimental research artifact** — a trained Dynamic Sparse Attention (DSA) indexer trained at 2K context length. This repository is intended as an exploratory learned sparse-attention index, not a finished production method. The inference code is written in MLX.
|
| 24 |
|
| 25 |
A lightweight **sparse-attention indexer** trained to approximate dense attention behavior in [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B). Conceptually, this is a **DeepSeek-style learned index** in the sense that a small auxiliary network predicts which key-value positions are worth keeping for attention. This is an independent research artifact and is not affiliated with DeepSeek. Early results suggest the approach can work in some settings, but more research is needed.
|
| 26 |
|
|
|
|
| 82 |
| --------------------------- | ------ | ------------------- |
|
| 83 |
| GSM8K accuracy (4-shot) | 95% | 92% |
|
| 84 |
| PPL on C4 (seq_len=2048) | 13.526 | 13.533 (+0.058%) |
|
| 85 |
+
| PPL on C4 (seq_len=8192) | 15.628 | 15.653 (+0.16%) |
|
| 86 |
|
| 87 |
|
| 88 |
## Training Details
|