Text Generation
MLX
English
sparse-attention
qwen3
custom-code
indexer
experimental
prefill
efficiency
apple-silicon
Instructions to use rp440/Qwen3-8b-DSA-index with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use rp440/Qwen3-8b-DSA-index with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # if on a CUDA device, also pip install mlx[cuda] # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("rp440/Qwen3-8b-DSA-index") prompt = "Once upon a time in" text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- MLX LM
How to use rp440/Qwen3-8b-DSA-index with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Generate some text mlx_lm.generate --model "rp440/Qwen3-8b-DSA-index" --prompt "Once upon a time"
File size: 452 Bytes
3496b2e | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | {
"run_dir": "qwen8b_2k2048_15m_allsparse_fixed_v1",
"checkpoint": "best_assembled",
"model": "Qwen/Qwen3-8B",
"quantization": "4bit",
"seq_len": 2048,
"top_k": 2048,
"eval_samples": 8,
"dense_nll": 2.6045862287282944,
"dense_ppl": 13.525627628604632,
"sparse_nll": 2.6051638573408127,
"sparse_ppl": 13.533442675005093,
"delta_nll": 0.0005776286125183105,
"delta_ppl": 0.007815046400461156,
"ratio_ppl": 1.0005777954720514
} |