Text Generation
MLX
English
sparse-attention
qwen3
custom-code
indexer
experimental
prefill
efficiency
apple-silicon
Instructions to use rp440/Qwen3-8b-DSA-index with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use rp440/Qwen3-8b-DSA-index with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # if on a CUDA device, also pip install mlx[cuda] # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("rp440/Qwen3-8b-DSA-index") prompt = "Once upon a time in" text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- MLX LM
How to use rp440/Qwen3-8b-DSA-index with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Generate some text mlx_lm.generate --model "rp440/Qwen3-8b-DSA-index" --prompt "Once upon a time"
| { | |
| "run_dir": "qwen8b_2k2048_15m_allsparse_fixed_v1", | |
| "checkpoint": "best_assembled", | |
| "model": "Qwen/Qwen3-8B", | |
| "quantization": "4bit", | |
| "seq_len": 2048, | |
| "top_k": 2048, | |
| "eval_samples": 8, | |
| "dense_nll": 2.6045862287282944, | |
| "dense_ppl": 13.525627628604632, | |
| "sparse_nll": 2.6051638573408127, | |
| "sparse_ppl": 13.533442675005093, | |
| "delta_nll": 0.0005776286125183105, | |
| "delta_ppl": 0.007815046400461156, | |
| "ratio_ppl": 1.0005777954720514 | |
| } |