GGUF Heretic

QwenPaw-Flash-9B-heretic

English | 📖 中文文档

📌 Overview
F32 safetensors of **QwenPaw-Flash-9B-heretic**, a 9B dense model fine-tuned with Heretic methodology on Qwen3.5-9B.
🧠 Model Details
- **Base model**: Qwen3.5-9B - **Precision**: F32 (float32 safetensors) - **Parameters**: ~9B - **Shards**: model-00001 ~ model-00008 (8 files, F32 main weights) - **Additional**: model-00009 (BF16, Multi-Token Prediction head extracted from Qwen3.5-9B)
MTP (Multi-Token Prediction)
`model-00009-of-00009.safetensors` contains the MTP head weights extracted from Qwen3.5-9B. MTP enables the model to predict multiple future tokens in a single forward pass, improving generation speed via speculative decoding.
  • MTP acceptance rate: ~43%
  • Speedup: ~1.5-1.9x decode throughput

For MTP-enabled GGUF inference, see the MTP GGUF repo below.

📦 GGUF Quantized Versions
For inference with llama.cpp / Ollama / LM Studio, use the GGUF versions:
🚀 Usage
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "SC117/QwenPaw-Flash-9B-heretic",
    torch_dtype=torch.float32,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("SC117/QwenPaw-Flash-9B-heretic")
📄 License
Same as base model (Qwen3.5-9B).
Downloads last month
374
Safetensors
Model size
9B params
Tensor type
F32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SC117/QwenPaw-Flash-9B-heretic

Finetuned
Qwen/Qwen3.5-9B
Finetuned
(386)
this model
Quantizations
2 models