--- base_model: unsloth/Qwen3.5-9B tags: - text-generation - llama.cpp - gguf - unsloth - qwen - qwen3.5 - reasoning - distillation - sft - lora - rs-lora - quantized license: apache-2.0 language: - en datasets: - trjxter/Kimi-K2.6-Reasoning-3300x-WandB - Jackrong/Claude-opus-4.6-TraceInversion-9000x - Jackrong/Qwen3.5-reasoning-700x --- # Qwimi3.5-9B-Kimik2.6-Opus-Distill-GGUF

Qwimi3.5-9B launch overview

**Qwimi3.5-9B-Kimik2.6-Opus-Distill-GGUF** contains GGUF quantized releases of `Qwimi3.5-9B-Kimik2.6-Opus-Distill`, a reasoning-focused fine-tune of `unsloth/Qwen3.5-9B`. This model was trained as a supervised fine-tuning/distillation run using a curated mixture of Kimi K2.6, Qwen reasoning, and Claude Opus TraceInversion-style reasoning data. The goal of the run was to improve structured reasoning behavior while preserving Qwen-style chat formatting and `...` reasoning traces. - **Developed by:** `trjxter` - **Base model:** `unsloth/Qwen3.5-9B` - **Model type:** GGUF quantized causal language model - **Training method:** LoRA / RS-LoRA SFT with Unsloth + TRL - **License:** Apache 2.0 - **Language:** English --- ## Available Quantizations This repository contains GGUF quantized versions of the merged fine-tuned model for use with llama.cpp-compatible runtimes. Expected quantization set: | Quant | Notes | |---|---| | `Q3_K_L` | Smaller size, lower memory usage, more quality loss | | `Q4_K_M` | Good default balance of size, speed, and quality | | `Q5_K_M` | Higher quality than Q4, moderate size increase | | `Q6_K` | Strong quality retention, larger file size | | `Q8_0` | Highest quality quant in this set, largest file size | For most local inference setups, start with **Q4_K_M**. If you have more VRAM/RAM and want better quality, try **Q5_K_M**, **Q6_K**, or **Q8_0**. --- ## Training Overview This GGUF release was created from a merged version of the LoRA fine-tune. Training used Unsloth and Hugging Face TRL with a LoRA-based supervised fine-tuning setup. ### Training configuration | Setting | Value | |---|---| | Base model | `unsloth/Qwen3.5-9B` | | Sequence length | 16,384 | | Training examples | 12,000 | | Held-out eval examples | 366 | | Trainer eval subset | 200 | | Epochs | 1 | | Effective batch size | 16 | | Per-device batch size | 2 | | Gradient accumulation steps | 8 | | LoRA rank | 128 | | LoRA alpha | 128 | | RS-LoRA | Enabled | | Base loading | 8-bit | | Optimizer | `adamw_8bit` | | Learning rate | `2e-5` | | Scheduler | Linear | | Gradient checkpointing | Unsloth | | Runtime | ~4.37 hours on an 80GB GPU | ### Final training metrics | Metric | Value | |---|---:| | Final training loss | `0.5517` | | Final lightweight eval loss | `~0.3161` | | Train runtime | `15,728.8s` | | Train samples/sec | `0.763` | | Train steps/sec | `0.048` | | Total FLOPs | `1.45e18` | The lightweight eval loss was measured on a 200-example eval subset during training. --- ## Dataset Mix and Attribution This run used a combined reasoning/distillation dataset made from: 1. `trjxter/Kimi-K2.6-Reasoning-3300x-WandB` 2. `Jackrong/Qwen3.5-reasoning-700x` 3. `Jackrong/Claude-opus-4.6-TraceInversion-9000x` The dataset was normalized into Qwen chat format, preserving assistant reasoning traces in the form: ```text ... final answer ``` After formatting and 16k token filtering, the final usable dataset contained **12,366 examples**: - **12,000** examples used for training - **366** examples held out for evaluation - **200** examples used as the lightweight trainer eval subset ### Special thanks Special thanks to **Jackrong** and **Kyle Hessling** for the Qwen reasoning and Claude Opus TraceInversion datasets used in this run. Those datasets were not created by me, and this release builds on their dataset work. --- ## Intended Use This model is intended for experimentation with: - local reasoning model inference - llama.cpp-compatible workflows - structured reasoning prompts - math and problem solving - coding and technical reasoning - long-context reasoning experiments - comparing GGUF quantization quality across Q4, Q5, Q6, and Q8 variants This is an experimental fine-tune and should be evaluated carefully before use in production or high-stakes settings. --- ## Prompt Format The model follows Qwen-style chat formatting. Example: ```text <|im_start|>user Solve this step by step: A shop earns $72 from hourly pay, $105 from restringing, $20 from grommets, and $5 from stencils. What is the total? <|im_end|> <|im_start|>assistant ... ... <|im_end|> ``` When using a runtime that supports chat templates, prefer applying the Qwen chat template rather than manually formatting prompts. --- ## Example llama.cpp Usage Example command: ```bash ./llama-cli \ -m Qwimi3.5-9B-Kimik2.6-Opus-Distill-Q4_K_M.gguf \ -c 16384 \ -ngl 99 \ --temp 0.6 \ --top-p 0.95 \ -p "<|im_start|>user\nSolve this step by step: If a worker earns $9/hour for 8 hours, plus $15 for each of 7 racquets, $10 for each of 2 grommet replacements, and $1 for each of 5 stencils, how much do they earn?\n<|im_end|>\n<|im_start|>assistant\n" ``` Adjust `-ngl` based on your GPU/VRAM. For CPU-only inference, omit or reduce `-ngl`. --- ## Related Releases This run may also be released in adapter and merged BF16 formats: - LoRA adapter: `trjxter/Qwimi3.5-9B-Kimik2.6-Opus-Distill-LoRA` - Merged BF16: `trjxter/Qwimi3.5-9B-Kimik2.6-Opus-Distill-BF16` --- ## Notes This model was trained using Unsloth for efficient fine-tuning and Hugging Face TRL for SFT training. The GGUF files were generated from the merged fine-tuned model. [](https://github.com/unslothai/unsloth) --- ## Disclaimer This is an experimental research fine-tune. Outputs may contain mistakes, hallucinations, or incorrect reasoning. Always validate important outputs independently.