--- base_model: unsloth/Qwen3.5-27B tags: - text-generation-inference - transformers - unsloth - qwen3_5 - reasoning - chain-of-thought - agent - sft - code - biology - chemistry license: apache-2.0 language: - en - zh - ko - ja - es pipeline_tag: image-text-to-text --- # π Qwopus3.5-27B-v3.5  ## π‘ Model Overview & v3.5 Design Qwopus3.5-27B-v3.5 is a **data-scaled continuation** of the Qwopus3.5-27B-v3 model. The training data in v3.5 is expanded to cover a broader range of domains, including mathematics, programming,puzzle-solving,multilingual dialogue,instruction-following, muti-turn interactions,and STEM-related tasks. --- Qwopus3.5-27B-v3.5 is a reasoning-enhanced model based on **Qwen3.5-27B**, designed for: - π§© Structured reasoning - π§ Tool-augmented workflows - π Multi-step agentic tasks - β‘ Token-efficient inference Compared with Qwopus3.5-v3, **3.5 version does not introduce a new architecture, RL stage, or template redesign**. This version is trained with approximately **2Γ more SFT data**. --- ## π― Motivation & Generalization Insight The motivation behind v3.5 comes from a simple observation: > This work is motivated by the hypothesis that scaling high-quality SFT data may further enhance the generalization ability of large language models. In v3, Qwopus demonstrates that structured reasoning improves both **accuracy and efficiency**: - Structured reasoning is more effective than simply mimicking long CoT - Act-then-refine is better suited for coding and multi-step tasks - Improved reasoning structure enables more reliable use of existing knowledge > [!IMPORTANT] >This suggests that the improvement is not simply memorization or dataset overlap. Instead, reasoning SFT helps the model: > - π§ Better utilize existing knowledge > - π Activate latent knowledge through structured reasoning > - ποΈ Learn reasoning procedures, not just output format --- ## π¬ Supporting Evidence Recent work: **Ren et al., 2026 β *Rethinking Generalization in Reasoning SFT*** ([arXiv:2604.06628](https://arxiv.org/abs/2604.06628))
Short-epoch reasoning SFT can underestimate generalization β in-domain gains may appear early, while out-of-domain improvements often require sufficient optimization.
shows that generalization in reasoning SFT is **not fixed, but conditional** β depending on optimization, data quality, and model capability. Key takeaways: - Reasoning SFT can generalize when sufficiently trained (often showing a **dip β recovery** pattern) - **High-quality long-CoT data** enables cross-domain transfer - **Stronger models learn reasoning structure**, not just longer outputs (14B/27B/32B) - Gains are **asymmetric** β reasoning improves, while safety may degrade This suggests that reasoning SFT should be viewed as a **dynamic optimization process**, rather than a static training outcome. --- ### π Evaluation results
Reasoning-focused SFT improves multi-step reasoning tasks, while introducing mild trade-offs on alignment-sensitive benchmarks.
A third-party benchmark report shows that Qwopus3.5-v3 achieves strong performance across reasoning-heavy tasks, especially on: - MATH500 - MMLU-Pro - HumanEval - GSM8K - AIME-style reasoning tasks However, the same results also suggest a **capability trade-off**: reasoning-focused SFT can improve multi-step reasoning while causing mild regressions on some alignment-sensitive or tool-oriented benchmarks. This supports the view that Qwopus-v3 shifts the model toward **stronger reasoning efficiency and problem-solving ability**, rather than uniform gains across every benchmark. ### π Preliminary v3.5 comparison on MMLU-Pro subsets Due to limited compute, v3.5 was evaluated on the **same 280 questions used for v3**, sampled from **7 selected MMLU-Pro categories**. On this subset: | Model | Correct | Total | Accuracy | |--------|--------|-------|----------| | **v3** | 250 | 280 | **89.29%** | | **v3.5** | 253 | 280 | **β 90.36%** | **β Gain:** **+1.07 percentage points** This suggests that scaling SFT data in v3.5 brings a **small but measurable improvement** on the controlled MMLU-Pro subset. Since this is not a full MMLU-Pro evaluation, the result should be viewed as a **preliminary reference**, not a definitive benchmark score. ### πͺ SWE / Agentic Coding Test Report   Qwopus3.5-27B-v3.5 was tested on a 44-case SWE-style capability suite covering reasoning, tool calling, structured output, context handling, multilingual responses, programming, and multi-step agentic workflows. The Q5_K_M GGUF build achieved **43 / 44 passed tests (97.7%)**, including **14 / 15 programming tasks**. The only failure was a unit-test-writing case involving incorrect pytest assertions. Compared with Qwopus3.5-27B-v3, which scored **42 / 44 (95.5%)** on the same suite, v3.5 improved by **+2.2 points**. The most important gain is in multi-step agentic coding: v3.5 successfully read source code through a tool call, diagnosed a timezone parsing bug, and proposed a fix, while v3 failed to identify the root cause. This suggests that v3.5 is a small but meaningful upgrade over v3, especially for SWE-style workflows involving tool use, code inspection, bug diagnosis, and action planning. > [!NOTE] > Throughput differences are excluded from the model-level comparison because both runs use **Q5_K_M GGUF** builds, where quantization choices and runtime environments can affect speed. > π·οΈ **Acknowledgement:** Special thanks to **Kyle Hessling** for running and sharing the SWE-style capability tests for Qwopus3.5-27B-v3.5. > X / Twitter: [@KyleHessling1](https://x.com/KyleHessling1) --- ## π Resources & Guides π **[GitHub Repository: Jackrong-llm-finetuning-guide](https://github.com/R6410418/Jackrong-llm-finetuning-guide.git)** Visit the repo to dive into the codebase and reproduce the results locally or on Colab. ### π₯ Core Technical Document **π [Qwopus3.5-27b Complete Fine-Tuning Guide (PDF)](https://github.com/R6410418/Jackrong-llm-finetuning-guide/blob/main/guidePDF/Qwopus3-5-27b-Colab_complete_guide_to_llm_finetuning.pdf)** * **The Full Pipeline:** A step-by-step walkthroughβfrom downloading the base model and unifying heterogeneous data, to configuring trainer hyperparameters and publishing to Hugging Face. * **Beginner Friendly:** Includes an introductory guide to getting started with Google Colab and Unsloth. > **A Note:** > My goal isn't just to detail a workflow, but to demystify LLM training. Beyond the social media hype, fine-tuning isn't an unattainable ritualβoften, all you need is a Google account, a standard laptop, and relentless curiosity. > All training and testing for this project were self-funded. If you find this model or guide helpful, a **Star βοΈ on GitHub** would be the greatest encouragement. Thank you! π > [!IMPORTANT] > The Claude series model optimizations are named under the **Qwopus3.5 series**, with the latest version being **πQwopus3.5-v3.5**. --- ## β οΈ Limitations - Possible overfitting if scaling exceeds optimal regime - Reasoning may still exhibit instability in edge cases - Tool-calling performance depends on environment integration - Not all capabilities are fully benchmarked yet --- ## π Acknowledgements Special thanks to: - Unsloth for efficient fine-tuning - Open-source datasets and community contributors - Researchers exploring reasoning SFT and generalization --- ## π Citation ```bibtex @misc{jackrong_qwopus35_v35, title = {Qwopus3.5-27B-v3.5}, author = {Jackrong}, year = {2026}, publisher = {Hugging Face} } ```