Anyone successfully reproduced this model with Jackrong's GitHub notebook? I'm getting results below baseline and wondering if it's just me.

#26

by sunboy - opened 4 days ago

The shared notebook (Jackrong's LLM Fine-tuning Guide) has been incredibly helpful for learning how to post-train an LLM for improved coding performance. I downloaded Jackrong's trained/reference model and confirmed it does outperform the baseline (Qwen3.5-27B).

However, when I followed the notebook (Qwopus3.5 27B SFT Google Colab) to train my own model, the results came in below baseline — so I'm wondering if anyone else has experienced the same issue.

Below is a comparison between the baseline, the model I trained using Jackrong's notebook, and Jackrong's published model.

My setup was nearly identical to the notebook, with one exception to avoid OOM: I used PER_DEV_BS=4, GRAD_ACCUM=9 instead of PER_DEV_BS=6, GRAD_ACCUM=6. My understanding is that this should only affect training speed (since the effective batch size remains the same) without significantly impacting model quality.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment