strathos-qwen17b-sft-traced

Re-training trace of the Strathos V2-PLUS adapter with full wandb experimental tracking enabled.

Relationship to V2-PLUS (deployed model)

The deployed Strathos model is kavyanshshakya/strathos-qwen17b-sft (V2-PLUS). It was trained on Stage 2 grounded discrimination data (200 examples, lr=1e-4, 5 epochs) and is the model used in the live env Space.

This adapter is a re-training trace produced specifically for the Meta PyTorch OpenEnv Hackathon Grand Finale to satisfy the hackathon's experimental-tracking requirement. It starts from V2-PLUS as a checkpoint and continues training for 3 epochs on the regenerated 200-example dataset, with wandb tracking enabled throughout.

Wandb run: https://wandb.ai/kavyanshshakya-indian-institute-of-science-education-and/strathos-tracking/runs/g3cxonxb

Training details

Base model: Qwen/Qwen3-1.7B
Starting checkpoint: kavyanshshakya/strathos-qwen17b-sft (V2-PLUS)
LoRA: r=16, alpha=16, q/k/v/o projections (~6.4M trainable params, 0.32%)
Optimizer: AdamW, lr=1e-4, cosine schedule, warmup_ratio=0.05
Batch size: 4, grad accum 2, 3 epochs
Total steps: 75
Final training loss: 0.69 (started from V2-PLUS checkpoint, hence faster convergence)
Hardware: Colab Pro A100

Use case

This adapter is for verifiable training reproducibility. For deployment, use V2-PLUS (kavyanshshakya/strathos-qwen17b-sft).

Resources

Downloads last month: -

Video Preview

Reinforcement Learning

Model tree for kavyanshshakya/strathos-qwen17b-sft-traced

Base model

Qwen/Qwen3-1.7B-Base

Finetuned

Qwen/Qwen3-1.7B

Adapter

(518)

this model