The Mind of Tashi — mini student

Our custom qwen3.5-MoE (≈1B total / 610M active parameters), fine-tuned on the mind-of-tashi-selfplay corpus to play the blind-commit duel: read the opponent's move history, reason in an English + Hindi/Sanskrit (IAST) code-switched register inside <think>…</think>, then commit one legal move as JSON.

Its sibling is the micro student (0.4B total / 200M active) — the mini fields ~3× the active parameters, and the two can duel each other live in the game Space's self-play mode.

Fine-tune

LoRA r=32 / α=64 over the attention + MLP projections (q/k/v/o + gate/up/down), dropout 0.05, LR 2e-5 cosine with 10% warmup, bf16, seq 4096, completion-only loss, effective batch 16. The bare adapter ships at mind-of-tashi-mini-sft-lora; this repo is the merged model.

Format gate

20 unseen game states across all 10 ladder personas; a reply is valid only if it has a <think> block, parseable {"move", "taunt"} JSON, and a legal move:

Decode Valid
greedy 18/20
sampled (temp 0.8, top_p 0.9 — the game's regime) 20/20

Head-to-head vs the GRPO micro

Watch-mode duels on the deployed Space (mini as challenger, GRPO micro as the house mind): the 200M-active GRPO model beats this 610M-active SFT model decisively. Imitation learns the format; reinforcement learns the game. Reproduce it: open the Space, pick "Tashi mini SFT" in the self-play picker, and watch.

Use

from transformers import AutoModelForCausalLM, AutoTokenizer
m = AutoModelForCausalLM.from_pretrained("build-small-hackathon/mind-of-tashi-mini-sft",
                                         torch_dtype="bfloat16", trust_remote_code=True)
# sample at temperature >= 0.7 — greedy decoding can loop on small models

Part of the Mind of Tashi bundle (Build Small Hackathon, Track Two) — see the collection.

Downloads last month
51
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for build-small-hackathon/mind-of-tashi-mini-sft

Adapter
(2)
this model

Dataset used to train build-small-hackathon/mind-of-tashi-mini-sft

Space using build-small-hackathon/mind-of-tashi-mini-sft 1

Collection including build-small-hackathon/mind-of-tashi-mini-sft