The Mind of Tashi — mini student

Our custom qwen3.5-MoE (≈1B total / 610M active parameters), fine-tuned on the mind-of-tashi-selfplay corpus to play the blind-commit duel: read the opponent's move history, reason in an English + Hindi/Sanskrit (IAST) code-switched register inside <think>…</think>, then commit one legal move as JSON.

Its sibling is the micro student (0.4B total / 200M active) — the mini fields ~3× the active parameters, and the two can duel each other live in the game Space's self-play mode.

Fine-tune

LoRA r=32 / α=64 over the attention + MLP projections (q/k/v/o + gate/up/down), dropout 0.05, LR 2e-5 cosine with 10% warmup, bf16, seq 4096, completion-only loss, effective batch 16. The bare adapter ships at mind-of-tashi-mini-sft-lora; this repo is the merged model.

Format gate

20 unseen game states across all 10 ladder personas; a reply is valid only if it has a <think> block, parseable {"move", "taunt"} JSON, and a legal move:

Decode	Valid
greedy	18/20
sampled (temp 0.8, top_p 0.9 — the game's regime)	20/20

Head-to-head vs the GRPO micro

Watch-mode duels on the deployed Space (mini as challenger, GRPO micro as the house mind): the 200M-active GRPO model beats this 610M-active SFT model decisively. Imitation learns the format; reinforcement learns the game. Reproduce it: open the Space, pick "Tashi mini SFT" in the self-play picker, and watch.

Use

from transformers import AutoModelForCausalLM, AutoTokenizer
m = AutoModelForCausalLM.from_pretrained("build-small-hackathon/mind-of-tashi-mini-sft",
                                         torch_dtype="bfloat16", trust_remote_code=True)
# sample at temperature >= 0.7 — greedy decoding can loop on small models

Part of the Mind of Tashi bundle (Build Small Hackathon, Track Two) — see the collection.

Downloads last month: 51

Safetensors

Model size

1B params

Tensor type

BF16

Model tree for build-small-hackathon/mind-of-tashi-mini-sft

Base model

kshitijthakkar/tracegenix-mini-sft-clean-3ep

Adapter

(2)

this model

Dataset used to train build-small-hackathon/mind-of-tashi-mini-sft

Space using build-small-hackathon/mind-of-tashi-mini-sft 1

Collection including build-small-hackathon/mind-of-tashi-mini-sft

The Mind of Tashi

Collection

A blind-commit reasoning duel vs a ~200M-active local model. Build Small Hackathon, Track Two. • 11 items • Updated 18 days ago