VQ-VAE on Coconut latent thoughts (Qwen3-0.6B)

VQ-VAE bottleneck trained on the frozen Stage-3 Coconut latent thoughts of Qwen3-0.6B, from the VQ-CoT: Discretising Latent Chain-of-Thought project (team RateLimit Achieved, EPFL CS-552). The language model is frozen; only this bottleneck is trained on the dumped latents (385,620 x 6 latent thought vectors, GSM8K).

  • Checkpoint: vq.pt
  • K=4096 codes, code dim 64, EMA 0.999, k-means init + AE warmup
  • Result: inserting this bottleneck into the frozen latent loop costs ~0 GSM8K test accuracy.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support