# Micro-Distilled GRPO+VAE Model ## Model Description This is a distilled language model trained using Group Relative Policy Optimization (GRPO) with VAE filtering. ## Model Details - **Model type**: micro-distill-grpo-vae - **Model size**: 42M parameters - **Language**: English - **License**: Apache 2.0 ## Training Methodology - **GRPO (Group Relative Policy Optimization)**: 8 groups - **VAE Filtering**: 32D latent space - **KV-Cache Reuse**: 512 cache size ## Architecture Details - Hidden size: 512 - Number of layers: 8 - Attention heads: 8 - Vocabulary size: 50257 - Maximum sequence length: 1024 ## Usage ### Using Transformers ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("micro-distill-grpo-vae") tokenizer = AutoTokenizer.from_pretrained("micro-distill-grpo-vae") inputs = tokenizer("Hello, world!", return_tensors="pt") outputs = model.generate(**inputs, max_length=50) print(tokenizer.decode(outputs[0])) ```