microd_v1 / README.md
webxos's picture
Upload 12 files
6253d52 verified
|
Raw
History Blame
1 kB
# Micro-Distilled GRPO+VAE Model
## Model Description
This is a distilled language model trained using Group Relative Policy Optimization (GRPO) with VAE filtering.
## Model Details
- **Model type**: micro-distill-grpo-vae
- **Model size**: 42M parameters
- **Language**: English
- **License**: Apache 2.0
## Training Methodology
- **GRPO (Group Relative Policy Optimization)**: 8 groups
- **VAE Filtering**: 32D latent space
- **KV-Cache Reuse**: 512 cache size
## Architecture Details
- Hidden size: 512
- Number of layers: 8
- Attention heads: 8
- Vocabulary size: 50257
- Maximum sequence length: 1024
## Usage
### Using Transformers
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("micro-distill-grpo-vae")
tokenizer = AutoTokenizer.from_pretrained("micro-distill-grpo-vae")
inputs = tokenizer("Hello, world!", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0]))
```