Text Generation
Transformers
Safetensors
qwen3
conversational
text-generation-inference
jgeuter commited on
Commit
c112624
·
verified ·
1 Parent(s): 31ee65d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -8,7 +8,7 @@ datasets:
8
 
9
  ## Model Description
10
 
11
- Boomerang distillation is a phenomenon in LLMs where we can distill a teacher model into a student and reincorporate teacher layers to create intermediate-sized models with no additional training. This is the student model distilled from Qwen3-4B-Base from [our paper](https://arxiv.org/abs/2510.05064).
12
 
13
  ## Training Procedure
14
 
 
8
 
9
  ## Model Description
10
 
11
+ Boomerang distillation is a phenomenon in LLMs that allows us to distill a teacher model into a student and reincorporate teacher layers to create intermediate-sized models with no additional training. This is the student model distilled from Qwen3-4B-Base from [our paper](https://arxiv.org/abs/2510.05064).
12
 
13
  ## Training Procedure
14