File size: 1,605 Bytes
dc790bd
 
 
 
 
 
 
 
 
 
af49b82
dc790bd
 
395d19d
dc790bd
395d19d
dc790bd
 
 
395d19d
 
 
 
dc790bd
395d19d
dc790bd
395d19d
 
 
 
 
 
dc790bd
59e2c18
dc790bd
59e2c18
dc790bd
395d19d
dc790bd
395d19d
 
 
 
dc790bd
395d19d
 
 
 
 
dc790bd
395d19d
 
dc790bd
395d19d
 
dc790bd
395d19d
 
dc790bd
395d19d
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
---
base_model: Qwen/Qwen3-1.7B
library_name: peft
pipeline_tag: text-generation
tags:
- base_model:adapter:Qwen/Qwen3-1.7B
- lora
- sft
- transformers
- trl
- unsloth
---

# CoNDeNse-AI/GLM-5.1-Qwen3-1.7B-CoNDeNse

Part of the **CoNDeNse** project — compressing the reasoning capability of large models into small, deployable ones.

## Model Details

- **Base model:** Qwen/Qwen3-1.7B
- **Method:** LoRA (r=32, α=64)
- **Target modules:** q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- **Dtype:** float16

## Training

- **Dataset:** Jackrong/GLM-5.1-Reasoning-1M-Cleaned (75,000 examples)
- **Optimizer:** AdamW 8-bit
- **Learning rate:** 2e-4 with cosine scheduler
- **Batch size:** 1 × 16 gradient accumulation (effective batch = 16)
- **Max sequence length:** 4096
- **Packing:** enabled

## Notes

May **HALLUCINATE** 

## Usage

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-1.7B",
    torch_dtype=torch.float16,
    device_map="cuda",
)

tokenizer = AutoTokenizer.from_pretrained("CoNDeNse-AI/GLM-5.1-Qwen3-1.7B-CoNDeNse")
model = PeftModel.from_pretrained(base_model, "CoNDeNse-AI/GLM-5.1-Qwen3-1.7B-CoNDeNse")

prompt = "<|im_start|>user\nYour question here<|im_end|>\n<|im_start|>assistant\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```