sttjr commited on
Commit
f40791c
·
verified ·
1 Parent(s): 45fa180

Paganini AIOS GRPO RL — Qwen3.5-27B LoRA rank 32, dual-domain reward (code+finance), 13.7K samples

Browse files
Files changed (3) hide show
  1. README.md +32 -26
  2. adapter_config.json +2 -2
  3. checkpoint_complete +0 -0
README.md CHANGED
@@ -3,22 +3,22 @@ base_model: Qwen/Qwen3.5-27B
3
  library_name: peft
4
  license: apache-2.0
5
  tags:
6
- - paganini-aios
7
- - fidc
8
- - grpo
9
- - rl
10
- - lora
11
- - qwen
12
- - finance
13
- - compliance
14
  language:
15
- - pt
16
  pipeline_tag: text-generation
17
  ---
18
 
19
- # 🎻 Paganini AIOS — GRPO LoRA Adapter
20
 
21
- **Qwen3.5-27B** fine-tuned with **Group Relative Policy Optimization (GRPO)** for Brazilian FIDC (Fundo de Investimento em Direitos Creditórios) operations.
22
 
23
  ## Training Details
24
 
@@ -27,17 +27,24 @@ pipeline_tag: text-generation
27
  - **LoRA**: Rank 32, Alpha 32, all-linear targets
28
  - **Dataset**: 13,697 dual-domain Q&A pairs (code + finance + cross-domain)
29
  - **Reward Function**: Dual-domain with 6 guardrail gates
30
- - **Code domain**: BMAD-CE pipeline compliance, architecture quality, TDD signals
31
- - **Finance domain**: CVM regulation citation, guardrail compliance, source attribution
32
- - **Shared**: Hallucination penalty, corporate-speak penalty, PT-BR bonus
33
 
34
- ## Architecture
35
 
36
- Part of the **Paganini AIOS** — an autonomous AI operating system for Brazilian FIDC operations:
37
- - 14 specialized agents (admin, compliance, custódia, due diligence, gestor, IR, pricing, reg watch, reporting)
38
- - 6 guardrail gates (Eligibility → Concentration → Covenant → PLD/AML → Compliance → Risk)
39
- - Hybrid RAG pipeline (dense + sparse + graph RRF)
40
- - Bayesian risk network
 
 
 
 
 
 
 
 
 
 
41
 
42
  ## Usage
43
 
@@ -45,16 +52,15 @@ Part of the **Paganini AIOS** — an autonomous AI operating system for Brazilia
45
  from peft import PeftModel
46
  from transformers import AutoModelForCausalLM, AutoTokenizer
47
 
48
- base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-27B", torch_dtype="auto")
49
  model = PeftModel.from_pretrained(base, "sttjr/paganini-qwen35-27b-grpo-lora")
50
  tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3.5-27B")
51
  ```
52
 
53
- ## Prior Stage
54
 
55
- SFT adapter: [sttjr/paganini-qwen35-27b-sft-lora](https://huggingface.co/sttjr/paganini-qwen35-27b-sft-lora)
56
 
57
- ## Links
58
 
59
- - **GitHub**: [juboyy/paganini-aios](https://github.com/juboyy/paganini-aios)
60
- - **Dashboard**: [paganini-demo.vercel.app](https://paganini-demo.vercel.app)
 
3
  library_name: peft
4
  license: apache-2.0
5
  tags:
6
+ - lora
7
+ - grpo
8
+ - rl
9
+ - fidc
10
+ - finance
11
+ - compliance
12
+ - portuguese
13
+ - paganini-aios
14
  language:
15
+ - pt
16
  pipeline_tag: text-generation
17
  ---
18
 
19
+ # Paganini AIOS — GRPO LoRA Adapter
20
 
21
+ **Qwen3.5-27B + LoRA Rank 32** fine-tuned with Group Relative Policy Optimization (GRPO) for dual-domain expertise: **Brazilian FIDC compliance** and **software engineering**.
22
 
23
  ## Training Details
24
 
 
27
  - **LoRA**: Rank 32, Alpha 32, all-linear targets
28
  - **Dataset**: 13,697 dual-domain Q&A pairs (code + finance + cross-domain)
29
  - **Reward Function**: Dual-domain with 6 guardrail gates
 
 
 
30
 
31
+ ## Reward Function Design
32
 
33
+ ```
34
+ R(x) = λ·R_code + (1-λ)·R_fin + R_shared
35
+
36
+ Code (λ=1.0): spec adherence, architecture, pipeline compliance, code quality
37
+ Finance (λ=0.0): guardrail compliance, factual accuracy, source attribution, precision
38
+ Cross (λ=0.5): both domains integrated
39
+ ```
40
+
41
+ ### Guardrail Gates
42
+ 1. **Eligibility** — CVM 175 compliance check
43
+ 2. **Concentration** — Portfolio concentration limits
44
+ 3. **Covenant** — Fund covenant monitoring
45
+ 4. **PLD/AML** — Anti-money laundering
46
+ 5. **Compliance** — Regulatory compliance
47
+ 6. **Risk** — Bayesian risk assessment
48
 
49
  ## Usage
50
 
 
52
  from peft import PeftModel
53
  from transformers import AutoModelForCausalLM, AutoTokenizer
54
 
55
+ base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-27B")
56
  model = PeftModel.from_pretrained(base, "sttjr/paganini-qwen35-27b-grpo-lora")
57
  tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3.5-27B")
58
  ```
59
 
60
+ ## Part of Paganini AIOS
61
 
62
+ [Paganini AIOS](https://github.com/juboyy/paganini-aios) is an autonomous AI system for Brazilian FIDC (Fundos de Investimento em Direitos Creditórios) operations, featuring 14 specialized agents, 6 guardrail gates, and a Bayesian risk network.
63
 
64
+ ## SFT Checkpoint
65
 
66
+ The SFT checkpoint (pre-GRPO) is available at: [sttjr/paganini-qwen35-27b-sft-lora](https://huggingface.co/sttjr/paganini-qwen35-27b-sft-lora)
 
adapter_config.json CHANGED
@@ -1,13 +1,13 @@
1
  {
2
  "alpha_pattern": {},
3
  "auto_mapping": null,
4
- "base_model_name_or_path": "Qwen/Qwen3.5-27B",
5
  "bias": "none",
6
  "corda_config": null,
7
  "eva_config": null,
8
  "exclude_modules": null,
9
  "fan_in_fan_out": false,
10
- "inference_mode": true,
11
  "init_lora_weights": true,
12
  "layer_replication": null,
13
  "layers_pattern": null,
 
1
  {
2
  "alpha_pattern": {},
3
  "auto_mapping": null,
4
+ "base_model_name_or_path": null,
5
  "bias": "none",
6
  "corda_config": null,
7
  "eva_config": null,
8
  "exclude_modules": null,
9
  "fan_in_fan_out": false,
10
+ "inference_mode": false,
11
  "init_lora_weights": true,
12
  "layer_replication": null,
13
  "layers_pattern": null,
checkpoint_complete ADDED
File without changes