Configuration Parsing Warning:In adapter_config.json: "peft.base_model_name_or_path" must be a string

Paganini GRPO LoRA β€” Qwen3.5-27B

Paganini is a dual-domain LoRA adapter trained via GRPO (Group Relative Policy Optimization) on top of Qwen3.5-27B. It serves as the intelligence backbone for 9 specialized FIDC agents in the Paganini AIOS platform, with deep expertise in Brazilian investment fund regulation (CVM 175) and software architecture.


🧠 Model Overview

Property Value
Base Model Qwen/Qwen3.5-27B
Parameters 27B
Adapter Type LoRA (PEFT)
Training Method GRPO (Group Relative Policy Optimization)
LoRA Rank 32
LoRA Alpha 32
LoRA Targets all-linear
Task CAUSAL_LM
Adapter Size 966 MB (safetensors)
Languages Portuguese (Brazil) + English
Training Platform Tinker API β€” Thinking Machines Lab cloud GPUs
Training Duration ~3 hours (23 runs)
Run ID 7e18a5a1-8a6b-530d-b443-4f855a3aa8c4:train:0

πŸ—οΈ Training Pipeline

Paganini follows a two-stage alignment pipeline:

Qwen3.5-27B (base)
       β”‚
       β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚  Stage 1: Supervised Fine-Tuning (SFT)  β”‚
  β”‚  Platform: RunPod A100 80GB             β”‚
  β”‚  Accuracy: 87.75% | Loss: 0.454         β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό  sttjr/paganini-qwen35-27b-sft-lora
       β”‚
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚  Stage 2: GRPO RL Alignment (this)      β”‚
  β”‚  Platform: Tinker API (TML Cloud GPUs)  β”‚
  β”‚  23 training runs | ~3 hours            β”‚
  β”‚  Dual-domain reward optimization        β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό  sttjr/paganini-qwen35-27b-grpo-lora  ← you are here

SFT Predecessor

The GRPO run was initialized from the SFT checkpoint:


πŸ“¦ Dataset

Name: dual-dataset-v2.jsonl

Split Count
Total samples 13,697
Code domain 6,848
Finance domain 6,849

Difficulty distribution:

Level Count
L1 (Basic) 4,566
L2 (Intermediate) 4,566
L3 (Advanced) 4,565

Sources:

  • Finance: FIDC (Fundo de Investimento em Direitos CreditΓ³rios) regulatory corpus under CVM Resolution 175 β€” covering eligibility, concentration limits, covenants, PLD/AML procedures, compliance gates, and risk management
  • Code: Software architecture patterns, pipeline compliance, TDD practices, and spec adherence for AIOS agent development

🎯 Reward Function (Dual-Domain)

The GRPO training uses a composite reward function:

R(x) = Ξ» Β· R_code + (1 - Ξ») Β· R_fin + R_shared

Where Ξ» = 1.0 for code samples and Ξ» = 0.0 for finance samples.

R_code β€” Code Domain Rewards

Component Reward
Spec adherence +0.30
Architecture patterns +0.25
Pipeline compliance +0.15
Code blocks present +0.10
TDD terms present +0.10
Maximum +0.90

R_finance β€” Finance Domain Rewards

Component Reward
Guardrail compliance +0.35
Source attribution +0.20
CVM citation +0.15
Article reference +0.15
Maximum +0.85

R_shared β€” Shared Penalty/Bonus

Component Reward
Hallucination penalty βˆ’0.15
Corporate speak penalty βˆ’0.05 per occurrence
PT-BR language bonus +0.05
Length < 50 tokens penalty βˆ’0.20

πŸ€– Use Case: Paganini AIOS

This model is the intelligence backbone for 9 specialized FIDC domain agents in the Paganini AIOS platform:

Agent Role
πŸ›οΈ Admin Administrative governance and fund operations
🏦 Custodian Asset custody, settlement, and safekeeping
πŸ“Š Manager Portfolio management and investment decisions
βš–οΈ Compliance Regulatory adherence and audit trails
πŸ“‹ Reporting Investor reporting and fund disclosures
πŸ” Due Diligence Cedente/debtor analysis and credit assessment
πŸ‘οΈ RegWatch Regulatory change monitoring (CVM, BACEN)
πŸ“§ IR Investor Relations communication
πŸ’Ή Pricing Asset pricing and NAV calculation

6-Gate Guardrail Pipeline

Each query passes through a sequential compliance chain:

Input β†’ [Eligibility] β†’ [Concentration] β†’ [Covenant] β†’ [PLD/AML] β†’ [Compliance] β†’ [Risk] β†’ Output

All 6 gates must pass before a response is delivered to end users. This ensures CVM 175-compliant, hallucination-free outputs across all agent types.


πŸš€ Usage

Installation

pip install transformers peft accelerate

Load and Run

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load base model
base = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3.5-27B",
    device_map="auto",
    torch_dtype="auto"
)

# Load GRPO LoRA adapter
model = PeftModel.from_pretrained(base, "sttjr/paganini-qwen35-27b-grpo-lora")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3.5-27B")

# Finance domain example (PT-BR)
prompt = "Explique os requisitos de PDD mΓ­nima para FIDC conforme CVM 175."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
print(tokenizer.decode(out[0], skip_special_tokens=True))

Merge Adapter (Optional)

# Merge LoRA weights into base model for faster inference
merged_model = model.merge_and_unload()
merged_model.save_pretrained("paganini-27b-merged")
tokenizer.save_pretrained("paganini-27b-merged")

πŸ“Š Checkpoints

Checkpoint Size Description
paganini-test 2.7 GB Intermediate checkpoint
paganini-rl-final 2.7 GB Final GRPO-aligned checkpoint

⚠️ Intended Use & Limitations

Intended Use

  • FIDC regulatory Q&A in Portuguese (Brazil)
  • Software architecture guidance for AIOS agents
  • Compliance-first financial analysis aligned with CVM 175
  • Internal enterprise use within the Paganini AIOS platform

Out-of-Scope Use

  • General-purpose chatbot (use base Qwen3.5-27B instead)
  • Non-Brazilian regulatory domains (model is specialized for CVM/BACEN frameworks)
  • Real-time trading decisions or autonomous financial transactions

Limitations

  • Finance knowledge is bounded by CVM 175 regulatory corpus at training cutoff
  • PT-BR outputs are prioritized; EN responses may be less fluent
  • Requires at least 2Γ— A100 80GB GPUs or equivalent for full-precision inference
  • LoRA adapter requires the base Qwen3.5-27B model (~54 GB in fp16)

πŸ”— Project Links

Resource Link
πŸ™ GitHub (Paganini AIOS) juboyy/paganini-aios
πŸ“Š Dashboard dashboard-v2-pearl-rho.vercel.app
πŸ€— SFT Predecessor sttjr/paganini-qwen35-27b-sft-lora

πŸ“„ Citation

@misc{paganini-grpo-lora-2026,
  title        = {Paganini GRPO LoRA -- Qwen3.5-27B: Dual-Domain RL Alignment for FIDC Regulatory Intelligence},
  author       = {sttjr},
  year         = {2026},
  publisher    = {HuggingFace},
  howpublished = {\url{https://huggingface.co/sttjr/paganini-qwen35-27b-grpo-lora}},
  note         = {GRPO-aligned LoRA adapter for Brazilian investment fund regulation and software architecture}
}

πŸ“œ License

Apache 2.0 β€” See LICENSE for details.


Paganini AIOS β€” Built for the Brazilian FIDC ecosystem.

Downloads last month
-
Video Preview
loading

Model tree for sttjr/paganini-qwen35-27b-grpo-lora

Base model

Qwen/Qwen3.5-27B
Adapter
(77)
this model