Configuration Parsing Warning:In adapter_config.json: "peft.base_model_name_or_path" must be a string

Paganini GRPO LoRA — Qwen3.5-27B

Paganini is a dual-domain LoRA adapter trained via GRPO (Group Relative Policy Optimization) on top of Qwen3.5-27B. It serves as the intelligence backbone for 9 specialized FIDC agents in the Paganini AIOS platform, with deep expertise in Brazilian investment fund regulation (CVM 175) and software architecture.

🧠 Model Overview

Property	Value
Base Model	Qwen/Qwen3.5-27B
Parameters	27B
Adapter Type	LoRA (PEFT)
Training Method	GRPO (Group Relative Policy Optimization)
LoRA Rank	32
LoRA Alpha	32
LoRA Targets	all-linear
Task	CAUSAL_LM
Adapter Size	966 MB (safetensors)
Languages	Portuguese (Brazil) + English
Training Platform	Tinker API — Thinking Machines Lab cloud GPUs
Training Duration	~3 hours (23 runs)
Run ID	`7e18a5a1-8a6b-530d-b443-4f855a3aa8c4:train:0`

🏗️ Training Pipeline

Paganini follows a two-stage alignment pipeline:

Qwen3.5-27B (base)
       │
       ▼
  ┌─────────────────────────────────────────┐
  │  Stage 1: Supervised Fine-Tuning (SFT)  │
  │  Platform: RunPod A100 80GB             │
  │  Accuracy: 87.75% | Loss: 0.454         │
  └─────────────────────────────────────────┘
       │
       ▼  sttjr/paganini-qwen35-27b-sft-lora
       │
  ┌─────────────────────────────────────────┐
  │  Stage 2: GRPO RL Alignment (this)      │
  │  Platform: Tinker API (TML Cloud GPUs)  │
  │  23 training runs | ~3 hours            │
  │  Dual-domain reward optimization        │
  └─────────────────────────────────────────┘
       │
       ▼  sttjr/paganini-qwen35-27b-grpo-lora  ← you are here

SFT Predecessor

The GRPO run was initialized from the SFT checkpoint:

SFT Model: sttjr/paganini-qwen35-27b-sft-lora
Platform: RunPod A100 80GB
Accuracy: 87.75%
Final Loss: 0.454

📦 Dataset

Name: dual-dataset-v2.jsonl

Split	Count
Total samples	13,697
Code domain	6,848
Finance domain	6,849

Difficulty distribution:

Level	Count
L1 (Basic)	4,566
L2 (Intermediate)	4,566
L3 (Advanced)	4,565

Sources:

Finance: FIDC (Fundo de Investimento em Direitos Creditórios) regulatory corpus under CVM Resolution 175 — covering eligibility, concentration limits, covenants, PLD/AML procedures, compliance gates, and risk management
Code: Software architecture patterns, pipeline compliance, TDD practices, and spec adherence for AIOS agent development

🎯 Reward Function (Dual-Domain)

The GRPO training uses a composite reward function:

R(x) = λ · R_code + (1 - λ) · R_fin + R_shared

Where λ = 1.0 for code samples and λ = 0.0 for finance samples.

R_code — Code Domain Rewards

Component	Reward
Spec adherence	+0.30
Architecture patterns	+0.25
Pipeline compliance	+0.15
Code blocks present	+0.10
TDD terms present	+0.10
Maximum	+0.90

R_finance — Finance Domain Rewards

Component	Reward
Guardrail compliance	+0.35
Source attribution	+0.20
CVM citation	+0.15
Article reference	+0.15
Maximum	+0.85

R_shared — Shared Penalty/Bonus

Component	Reward
Hallucination penalty	−0.15
Corporate speak penalty	−0.05 per occurrence
PT-BR language bonus	+0.05
Length < 50 tokens penalty	−0.20

🤖 Use Case: Paganini AIOS

This model is the intelligence backbone for 9 specialized FIDC domain agents in the Paganini AIOS platform:

Agent	Role
🏛️ Admin	Administrative governance and fund operations
🏦 Custodian	Asset custody, settlement, and safekeeping
📊 Manager	Portfolio management and investment decisions
⚖️ Compliance	Regulatory adherence and audit trails
📋 Reporting	Investor reporting and fund disclosures
🔍 Due Diligence	Cedente/debtor analysis and credit assessment
👁️ RegWatch	Regulatory change monitoring (CVM, BACEN)
📧 IR	Investor Relations communication
💹 Pricing	Asset pricing and NAV calculation

6-Gate Guardrail Pipeline

Each query passes through a sequential compliance chain:

Input → [Eligibility] → [Concentration] → [Covenant] → [PLD/AML] → [Compliance] → [Risk] → Output

All 6 gates must pass before a response is delivered to end users. This ensures CVM 175-compliant, hallucination-free outputs across all agent types.

🚀 Usage

Installation

pip install transformers peft accelerate

Load and Run

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load base model
base = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3.5-27B",
    device_map="auto",
    torch_dtype="auto"
)

# Load GRPO LoRA adapter
model = PeftModel.from_pretrained(base, "sttjr/paganini-qwen35-27b-grpo-lora")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3.5-27B")

# Finance domain example (PT-BR)
prompt = "Explique os requisitos de PDD mínima para FIDC conforme CVM 175."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
print(tokenizer.decode(out[0], skip_special_tokens=True))

Merge Adapter (Optional)

# Merge LoRA weights into base model for faster inference
merged_model = model.merge_and_unload()
merged_model.save_pretrained("paganini-27b-merged")
tokenizer.save_pretrained("paganini-27b-merged")

📊 Checkpoints

Checkpoint	Size	Description
`paganini-test`	2.7 GB	Intermediate checkpoint
`paganini-rl-final`	2.7 GB	Final GRPO-aligned checkpoint

⚠️ Intended Use & Limitations

Intended Use

FIDC regulatory Q&A in Portuguese (Brazil)
Software architecture guidance for AIOS agents
Compliance-first financial analysis aligned with CVM 175
Internal enterprise use within the Paganini AIOS platform

Out-of-Scope Use

General-purpose chatbot (use base Qwen3.5-27B instead)
Non-Brazilian regulatory domains (model is specialized for CVM/BACEN frameworks)
Real-time trading decisions or autonomous financial transactions

Limitations

Finance knowledge is bounded by CVM 175 regulatory corpus at training cutoff
PT-BR outputs are prioritized; EN responses may be less fluent
Requires at least 2× A100 80GB GPUs or equivalent for full-precision inference
LoRA adapter requires the base Qwen3.5-27B model (~54 GB in fp16)

🔗 Project Links

Resource	Link
🐙 GitHub (Paganini AIOS)	juboyy/paganini-aios
📊 Dashboard	dashboard-v2-pearl-rho.vercel.app
🤗 SFT Predecessor	sttjr/paganini-qwen35-27b-sft-lora

📄 Citation

@misc{paganini-grpo-lora-2026,
  title        = {Paganini GRPO LoRA -- Qwen3.5-27B: Dual-Domain RL Alignment for FIDC Regulatory Intelligence},
  author       = {sttjr},
  year         = {2026},
  publisher    = {HuggingFace},
  howpublished = {\url{https://huggingface.co/sttjr/paganini-qwen35-27b-grpo-lora}},
  note         = {GRPO-aligned LoRA adapter for Brazilian investment fund regulation and software architecture}
}

📜 License

Apache 2.0 — See LICENSE for details.

Paganini AIOS — Built for the Brazilian FIDC ecosystem.

Downloads last month: -

Video Preview

Reinforcement Learning

Model tree for sttjr/paganini-qwen35-27b-grpo-lora

Base model

Qwen/Qwen3.5-27B

Adapter

(77)

this model