Add arxiv field to link model to paper page

044531e verified 21 days ago

6.55 kB

base_model: google/gemma-3-12b-it
library_name: peft
datasets:
  - AmirMohseni/CurveBench-Easy
  - AmirMohseni/CurveBench
arxiv: 2605.14068
tags:
  - grpo
  - trl
  - lora
  - vision-language-model
  - topological-reasoning
  - curvebench

curvebench-gemma-3-12b

This is a LoRA adapter for google/gemma-3-12b-it, fine-tuned with GRPO on CurveBench-Easy using verifiable rewards for topological tree prediction.

It corresponds to model-c in the CurveBench paper (reward: tree isomorphism (0.7) + node count (0.3)).

Paper: CurveBench: A Benchmark for Exact Topological Reasoning over Nested Jordan Curves
Training dataset: AmirMohseni/CurveBench-Easy
Evaluation dataset: AmirMohseni/CurveBench
Collection: AmirMohseni/curvebench
GitHub: Amir-Mohseni/CurveBench

Usage

Option 1 — vLLM (recommended for serving)

Start the server with the LoRA adapter loaded on top of the base model:

vllm serve google/gemma-3-12b-it \
    --enable-lora \
    --lora-modules grpo-region-tree=AmirMohseni/curvebench-gemma-3-12b \
    --max-lora-rank 4 \
    --max-model-len 32768 \
    --gpu-memory-utilization 0.90 \
    --dtype bfloat16 \
    --trust-remote-code

Then query it with the OpenAI-compatible API:

from openai import OpenAI
from datasets import load_dataset
import base64
from io import BytesIO

client = OpenAI(base_url="http://localhost:8000/v1", api_key="token")

# Load the first test image from the benchmark
ds = load_dataset("AmirMohseni/CurveBench-Easy", split="total_test")
image = ds[0]["image"]

buf = BytesIO()
image.save(buf, format="PNG")
image_b64 = base64.b64encode(buf.getvalue()).decode()

response = client.chat.completions.create(
    model="grpo-region-tree",
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "image_url",
                "image_url": {"url": f"data:image/png;base64,{image_b64}"},
            },
            {
                "type": "text",
                "text": (
                    "The image shows a set of pairwise non-intersecting closed curves drawn on a plane. "
                    "Each curve creates a boundary between an interior region and its surroundings. "
                    "Output the containment tree of the regions as a list of edges in the format: "
                    "[(parent, child), ...] where 0 is the outermost (unbounded) region."
                ),
            },
        ],
    }],
    max_tokens=2048,
)
print(response.choices[0].message.content)
print("Ground truth:", ds[0]["tree"])

Option 2 — PEFT + Transformers (offline)

Load the base model and apply the LoRA adapter directly:

from peft import PeftModel
from transformers import AutoModelForImageTextToText, AutoProcessor
from datasets import load_dataset
import torch

base_id = "google/gemma-3-12b-it"
adapter_id = "AmirMohseni/curvebench-gemma-3-12b"

processor = AutoProcessor.from_pretrained(base_id, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
    base_id, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True
)
model = PeftModel.from_pretrained(model, adapter_id)

# Load the first test image from the benchmark
ds = load_dataset("AmirMohseni/CurveBench-Easy", split="total_test")
image = ds[0]["image"]

prompt = (
    "The image shows a set of pairwise non-intersecting closed curves drawn on a plane. "
    "Each curve creates a boundary between an interior region and its surroundings. "
    "Output the containment tree of the regions as a list of edges in the format: "
    "[(parent, child), ...] where 0 is the outermost (unbounded) region."
)

inputs = processor(
    text=processor.apply_chat_template(
        [{"role": "user", "content": [{"type": "image"}, {"type": "text", "text": prompt}]}],
        add_generation_prompt=True,
    ),
    images=[image],
    return_tensors="pt",
).to(model.device)

output = model.generate(**inputs, max_new_tokens=2048)
print(processor.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
print("Ground truth:", ds[0]["tree"])

Training curves

Training reward

Eval reward

Training procedure

Trained with GRPO using a fork of TRL with multimodal support: AmirTuring/trl @ curvebench.

Method: GRPO (Group Relative Policy Optimization)
Base model: google/gemma-3-12b-it
Training split: total_train (210 images) from CurveBench-Easy
Reward: tree isomorphism (0.7) + node count (0.3)
LoRA rank (r): 4 | LoRA alpha: 8

Framework versions

TRL: 0.1.0
Transformers: 4.57.1
Pytorch: 2.8.0
Datasets: 4.3.0
Tokenizers: 0.22.1

Citation

@misc{mohseni2026curvebench,
      title={CurveBench: A Benchmark for Exact Topological Reasoning over Nested Jordan Curves},
      author={Amirreza Mohseni and Mona Mohammadi and Morteza Saghafian and Naser Talebizadeh Sardari},
      year={2026},
      eprint={2605.14068},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2605.14068},
}

Cite GRPO as:

@article{shao2024deepseekmath,
    title        = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
    author       = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
    year         = 2024,
    eprint       = {arXiv:2402.03300},
}

Cite TRL as:

@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}