---
base_model: google/gemma-3-12b-it
library_name: peft
datasets:
- AmirMohseni/CurveBench-Easy
- AmirMohseni/CurveBench
arxiv: 2605.14068
tags:
- grpo
- trl
- lora
- vision-language-model
- topological-reasoning
- curvebench
---
# curvebench-gemma-3-12b

This is a **LoRA adapter** for [google/gemma-3-12b-it](https://huggingface.co/google/gemma-3-12b-it), fine-tuned with GRPO on [CurveBench-Easy](https://huggingface.co/datasets/AmirMohseni/CurveBench-Easy) using verifiable rewards for topological tree prediction.

It corresponds to **model-c** in the [CurveBench paper](https://arxiv.org/abs/2605.14068) (reward: tree isomorphism (0.7) + node count (0.3)).

- **Paper:** [CurveBench: A Benchmark for Exact Topological Reasoning over Nested Jordan Curves](https://arxiv.org/abs/2605.14068)
- **Training dataset:** [AmirMohseni/CurveBench-Easy](https://huggingface.co/datasets/AmirMohseni/CurveBench-Easy)
- **Evaluation dataset:** [AmirMohseni/CurveBench](https://huggingface.co/datasets/AmirMohseni/CurveBench)
- **Collection:** [AmirMohseni/curvebench](https://huggingface.co/collections/AmirMohseni/curvebench)
- **GitHub:** [Amir-Mohseni/CurveBench](https://github.com/Amir-Mohseni/CurveBench)

---

## Usage

### Option 1 — vLLM (recommended for serving)

Start the server with the LoRA adapter loaded on top of the base model:

```bash
vllm serve google/gemma-3-12b-it \
    --enable-lora \
    --lora-modules grpo-region-tree=AmirMohseni/curvebench-gemma-3-12b \
    --max-lora-rank 4 \
    --max-model-len 32768 \
    --gpu-memory-utilization 0.90 \
    --dtype bfloat16 \
    --trust-remote-code
```

Then query it with the OpenAI-compatible API:

```python
from openai import OpenAI
from datasets import load_dataset
import base64
from io import BytesIO

client = OpenAI(base_url="http://localhost:8000/v1", api_key="token")

# Load the first test image from the benchmark
ds = load_dataset("AmirMohseni/CurveBench-Easy", split="total_test")
image = ds[0]["image"]

buf = BytesIO()
image.save(buf, format="PNG")
image_b64 = base64.b64encode(buf.getvalue()).decode()

response = client.chat.completions.create(
    model="grpo-region-tree",
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "image_url",
                "image_url": {"url": f"data:image/png;base64,{image_b64}"},
            },
            {
                "type": "text",
                "text": (
                    "The image shows a set of pairwise non-intersecting closed curves drawn on a plane. "
                    "Each curve creates a boundary between an interior region and its surroundings. "
                    "Output the containment tree of the regions as a list of edges in the format: "
                    "[(parent, child), ...] where 0 is the outermost (unbounded) region."
                ),
            },
        ],
    }],
    max_tokens=2048,
)
print(response.choices[0].message.content)
print("Ground truth:", ds[0]["tree"])
```

### Option 2 — PEFT + Transformers (offline)

Load the base model and apply the LoRA adapter directly:

```python
from peft import PeftModel
from transformers import AutoModelForImageTextToText, AutoProcessor
from datasets import load_dataset
import torch

base_id = "google/gemma-3-12b-it"
adapter_id = "AmirMohseni/curvebench-gemma-3-12b"

processor = AutoProcessor.from_pretrained(base_id, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
    base_id, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True
)
model = PeftModel.from_pretrained(model, adapter_id)

# Load the first test image from the benchmark
ds = load_dataset("AmirMohseni/CurveBench-Easy", split="total_test")
image = ds[0]["image"]

prompt = (
    "The image shows a set of pairwise non-intersecting closed curves drawn on a plane. "
    "Each curve creates a boundary between an interior region and its surroundings. "
    "Output the containment tree of the regions as a list of edges in the format: "
    "[(parent, child), ...] where 0 is the outermost (unbounded) region."
)

inputs = processor(
    text=processor.apply_chat_template(
        [{"role": "user", "content": [{"type": "image"}, {"type": "text", "text": prompt}]}],
        add_generation_prompt=True,
    ),
    images=[image],
    return_tensors="pt",
).to(model.device)

output = model.generate(**inputs, max_new_tokens=2048)
print(processor.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
print("Ground truth:", ds[0]["tree"])
```

---

## Training curves

**Training reward**

![Training reward](train_reward.png)

**Eval reward**

![Eval reward](eval_reward.png)

---

## Training procedure

Trained with GRPO using a fork of TRL with multimodal support: [AmirTuring/trl @ curvebench](https://github.com/AmirTuring/trl/tree/curvebench).

- **Method:** GRPO (Group Relative Policy Optimization)
- **Base model:** [google/gemma-3-12b-it](https://huggingface.co/google/gemma-3-12b-it)
- **Training split:** `total_train` (210 images) from CurveBench-Easy
- **Reward:** tree isomorphism (0.7) + node count (0.3)
- **LoRA rank (r):** 4 | **LoRA alpha:** 8

### Framework versions

- TRL: 0.1.0
- Transformers: 4.57.1
- Pytorch: 2.8.0
- Datasets: 4.3.0
- Tokenizers: 0.22.1

---

## Citation

```bibtex
@misc{mohseni2026curvebench,
      title={CurveBench: A Benchmark for Exact Topological Reasoning over Nested Jordan Curves},
      author={Amirreza Mohseni and Mona Mohammadi and Morteza Saghafian and Naser Talebizadeh Sardari},
      year={2026},
      eprint={2605.14068},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2605.14068},
}
```

Cite GRPO as:

```bibtex
@article{shao2024deepseekmath,
    title        = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
    author       = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
    year         = 2024,
    eprint       = {arXiv:2402.03300},
}
```

Cite TRL as:

```bibtex
@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}
```