--- base_model: google/gemma-3-12b-it library_name: peft datasets: - AmirMohseni/CurveBench-Easy - AmirMohseni/CurveBench arxiv: 2605.14068 tags: - grpo - trl - lora - vision-language-model - topological-reasoning - curvebench --- # curvebench-gemma-3-12b This is a **LoRA adapter** for [google/gemma-3-12b-it](https://huggingface.co/google/gemma-3-12b-it), fine-tuned with GRPO on [CurveBench-Easy](https://huggingface.co/datasets/AmirMohseni/CurveBench-Easy) using verifiable rewards for topological tree prediction. It corresponds to **model-c** in the [CurveBench paper](https://arxiv.org/abs/2605.14068) (reward: tree isomorphism (0.7) + node count (0.3)). - **Paper:** [CurveBench: A Benchmark for Exact Topological Reasoning over Nested Jordan Curves](https://arxiv.org/abs/2605.14068) - **Training dataset:** [AmirMohseni/CurveBench-Easy](https://huggingface.co/datasets/AmirMohseni/CurveBench-Easy) - **Evaluation dataset:** [AmirMohseni/CurveBench](https://huggingface.co/datasets/AmirMohseni/CurveBench) - **Collection:** [AmirMohseni/curvebench](https://huggingface.co/collections/AmirMohseni/curvebench) - **GitHub:** [Amir-Mohseni/CurveBench](https://github.com/Amir-Mohseni/CurveBench) --- ## Usage ### Option 1 — vLLM (recommended for serving) Start the server with the LoRA adapter loaded on top of the base model: ```bash vllm serve google/gemma-3-12b-it \ --enable-lora \ --lora-modules grpo-region-tree=AmirMohseni/curvebench-gemma-3-12b \ --max-lora-rank 4 \ --max-model-len 32768 \ --gpu-memory-utilization 0.90 \ --dtype bfloat16 \ --trust-remote-code ``` Then query it with the OpenAI-compatible API: ```python from openai import OpenAI from datasets import load_dataset import base64 from io import BytesIO client = OpenAI(base_url="http://localhost:8000/v1", api_key="token") # Load the first test image from the benchmark ds = load_dataset("AmirMohseni/CurveBench-Easy", split="total_test") image = ds[0]["image"] buf = BytesIO() image.save(buf, format="PNG") image_b64 = base64.b64encode(buf.getvalue()).decode() response = client.chat.completions.create( model="grpo-region-tree", messages=[{ "role": "user", "content": [ { "type": "image_url", "image_url": {"url": f"data:image/png;base64,{image_b64}"}, }, { "type": "text", "text": ( "The image shows a set of pairwise non-intersecting closed curves drawn on a plane. " "Each curve creates a boundary between an interior region and its surroundings. " "Output the containment tree of the regions as a list of edges in the format: " "[(parent, child), ...] where 0 is the outermost (unbounded) region." ), }, ], }], max_tokens=2048, ) print(response.choices[0].message.content) print("Ground truth:", ds[0]["tree"]) ``` ### Option 2 — PEFT + Transformers (offline) Load the base model and apply the LoRA adapter directly: ```python from peft import PeftModel from transformers import AutoModelForImageTextToText, AutoProcessor from datasets import load_dataset import torch base_id = "google/gemma-3-12b-it" adapter_id = "AmirMohseni/curvebench-gemma-3-12b" processor = AutoProcessor.from_pretrained(base_id, trust_remote_code=True) model = AutoModelForImageTextToText.from_pretrained( base_id, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True ) model = PeftModel.from_pretrained(model, adapter_id) # Load the first test image from the benchmark ds = load_dataset("AmirMohseni/CurveBench-Easy", split="total_test") image = ds[0]["image"] prompt = ( "The image shows a set of pairwise non-intersecting closed curves drawn on a plane. " "Each curve creates a boundary between an interior region and its surroundings. " "Output the containment tree of the regions as a list of edges in the format: " "[(parent, child), ...] where 0 is the outermost (unbounded) region." ) inputs = processor( text=processor.apply_chat_template( [{"role": "user", "content": [{"type": "image"}, {"type": "text", "text": prompt}]}], add_generation_prompt=True, ), images=[image], return_tensors="pt", ).to(model.device) output = model.generate(**inputs, max_new_tokens=2048) print(processor.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)) print("Ground truth:", ds[0]["tree"]) ``` --- ## Training curves **Training reward** ![Training reward](train_reward.png) **Eval reward** ![Eval reward](eval_reward.png) --- ## Training procedure Trained with GRPO using a fork of TRL with multimodal support: [AmirTuring/trl @ curvebench](https://github.com/AmirTuring/trl/tree/curvebench). - **Method:** GRPO (Group Relative Policy Optimization) - **Base model:** [google/gemma-3-12b-it](https://huggingface.co/google/gemma-3-12b-it) - **Training split:** `total_train` (210 images) from CurveBench-Easy - **Reward:** tree isomorphism (0.7) + node count (0.3) - **LoRA rank (r):** 4 | **LoRA alpha:** 8 ### Framework versions - TRL: 0.1.0 - Transformers: 4.57.1 - Pytorch: 2.8.0 - Datasets: 4.3.0 - Tokenizers: 0.22.1 --- ## Citation ```bibtex @misc{mohseni2026curvebench, title={CurveBench: A Benchmark for Exact Topological Reasoning over Nested Jordan Curves}, author={Amirreza Mohseni and Mona Mohammadi and Morteza Saghafian and Naser Talebizadeh Sardari}, year={2026}, eprint={2605.14068}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2605.14068}, } ``` Cite GRPO as: ```bibtex @article{shao2024deepseekmath, title = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}}, author = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo}, year = 2024, eprint = {arXiv:2402.03300}, } ``` Cite TRL as: ```bibtex @misc{vonwerra2022trl, title = {{TRL: Transformer Reinforcement Learning}}, author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec}, year = 2020, journal = {GitHub repository}, publisher = {GitHub}, howpublished = {\url{https://github.com/huggingface/trl}} } ```