--- license: apache-2.0 language: - en library_name: transformers pipeline_tag: text-generation base_model: - Qwen/Qwen3.5-2B datasets: - minnesotanlp/Finch-Collection tags: - evolution-fine-tuning - evolutionary-search - discovery - code-optimization - scientific-discovery - mutation-operator - mid-training ---

Evolution Fine-Tuning: Learning to Discover Across 371 Optimization Tasks

A mid-training "practice phase" that teaches small open-source LLMs how to evolve solutions.

Website arXiv GitHub Dataset Finch-4B Finch-8B Finch-9B Finch-4B-KTO Finch-8B-KTO Apache 2.0

**Finch-2B** is the 2B member of the **Finch** family β€” open-source LLMs **evolution fine-tuned (EFT)** to act as a stronger **mutation operator** inside evolutionary search. Built on **Qwen3.5-2B** and trained on the [**Finch Collection**](https://huggingface.co/datasets/minnesotanlp/Finch-Collection), it learns *how to evolve a solution* (which part to mutate, what to keep, when to backtrack) and reuses that discovery skill across tasks. ## TL;DR State-of-the-art discovery systems put an LLM inside an evolutionary search *scaffold* β€” but the discovery know-how lives in the scaffold, and every new task starts from zero. **Evolution Fine-Tuning (EFT)** moves that behavior *into the model* by turning evolutionary search **trajectories** into supervision.
EFT as mid-training
- (Left) EFT acts as mid-training, boosting Finch's discovery on the ErdΕ‘s minimum-overlap problem under both test-time search and test-time learning. - (Right) On NP-hard competitive programming, Finch composes strategies learned across diverse domains, while the base model relies on a single repetitive strategy. ## Finch family | Model | Base | Params | Training | πŸ€— Hugging Face | |---|---|---:|---|:---:| | **`Finch-2B`** ← *this model* | **Qwen3.5-2B** | **2B** | **EFT** | [![Open on Hugging Face](https://img.shields.io/badge/-Open-FFD21E?logo=huggingface&logoColor=black)](https://huggingface.co/minnesotanlp/Finch-2B) | | `Finch-4B` | Qwen3.5-4B | 4B | EFT | [![Open on Hugging Face](https://img.shields.io/badge/-Open-FFD21E?logo=huggingface&logoColor=black)](https://huggingface.co/minnesotanlp/Finch-4B) | | `Finch-8B` | Qwen3-8B | 8B | EFT | [![Open on Hugging Face](https://img.shields.io/badge/-Open-FFD21E?logo=huggingface&logoColor=black)](https://huggingface.co/minnesotanlp/Finch-8B) | | `Finch-9B` | Qwen3.5-9B | 9B | EFT | [![Open on Hugging Face](https://img.shields.io/badge/-Open-FFD21E?logo=huggingface&logoColor=black)](https://huggingface.co/minnesotanlp/Finch-9B) | | `Finch-4B-KTO` | Qwen3.5-4B | 4B | EFT + KTO | [![Open on Hugging Face](https://img.shields.io/badge/-Open-FFD21E?logo=huggingface&logoColor=black)](https://huggingface.co/minnesotanlp/Finch-4B-KTO) | | `Finch-8B-KTO` | Qwen3-8B | 8B | EFT + KTO | [![Open on Hugging Face](https://img.shields.io/badge/-Open-FFD21E?logo=huggingface&logoColor=black)](https://huggingface.co/minnesotanlp/Finch-8B-KTO) | ## How to Use Finch 1. **Execute OpenEvolve scaffold with Finch (vLLM serving, recommended)** Finch is a **mutation operator for evolutionary search**, most effective driven by a scaffold such as **OpenEvolve** (`T = 100`, temperature `0.7`, top-`p` `0.95`, up to `30K` tokens). You can also use other scaffolds in the [SkyDiscover](https://github.com/skydiscover-ai/skydiscover) framework, but we do not guarantee performance, as our model is trained on OpenEvolve's trajectories β€” one of this work's limitations. 2. **Calling Finch directly** You can also call Finch directly: **System prompt** (task-level instruction from the OpenEvolve scaffold): ``` You are an expert mathematician specializing in circle packing problems and computational geometry. Your task is to improve a constructor function that directly produces a specific arrangement of 26 circles in a unit square, maximizing the sum of their radii. The AlphaEvolve paper achieved a sum of 2.635 for n=26. Key geometric insights: - Circle packings often follow hexagonal patterns in the densest regions - Maximum density for infinite circle packing is pi/(2*sqrt(3)) β‰ˆ 0.9069 - Edge effects make square container packing harder than infinite packing - Similar radius circles often form regular patterns, while varied radii allow better space utilization ``` **User prompt** (evolutionary state β€” current program + evaluator feedback + evolutionary history): ``` # Current Program Information - Fitness: 0.3642 (sum_radii: 0.9598) - Focus areas: Fitness unchanged at 0.3642. Consider simplifying β€” code length exceeds 500 characters. # Program Evolution History ## Previous Attempts ### Attempt 1 - Changes: Replace concentric ring placement with hexagonal lattice (5-6-5-6-5 row pattern) - Metrics: sum_radii: 0.9598, validity: 1.0 β€” Improvement in all metrics # Current Program # EVOLVE-BLOCK-START import numpy as np def construct_packing(): n = 26 centers = np.zeros((n, 2)) centers[0] = [0.5, 0.5] # center circle for i in range(8): # inner ring angle = 2 * np.pi * i / 8 centers[i+1] = [0.5 + 0.3*np.cos(angle), 0.5 + 0.3*np.sin(angle)] for i in range(16): # outer ring angle = 2 * np.pi * i / 16 centers[i+9] = [0.5 + 0.7*np.cos(angle), 0.5 + 0.7*np.sin(angle)] centers = np.clip(centers, 0.01, 0.99) radii = compute_max_radii(centers) return centers, radii, np.sum(radii) # EVOLVE-BLOCK-END ``` ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer model_id = "minnesotanlp/Finch-2B" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto") # Given an evolutionary state β€” task instruction + parent program + evolutionary history # + evaluator feedback β€” Finch proposes an improved candidate program. messages = [ {"role": "system", "content": SYSTEM_PROMPT}, # provided by your evolutionary scaffold {"role": "user", "content": USER_PROMPT}, # parent program + feedback + history ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, return_tensors="pt" ).to(model.device) out = model.generate(inputs, max_new_tokens=30000, do_sample=True, temperature=0.7, top_p=0.95) print(tokenizer.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True)) ``` ## Training - **Data.** `improved` transitions from the [Finch Collection](https://huggingface.co/datasets/minnesotanlp/Finch-Collection) across **355 training tasks** (16 of 371 held out). One evolutionary run is kept per task β†’ **30,445** supervised examples; **900** uniformly-sampled examples for validation. - **Teacher.** Trajectories generated by **Qwen3.5-397B-A17B** inside the **OpenEvolve** scaffold. - **Recipe.** Full SFT with [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) β€” **1 epoch**, global batch size **128**, learning rate **1e-5**, on **8Γ— NVIDIA H200 140GB** GPUs. ## Results - Finch outperforms its same-size base model by +10.2% on 22 held-out tasks across 5 domains, with improvements of up to +290% on individual tasks. - Larger models benefit more, and Finch-4B matches a model roughly 2Γ— larger on the ErdΕ‘s task.
main results
- On competitive programming (FrontierCS), Finch-9B averages 46.01 vs base Qwen3.5-9B's 32.46; on CALICO's P263 (UC Berkeley's official open-ended contest) it scores 86.10 vs 55.09
frontiercs results
- With preference learning (KTO), Finch-8B surpasses the best human score on AC1 and AC2, while its competitive programming score improves from 24.56 β†’ 37.30. - Finch-8B matches SOTA on two circle-packing tasks and improves the ErdΕ‘s task by +3.2%.
frontiercs results
## Limitations Trajectories are collected and evaluated only with **OpenEvolve**; behavior under different scaffolds is not guaranteed. ## License The **Finch Collection** is released under the [**CC-BY 4.0 License**](https://creativecommons.org/licenses/by/4.0/) and is recommended for **non-commercial academic research**. The accompanying **code** and **Finch model weights** are released under the [**Apache 2.0 License**](https://www.apache.org/licenses/LICENSE-2.0). ## Acknowledgements This research was supported by the "Advanced GPU Utilization Support Program" funded by the Government of the Republic of Korea (Ministry of Science and ICT). We are grateful to the SkyDiscover team for their valuable feedback on the dataset construction process, the use of the SkyDiscover framework, and the overall direction of this research β€” in particular, [Shu Liu](https://shulynnliu.com/), [Shubham Agarwal](https://skejriwal44.github.io/), and [Mert Cemri](https://people.eecs.berkeley.edu/~mert_cemri/) for their insightful comments and discussions. We also thank the OpenEvolve team, especially Ritik Vijayvergiya and [Asankhaya Sharma](https://asankhaya.github.io/), for their guidance on using the OpenEvolve framework and for their thoughtful comments on this work. We further thank the authors of ALE-Bench, especially [Yuki Imajuku](https://imajuku.tech/), and the AtCoder team for authorizing the public release of the evolutionary search trajectories derived from their CC BY-ND 4.0-licensed dataset. Finally, we thank [Byung-Kwan Lee](https://byungkwanlee.github.io/ByungKwanLee-CV/) for valuable feedback during the early stages of this project. ## Citation ```bibtex @misc{lee2026evolutionfinetuninglearningdiscover, title={Evolution Fine-Tuning: Learning to Discover Across 371 Optimization Tasks}, author={Young-Jun Lee and Seungone Kim and Minki Kang and Alistair Cheong Liang Chuen and Zerui Chen and Seungho Han and Taehee Jung and Dongyeop Kang}, year={2026}, eprint={2606.29082}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2606.29082}, } ```