Qwen2.5-Coder-7B-SkyRL-SQL

Qwen2.5-Coder-7B-Instruct trained as a multi-turn SQL agent with GRPO using SkyRL's SkyRL-SQL recipe. The model interacts with a real SQLite database over up to 6 turns: it probes the schema with exploratory queries, observes actual execution results (or errors), refines its understanding, and then commits a final answer.

Method

Recipe: SkyRL-SQL (examples/train/text_to_sql), GRPO with dual-clip policy loss, no KL
Training data: SkyRL-SQL-653 — only 653 examples, executed against OmniSQL databases
Reward: execution-result match against the gold query on the final answer (sparse, outcome-only)
Training: 10 optimizer steps (2 epochs), batch 128 prompts × 5 samples, max 16k context
Hardware: 4× NVIDIA L40 (48GB), FSDP + vLLM via SkyRL

Results (held-out Spider, execution accuracy pass@1)

step 0 (base model)	step 5	step 10 (this model)
38.4%	57.9%	69.8%

Average response length also dropped 528 → 364 tokens — the model learned to probe the database decisively rather than ramble.

Prompt format

The model expects the SkyRL-SQL interaction format: a system/user prompt containing the database schema and question, with <sql>...</sql> blocks for exploratory queries (results are returned in <observation> messages) and a final <solution>...</solution> block. See the SkyRL-SQL recipe for the exact template and a runnable environment.

Downloads last month: 11

Safetensors

Model size

8B params

Tensor type

F32

Model tree for lecunyin/Qwen2.5-Coder-7B-SkyRL-SQL

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-Coder-7B

Finetuned

Qwen/Qwen2.5-Coder-7B-Instruct

Finetuned

(397)

this model