Qwen2.5-Coder-7B-SkyRL-SQL

Qwen2.5-Coder-7B-Instruct trained as a multi-turn SQL agent with GRPO using SkyRL's SkyRL-SQL recipe. The model interacts with a real SQLite database over up to 6 turns: it probes the schema with exploratory queries, observes actual execution results (or errors), refines its understanding, and then commits a final answer.

Method

  • Recipe: SkyRL-SQL (examples/train/text_to_sql), GRPO with dual-clip policy loss, no KL
  • Training data: SkyRL-SQL-653 โ€” only 653 examples, executed against OmniSQL databases
  • Reward: execution-result match against the gold query on the final answer (sparse, outcome-only)
  • Training: 10 optimizer steps (2 epochs), batch 128 prompts ร— 5 samples, max 16k context
  • Hardware: 4ร— NVIDIA L40 (48GB), FSDP + vLLM via SkyRL

Results (held-out Spider, execution accuracy pass@1)

step 0 (base model) step 5 step 10 (this model)
38.4% 57.9% 69.8%

Average response length also dropped 528 โ†’ 364 tokens โ€” the model learned to probe the database decisively rather than ramble.

Prompt format

The model expects the SkyRL-SQL interaction format: a system/user prompt containing the database schema and question, with <sql>...</sql> blocks for exploratory queries (results are returned in <observation> messages) and a final <solution>...</solution> block. See the SkyRL-SQL recipe for the exact template and a runnable environment.

Downloads last month
11
Safetensors
Model size
8B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for lecunyin/Qwen2.5-Coder-7B-SkyRL-SQL

Base model

Qwen/Qwen2.5-7B
Finetuned
(397)
this model