Qwen2.5-Coder-7B-SkyRL-SQL
Qwen2.5-Coder-7B-Instruct trained as a multi-turn SQL agent with GRPO using SkyRL's SkyRL-SQL recipe. The model interacts with a real SQLite database over up to 6 turns: it probes the schema with exploratory queries, observes actual execution results (or errors), refines its understanding, and then commits a final answer.
Method
- Recipe: SkyRL-SQL (
examples/train/text_to_sql), GRPO with dual-clip policy loss, no KL - Training data: SkyRL-SQL-653 โ only 653 examples, executed against OmniSQL databases
- Reward: execution-result match against the gold query on the final answer (sparse, outcome-only)
- Training: 10 optimizer steps (2 epochs), batch 128 prompts ร 5 samples, max 16k context
- Hardware: 4ร NVIDIA L40 (48GB), FSDP + vLLM via SkyRL
Results (held-out Spider, execution accuracy pass@1)
| step 0 (base model) | step 5 | step 10 (this model) |
|---|---|---|
| 38.4% | 57.9% | 69.8% |
Average response length also dropped 528 โ 364 tokens โ the model learned to probe the database decisively rather than ramble.
Prompt format
The model expects the SkyRL-SQL interaction format: a system/user prompt containing the database schema and question, with <sql>...</sql> blocks for exploratory queries (results are returned in <observation> messages) and a final <solution>...</solution> block. See the SkyRL-SQL recipe for the exact template and a runnable environment.
- Downloads last month
- 11