Text-to-SQL LightGBM Reranker
This repository contains a trained LightGBM LambdaRank reranker for selecting the best SQL candidate from CodeT5-generated SQL candidates.
Performance Summary
- Rows: 4701
- Candidate upper bound: 0.4482025101042331
- Rule reranker exact: 0.24803233354605403
- Validation ML exact: 0.3995749202975558
- Validation ML correct: 376 / 941
- Full ML exact: 0.43671559242714314
Files
- sql_reranker_lightgbm.txt: trained LightGBM model
- feature_columns.json: exact feature column order used during training
- feature_importance.csv: LightGBM feature importance
- summary.json: training summary
- inference.py: helper functions for loading and scoring
Intended Pipeline
Question -> NER + value index -> CodeT5 candidates -> feature extraction -> LightGBM reranker -> best SQL
Important
This reranker does not generate SQL by itself. It only ranks already-generated SQL candidates.
At inference time, use the same feature extraction logic as training and reindex with feature_columns.json.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support