Text-to-SQL LightGBM Reranker

This repository contains a trained LightGBM LambdaRank reranker for selecting the best SQL candidate from CodeT5-generated SQL candidates.

Performance Summary

  • Rows: 4701
  • Candidate upper bound: 0.4482025101042331
  • Rule reranker exact: 0.24803233354605403
  • Validation ML exact: 0.3995749202975558
  • Validation ML correct: 376 / 941
  • Full ML exact: 0.43671559242714314

Files

  • sql_reranker_lightgbm.txt: trained LightGBM model
  • feature_columns.json: exact feature column order used during training
  • feature_importance.csv: LightGBM feature importance
  • summary.json: training summary
  • inference.py: helper functions for loading and scoring

Intended Pipeline

Question -> NER + value index -> CodeT5 candidates -> feature extraction -> LightGBM reranker -> best SQL

Important

This reranker does not generate SQL by itself. It only ranks already-generated SQL candidates.

At inference time, use the same feature extraction logic as training and reindex with feature_columns.json.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support