--- language: - en license: apache-2.0 base_model: mistralai/Mistral-7B-v0.3 library_name: peft pipeline_tag: text-generation tags: - text-to-sql - sql - code-generation - mistral - qlora - lora - peft - nl2sql datasets: - spider - b-mc2/sql-create-context --- # SQLForge-Mistral-7B-QLoRA **SQLForge** is a QLoRA adapter for [`mistralai/Mistral-7B-v0.3`](https://huggingface.co/mistralai/Mistral-7B-v0.3) that forges natural-language questions into executable SQL. Trained on a mixture of **Spider** and **b-mc2/sql-create-context** (~90k examples), it brings exact-match accuracy on an internal text-to-SQL test split from **9.2% → 87.0%** over the base model, while keeping the adapter footprint under **340 MB**. ## Results Evaluated on a held-out internal text-to-SQL test set (500 examples, schema-aware prompt): | Model | Exact Set Match | Valid SQL | | ------------------------------ | --------------- | --------- | | `mistralai/Mistral-7B-v0.3` (base) | 9.2% | 86.2% | | **SQLForge-Mistral-7B-QLoRA** | **87.0%** | **97.4%** | A **9.5× improvement** in exact-match accuracy and a **+11.2 pp** lift in syntactic validity, delivered by a 336 MB LoRA adapter. ## Usage ```python import torch from transformers import AutoTokenizer, AutoModelForCausalLM from peft import PeftModel BASE = "mistralai/Mistral-7B-v0.3" ADAPT = "shreyash-pandey-katni/SQLForge-Mistral-7B-QLoRA" tok = AutoTokenizer.from_pretrained(BASE) model = AutoModelForCausalLM.from_pretrained( BASE, torch_dtype=torch.bfloat16, device_map="auto", ) model = PeftModel.from_pretrained(model, ADAPT) model.eval() schema = "CREATE TABLE employees (id INT, name TEXT, dept TEXT, salary REAL);" question = "List the names of employees in the Engineering department earning over 100000." prompt = f"""[INST] You are an expert SQL assistant. Given the schema and question, write a single SQL query. Schema: {schema} Question: {question} [/INST]""" inputs = tok(prompt, return_tensors="pt").to(model.device) out = model.generate(**inputs, max_new_tokens=128, do_sample=False) print(tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)) ``` ### 4-bit inference (single 12 GB GPU) ```python from transformers import BitsAndBytesConfig bnb = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16, ) model = AutoModelForCausalLM.from_pretrained(BASE, quantization_config=bnb, device_map="auto") model = PeftModel.from_pretrained(model, ADAPT) ``` ## Training Details - **Base model:** `mistralai/Mistral-7B-v0.3` - **Method:** QLoRA (4-bit NF4 base, BF16 LoRA adapters) via `trl.SFTTrainer` - **Datasets:** - [`spider`](https://huggingface.co/datasets/spider) — cross-domain text-to-SQL - [`b-mc2/sql-create-context`](https://huggingface.co/datasets/b-mc2/sql-create-context) — schema-grounded SQL - Combined: **90,719 train / 3,929 val rows** - **Prompt format:** `[INST] … [/INST]` (Mistral instruction template). Loss is masked to apply only on tokens **after** `[/INST]` (the SQL response). - **Max sequence length:** 512 tokens ### LoRA configuration | Setting | Value | | -------------- | ----------------------------------------------------------- | | Rank `r` | 64 | | Alpha | 16 (scale = α/r = 0.25, conservative — preserves reasoning) | | Dropout | 0.05 | | RSLoRA | enabled (1/√r scaling for r=64 stability) | | Target modules | `q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj` | | Bias | none | ### Training hyperparameters | Setting | Value | | -------------------- | -------------- | | Epochs | 3 | | Effective batch size | 8 (1 × 8 grad-accum) | | Learning rate | 2e-4 | | LR scheduler | cosine | | Warmup ratio | 0.03 | | Precision | BF16 | | Gradient checkpointing | enabled | | Optimizer | AdamW (transformers default) | | Checkpoint | `checkpoint-34020` (final) | ### Hardware - **GPU:** 1× NVIDIA RTX 3080 Ti (12 GB VRAM) - **Peak VRAM:** ~8.25 GB (4-bit NF4 base + BF16 adapters + grad-ckpt) - **Training time:** ~4–8 hours ## Limitations - **Schema sensitivity:** trained with explicit `CREATE TABLE` schemas in the prompt; performance degrades without schema context. - **Single-query bias:** optimized for single SELECT statements; multi-statement scripts, DDL, and procedural SQL are out of scope. - **Dialect:** training data is primarily SQLite/ANSI; vendor-specific syntax (T-SQL, PL/pgSQL, BigQuery functions) may not generalize. - **English-only:** prompts and questions are English; non-English NL queries untested. - **No execution safety:** generated SQL should be sandboxed/parameterized before running against production databases. ## Evaluation The metrics above use `exact_set_match` (gold and predicted normalized SQL match exactly as token sets) and `valid_sql_pct` (predicted SQL parses with `sqlparse`/sqlite3). See `evaluate_sql.py` in the [training repository](https://github.com/shreyash-Pandey-Katni/preTrainLLM) for the full eval harness. ## Files | File | Purpose | | ----------------------------- | ---------------------------------- | | `adapter_model.safetensors` | LoRA weights (~336 MB) | | `adapter_config.json` | PEFT/LoRA configuration | | `tokenizer.json` / `tokenizer_config.json` | Mistral tokenizer | | `optimizer.pt`, `scheduler.pt`, `rng_state.pth`, `training_args.bin`, `trainer_state.json` | Training-state snapshot (resume-capable) | ## License Released under **Apache 2.0**, the same license as the base model `mistralai/Mistral-7B-v0.3`. ## Citation ```bibtex @misc{sqlforge_mistral7b_qlora_2026, title = {SQLForge-Mistral-7B-QLoRA: A QLoRA Text-to-SQL Adapter for Mistral-7B-v0.3}, author = {Shreyash Pandey Katni}, year = {2026}, url = {https://huggingface.co/shreyash-pandey-katni/SQLForge-Mistral-7B-QLoRA} } ```