Qwen3-4B Indian Law

A domain-adapted legal assistant fine-tuned from Qwen3-4B on a large corpus of Indian legal texts, statutory provisions, constitutional law, criminal law, evidence law, procedural law, and court judgments.

The model is designed to assist with:

  • Indian legal question answering
  • Statutory interpretation
  • Constitution-related queries
  • Criminal law and procedure
  • Legal reasoning
  • Case law understanding
  • Legal research assistance
  • Judgment summarization
  • Legal education and training

Model Overview

Item Value
Base Model unsloth/Qwen3-4B
Fine-Tuning Method LoRA + QLoRA
Framework Unsloth
Context Length 4096
Precision BF16
LoRA Rank 32
LoRA Alpha 32
Optimizer AdamW 8-bit
Learning Rate 2e-4
Scheduler Cosine
Epochs 2
Effective Batch Size 32
Domain Indian Legal Knowledge

Training Dataset

The training corpus was created by combining multiple publicly available Indian legal datasets together with a large judgment corpus.

The objective was to expose the model to:

  • Legal question answering
  • Statutory provisions
  • Constitutional law
  • Criminal law
  • Procedural law
  • Evidence law
  • Court judgments
  • Legal summarization
  • Legal reasoning

Dataset Composition

1. Indian Legal Supervised Fine-Tuning Dataset

Source:

Prarabdha/indian-legal-supervised-fine-tuning-data

Characteristics:

  • Large-scale legal instruction dataset
  • Context → Question → Answer format
  • Derived from Indian court judgments
  • Designed for legal reasoning and legal QA

Original Size:

6,055,371 samples

To prevent over-representation and memorization, a subset was selected during dataset balancing.

Contribution:

≈ 250,000 samples

Example:

Context:
Delhi Development Authority v. Kanwar Kumar Mehta

Question:
Was the High Court justified in calculating interest on escalation charges?

Answer:
Yes. The High Court's decision was held justified on equitable grounds.

2. Indian Law Instruction Dataset

Source:

viber1/indian-law-dataset

Characteristics:

  • Legal instruction-response pairs
  • Covers Indian legal concepts
  • General legal knowledge
  • Legal terminology

Samples:

24,607

Example:

Question:
What is the difference between a petition and a plaint?

Answer:
A petition is a formal request seeking relief, whereas a plaint is the written statement initiating a civil suit.

3. Constitution of India QA Dataset

Custom processed dataset containing question-answer pairs generated from constitutional provisions.

Coverage:

  • Fundamental Rights
  • Directive Principles
  • Union and State relations
  • Parliament
  • Judiciary
  • Constitutional amendments

Samples:

4,082

Example:

Question:
What is India according to the Constitution?

Answer:
India, that is Bharat, shall be a Union of States.

4. Indian Penal Code (IPC) Dataset

Custom processed IPC question-answer corpus.

Coverage:

  • Definitions
  • Offences
  • Punishments
  • Criminal liability
  • General exceptions

Samples:

2,267

Example:

Question:
What is the title and extent of operation of the Indian Penal Code?

Answer:
The title is the Indian Penal Code and it extends to offences committed within India and certain offences committed outside India.

5. Code of Criminal Procedure (CrPC) Dataset

Custom processed question-answer dataset generated from CrPC provisions.

Coverage:

  • Investigation
  • Arrest
  • Bail
  • Trial procedures
  • Appeals
  • Criminal courts

Samples:

8,194

Example:

Question:
What is the short title and commencement of the CrPC?

Answer:
The Code of Criminal Procedure, 1973.

6. IndicLegalQA

Legal question-answer dataset derived from Indian Supreme Court judgments.

Coverage:

  • Case law
  • Judicial reasoning
  • Legal interpretation

Samples:

10,002

Example:

Question:
Who was the respondent in Union of India v. Maj. Gen. Manomoy Ganguly?

Answer:
Maj. Gen. Manomoy Ganguly.

7. Bharatiya Nyaya Sanhita (BNS)

Structured dataset generated from the Bharatiya Nyaya Sanhita, 2023.

Coverage:

  • Criminal offences
  • Punishments
  • Definitions
  • Modern criminal law provisions

Source Structure:

Chapter
Section
Section Name
Description

8. Bharatiya Sakshya Adhiniyam (BSA)

Structured dataset generated from the Bharatiya Sakshya Adhiniyam, 2023.

Coverage:

  • Evidence law
  • Documentary evidence
  • Digital evidence
  • Witness testimony

Source Structure:

Chapter
Section
Section Name
Description

9. Indian Court Judgments Corpus

Largest component of the training data.

Sources include:

  • Supreme Court judgments
  • High Court judgments
  • CourtNIC archives
  • JUDIS archives

Documents processed:

16,726 judgment files

Coverage:

  • Constitutional law
  • Civil law
  • Criminal law
  • Taxation
  • Property law
  • Administrative law
  • Service law

Training samples were automatically converted into:

Context → Question → Answer

instruction format.


Dataset Balancing

The original corpus was heavily dominated by judgment-derived samples.

Without balancing:

451,756 samples

Distribution:

Judgment-heavy

To improve generalization across statutory and constitutional law, a balancing procedure was applied.

Final balanced dataset:

304,930 samples

Approximate distribution:

Category Samples
General Legal QA 190,744
Court Judgments 66,368
Constitution 32,346
CrPC 8,719
IPC 6,698
BNS 50
BSA 5

This balancing significantly reduced bias toward judgment memorization while preserving broad legal coverage.


Training Configuration

The model was fine-tuned using LoRA adapters on top of Qwen3-4B.

LoRA Configuration

r=32
lora_alpha=32
lora_dropout=0.0

Target Modules:

q_proj
k_proj
v_proj
o_proj
gate_proj
up_proj
down_proj

Optimization

Learning Rate: 2e-4
Weight Decay: 0.01
Warmup Ratio: 0.03
Scheduler: Cosine
Optimizer: AdamW 8-bit

Training

Epochs: 2
Max Sequence Length: 4096
Batch Size: 8
Gradient Accumulation: 4
Effective Batch Size: 32
Precision: BF16

Example Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "goasty/Qwen3-4B-Indian-Law"

tokenizer = AutoTokenizer.from_pretrained(model_name)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

prompt = """
What is Article 21 of the Constitution of India?
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=256
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Intended Use

Suitable for:

  • Legal research assistance
  • Educational purposes
  • Law students
  • Legal document analysis
  • Statutory interpretation
  • Legal Q&A systems
  • Retrieval-Augmented Generation (RAG)

Limitations

  • Not a substitute for licensed legal counsel.
  • May generate legally incorrect or outdated interpretations.
  • Should not be relied upon for litigation or legal advice without expert review.
  • Training data contains historical judgments and statutes which may have been amended or overruled.

Acknowledgements

This work builds upon:

  • Qwen Team
  • Unsloth
  • Hugging Face Datasets Community
  • Indian Legal Open Data Contributors
  • Supreme Court and High Court public legal records

Citation

@misc{qwen3_indian_law,
  title={Qwen3-4B Indian Law},
  author={Aditya},
  year={2026},
  note={Fine-tuned Qwen3-4B model for Indian legal reasoning and question answering}
}
Downloads last month
36
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for goasty/Qwen3-4B-Indian-Law

Finetuned
Qwen/Qwen3-4B
Finetuned
unsloth/Qwen3-4B
Adapter
(25)
this model

Datasets used to train goasty/Qwen3-4B-Indian-Law