DIRAC โ€” LLaMA-3.2-3B arXiv LoRA

DIRAC (Domain-specific Intelligent Research Assistant with Context) is a LoRA fine-tuned version of meta-llama/Llama-3.2-3B-Instruct, trained on a curated corpus of arXiv ML/AI papers to function as a domain-specific research assistant.

It is designed to be used with a FAISS-based RAG pipeline: retrieved paper chunks are injected into the prompt context, and the model generates grounded, citation-aware answers.


Model Details

Model Description

  • Model type: Causal Language Model (decoder-only), LoRA adapter
  • Base model: meta-llama/Llama-3.2-3B-Instruct
  • Language: English
  • License: Llama 3.2 Community License
  • Fine-tuned for: Retrieval-Augmented Generation (RAG) over arXiv ML/AI papers

Model Sources


Uses

Direct Use

This adapter can be used with the PEFT library to answer research questions about ML/AI papers when paired with retrieved context chunks from a FAISS vector store.

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch

base    = "meta-llama/Llama-3.2-3B-Instruct"
adapter = "Navyasri12355/llama-3.2-3b-arxiv-lora"

bnb       = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16)
tokenizer = AutoTokenizer.from_pretrained(base)
model     = AutoModelForCausalLM.from_pretrained(base, quantization_config=bnb, device_map="auto")
model     = PeftModel.from_pretrained(model, adapter)
model.eval()

Downstream Use

Intended for use inside the DIRAC research assistant pipeline, where retrieved arXiv paper chunks are injected into the prompt before generation:

You are a research assistant specializing in machine learning.
Use the following retrieved paper excerpts to answer the question.
If the answer is not in the context, say so.

CONTEXT:
[Source 1: LoRA: Low-Rank Adaptation of LLMs (2021)]
We propose LoRA, which freezes the pre-trained model weights and injects trainable
rank decomposition matrices into each layer of the Transformer architecture...

Question: What are the key advantages of LoRA over full fine-tuning?
Answer:

Out-of-Scope Use

  • General-purpose chatbot or instruction following outside the ML/AI domain
  • Medical, legal, or safety-critical decision making
  • Factual recall without a retrieval context (the model is optimized for RAG, not memorization)

Bias, Risks, and Limitations

  • The training corpus is limited to arXiv ML/AI papers (cs.LG, cs.AI, 2022โ€“2026); performance on other domains will degrade significantly.
  • The model may hallucinate paper titles or results when no relevant context is retrieved.
  • Like all LLM-based systems, outputs should be verified against cited sources before use in academic writing.

Recommendations

Always use this model with a retrieval step (RAG). Do not rely on generated answers as ground truth without cross-referencing the cited arXiv papers.


Training Details

Training Data

A curated corpus of 4000 arXiv ML/AI papers (categories: cs.LG, cs.AI) published between 2022โ€“2026, split into 512-token overlapping chunks and paired with synthetically generated question-answer pairs for supervised fine-tuning. A held-out evaluation set of 50 papers (eval/holdout_50.jsonl) was excluded from training.

Training Procedure

Fine-tuned using ๐Ÿค— trl (SFTTrainer) + peft on a single NVIDIA T4 GPU (Google Colab) with QLoRA (4-bit NF4 quantization via bitsandbytes).

Training Hyperparameters

Hyperparameter Value
Training regime bf16 mixed precision
Optimizer AdamW (paged)
Learning rate 2e-4
LR schedule Cosine with warmup
Warmup ratio 0.03
Batch size (effective) 16
Gradient accumulation steps 4
Epochs 3
Max sequence length 2048 tokens
LoRA rank (r) 16
LoRA alpha 32
LoRA dropout 0.05
LoRA target modules q_proj, k_proj, v_proj, o_proj
Quantization 4-bit NF4 (QLoRA)

Evaluation

Testing Data

Held-out set of 50 arXiv papers (eval/holdout_50.jsonl) not seen during training. Evaluation queries were generated from paper abstracts and full-text sections.

Metrics

ROUGE-L is computed against reference answers derived from paper abstracts.

Model ROUGE-L
Base LLaMA-3.2-3B-Instruct (no fine-tune) 0.19
DIRAC (this model) + RAG 0.38

The fine-tuned model with RAG achieves a +100% relative improvement over the untuned base.


Technical Specifications

Model Architecture

Component Detail
Base model LLaMA-3.2-3B-Instruct (3.21B parameters)
Adapter type LoRA (Low-Rank Adaptation)
Trainable parameters ~8.4M (โ‰ˆ0.26% of total)
Embedding model (RAG) sentence-transformers/all-MiniLM-L6-v2
Vector store FAISS (L2 index)
Retrieval top-K 5 chunks per query

Compute Infrastructure

  • Hardware: NVIDIA T4 (16 GB VRAM) โ€” Google Colab
  • Training time: ~3 hours
  • Framework: PyTorch 2.2, Transformers 4.40, PEFT 0.10, TRL 0.8

Citation

If you use this model, please cite:

@misc{pulipati2026dirac,
  author       = {Navyasri Pulipati},
  title        = {DIRAC: Domain-specific Intelligent Research Assistant with Context},
  year         = {2024},
  howpublished = {\url{https://huggingface.co/Navyasri12355/llama-3.2-3b-arxiv-lora}},
  note         = {LoRA fine-tuned LLaMA-3.2-3B for arXiv ML/AI research Q\&A}
}

Model Card Contact

HuggingFace

Downloads last month
41
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Navyasri12355/llama-3.2-3b-arxiv-lora

Adapter
(762)
this model