Instructions to use Navyasri12355/llama-3.2-3b-arxiv-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Navyasri12355/llama-3.2-3b-arxiv-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-3B-Instruct") model = PeftModel.from_pretrained(base_model, "Navyasri12355/llama-3.2-3b-arxiv-lora") - Notebooks
- Google Colab
- Kaggle
DIRAC โ LLaMA-3.2-3B arXiv LoRA
DIRAC (Domain-specific Intelligent Research Assistant with Context) is a LoRA fine-tuned
version of meta-llama/Llama-3.2-3B-Instruct,
trained on a curated corpus of arXiv ML/AI papers to function as a domain-specific research assistant.
It is designed to be used with a FAISS-based RAG pipeline: retrieved paper chunks are injected into the prompt context, and the model generates grounded, citation-aware answers.
Model Details
Model Description
- Model type: Causal Language Model (decoder-only), LoRA adapter
- Base model:
meta-llama/Llama-3.2-3B-Instruct - Language: English
- License: Llama 3.2 Community License
- Fine-tuned for: Retrieval-Augmented Generation (RAG) over arXiv ML/AI papers
Model Sources
- Repository: Navyasri12355/llama-3.2-3b-arxiv-lora
- Project code: DIRAC GitHub
Uses
Direct Use
This adapter can be used with the PEFT library to answer research questions about ML/AI papers when paired with retrieved context chunks from a FAISS vector store.
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch
base = "meta-llama/Llama-3.2-3B-Instruct"
adapter = "Navyasri12355/llama-3.2-3b-arxiv-lora"
bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16)
tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, quantization_config=bnb, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)
model.eval()
Downstream Use
Intended for use inside the DIRAC research assistant pipeline, where retrieved arXiv paper chunks are injected into the prompt before generation:
You are a research assistant specializing in machine learning.
Use the following retrieved paper excerpts to answer the question.
If the answer is not in the context, say so.
CONTEXT:
[Source 1: LoRA: Low-Rank Adaptation of LLMs (2021)]
We propose LoRA, which freezes the pre-trained model weights and injects trainable
rank decomposition matrices into each layer of the Transformer architecture...
Question: What are the key advantages of LoRA over full fine-tuning?
Answer:
Out-of-Scope Use
- General-purpose chatbot or instruction following outside the ML/AI domain
- Medical, legal, or safety-critical decision making
- Factual recall without a retrieval context (the model is optimized for RAG, not memorization)
Bias, Risks, and Limitations
- The training corpus is limited to arXiv ML/AI papers (
cs.LG,cs.AI, 2022โ2026); performance on other domains will degrade significantly. - The model may hallucinate paper titles or results when no relevant context is retrieved.
- Like all LLM-based systems, outputs should be verified against cited sources before use in academic writing.
Recommendations
Always use this model with a retrieval step (RAG). Do not rely on generated answers as ground truth without cross-referencing the cited arXiv papers.
Training Details
Training Data
A curated corpus of 4000 arXiv ML/AI papers (categories: cs.LG, cs.AI) published
between 2022โ2026, split into 512-token overlapping chunks and paired with synthetically
generated question-answer pairs for supervised fine-tuning. A held-out evaluation set of
50 papers (eval/holdout_50.jsonl) was excluded from training.
Training Procedure
Fine-tuned using ๐ค trl (SFTTrainer) + peft on a single NVIDIA T4 GPU (Google Colab)
with QLoRA (4-bit NF4 quantization via bitsandbytes).
Training Hyperparameters
| Hyperparameter | Value |
|---|---|
| Training regime | bf16 mixed precision |
| Optimizer | AdamW (paged) |
| Learning rate | 2e-4 |
| LR schedule | Cosine with warmup |
| Warmup ratio | 0.03 |
| Batch size (effective) | 16 |
| Gradient accumulation steps | 4 |
| Epochs | 3 |
| Max sequence length | 2048 tokens |
| LoRA rank (r) | 16 |
| LoRA alpha | 32 |
| LoRA dropout | 0.05 |
| LoRA target modules | q_proj, k_proj, v_proj, o_proj |
| Quantization | 4-bit NF4 (QLoRA) |
Evaluation
Testing Data
Held-out set of 50 arXiv papers (eval/holdout_50.jsonl) not seen during training.
Evaluation queries were generated from paper abstracts and full-text sections.
Metrics
ROUGE-L is computed against reference answers derived from paper abstracts.
| Model | ROUGE-L |
|---|---|
| Base LLaMA-3.2-3B-Instruct (no fine-tune) | 0.19 |
| DIRAC (this model) + RAG | 0.38 |
The fine-tuned model with RAG achieves a +100% relative improvement over the untuned base.
Technical Specifications
Model Architecture
| Component | Detail |
|---|---|
| Base model | LLaMA-3.2-3B-Instruct (3.21B parameters) |
| Adapter type | LoRA (Low-Rank Adaptation) |
| Trainable parameters | ~8.4M (โ0.26% of total) |
| Embedding model (RAG) | sentence-transformers/all-MiniLM-L6-v2 |
| Vector store | FAISS (L2 index) |
| Retrieval top-K | 5 chunks per query |
Compute Infrastructure
- Hardware: NVIDIA T4 (16 GB VRAM) โ Google Colab
- Training time: ~3 hours
- Framework: PyTorch 2.2, Transformers 4.40, PEFT 0.10, TRL 0.8
Citation
If you use this model, please cite:
@misc{pulipati2026dirac,
author = {Navyasri Pulipati},
title = {DIRAC: Domain-specific Intelligent Research Assistant with Context},
year = {2024},
howpublished = {\url{https://huggingface.co/Navyasri12355/llama-3.2-3b-arxiv-lora}},
note = {LoRA fine-tuned LLaMA-3.2-3B for arXiv ML/AI research Q\&A}
}
Model Card Contact
- Downloads last month
- 41
Model tree for Navyasri12355/llama-3.2-3b-arxiv-lora
Base model
meta-llama/Llama-3.2-3B-Instruct