LoRA Adapter (Rank 16) - Llama 3.2 3B Vietnamese ⭐

LoRA adapter (rank=16) fine-tuned on Vietnamese Alpaca dataset. Best balance between performance and efficiency.

Model Details

  • Base Model: unsloth/Llama-3.2-3B-Instruct-bnb-4bit
  • LoRA Rank: 16
  • LoRA Alpha: 32
  • Target Modules: q_proj, v_proj
  • Dataset: Vietnamese Alpaca (180 train samples, 20 eval samples)
  • Training Framework: Unsloth + TRL SFTTrainer
  • Quantization: 4-bit (QLoRA)

Metrics

  • Trainable Parameters: 4,587,520 (0.14% of total)
  • Training Time: 4.32 minutes on T4
  • Peak VRAM: 6.23 GB (lowest among all ranks!)
  • Eval Loss: 1.6530
  • Perplexity: 5.22 (best among all ranks!)

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "unsloth/Llama-3.2-3B-Instruct-bnb-4bit",
    load_in_4bit=True,
    device_map="auto"
)

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "luckyman2907/lab21-llama3.2-3b-r16")
tokenizer = AutoTokenizer.from_pretrained("luckyman2907/lab21-llama3.2-3b-r16")

# Generate
prompt = "Giải thích về machine learning"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=200, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

  • Epochs: 3
  • Learning Rate: 2e-4
  • Scheduler: Cosine with warmup (10%)
  • Batch Size: 8 (effective, via gradient accumulation)
  • Optimizer: AdamW 8-bit
  • GPU: Tesla T4 (16 GB)
  • Training Cost: ~$0.02 (4.32 minutes @ $0.35/hr)

Comparison with Other Ranks

Rank Trainable Params Train Time Peak VRAM Perplexity Status
8 2.3M 4.01 min 7.00 GB 5.30 Underfitting
16 4.6M 4.32 min 6.23 GB 5.22 Best
64 18.4M 3.97 min 7.97 GB 5.23 Diminishing returns

Why Rank 16 is the Best Choice?

  1. Lowest VRAM usage (6.23 GB) - 12% less than r8, 22% less than r64
  2. Best perplexity (5.22) - outperforms both r8 and r64
  3. Optimal capacity - 4.6M params is the sweet spot for 180 training samples
  4. Cost-effective - Fast training (~4 min) with best results

Qualitative Improvements

Fine-tuned model shows significant improvements over base model:

  • ✅ Better instruction-following format
  • ✅ More concise and structured responses
  • ✅ Improved Vietnamese language generation
  • ✅ Better code formatting (markdown, syntax)
  • ✅ Reduced hallucination

Limitations

  • Dataset size is small (180 samples) - may not generalize to all Vietnamese tasks
  • Technical concepts (LoRA, RAG) are not well-learned due to dataset limitations
  • Best suited for general instruction-following tasks

Citation

@misc{lab21-lora-r16,
  author = {luckyman2907},
  title = {LoRA Adapter (Rank 16) - Llama 3.2 3B Vietnamese},
  year = {2026},
  publisher = {HuggingFace},
  howpublished = {\url{luckyman2907/lab21-llama3.2-3b-r16}}
}

Related Models

License

Apache 2.0 (following base model license)

Acknowledgments

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for luckyman2907/lab21-llama3.2-3b-r16

Dataset used to train luckyman2907/lab21-llama3.2-3b-r16