Amaanaliii's picture
add model card
f02467d verified
|
Raw
History Blame Contribute Delete
1.73 kB
metadata
base_model: nvidia/Llama-3.1-Nemotron-Safety-Guard-8B-v3
language:
  - en
  - hi
license: other
tags:
  - content-moderation
  - safety
  - lora
  - peft
  - hindi
  - english
datasets:
  - nvidia/Nemotron-Safety-Guard-Dataset-v3

Nemotron Safety Guard — Hindi + English

QLoRA fine-tune of nvidia/Llama-3.1-Nemotron-Safety-Guard-8B-v3 for Hindi and English content safety classification.

what this is

the original model supports 9 languages. this fine-tune specializes it for hindi (hi) and english (en) only, trained on a balanced sample from the Nemotron-Safety-Guard-Dataset-v3.

training details

base model nvidia/Llama-3.1-Nemotron-Safety-Guard-8B-v3
method QLoRA (4-bit, nf4)
lora rank 8
lora alpha 32
target modules q_proj, v_proj
trainable params 3.4M (LoRA adapters, 4-bit compressed)
languages English, Hindi
training samples 1000 (balanced)
epochs 1
learning rate 2e-4

how to use

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch, json

base_model_id = "nvidia/Llama-3.1-Nemotron-Safety-Guard-8B-v3"
adapter_id = "Amaanaliii/nemotron-safety-guard-hi-en"

tokenizer = AutoTokenizer.from_pretrained(adapter_id)
model = AutoModelForCausalLM.from_pretrained(base_model_id, torch_dtype=torch.float16, device_map="auto")
model = PeftModel.from_pretrained(model, adapter_id)
model.eval()

output format

{"User Safety": "safe" | "unsafe", "Response Safety": "safe" | "unsafe", "Safety Categories": "Violence, ..."}

Response Safety and Safety Categories are omitted when not applicable.