lmsys/toxic-chat
Viewer • Updated • 20.3k • 6.12k • 194
How to use llm-semantic-router/mmbert32k-jailbreak-detector-lora with PEFT:
from peft import PeftModel
from transformers import AutoModelForSequenceClassification
base_model = AutoModelForSequenceClassification.from_pretrained("llm-semantic-router/mmbert-32k-yarn")
model = PeftModel.from_pretrained(base_model, "llm-semantic-router/mmbert32k-jailbreak-detector-lora")LoRA adapter for jailbreak/prompt injection detection based on mmBERT-32K-YaRN.
This model includes heavy oversampling of short jailbreak patterns to improve generalization:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel
base_model = "llm-semantic-router/mmbert-32k-yarn"
lora_path = "llm-semantic-router/mmbert32k-jailbreak-detector-lora"
tokenizer = AutoTokenizer.from_pretrained(lora_path)
base = AutoModelForSequenceClassification.from_pretrained(base_model, num_labels=2)
model = PeftModel.from_pretrained(base, lora_path)
Base model
jhu-clsp/mmBERT-base