Gemma 1.1 2B IT — Merged Model v2 (DrugBank KG-to-Text)

Model Summary

This is the full merged model of Gemma 1.1 2B IT fine-tuned with LoRA to generate fluent, hallucination-free natural language drug descriptions from pharmaceutical RDF knowledge graph triples sourced from DrugBank. It was developed as part of a UEL–Depixen industrial placement research project focused on building trustworthy, domain-specific SLMs.

This is the recommended model for inference — no additional adapter loading required.

For the LoRA adapter only, use: 👉 BSVGK/gemma-1.1-2b-it-drugbank-kg2text-lora-v2

Key Results

Metric	Score
BLEU Score	0.9737
BERTScore F1	0.9896
Fact F1	0.9966
Hallucination Rate	0.54%
Test Samples	254 unseen DrugBank entries

Model Details

Base Model: google/gemma-1.1-2b-it
Fine-tuning Method: LoRA (Low-Rank Adaptation)
Task: KG-to-Text — RDF triples → fluent drug descriptions
Domain: Pharmaceutical — DrugBank
Training Dataset: 2,537 verified DrugBank RDF triples
Hardware: NVIDIA A100
Framework: PyTorch, Hugging Face PEFT, TRL, SFTTrainer

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

Load merged model directly — no adapter needed

tokenizer = AutoTokenizer.from_pretrained( "BSVGK/gemma-1.1-2b-it-drugbank-kg2text-merged-v2" ) model = AutoModelForCausalLM.from_pretrained( "BSVGK/gemma-1.1-2b-it-drugbank-kg2text-merged-v2" )

prompt = """Generate a natural language description from the following RDF triples:

Triples:

DrugA hasIndication Condition_X
DrugA hasMechanism Mechanism_Y
DrugA hasInteraction DrugB

Description:"""

inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=256) print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Dataset

Dataset: BSVGK/drugbank_dataset
Size: 2,537 training + 254 test samples
Source: DrugBank pharmaceutical database
Format: RDF Triples → Natural Language Drug Description

Comparison with LoRA Adapter

	Merged Model v2	LoRA Adapter v2
Inference	✅ Direct — no base model needed	❌ Requires base model + PEFT
Storage	Larger (full model)	Smaller (adapter only)
Speed	Faster to load	Slower to load
Recommended for	Production inference	Research & experimentation

Intended Use

Pharmaceutical knowledge graph verbalisation
Drug information summarisation and description generation
Research in trustworthy and hallucination-free biomedical NLP
Natural language generation from biomedical knowledge graphs

Out of Scope

Non-pharmaceutical domains
Clinical diagnosis or medical advice
General purpose text generation

Important Notice

This model is intended for research purposes only. It should not be used for clinical decision-making or medical advice. Always consult a qualified healthcare professional.

Citation

@misc{bubathula2026drugbank_merged, author = {Sai Venkata Gopala Krishna Bubathula}, title = {Gemma 1.1 2B IT Merged Model v2: KG-to-Text Generation for DrugBank Pharmaceutical Data}, year = {2026}, publisher = {HuggingFace}, url = {https://huggingface.co/BSVGK/gemma-1.1-2b-it-drugbank-kg2text-merged-v2}, institution = {University of East London & Depixen} }

Developer

Sai Venkata Gopala Krishna Bubathula

🎓 MSc Big Data Technologies, University of East London
🏢 AI Engineer — UEL–Depixen Industrial Placement
🔗 GitHub
🔗 LoRA Adapter
🔗 LinkedIn

Downloads last month: 131

Safetensors

Model size

3B params

Tensor type

BF16

Model tree for BSVGK/gemma-1.1-2b-it-drugbank-kg2text-merged-v2

Base model

google/gemma-1.1-2b-it

Adapter

(25)

this model