Gemma 1.1 2B IT โ€” Merged Model v2 (DrugBank KG-to-Text)

Model Summary

This is the full merged model of Gemma 1.1 2B IT fine-tuned with LoRA to generate fluent, hallucination-free natural language drug descriptions from pharmaceutical RDF knowledge graph triples sourced from DrugBank. It was developed as part of a UELโ€“Depixen industrial placement research project focused on building trustworthy, domain-specific SLMs.

This is the recommended model for inference โ€” no additional adapter loading required.

For the LoRA adapter only, use: ๐Ÿ‘‰ BSVGK/gemma-1.1-2b-it-drugbank-kg2text-lora-v2

Key Results

Metric Score
BLEU Score 0.9737
BERTScore F1 0.9896
Fact F1 0.9966
Hallucination Rate 0.54%
Test Samples 254 unseen DrugBank entries

Model Details

  • Base Model: google/gemma-1.1-2b-it
  • Fine-tuning Method: LoRA (Low-Rank Adaptation)
  • Task: KG-to-Text โ€” RDF triples โ†’ fluent drug descriptions
  • Domain: Pharmaceutical โ€” DrugBank
  • Training Dataset: 2,537 verified DrugBank RDF triples
  • Hardware: NVIDIA A100
  • Framework: PyTorch, Hugging Face PEFT, TRL, SFTTrainer

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

Load merged model directly โ€” no adapter needed

tokenizer = AutoTokenizer.from_pretrained( "BSVGK/gemma-1.1-2b-it-drugbank-kg2text-merged-v2" ) model = AutoModelForCausalLM.from_pretrained( "BSVGK/gemma-1.1-2b-it-drugbank-kg2text-merged-v2" )

prompt = """Generate a natural language description from the following RDF triples:

Triples:

  • DrugA hasIndication Condition_X
  • DrugA hasMechanism Mechanism_Y
  • DrugA hasInteraction DrugB

Description:"""

inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=256) print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Dataset

  • Dataset: BSVGK/drugbank_dataset
  • Size: 2,537 training + 254 test samples
  • Source: DrugBank pharmaceutical database
  • Format: RDF Triples โ†’ Natural Language Drug Description

Comparison with LoRA Adapter

Merged Model v2 LoRA Adapter v2
Inference โœ… Direct โ€” no base model needed โŒ Requires base model + PEFT
Storage Larger (full model) Smaller (adapter only)
Speed Faster to load Slower to load
Recommended for Production inference Research & experimentation

Intended Use

  • Pharmaceutical knowledge graph verbalisation
  • Drug information summarisation and description generation
  • Research in trustworthy and hallucination-free biomedical NLP
  • Natural language generation from biomedical knowledge graphs

Out of Scope

  • Non-pharmaceutical domains
  • Clinical diagnosis or medical advice
  • General purpose text generation

Important Notice

This model is intended for research purposes only. It should not be used for clinical decision-making or medical advice. Always consult a qualified healthcare professional.

Citation

@misc{bubathula2026drugbank_merged, author = {Sai Venkata Gopala Krishna Bubathula}, title = {Gemma 1.1 2B IT Merged Model v2: KG-to-Text Generation for DrugBank Pharmaceutical Data}, year = {2026}, publisher = {HuggingFace}, url = {https://huggingface.co/BSVGK/gemma-1.1-2b-it-drugbank-kg2text-merged-v2}, institution = {University of East London & Depixen} }

Developer

Sai Venkata Gopala Krishna Bubathula

  • ๐ŸŽ“ MSc Big Data Technologies, University of East London
  • ๐Ÿข AI Engineer โ€” UELโ€“Depixen Industrial Placement
  • ๐Ÿ”— GitHub
  • ๐Ÿ”— LoRA Adapter
  • ๐Ÿ”— LinkedIn
Downloads last month
131
Safetensors
Model size
3B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for BSVGK/gemma-1.1-2b-it-drugbank-kg2text-merged-v2

Adapter
(25)
this model