---
license: apache-2.0
base_model: unsloth/Qwen2.5-1.5B-Instruct
tags:
  - medical
  - pharmacy
  - ner
  - drug-extraction
  - medication
  - clinical-nlp
  - qlora
  - unsloth
  - pharmacovigilance
language:
  - en
library_name: peft
pipeline_tag: text-generation
datasets:
  - AmareshHebbar/pharmacy-ner-sft
co2_eq_emissions:
  emissions: 0
  source: "estimate, not measured with a carbon-tracking tool"
  training_type: "fine-tuning"
  geographical_location: "EU-West"
  hardware_used: "NVIDIA A40 (48GB)"
model-index:
  - name: pharmacy-ner-qwen25-1b
    results: []
---

<div align="center">

# 💊 Pharmacy NER — Drug Entity Extraction
### Qwen2.5-1.5B fine-tuned for pharmacy ner — drug entity extraction

[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Model-pharmacy--ner--qwen25--1b-FFD21E)](https://huggingface.co/AmareshHebbar/pharmacy-ner-qwen25-1b)
[![Dataset](https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-pharmacy--ner--sft-blue)](https://huggingface.co/datasets/AmareshHebbar/pharmacy-ner-sft)
[![License](https://img.shields.io/badge/license-Apache%202.0-green)](https://www.apache.org/licenses/LICENSE-2.0)
[![Base Model](https://img.shields.io/badge/base-Qwen2.5--1.5B-orange)](https://huggingface.co/unsloth/Qwen2.5-1.5B-Instruct)
[![Unsloth](https://img.shields.io/badge/-Unsloth-purple)](https://github.com/unslothai/unsloth)
[![W&B](https://img.shields.io/badge/W%26B-tracked-yellow?logo=weightsandbiases)](https://wandb.ai/amareshhebbar-/axiomapper/runs/5lwyt4sx)

*Part of the [Medical AI Fine-tuned Model Suite](https://huggingface.co/AmareshHebbar/medical-ai-model-suite) — 16 specialist models, one per task*

</div>

---

## TL;DR

Extracts structured medication entities — drug name, dosage, frequency, route, indication — as JSON.

```
INPUT:  Administer Vancomycin 1.5g IV every 12 hours for MRSA bacteraemia.
OUTPUT: {"drug": "Vancomycin", "dosage": "1.5g", "frequency": "every 12 hours", "route": "IV", "indication": "MRSA bacteraemia"}
```

| | |
|---|---|
| **Base model** | [unsloth/Qwen2.5-1.5B-Instruct](https://huggingface.co/unsloth/Qwen2.5-1.5B-Instruct) |
| **Method** | QLoRA, 4-bit NF4, rank 16 |
| **Training data** | [pharmacy-ner-sft](https://huggingface.co/datasets/AmareshHebbar/pharmacy-ner-sft) — 3,500 real-world rows |
| **Training compute** | NVIDIA A40 (48GB), ~0.5h |
| **License** | Apache 2.0 |

---

## Architecture

```
                  +-------------------------+
  user prompt --> |  Qwen2.5-1.5B-Instruct  | --> base weights (frozen, 4-bit NF4)
                  |  + LoRA adapter (r=16)  | --> pharmacy-ner-qwen25-1b
                  +-------------------------+
                              |
                              v
                     structured output
                  (code / JSON / classification)
```

This repo contains **only the LoRA adapter** (~20MB), not the full merged weights. Load it on top of the base model as shown below — this keeps the download small and lets you swap adapters on one base model in memory.

---

## Intended use

Power medication reconciliation systems, pharmacovigilance pipelines.

### Direct use
Paste a sentence mentioning a medication, get structured JSON entities back.

### Downstream use
Feed extracted entities into a medication reconciliation tool or adverse-event reporting pipeline.

### Out of scope
Drug interaction checking or dosage safety validation — this model extracts entities, it does not assess clinical appropriateness.

> **This model is not a substitute for a certified medical professional's judgment.** Output should be reviewed by a qualified person before being used in a clinical or billing decision.

---

## Quickstart

### Option A — Transformers + PEFT

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base_model = "unsloth/Qwen2.5-1.5B-Instruct"
adapter    = "AmareshHebbar/pharmacy-ner-qwen25-1b"

tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)

messages = [
    {"role": "system", "content": "You are a pharmacy NLP system. Extract drug name, dosage, frequency, route of administration, and indication from the text."},
    {"role": "user", "content": "Administer Vancomycin 1.5g IV every 12 hours for MRSA bacteraemia."},
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
outputs = model.generate(inputs, max_new_tokens=128, temperature=0.1, do_sample=True)
print(tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True))
```

**Expected output:**
```
{"drug": "Vancomycin", "dosage": "1.5g", "frequency": "every 12 hours", "route": "IV", "indication": "MRSA bacteraemia"}
```

### Option B — Unsloth (2x faster load + inference)

```python
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="AmareshHebbar/pharmacy-ner-qwen25-1b",
    max_seq_length=512,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

messages = [
    {"role": "system", "content": "You are a pharmacy NLP system. Extract drug name, dosage, frequency, route of administration, and indication from the text."},
    {"role": "user", "content": "Patient is on Warfarin 5mg orally once daily for atrial fibrillation."},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=128, temperature=0.1, do_sample=True)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
```

### Option C — vLLM (production serving, OpenAI-compatible)

```bash
vllm serve unsloth/Qwen2.5-1.5B-Instruct \
    --enable-lora \
    --lora-modules pharmacy-ner-qwen25-1b=AmareshHebbar/pharmacy-ner-qwen25-1b \
    --host 0.0.0.0 --port 8000 --dtype bfloat16
```

```python
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")
response = client.chat.completions.create(
    model="pharmacy-ner-qwen25-1b",
    messages=[
        {"role": "system", "content": "You are a pharmacy NLP system. Extract drug name, dosage, frequency, route of administration, and indication from the text."},
        {"role": "user", "content": "Morphine sulphate 10mg SC PRN every 4 hours for severe cancer pain."},
    ],
    temperature=0.1,
)
print(response.choices[0].message.content)
```

### Option D — GGUF / llama.cpp (CPU / edge inference)

This repo ships LoRA adapter weights, not a pre-merged GGUF. To run on llama.cpp, merge first:

```bash
pip install unsloth
python -c "
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained('AmareshHebbar/pharmacy-ner-qwen25-1b', load_in_4bit=False)
model.save_pretrained_gguf('pharmacy-ner-qwen25-1b-gguf', tokenizer, quantization_method='q4_k_m')
"
```

---

## Training details

### Data

Trained on 3,500 examples extracted from **bigbio/drugprot — biomedical abstracts with drug-protein interaction annotations** ([source](https://huggingface.co/datasets/bigbio/drugprot)). No synthetic or LLM-generated training data — every example pairs real-world input with its authoritative output.

| Split | Rows |
|---|---|
| Train | 2,800 |
| Validation | 350 |
| Test | 350 |

Full extraction pipeline documented on the [dataset card](https://huggingface.co/datasets/AmareshHebbar/pharmacy-ner-sft).

### Hyperparameters

| Parameter | Value |
|---|---|
| LoRA rank (r) | 16 |
| LoRA alpha | 32 |
| LoRA dropout | 0 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Quantization | 4-bit NF4 (QLoRA) |
| Max sequence length | 512 |
| Optimizer | paged_adamw_8bit |
| LR schedule | 2e-4, cosine |
| Gradient checkpointing | Unsloth (smart offload) |

### Training compute

| | |
|---|---|
| **GPU** | NVIDIA A40 (48GB) |
| **Cloud provider** | RunPod |
| **Training time** | ~0.5h (incl. eval + hub push) |
| **Tracking** | [W&B run](https://wandb.ai/amareshhebbar-/axiomapper/runs/5lwyt4sx) |
| **CO2 estimate** | self-reported, not measured with a carbon tracker — treat as approximate |

Fine-tuned with [Unsloth](https://github.com/unslothai/unsloth) for 2x faster training and reduced VRAM, using TRL's `SFTTrainer`. Full project: [wandb.ai/amareshhebbar-/axiomapper](https://wandb.ai/amareshhebbar-/axiomapper).

---

## Bias, risks & limitations

**Data recency.** Training data reflects a specific snapshot in time (CMS FY2026 / dataset publish date). Codes, rates, and rules referenced may become outdated as source authorities issue updates — always cross-check against the live authoritative source before high-stakes use.

**Failure mode.** Like any LLM, this model can produce a plausible-sounding but incorrect output, especially on rare, ambiguous, or highly compound real-world cases that fall outside the training distribution. It does not know when it's wrong.

**Language.** English-language input only (Hindi-medical model excepted, where Hindi system prompts are used but underlying clinical reasoning data is largely English-sourced).

**Not a regulated medical device.** This model has not been validated, cleared, or approved by any regulatory body (FDA, CDSCO, or equivalent) as a medical device or clinical decision support tool. It is a research/engineering artifact.

**Misapplication risk.** Do not use this model as the sole basis for a clinical, billing, or compliance decision affecting a real patient or claim. Do not deploy in an emergency triage context without a human-in-the-loop and clear escalation paths.

---

## FAQ

**Q: Can I merge the adapter into the base model for faster inference?**
Yes — use `model.merge_and_unload()` after loading with PEFT, or use Unsloth's `save_pretrained_merged()` method.

**Q: Why QLoRA instead of full fine-tuning?**
The base model already has strong language and medical knowledge from pretraining. QLoRA adapts only ~0.5-1% of parameters, which is enough to specialize the output format and domain without the cost or overfitting risk of full fine-tuning.

**Q: Can I fine-tune this further on my own data?**
Yes, this adapter can be used as a starting checkpoint for continued fine-tuning. Note this may require merging first depending on your training framework.

**Q: Why is the output format so strict?**
Each task was trained on a fixed system prompt and consistent output structure. Following the documented system prompt closely (see Quickstart above) gives the most reliable results — deviating from it may produce inconsistent formatting.

**Q: Does this model store or transmit my input data?**
No. Like any open-weight model, all inference happens locally on your own infrastructure (or wherever you deploy it) — nothing is sent back to the model author.

---

## Troubleshooting

| Symptom | Likely cause | Fix |
|---|---|---|
| `ValueError: padding_token not set` | Base tokenizer has no pad token | Set `tokenizer.pad_token = tokenizer.eos_token` before inference |
| Garbled / repeated output | Wrong chat template applied | Make sure you use `tokenizer.apply_chat_template`, not a raw string prompt |
| CUDA OOM on load | Insufficient VRAM | Use `load_in_4bit=True` (already default above) or reduce `max_seq_length` |
| Adapter loads but ignores fine-tuning | Base model mismatch | Confirm you loaded the **exact** base listed above — adapters are not portable across different base models or quantizations |

---

## Related models in this suite

| Model | Task | Size |
|---|---|---|
| [icd10-coder-qwen25-7b](https://huggingface.co/AmareshHebbar/icd10-coder-qwen25-7b) | ICD-10-CM medical coding | 7B |
| [snomed-mapper-qwen25-7b](https://huggingface.co/AmareshHebbar/snomed-mapper-qwen25-7b) | Clinical concept mapping | 7B |
| [icd10-to-drg-qwen25-1b](https://huggingface.co/AmareshHebbar/icd10-to-drg-qwen25-1b) | ICD-10 to DRG reimbursement | 1.5B |
| [pmjay-classifier-qwen25-3b](https://huggingface.co/AmareshHebbar/pmjay-classifier-qwen25-3b) | India PM-JAY classification | 3B |

**Full suite overview:** [AmareshHebbar/medical-ai-model-suite](https://huggingface.co/AmareshHebbar/medical-ai-model-suite)

---

## Changelog

| Version | Date | Notes |
|---|---|---|
| v1.0 | 2026 | Initial release — QLoRA fine-tune on 3,500 real-world rows |

---

## Citation

```bibtex
@misc{medicalai2026,
  author    = {Hebbar, Amaresh},
  title     = {Medical AI Fine-tuning Suite},
  year      = {2026},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/AmareshHebbar}
}
```

## Contact

[![GitHub](https://img.shields.io/badge/GitHub-amareshhebbar-181717?logo=github)](https://github.com/amareshhebbar)
[![LinkedIn](https://img.shields.io/badge/LinkedIn-gvamaresh-0A66C2?logo=linkedin)](https://www.linkedin.com/in/gvamaresh)
[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Profile-AmareshHebbar-FFD21E)](https://huggingface.co/AmareshHebbar)