sakhi-medgemma-1.5-4b-maternal-GGUF

A QLoRA fine-tuned, merged, and Q4_K_M-quantized version of google/medgemma-1.5-4b-it, specialized for maternal and neonatal clinical triage in the Indian rural health context. This is the active production model powering Sakhi — an AI clinical companion for ASHA (Accredited Social Health Activist) workers.

The LoRA adapter (docvm/sakhi-medgemma-1.5-4b-maternal) was merged into the base model weights, converted to GGUF via llama.cpp, and quantized to Q4_K_M (~2.5 GB). It runs on CPU via Ollama and exposes an OpenAI-compatible endpoint.


Intended Use

This model is designed to assist ASHA workers — trained community health volunteers in rural India — during antenatal checkups and newborn postnatal visits. It is called by the Sakhi backend to:

  • Stratify maternal and neonatal risk (green / yellow / red)
  • Flag warning signs (hypertension, severe anaemia, cord complications, etc.)
  • Suggest referral decisions aligned with MOHFW/WHO guidelines
  • Answer free-form clinical questions in a field-appropriate tone

This model is a clinical decision support tool. It does not diagnose. All outputs should be reviewed by a trained health worker before any action is taken.


Training

Fine-tuning method

QLoRA (4-bit NF4 quantization of base weights during training) via Unsloth on Kaggle (2×T4).

Parameter Value
LoRA rank 16
LoRA alpha 16
LoRA dropout 0.05
Target modules all-linear
Optimizer paged_adamw_8bit
Learning rate 2e-4
LR schedule cosine
Epochs 1
Batch size 2 (grad accumulation steps = 4, effective batch = 8)
Max sequence length 512
Training time ~4.2 hours
Trainable parameters 38.5M / 4.34B (0.89%)
Final training loss 2.13

Training data

Two public HuggingFace datasets, filtered to maternal/neonatal content via keyword matching:

Dataset HF repo Filtered size
ChatDoctor-HealthCareMagic-100k lavita/ChatDoctor-HealthCareMagic-100k 5,000 examples
WikiDoc Patient Information medalpaca/medical_meadow_wikidoc_patient_information 1,500 examples

Filter keywords: pregnancy, antenatal, postpartum, newborn, neonate, breastfeed, jaundice, preeclampsia, gestational diabetes, anaemia, low birth weight, cord, lactation, miscarriage, ectopic, folic acid, iron.

Total after 95/5 train/eval split: ~5,300 train / ~280 eval.

Key training objectives

  • Indian clinical context: recognition of locally prevalent risk patterns (severe anaemia, eclampsia, low birth weight) common in Rajasthan and similar settings
  • Output reliability: improved JSON schema compliance for structured triage output, reducing post-processing failures in production

Quantization

The LoRA adapter was merged into the base google/medgemma-1.5-4b-it weights (bfloat16), then converted and quantized using llama.cpp:

python convert_hf_to_gguf.py merged_model/ --outtype bf16
./llama-quantize model-bf16.gguf model-Q4_K_M.gguf Q4_K_M

Final size: ~2.5 GB (Q4_K_M). Runs on CPU with ~6 GB RAM.

The merge and quantize pipeline is fully documented in model/merge-and-quantize.ipynb.


How to Use

Ollama (recommended)

ollama pull docvm/sakhi-medgemma-1.5-4b-maternal-GGUF:Q4_K_M
ollama run docvm/sakhi-medgemma-1.5-4b-maternal-GGUF:Q4_K_M

The model exposes an OpenAI-compatible endpoint at http://localhost:11434/v1.

llama.cpp

./llama-cli -m sakhi-medgemma-1.5-4b-maternal-Q4_K_M.gguf \
  --chat-template gemma \
  -p "You are Sakhi, an AI clinical companion for ASHA workers..."

Example prompt (Sakhi triage format)

You are a maternal triage AI.

Classify the case into exactly one of:
Triage: HIGH
Triage: MODERATE
Triage: LOW

Escalate to HIGH if any of the following are present:
- BP ≥160 systolic or ≥110 diastolic
- Seizures, convulsions
- Heavy bleeding
- Signs of sepsis (fever + rigors + abdominal tenderness postpartum)
- Visual disturbance + hypertension
Otherwise classify appropriately.

Output strictly in this format:
Triage: <HIGH/MODERATE/LOW>
Reason: <one short sentence>

Case: 26-year-old, 34 weeks pregnant. BP 162/108. Headache and visual
disturbance since morning. No bleeding.

Limitations

  • Trained on English-language medical Q&A data; Hindi-language performance is untested at the model level (the Sakhi app handles Hindi via prompt instruction).
  • Training data is filtered public datasets, not real patient records. Clinical thresholds were applied in post-processing and evaluation, not through supervised fine-tuning on labeled triage decisions.
  • The model is not a replacement for clinical judgment or specialist review.
  • Not validated in a prospective clinical setting.

Part of the Sakhi Project

Resource Link
Sakhi app (live demo) https://sakhi-asha.vercel.app
Backend API https://docvm-sakhi-api.hf.space/health
LoRA adapter (pre-merge) docvm/sakhi-medgemma-1.5-4b-maternal
GitHub repo https://github.com/orcus108/sakhi
Fine-tuning notebook model/finetuning-medgemma.ipynb
Merge + quantize notebook model/merge-and-quantize.ipynb

Built for the Google MedGemma Impact Challenge · Kaggle · February 2026.

License

Health AI Developer Foundations Terms of Use

Downloads last month
26
GGUF
Model size
4B params
Architecture
gemma3
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for docvm/sakhi-medgemma-1.5-4b-maternal-GGUF

Quantized
(36)
this model

Space using docvm/sakhi-medgemma-1.5-4b-maternal-GGUF 1