Instructions to use docvm/sakhi-medgemma-1.5-4b-maternal-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use docvm/sakhi-medgemma-1.5-4b-maternal-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="docvm/sakhi-medgemma-1.5-4b-maternal-GGUF", filename="sakhi-medgemma-maternal-Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use docvm/sakhi-medgemma-1.5-4b-maternal-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf docvm/sakhi-medgemma-1.5-4b-maternal-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf docvm/sakhi-medgemma-1.5-4b-maternal-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf docvm/sakhi-medgemma-1.5-4b-maternal-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf docvm/sakhi-medgemma-1.5-4b-maternal-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf docvm/sakhi-medgemma-1.5-4b-maternal-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf docvm/sakhi-medgemma-1.5-4b-maternal-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf docvm/sakhi-medgemma-1.5-4b-maternal-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf docvm/sakhi-medgemma-1.5-4b-maternal-GGUF:Q4_K_M
Use Docker
docker model run hf.co/docvm/sakhi-medgemma-1.5-4b-maternal-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use docvm/sakhi-medgemma-1.5-4b-maternal-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "docvm/sakhi-medgemma-1.5-4b-maternal-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "docvm/sakhi-medgemma-1.5-4b-maternal-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/docvm/sakhi-medgemma-1.5-4b-maternal-GGUF:Q4_K_M
- Ollama
How to use docvm/sakhi-medgemma-1.5-4b-maternal-GGUF with Ollama:
ollama run hf.co/docvm/sakhi-medgemma-1.5-4b-maternal-GGUF:Q4_K_M
- Unsloth Studio
How to use docvm/sakhi-medgemma-1.5-4b-maternal-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for docvm/sakhi-medgemma-1.5-4b-maternal-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for docvm/sakhi-medgemma-1.5-4b-maternal-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for docvm/sakhi-medgemma-1.5-4b-maternal-GGUF to start chatting
- Atomic Chat new
- Docker Model Runner
How to use docvm/sakhi-medgemma-1.5-4b-maternal-GGUF with Docker Model Runner:
docker model run hf.co/docvm/sakhi-medgemma-1.5-4b-maternal-GGUF:Q4_K_M
- Lemonade
How to use docvm/sakhi-medgemma-1.5-4b-maternal-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull docvm/sakhi-medgemma-1.5-4b-maternal-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.sakhi-medgemma-1.5-4b-maternal-GGUF-Q4_K_M
List all available models
lemonade list
sakhi-medgemma-1.5-4b-maternal-GGUF
A QLoRA fine-tuned, merged, and Q4_K_M-quantized version of
google/medgemma-1.5-4b-it,
specialized for maternal and neonatal clinical triage in the Indian rural health
context. This is the active production model powering
Sakhi — an AI clinical companion for ASHA
(Accredited Social Health Activist) workers.
The LoRA adapter (docvm/sakhi-medgemma-1.5-4b-maternal) was merged into the
base model weights, converted to GGUF via llama.cpp, and quantized to Q4_K_M
(~2.5 GB). It runs on CPU via Ollama and exposes an OpenAI-compatible endpoint.
Intended Use
This model is designed to assist ASHA workers — trained community health volunteers in rural India — during antenatal checkups and newborn postnatal visits. It is called by the Sakhi backend to:
- Stratify maternal and neonatal risk (green / yellow / red)
- Flag warning signs (hypertension, severe anaemia, cord complications, etc.)
- Suggest referral decisions aligned with MOHFW/WHO guidelines
- Answer free-form clinical questions in a field-appropriate tone
This model is a clinical decision support tool. It does not diagnose. All outputs should be reviewed by a trained health worker before any action is taken.
Training
Fine-tuning method
QLoRA (4-bit NF4 quantization of base weights during training) via Unsloth on Kaggle (2×T4).
| Parameter | Value |
|---|---|
| LoRA rank | 16 |
| LoRA alpha | 16 |
| LoRA dropout | 0.05 |
| Target modules | all-linear |
| Optimizer | paged_adamw_8bit |
| Learning rate | 2e-4 |
| LR schedule | cosine |
| Epochs | 1 |
| Batch size | 2 (grad accumulation steps = 4, effective batch = 8) |
| Max sequence length | 512 |
| Training time | ~4.2 hours |
| Trainable parameters | 38.5M / 4.34B (0.89%) |
| Final training loss | 2.13 |
Training data
Two public HuggingFace datasets, filtered to maternal/neonatal content via keyword matching:
| Dataset | HF repo | Filtered size |
|---|---|---|
| ChatDoctor-HealthCareMagic-100k | lavita/ChatDoctor-HealthCareMagic-100k |
5,000 examples |
| WikiDoc Patient Information | medalpaca/medical_meadow_wikidoc_patient_information |
1,500 examples |
Filter keywords: pregnancy, antenatal, postpartum, newborn, neonate, breastfeed, jaundice, preeclampsia, gestational diabetes, anaemia, low birth weight, cord, lactation, miscarriage, ectopic, folic acid, iron.
Total after 95/5 train/eval split: ~5,300 train / ~280 eval.
Key training objectives
- Indian clinical context: recognition of locally prevalent risk patterns (severe anaemia, eclampsia, low birth weight) common in Rajasthan and similar settings
- Output reliability: improved JSON schema compliance for structured triage output, reducing post-processing failures in production
Quantization
The LoRA adapter was merged into the base google/medgemma-1.5-4b-it weights
(bfloat16), then converted and quantized using llama.cpp:
python convert_hf_to_gguf.py merged_model/ --outtype bf16
./llama-quantize model-bf16.gguf model-Q4_K_M.gguf Q4_K_M
Final size: ~2.5 GB (Q4_K_M). Runs on CPU with ~6 GB RAM.
The merge and quantize pipeline is fully documented in
model/merge-and-quantize.ipynb.
How to Use
Ollama (recommended)
ollama pull docvm/sakhi-medgemma-1.5-4b-maternal-GGUF:Q4_K_M
ollama run docvm/sakhi-medgemma-1.5-4b-maternal-GGUF:Q4_K_M
The model exposes an OpenAI-compatible endpoint at http://localhost:11434/v1.
llama.cpp
./llama-cli -m sakhi-medgemma-1.5-4b-maternal-Q4_K_M.gguf \
--chat-template gemma \
-p "You are Sakhi, an AI clinical companion for ASHA workers..."
Example prompt (Sakhi triage format)
You are a maternal triage AI.
Classify the case into exactly one of:
Triage: HIGH
Triage: MODERATE
Triage: LOW
Escalate to HIGH if any of the following are present:
- BP ≥160 systolic or ≥110 diastolic
- Seizures, convulsions
- Heavy bleeding
- Signs of sepsis (fever + rigors + abdominal tenderness postpartum)
- Visual disturbance + hypertension
Otherwise classify appropriately.
Output strictly in this format:
Triage: <HIGH/MODERATE/LOW>
Reason: <one short sentence>
Case: 26-year-old, 34 weeks pregnant. BP 162/108. Headache and visual
disturbance since morning. No bleeding.
Limitations
- Trained on English-language medical Q&A data; Hindi-language performance is untested at the model level (the Sakhi app handles Hindi via prompt instruction).
- Training data is filtered public datasets, not real patient records. Clinical thresholds were applied in post-processing and evaluation, not through supervised fine-tuning on labeled triage decisions.
- The model is not a replacement for clinical judgment or specialist review.
- Not validated in a prospective clinical setting.
Part of the Sakhi Project
| Resource | Link |
|---|---|
| Sakhi app (live demo) | https://sakhi-asha.vercel.app |
| Backend API | https://docvm-sakhi-api.hf.space/health |
| LoRA adapter (pre-merge) | docvm/sakhi-medgemma-1.5-4b-maternal |
| GitHub repo | https://github.com/orcus108/sakhi |
| Fine-tuning notebook | model/finetuning-medgemma.ipynb |
| Merge + quantize notebook | model/merge-and-quantize.ipynb |
Built for the Google MedGemma Impact Challenge · Kaggle · February 2026.
License
- Downloads last month
- 26
4-bit
Model tree for docvm/sakhi-medgemma-1.5-4b-maternal-GGUF
Base model
google/medgemma-1.5-4b-it