Spaces:
Sleeping
Sakhi (सखी) — Judge Brief
One-page version of the README. Full detail in README.md. 3-min demo video: youtu.be/n-u7J1lljUg.
Problem
India's 1 million+ ASHA health workers conduct 50M+ maternal and child home visits every year; every visit ends with a hand-filled paper form carried to the PHC. Danger signs observed in the field — preeclampsia, postpartum hemorrhage, neonatal distress — often don't reach the clinical system in time for intervention.
What Sakhi does
Sakhi converts Hindi home-visit conversations (voice on a shared health-center workstation, text on the ASHA's phone offline) into structured NHM/MCTS forms + a function-calling-powered danger-sign triage that flags referrals with verbatim utterance evidence. Same pipeline, same anti-hallucination validation, two deployment modes: Whisper-Large + Gemma 4 E4B via Ollama on a workstation for accuracy, and Gemma 4 E2B via Cactus SDK on an Android phone for offline resilience.
Numbers a judge can check
| Measurement | Value | Source |
|---|---|---|
| Text extraction pass rate (base Gemma 4 E4B) | 15 / 15 | scripts/test_ollama_quality.py — per-case rubric; one under-specified trap documented in FAILURES.md |
| End-to-end audio pipeline pass rate | 13 / 15 | scripts/test_pipeline_e2e.py (2 TTS→ASR artifacts, documented in FAILURES.md) |
| Hindi number / medical-term normalization | 133 / 133 | scripts/test_asr.py |
| On-device JS pipeline port (engine-agnostic) | 72 / 72 | cd frontend && node --test src/lib/__tests__/ |
| False-alarm rate on routine visits | 0 | Strict evidence-grounding + 6-layer validation |
| Workstation pipeline latency (audio → form) | ~15–25 s | RTX 5070 Ti, warm Ollama |
| On-device pipeline latency (Hindi text → form) | ~5 min | OnePlus 11R / Snapdragon 8+ Gen 1, Gemma 4 E2B INT4 on Cactus |
The 5-minute on-device figure is reproducible via the Load ANC example button in Field Mode (Field Mode tab → On-device text → form card → "Load ANC example"). On OnePlus 11R / Snapdragon 8+ Gen 1, the on-device pipeline extracts BP 155/100, verbatim Hindi symptoms (सिरदर्द, आँखों के सामने धुंधला दिखना, चेहरे पर सूजन, पैरों में सूजन), Counseling PHC जाने की सलाह, and flags three danger signs — high_bp_with_symptoms, swelling_face, swelling_legs — all with verbatim Hindi utterance_evidence and category: immediate_referral. Total 320.7 s end-to-end (Form 231.8 s + Danger 88.9 s + normalize + detect). For comparison: the paper-form baseline is 15–20 min of hand-filling plus travel to the PHC.
Why this is submitted to four tracks
| Track | What Sakhi brings |
|---|---|
| Health & Sciences | A clinical-decision-support tool with explicit human-in-the-loop design, 6-layer anti-hallucination, strict-evidence danger-sign grounding, demographics entered as a typed header (the way every clinical EMR does it, so identifiers don't depend on ASR), and a workflow matched to how ASHA workers actually operate (health-center mode + field mode with later sync). |
| Ollama | Native function calling via tools= parameter for extract_form + flag_danger_sign + issue_referral in a single inference pass, quantized Gemma 4 E4B Q4_K_M served on LAN to any phone on the same WiFi. One command (python api.py) starts the full stack. |
| Unsloth | One-command LoRA pipeline (scripts/train_unsloth.py): data prep → train → GGUF export → Ollama register → A/B eval vs base. Includes a Windows GGUF-export workaround (scripts/export_merge.py) for Unsloth's Gemma 4 mmap failure — manual delta-merge + llama.cpp/convert_hf_to_gguf.py + llama-quantize Q4_K_M, no WSL needed. Fine-tune pass rate 14/15 vs base 15/15 — base is in the live pipeline; fine-tune is published to Ollama as tusharbrisingr9802/sakhi (ollama pull tusharbrisingr9802/sakhi to verify A/B locally) for deployments preferring English schema-label normalization (दस्त → Diarrhea) over raw Hindi. Field-coverage diff in FIELD_COVERAGE_DIFF.md. |
| Cactus | On-device integration: custom Capacitor plugin bridging JS ↔ Cactus Kotlin SDK, JS pipeline port that drives either the Cactus engine or the workstation engine through a single engine.complete() contract, null-filled instance template prompting pattern that sidesteps E2B INT4's schema-echo failure mode, in-app SAF zip-import so a judge can install the 4.4 GB model without adb or developer tooling (single-pass extract with 1%/heartbeat progress events; auto-evicts stale model dirs on re-import), and a Developer-view toggle that shows raw per-stage model output for verifiable extraction. On-device voice-in via cactusTranscribe + Gemma was investigated; the README documents why it's not shipped (Gemma 4 doesn't serve Cactus's ASR path, and off-the-shelf Whisper-Hindi INT4 has 27–70% WER on rural/clinical Hindi per Kumar et al. 2025 and the Vistaar / Gramvaani benchmarks, with deletion-dominant errors on numbers — not in this submission). |
Reproduce in under 10 minutes
3-min demo video: youtu.be/n-u7J1lljUg — workstation voice-to-form path, on-device Hindi text-to-form on a phone in airplane mode, four tracks claimed.
Live demo (no install): https://huggingface.co/spaces/Tushar9802/sakhi. Same stack as a local install on a T4. ~5 min cold-boot wait after idle (Space runs on ephemeral disk). For instant evaluation, use the demo video or run locally below.
Pull the Unsloth fine-tune: ollama pull tusharbrisingr9802/sakhi. The LoRA-fine-tuned Gemma 4 E4B is on the Ollama registry. Run python scripts/test_ollama_quality.py against base + fine-tune to reproduce the 15/15 vs 14/15 A/B locally.
Health-center mode (workstation only):
pip install -r requirements-runtime.txt && ollama pull gemma4:e4b-it-q4_K_M
cd frontend && npm install && npm run build && cd ..
python api.py # browser: http://localhost:8000
Field mode (phone + Cactus):
Sakhi does not redistribute the Cactus-Compute model — it is gated under a custom Cactus license. Reviewers verifying the Cactus track follow the documented path below. Most reviewers can verify the engineering claims via the workstation path above without ever installing on-device; the 3-minute demo video shows the full on-device flow on a real phone.
# Build + install the APK once. After this the model install is in-app, no adb.
cd frontend && npm run build && npx cap sync android && \
cd android && ./gradlew assembleDebug && \
adb install -r app/build/outputs/apk/debug/app-debug.apk
# Model install — primary path, no developer tooling needed:
# 1. Accept terms at huggingface.co/Cactus-Compute/gemma-4-E2B-it
# 2. Download gemma-4-e2b-it-int4.zip (~4.4 GB) to the PHONE'S Downloads
# folder (USB MTP from PC, OTG drive, or direct Drive download to local).
# 3. Open Sakhi → Field Mode → On-Device Probe → Import model (.zip)
# → pick the zip. Progress bar fills in ~3-5 min.
# 4. Tap Load Model → Test Hindi.
#
# Re-imports auto-evict the previous model — one model on disk at a time.
# Developer alternative (adb-based, no manual file picking):
# export HF_TOKEN=hf_... && bash scripts/setup_cactus_model.sh
A sample Hindi transcript ready to paste is at data/processed/train.jsonl (line 1 = ANC preeclampsia case) or in the main README.
Privacy & data handling
Audio and transcripts never leave the institution that owns them. Workstation mode keeps everything on the PHC's local network (Whisper + Ollama on local GPU; no OpenAI / Anthropic / Google API). Field mode runs on-device via Cactus SDK — airplane mode does not break it. Patient demographics enter as a typed header rather than being extracted from audio, so identifiers are minimised at the boundary. This posture is compatible with India's Digital Personal Data Protection Act, 2023 — data fiduciary stays within the institution, no cross-border transfer, purpose limitation enforced by architecture rather than by policy.
What's next with $10K and six more months
- Partner with an ASHA training institute (Santosh Medical College / IIT Madras Bhashini) to collect 100+ hours of real ASHA home-visit audio under field conditions. Current evaluation covers 4 real-voice recordings (2 speakers — 1 female Bareilly reader + 1 male self-record — across 3 of 4 role-play scripts) plus the 15-case synthetic test suite; full-corpus rural-female accent + field-noise validation is the next step.
- Fine-tune an IndicWhisper variant on that real audio for the on-device voice-in path not shipped here.
- Harden integration with the official MCTS API so forms post directly into the NHM system instead of being exported as JSON/CSV.
- Pilot with 10–20 ASHA workers in one block (Muradnagar / Loni-adjacent) with before/after time-and-accuracy measurement.
Contact
Tushar J — tusharbrisingr9802@gmail.com — GitHub: Tushar-9802/Sakhi