Spaces:

Tushar9802
/

sakhi

Sleeping

App Files Files Community

sakhi / JUDGE_BRIEF.md

Tushar9802

docs: drop scaffold-style "in two sentences" subheads in JUDGE_BRIEF

5061a91 28 days ago

preview code

raw

history blame contribute delete

9.47 kB

Sakhi (सखी) — Judge Brief

One-page version of the README. Full detail in README.md. 3-min demo video: youtu.be/n-u7J1lljUg.

Problem

India's 1 million+ ASHA health workers conduct 50M+ maternal and child home visits every year; every visit ends with a hand-filled paper form carried to the PHC. Danger signs observed in the field — preeclampsia, postpartum hemorrhage, neonatal distress — often don't reach the clinical system in time for intervention.

What Sakhi does

Sakhi converts Hindi home-visit conversations (voice on a shared health-center workstation, text on the ASHA's phone offline) into structured NHM/MCTS forms + a function-calling-powered danger-sign triage that flags referrals with verbatim utterance evidence. Same pipeline, same anti-hallucination validation, two deployment modes: Whisper-Large + Gemma 4 E4B via Ollama on a workstation for accuracy, and Gemma 4 E2B via Cactus SDK on an Android phone for offline resilience.

Numbers a judge can check

Measurement	Value	Source
Text extraction pass rate (base Gemma 4 E4B)	15 / 15	`scripts/test_ollama_quality.py` — per-case rubric; one under-specified trap documented in FAILURES.md
End-to-end audio pipeline pass rate	13 / 15	`scripts/test_pipeline_e2e.py` (2 TTS→ASR artifacts, documented in FAILURES.md)
Hindi number / medical-term normalization	133 / 133	`scripts/test_asr.py`
On-device JS pipeline port (engine-agnostic)	72 / 72	`cd frontend && node --test src/lib/__tests__/`
False-alarm rate on routine visits	0	Strict evidence-grounding + 6-layer validation
Workstation pipeline latency (audio → form)	~15–25 s	RTX 5070 Ti, warm Ollama
On-device pipeline latency (Hindi text → form)	~5 min	OnePlus 11R / Snapdragon 8+ Gen 1, Gemma 4 E2B INT4 on Cactus

The 5-minute on-device figure is reproducible via the Load ANC example button in Field Mode (Field Mode tab → On-device text → form card → "Load ANC example"). On OnePlus 11R / Snapdragon 8+ Gen 1, the on-device pipeline extracts BP 155/100, verbatim Hindi symptoms (सिरदर्द, आँखों के सामने धुंधला दिखना, चेहरे पर सूजन, पैरों में सूजन), Counseling PHC जाने की सलाह, and flags three danger signs — high_bp_with_symptoms, swelling_face, swelling_legs — all with verbatim Hindi utterance_evidence and category: immediate_referral. Total 320.7 s end-to-end (Form 231.8 s + Danger 88.9 s + normalize + detect). For comparison: the paper-form baseline is 15–20 min of hand-filling plus travel to the PHC.

Why this is submitted to four tracks

Track	What Sakhi brings
Health & Sciences	A clinical-decision-support tool with explicit human-in-the-loop design, 6-layer anti-hallucination, strict-evidence danger-sign grounding, demographics entered as a typed header (the way every clinical EMR does it, so identifiers don't depend on ASR), and a workflow matched to how ASHA workers actually operate (health-center mode + field mode with later sync).
Ollama	Native function calling via `tools=` parameter for `extract_form` + `flag_danger_sign` + `issue_referral` in a single inference pass, quantized Gemma 4 E4B Q4_K_M served on LAN to any phone on the same WiFi. One command (`python api.py`) starts the full stack.
Unsloth	One-command LoRA pipeline (`scripts/train_unsloth.py`): data prep → train → GGUF export → Ollama register → A/B eval vs base. Includes a Windows GGUF-export workaround (`scripts/export_merge.py`) for Unsloth's Gemma 4 mmap failure — manual delta-merge + `llama.cpp/convert_hf_to_gguf.py` + `llama-quantize Q4_K_M`, no WSL needed. Fine-tune pass rate 14/15 vs base 15/15 — base is in the live pipeline; fine-tune is published to Ollama as `tusharbrisingr9802/sakhi` (`ollama pull tusharbrisingr9802/sakhi` to verify A/B locally) for deployments preferring English schema-label normalization (`दस्त` → `Diarrhea`) over raw Hindi. Field-coverage diff in `FIELD_COVERAGE_DIFF.md`.
Cactus	On-device integration: custom Capacitor plugin bridging JS ↔ Cactus Kotlin SDK, JS pipeline port that drives either the Cactus engine or the workstation engine through a single `engine.complete()` contract, null-filled instance template prompting pattern that sidesteps E2B INT4's schema-echo failure mode, in-app SAF zip-import so a judge can install the 4.4 GB model without adb or developer tooling (single-pass extract with 1%/heartbeat progress events; auto-evicts stale model dirs on re-import), and a Developer-view toggle that shows raw per-stage model output for verifiable extraction. On-device voice-in via `cactusTranscribe` + Gemma was investigated; the README documents why it's not shipped (Gemma 4 doesn't serve Cactus's ASR path, and off-the-shelf Whisper-Hindi INT4 has 27–70% WER on rural/clinical Hindi per Kumar et al. 2025 and the Vistaar / Gramvaani benchmarks, with deletion-dominant errors on numbers — not in this submission).

Reproduce in under 10 minutes

3-min demo video: youtu.be/n-u7J1lljUg — workstation voice-to-form path, on-device Hindi text-to-form on a phone in airplane mode, four tracks claimed.

Live demo (no install): https://huggingface.co/spaces/Tushar9802/sakhi. Same stack as a local install on a T4. ~5 min cold-boot wait after idle (Space runs on ephemeral disk). For instant evaluation, use the demo video or run locally below.

Pull the Unsloth fine-tune: ollama pull tusharbrisingr9802/sakhi. The LoRA-fine-tuned Gemma 4 E4B is on the Ollama registry. Run python scripts/test_ollama_quality.py against base + fine-tune to reproduce the 15/15 vs 14/15 A/B locally.

Health-center mode (workstation only):

pip install -r requirements-runtime.txt && ollama pull gemma4:e4b-it-q4_K_M
cd frontend && npm install && npm run build && cd ..
python api.py        # browser: http://localhost:8000

Field mode (phone + Cactus):

Sakhi does not redistribute the Cactus-Compute model — it is gated under a custom Cactus license. Reviewers verifying the Cactus track follow the documented path below. Most reviewers can verify the engineering claims via the workstation path above without ever installing on-device; the 3-minute demo video shows the full on-device flow on a real phone.

# Build + install the APK once. After this the model install is in-app, no adb.
cd frontend && npm run build && npx cap sync android && \
  cd android && ./gradlew assembleDebug && \
  adb install -r app/build/outputs/apk/debug/app-debug.apk

# Model install — primary path, no developer tooling needed:
#   1. Accept terms at huggingface.co/Cactus-Compute/gemma-4-E2B-it
#   2. Download gemma-4-e2b-it-int4.zip (~4.4 GB) to the PHONE'S Downloads
#      folder (USB MTP from PC, OTG drive, or direct Drive download to local).
#   3. Open Sakhi → Field Mode → On-Device Probe → Import model (.zip)
#      → pick the zip. Progress bar fills in ~3-5 min.
#   4. Tap Load Model → Test Hindi.
#
# Re-imports auto-evict the previous model — one model on disk at a time.

# Developer alternative (adb-based, no manual file picking):
#   export HF_TOKEN=hf_... && bash scripts/setup_cactus_model.sh

A sample Hindi transcript ready to paste is at data/processed/train.jsonl (line 1 = ANC preeclampsia case) or in the main README.

Privacy & data handling

Audio and transcripts never leave the institution that owns them. Workstation mode keeps everything on the PHC's local network (Whisper + Ollama on local GPU; no OpenAI / Anthropic / Google API). Field mode runs on-device via Cactus SDK — airplane mode does not break it. Patient demographics enter as a typed header rather than being extracted from audio, so identifiers are minimised at the boundary. This posture is compatible with India's Digital Personal Data Protection Act, 2023 — data fiduciary stays within the institution, no cross-border transfer, purpose limitation enforced by architecture rather than by policy.

What's next with $10K and six more months

Partner with an ASHA training institute (Santosh Medical College / IIT Madras Bhashini) to collect 100+ hours of real ASHA home-visit audio under field conditions. Current evaluation covers 4 real-voice recordings (2 speakers — 1 female Bareilly reader + 1 male self-record — across 3 of 4 role-play scripts) plus the 15-case synthetic test suite; full-corpus rural-female accent + field-noise validation is the next step.
Fine-tune an IndicWhisper variant on that real audio for the on-device voice-in path not shipped here.
Harden integration with the official MCTS API so forms post directly into the NHM system instead of being exported as JSON/CSV.
Pilot with 10–20 ASHA workers in one block (Muradnagar / Loni-adjacent) with before/after time-and-accuracy measurement.

Contact

Tushar J — tusharbrisingr9802@gmail.com — GitHub: Tushar-9802/Sakhi