# tau-rag — Production Deployment Guide

This guide covers everything you need to run tau-rag in production:
image build, secrets, deployment (Docker Compose or Kubernetes),
observability, backups, and incident response.

---

## 1. What you get

- **Hardened Docker image** (`Dockerfile.prod`): non-root user (UID 10001),
  gunicorn + uvicorn workers, tini as PID 1, HEALTHCHECK, multi-stage build.
- **Docker Compose stack** (`docker-compose.prod.yml`): app + nginx TLS
  proxy + Prometheus + Grafana.
- **Kubernetes manifests** (`deployment/k8s/`): Deployment (3 replicas, HPA
  3→12), Service, Ingress with cert-manager, PDB (minAvailable=2),
  NetworkPolicy (default deny + explicit allows), startup/liveness/readiness
  probes tuned for AlephBERT warmup.
- **GitHub Actions** (`.github/workflows/`): lint + 3× Python test matrix +
  build & push to GHCR + Trivy security scan; separate CD pipeline for
  tagged releases.
- **Corpus loader** (`scripts/init_corpus.py`): loads LawDBHeb on first
  boot; idempotent via marker file.
- **Prometheus alert rules** (`deployment/prometheus/rules/`): availability,
  latency, retrieval flakiness, calibration drift, numeric hallucination.
- **Grafana dashboard** (`deployment/grafana/dashboards/`): QPS, 5xx rate,
  p50/p95/p99 latency by stage, answer-quality signals.
- **Production preset** (`configs/hebrew_legal_prod.json`): cache on,
  reranker on, MMR diversity, numeric-consistency gate, rate limits,
  all middleware flags enabled.

---

## 2. Prerequisites

- Docker Desktop / OrbStack / Colima with `docker compose v2`
- (optional) `kubectl` + `helm` for K8s deploy
- (optional) cert-manager + ingress-nginx if using the K8s path
- LawDBHeb corpus accessible at a readable path (local disk or mount)
- Anthropic (or OpenAI) API key

---

## 3. First-time setup

```bash
cd tau_rag
cp .env.example .env
# Edit .env — at minimum set:
#   ANTHROPIC_API_KEY
#   TAU_RAG_HMAC_SECRET  (openssl rand -hex 32)
#   LAWDBHEB_HOST_PATH   (absolute path to LawDBHeb on host)
#   TAU_RAG_SEED_ADMIN_KEY (first-boot admin key; rotate after)

# Generate TLS certs (dev). For prod use letsencrypt/cert-manager.
mkdir -p deployment/nginx/certs
openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
  -keyout deployment/nginx/certs/privkey.pem \
  -out deployment/nginx/certs/fullchain.pem \
  -subj "/CN=tau-rag.local"
```

---

## 4. Deploy — Docker Compose

```bash
make prod-up          # build + start app + nginx + prometheus + grafana
make prod-ps          # status
make prod-logs        # tail logs
make prod-down        # stop everything
```

Access:
- API: `https://localhost/v1/health` (self-signed cert in dev)
- Grafana: `http://localhost:3000` (admin / change-me)
- Prometheus: `http://localhost:9090` (scraped via docker network)

First-boot behavior: `scripts/entrypoint.sh` runs `init_corpus.py`
if `$TAU_RAG_RUNTIME_DIR/.corpus_loaded` is absent. Warm-up can take
~60s while AlephBERT loads.

---

## 5. Deploy — Kubernetes

```bash
# 1. Build + push image (or use CI — it does this on tag)
make build-prod
docker tag tau-rag:prod ghcr.io/<org>/tau-rag:v1.0.0
docker push ghcr.io/<org>/tau-rag:v1.0.0

# 2. Create real secret (do NOT commit)
cp deployment/k8s/secret.example.yaml deployment/k8s/secret.yaml
# edit secret.yaml

# 3. Apply manifests
make k8s-apply

# 4. Watch rollout
make k8s-status
make k8s-logs
```

Persistent volumes required:
- `tau-rag-runtime` — 10Gi RWO — stores signals.jsonl, snapshots, dashboards.
- `lawdbheb-corpus` — 20Gi ROX — mount the LawDBHeb pickle/parquet data.

The HPA scales 3→12 replicas on CPU>70% or memory>80%.
PDB keeps at least 2 pods during voluntary disruptions (node drain, etc).
NetworkPolicy denies all by default and allows only: ingress-nginx →:8000,
prometheus scraper →:8000, egress DNS + egress 443 to public (for LLM API).

---

## 6. Secrets management

**Never commit `.env`, `secret.yaml`, or certs.**

Recommended options:
- **Local/compose**: `.env` on host, restrictive file perms.
- **K8s**: External Secrets Operator → AWS Secrets Manager / GCP Secret
  Manager / HashiCorp Vault.
- **GitHub Actions**: repository secrets, referenced as `${{ secrets.X }}`.

Rotation procedure:
1. Generate new key in provider
2. Create a new k8s Secret revision or update .env
3. `kubectl rollout restart deploy/tau-rag -n tau-rag` — zero-downtime.

---

## 7. Observability

**Built-in endpoints:**
- `GET /v1/health` — liveness (process up?).
- `GET /v1/readiness` — readiness (corpus loaded, retrievers ready?).
- `GET /metrics` — Prometheus text exposition (bytes, counters, histograms).
- `GET /v1/version` — version + feature flags.

**Key metrics exposed** (prefix `tau_rag_`):
- `requests_total{endpoint,status}` — request counter
- `request_duration_seconds_bucket{endpoint}` — latency histogram
- `retrieval_duration_seconds_bucket{retriever}` — per-retriever latency
- `generation_duration_seconds_bucket` — LLM call latency
- `answer_grounding_ratio` — v3.30 grounding KPI
- `numeric_support_ratio` — v3.77 numeric-consistency KPI
- `coverage_ratio` — v3.60 term coverage
- `flaky_queries_count` — v3.64 rank-stability count
- `calibration_ece` — v3.55 expected calibration error
- `numeric_unsupported_total` — v3.77 hallucination counter

**Alerts** (`deployment/prometheus/rules/tau-rag-alerts.yml`): availability,
5xx rate, p95/p99 latency, flaky retrieval, ECE drift, numeric hallucinations.
Wire `alertmanagers` in `prometheus.yml` to your on-call channel.

**Dashboards**: the provided Grafana JSON loads as "tau-rag · Overview".
Extend via the admin UI (auto-reloaded from `/etc/grafana/provisioning`).

---

## 8. Backups

`scripts/backup_runtime.sh` tars `TAU_RAG_RUNTIME_DIR` to a timestamped
path. Run via cron (host) or a `CronJob` (k8s):

```bash
0 3 * * *  /app/scripts/backup_runtime.sh /backups/tau-rag/
```

Set `TAU_RAG_BACKUP_RETENTION_DAYS=14` to rotate automatically.

The LawDBHeb corpus itself is mounted read-only and treated as
immutable — rebuild the index via your ingestion pipeline when the
upstream corpus changes.

---

## 9. Upgrading

**Zero-downtime rolling upgrade:**

Docker Compose:
```bash
docker compose -f docker-compose.prod.yml --env-file .env up -d --build tau-rag
```

Kubernetes:
```bash
kubectl -n tau-rag set image deploy/tau-rag \
  tau-rag=ghcr.io/<org>/tau-rag:v1.2.0
kubectl -n tau-rag rollout status deploy/tau-rag
# rollback if needed:
kubectl -n tau-rag rollout undo deploy/tau-rag
```

`rolling-update.maxSurge=1, maxUnavailable=0` keeps old pods until new
pods pass the readiness probe.

---

## 10. Incident response runbook

### Symptom: 5xx error rate > 5%
1. Check `TauRagHighErrorRate` alert in Prometheus for top endpoints.
2. `kubectl -n tau-rag logs -l app.kubernetes.io/name=tau-rag --tail=200`.
3. Look for stack traces or OOMKilled.
4. If LLM provider down: flip `TAU_RAG_PRESET=no_llm` via ConfigMap →
   `kubectl rollout restart` — extractive fallback.
5. If bad deploy: `kubectl rollout undo deploy/tau-rag`.

### Symptom: p95 latency > 2s
1. Check `TauRagP95HighLatency`; look at retrieval vs generation histograms.
2. If retrieval dominant → check `tau_rag_flaky_queries_count`, scale replicas.
3. If generation dominant → check LLM provider status, reduce `max_tokens`,
   or switch `llm_router` to a cheaper model.
4. Scale: `kubectl -n tau-rag scale deploy/tau-rag --replicas=8` (HPA
   also does this automatically on CPU/memory).

### Symptom: Numeric hallucinations spike (v3.77)
1. `TauRagAnswerNumericHallucinations` firing means the LLM is
   inventing section/case numbers.
2. Review recent prompt or preset changes.
3. Tighten `verify.min_support_ratio` in the preset, or force regen via
   the v3.73 coverage-gap `REC_REGENERATE` path.

### Symptom: Calibration drift (v3.55)
1. `TauRagLowConfidenceCalibration` — ECE > 0.2 means reported
   confidence doesn't match reality.
2. Run the admin endpoint `POST /v1/admin/confidence_calibrator/analyze`
   to see per-bucket calibration error.
3. The calibrator continuously re-maps; if persistently off, reduce LLM
   temperature or switch model.

### Symptom: Disk fills up on runtime volume
1. Check signals.jsonl size — should rotate but check log_rotation.
2. `kubectl -n tau-rag exec deploy/tau-rag -- du -sh /app/runtime/*`
3. Increase PVC size or tune `retention` middleware.

---

## 11. Security checklist

- [x] Non-root container (UID 10001)
- [x] `allowPrivilegeEscalation=false`, capabilities dropped
- [x] Pinned requirements.txt (reproducible builds)
- [x] Trivy scans in CI (fail on CRITICAL/HIGH)
- [x] HMAC signing on admin endpoints
- [x] Rate limiting (nginx + in-app TokenBucket)
- [x] NetworkPolicy default-deny in K8s
- [x] TLS on nginx ingress; HSTS header
- [x] No secrets in image; runtime injection only
- [ ] **Rotate seeded admin API key on first login** (you have to do this manually)
- [ ] **Enable WAF** at your LB if exposed to internet
- [ ] **Audit logs** exported to SIEM (use `/v1/audit_export` endpoint)

---

## 12. Gaps / known limitations

- The in-memory middleware state (cache, stats, histories) does not
  persist across pod restarts; rely on `runtime/` volume for snapshots.
- Corpus reload requires a pod restart (no hot-reload path yet).
- Some v3.x middleware modules are admin-only (not wired into the main
  query pipeline). Turn them on via `configs/hebrew_legal_prod.json`
  → `middleware_flags.*: true`.
- Local LawDBHeb pickles (~1.5 GB for bm25+chunks) must fit in the pod
  memory footprint (6 Gi limit default — tune via HPA + deployment).

---

## 13. Files you should know about

```
tau_rag/
├── Dockerfile.prod               # hardened image
├── docker-compose.prod.yml       # full stack
├── requirements.txt              # pinned prod deps
├── requirements-dev.txt          # + pytest/ruff/mypy
├── .env.example                  # all env vars documented
├── .dockerignore                 # keep image slim
├── configs/hebrew_legal_prod.json # production preset
├── scripts/
│   ├── entrypoint.sh             # dispatcher
│   ├── healthcheck.sh            # docker/k8s probe
│   ├── init_corpus.py            # first-boot corpus load
│   └── backup_runtime.sh         # runtime volume tar
├── deployment/
│   ├── nginx/                    # TLS proxy + rate limiting
│   ├── prometheus/               # scrape config + alert rules
│   ├── grafana/                  # dashboards + datasources
│   └── k8s/                      # Deployment, HPA, PDB, Ingress, NP
├── .github/workflows/
│   ├── ci.yml                    # test + build + scan
│   └── cd.yml                    # deploy on tag
└── PRODUCTION.md                 # this doc
```

Quick commands: `make help`