# TAU-RAG Deployment

## Two container flavors

| Image | Size | Startup | Capability |
|-------|------|---------|------------|
| `tau-rag:slim` | ~200 MB | <1 s | BM25 + Hebrew synonyms + Extractive (no neural) |
| `tau-rag:full` | ~3 GB  | ~5 s | + AlephBERT + FAISS + multi-hop refs |

Pick based on your quality/cost tradeoff (see A/B report: B wins Ω +6.3%, adds ~110ms/query).

## Quickstart — local Docker

**Prereqs**: Docker Desktop running.
- macOS: `open -a "Docker Desktop"` (wait ~10s until the whale icon is steady)
- Linux: `sudo systemctl start docker`

```bash
# Slim (no neural dependencies)
make build-slim
make run-slim
curl http://localhost:8000/health

# Full (AlephBERT)
make build-full
make run-full
curl http://localhost:8001/health
```

### If your source is in iCloud/OneDrive

Docker sometimes rejects build contexts from cloud-synced paths (path has
spaces, file-watchers compete, etc.). Workaround:

```bash
make build-from-tmp   # copies to /tmp and builds there
make run-slim         # (or run-full)
```

## docker-compose

```bash
docker compose up -d tau-rag-slim     # or tau-rag-full
docker compose logs -f
docker compose down
```

Volumes:
- `./runtime` → `/app/runtime` — signals.jsonl, snapshots, dashboards persist between restarts
- `hf-cache` → `/root/.cache/huggingface` — AlephBERT weights cached between runs

## Environment variables

| Var | Default | Purpose |
|-----|---------|---------|
| `TAU_RAG_PRESET` | `no_llm` | `mock` \| `no_llm` \| `hebrew_legal` \| `hebrew_dense` |
| `TAU_RAG_OTEL` | unset | Set to any value to enable OpenTelemetry bridge |
| `HF_HOME` | `~/.cache/huggingface` | Hugging Face model cache |
| `ANTHROPIC_API_KEY` | unset | Needed only if generation.provider = "anthropic" |
| `OPENAI_API_KEY` | unset | Needed only if generation.provider = "openai" |

## API endpoints

Once up:

```bash
# Add documents
curl -X POST http://localhost:8000/v1/documents \
  -H "Content-Type: application/json" \
  -d '{"documents": [{"id":"d1","text":"המעביד חייב לשלם, למעט בשבת.","metadata":{}}]}'

# Query
curl -X POST http://localhost:8000/v1/generate \
  -H "Content-Type: application/json" \
  -d '{"query":"מה חובות המעביד?","k":10,"rerank_k":5,"strategy":"hybrid","lang":"he"}'

# Per-retriever breakdown
curl -X POST http://localhost:8000/v1/search \
  -H "Content-Type: application/json" \
  -d '{"query":"מה חובות המעביד?","k":10,"lang":"he","strategy":"hybrid"}'

# Latest signals snapshot
curl http://localhost:8000/v1/signals/latest
```

## Deploy to Fly.io / Railway / Modal

### Fly.io
```bash
fly launch --dockerfile Dockerfile --build-target slim
fly secrets set TAU_RAG_PRESET=no_llm
fly deploy
```

### Railway
```bash
railway login
railway init
railway up
# In Dashboard: set TAU_RAG_PRESET=no_llm
```

### Modal
```python
# modal_app.py
import modal
image = modal.Image.from_dockerfile("Dockerfile", target="slim")
app = modal.App("tau-rag", image=image)
```

## Monitoring

- `/v1/signals/latest` returns the most recent TAU-Ω signals
- `runtime/signals.jsonl` accumulates all snapshots — feed into ClickHouse/DuckDB
- Set `TAU_RAG_OTEL=1` and point `OTEL_EXPORTER_OTLP_ENDPOINT` at your collector

## Rollback

If the new image regresses:
```bash
docker stop tau-rag-slim
docker run --rm -d --name tau-rag-slim -p 8000:8000 tau-rag:slim-v1.20  # previous tag
```

In TAU Platform: `knowledge_graph/rag_pipeline.py` is a shim — rename
`rag_pipeline.legacy.py` to `rag_pipeline.py` to revert to the original
implementation.