# TAU-RAG Deployment ## Two container flavors | Image | Size | Startup | Capability | |-------|------|---------|------------| | `tau-rag:slim` | ~200 MB | <1 s | BM25 + Hebrew synonyms + Extractive (no neural) | | `tau-rag:full` | ~3 GB | ~5 s | + AlephBERT + FAISS + multi-hop refs | Pick based on your quality/cost tradeoff (see A/B report: B wins Ω +6.3%, adds ~110ms/query). ## Quickstart — local Docker **Prereqs**: Docker Desktop running. - macOS: `open -a "Docker Desktop"` (wait ~10s until the whale icon is steady) - Linux: `sudo systemctl start docker` ```bash # Slim (no neural dependencies) make build-slim make run-slim curl http://localhost:8000/health # Full (AlephBERT) make build-full make run-full curl http://localhost:8001/health ``` ### If your source is in iCloud/OneDrive Docker sometimes rejects build contexts from cloud-synced paths (path has spaces, file-watchers compete, etc.). Workaround: ```bash make build-from-tmp # copies to /tmp and builds there make run-slim # (or run-full) ``` ## docker-compose ```bash docker compose up -d tau-rag-slim # or tau-rag-full docker compose logs -f docker compose down ``` Volumes: - `./runtime` → `/app/runtime` — signals.jsonl, snapshots, dashboards persist between restarts - `hf-cache` → `/root/.cache/huggingface` — AlephBERT weights cached between runs ## Environment variables | Var | Default | Purpose | |-----|---------|---------| | `TAU_RAG_PRESET` | `no_llm` | `mock` \| `no_llm` \| `hebrew_legal` \| `hebrew_dense` | | `TAU_RAG_OTEL` | unset | Set to any value to enable OpenTelemetry bridge | | `HF_HOME` | `~/.cache/huggingface` | Hugging Face model cache | | `ANTHROPIC_API_KEY` | unset | Needed only if generation.provider = "anthropic" | | `OPENAI_API_KEY` | unset | Needed only if generation.provider = "openai" | ## API endpoints Once up: ```bash # Add documents curl -X POST http://localhost:8000/v1/documents \ -H "Content-Type: application/json" \ -d '{"documents": [{"id":"d1","text":"המעביד חייב לשלם, למעט בשבת.","metadata":{}}]}' # Query curl -X POST http://localhost:8000/v1/generate \ -H "Content-Type: application/json" \ -d '{"query":"מה חובות המעביד?","k":10,"rerank_k":5,"strategy":"hybrid","lang":"he"}' # Per-retriever breakdown curl -X POST http://localhost:8000/v1/search \ -H "Content-Type: application/json" \ -d '{"query":"מה חובות המעביד?","k":10,"lang":"he","strategy":"hybrid"}' # Latest signals snapshot curl http://localhost:8000/v1/signals/latest ``` ## Deploy to Fly.io / Railway / Modal ### Fly.io ```bash fly launch --dockerfile Dockerfile --build-target slim fly secrets set TAU_RAG_PRESET=no_llm fly deploy ``` ### Railway ```bash railway login railway init railway up # In Dashboard: set TAU_RAG_PRESET=no_llm ``` ### Modal ```python # modal_app.py import modal image = modal.Image.from_dockerfile("Dockerfile", target="slim") app = modal.App("tau-rag", image=image) ``` ## Monitoring - `/v1/signals/latest` returns the most recent TAU-Ω signals - `runtime/signals.jsonl` accumulates all snapshots — feed into ClickHouse/DuckDB - Set `TAU_RAG_OTEL=1` and point `OTEL_EXPORTER_OTLP_ENDPOINT` at your collector ## Rollback If the new image regresses: ```bash docker stop tau-rag-slim docker run --rm -d --name tau-rag-slim -p 8000:8000 tau-rag:slim-v1.20 # previous tag ``` In TAU Platform: `knowledge_graph/rag_pipeline.py` is a shim — rename `rag_pipeline.legacy.py` to `rag_pipeline.py` to revert to the original implementation.