feat(Phase 3): wire 5th dataset repo for general shard
Browse filesTAU_RAG_EXTRA_SHARDS_REPOS now CSV of all 5 dataset repos:
legal-eye-shards-extra → administrative (~161MB)
legal-eye-shards-extra-2 → criminal (~944MB)
legal-eye-shards-extra-3 → constitutional (~453MB)
legal-eye-shards-extra-4 → procedure (~911MB)
legal-eye-shards-extra-5 → general (~641MB) ← NEW
After Space rebuild, all 15 shards live across 1 Space + 5 datasets.
Total corpus: 16,595 (Tier A curated) + 732,540 (Tier B sharded)
= 749,135 docs (100% of original PII-redacted parquet).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Dockerfile +1 -1
Dockerfile
CHANGED
|
@@ -54,7 +54,7 @@ ENV PYTHONPATH=/app \
|
|
| 54 |
TAU_RAG_AUTOLOAD_CORPUS=1 \
|
| 55 |
TAU_RAG_AUTOLOAD_CORPUS_PATH=/app/tau_rag/runtime/uploads/corpus_paid.jsonl \
|
| 56 |
TAU_RAG_TIER=paid \
|
| 57 |
-
TAU_RAG_EXTRA_SHARDS_REPOS=Legal-i/legal-eye-shards-extra,Legal-i/legal-eye-shards-extra-2,Legal-i/legal-eye-shards-extra-3,Legal-i/legal-eye-shards-extra-4 \
|
| 58 |
TAU_RAG_EXTRA_SHARDS_CACHE=/tmp/legal_eye_extra_shards \
|
| 59 |
TAU_RAG_CLUSTER_AUGMENT=1 \
|
| 60 |
TAU_RAG_AUTH_REQUIRED=true \
|
|
|
|
| 54 |
TAU_RAG_AUTOLOAD_CORPUS=1 \
|
| 55 |
TAU_RAG_AUTOLOAD_CORPUS_PATH=/app/tau_rag/runtime/uploads/corpus_paid.jsonl \
|
| 56 |
TAU_RAG_TIER=paid \
|
| 57 |
+
TAU_RAG_EXTRA_SHARDS_REPOS=Legal-i/legal-eye-shards-extra,Legal-i/legal-eye-shards-extra-2,Legal-i/legal-eye-shards-extra-3,Legal-i/legal-eye-shards-extra-4,Legal-i/legal-eye-shards-extra-5 \
|
| 58 |
TAU_RAG_EXTRA_SHARDS_CACHE=/tmp/legal_eye_extra_shards \
|
| 59 |
TAU_RAG_CLUSTER_AUGMENT=1 \
|
| 60 |
TAU_RAG_AUTH_REQUIRED=true \
|