Spaces:
Running on Zero
Running on Zero
Commit Β·
d1d46b8
1
Parent(s): a0b2364
hf basic plan
Browse files
.cursor/plans/hf_space_deploy_review_a7f8b3c3.plan.md
ADDED
|
@@ -0,0 +1,127 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
name: HF Space deploy review
|
| 3 |
+
overview: "The existing uv monorepo plan is the right foundation for a Build Small Hackathon Space. It already deploys a simple Gradio chat app β Docker SDK is the correct choice for monorepo + llama-cpp, not a separate \"simpler\" path. With your balanced priority, execute the plan in two phases: minimal live Space first, polish and badges second."
|
| 4 |
+
todos:
|
| 5 |
+
- id: fix-readme-yaml
|
| 6 |
+
content: "Put HF Space YAML frontmatter (sdk: docker, app_port: 7860) in root README.md, not only apps/gradio-space/README.md"
|
| 7 |
+
status: pending
|
| 8 |
+
- id: phase1-bootstrap
|
| 9 |
+
content: "Phase 1: uv workspace + inference lib (llama_cpp only) + minimal gr.ChatInterface app"
|
| 10 |
+
status: pending
|
| 11 |
+
- id: phase1-docker
|
| 12 |
+
content: "Phase 1: root Dockerfile (uv sync, UID 1000, port 7860) and create Space under build-small-hackathon"
|
| 13 |
+
status: pending
|
| 14 |
+
- id: phase1-verify
|
| 15 |
+
content: "Phase 1: local uv sync + Gradio smoke test + confirm Space builds on CPU basic"
|
| 16 |
+
status: pending
|
| 17 |
+
- id: phase2-polish
|
| 18 |
+
content: "Phase 2: loading UX, optional GPU, track-specific logic, badges, demo video + social post by June 15"
|
| 19 |
+
status: pending
|
| 20 |
+
isProject: false
|
| 21 |
+
---
|
| 22 |
+
|
| 23 |
+
# HF Space Deploy Review
|
| 24 |
+
|
| 25 |
+
## Verdict: the plan is already good β and it *is* a simple Gradio deploy
|
| 26 |
+
|
| 27 |
+
Your plan in [`.cursor/plans/uv_monorepo_init_d9207227.plan.md`](.cursor/plans/uv_monorepo_init_d9207227.plan.md) does **not** over-engineer relative to hackathon requirements. The hackathon only requires:
|
| 28 |
+
|
| 29 |
+
- A **Gradio app** hosted as a **Hugging Face Space** under [build-small-hackathon](https://huggingface.co/build-small-hackathon)
|
| 30 |
+
- Model **β€ 32B**
|
| 31 |
+
- Submission by **June 15, 2026**: Space link + demo video + social post
|
| 32 |
+
|
| 33 |
+
The plan delivers exactly that: `gr.ChatInterface` in [`apps/gradio-space/src/gradio_space/app.py`](apps/gradio-space/src/gradio_space/app.py), exposed on port **7860**. The uv workspace and `libs/inference` are scaffolding so the UI stays thin and backends stay swappable β not extra product scope.
|
| 34 |
+
|
| 35 |
+
```mermaid
|
| 36 |
+
flowchart TB
|
| 37 |
+
subgraph hfSpace [HF Space - Docker SDK]
|
| 38 |
+
Dockerfile[Root Dockerfile]
|
| 39 |
+
GradioApp[gradio_space.app]
|
| 40 |
+
end
|
| 41 |
+
subgraph monorepo [Monorepo]
|
| 42 |
+
InferenceLib[inference factory]
|
| 43 |
+
LlamaCpp[llama_cpp GGUF]
|
| 44 |
+
Transformers[transformers optional]
|
| 45 |
+
end
|
| 46 |
+
Dockerfile --> GradioApp
|
| 47 |
+
GradioApp --> InferenceLib
|
| 48 |
+
InferenceLib --> LlamaCpp
|
| 49 |
+
InferenceLib -.-> Transformers
|
| 50 |
+
```
|
| 51 |
+
|
| 52 |
+
## Gradio SDK vs Docker SDK β why not "just Gradio SDK"?
|
| 53 |
+
|
| 54 |
+
| Approach | Good for | Problem for this repo |
|
| 55 |
+
|----------|----------|------------------------|
|
| 56 |
+
| **Gradio SDK** (`app.py` + `requirements.txt` at repo root) | Single-file demos, fastest hello-world | No clean way to use `libs/inference` workspace package; `llama-cpp-python` needs compile/CMAKE control that Gradio SDK handles poorly |
|
| 57 |
+
| **Docker SDK** (root `Dockerfile`, still runs Gradio) | Monorepos, native deps, env control | ~30 extra lines of Dockerfile β already in your plan |
|
| 58 |
+
|
| 59 |
+
**Recommendation:** Keep **Docker SDK + Gradio UI**. You get the same user-facing Gradio app; Docker is just the shipping container. This is the standard monorepo pattern HF documents for custom runtimes.
|
| 60 |
+
|
| 61 |
+
A "simpler" alternative (Gradio SDK + `transformers` only, no llama-cpp) would boot faster to write but:
|
| 62 |
+
|
| 63 |
+
- Drops **Off-the-Grid** / **Llama Champion** badge paths
|
| 64 |
+
- Heavier default install (`torch`) on CPU Spaces
|
| 65 |
+
- Forces you to flatten or duplicate inference code later
|
| 66 |
+
|
| 67 |
+
Given balanced priority, Docker + llama-cpp default is the better baseline.
|
| 68 |
+
|
| 69 |
+
## One fix to the existing plan
|
| 70 |
+
|
| 71 |
+
**Space card YAML must live in the repo-root [`README.md`](README.md), not only in `apps/gradio-space/README.md`.**
|
| 72 |
+
|
| 73 |
+
HF reads the YAML frontmatter from the **repository root** `README.md` when the Space is linked to this repo. Merge:
|
| 74 |
+
|
| 75 |
+
```yaml
|
| 76 |
+
---
|
| 77 |
+
title: <Your App Name>
|
| 78 |
+
emoji: ...
|
| 79 |
+
sdk: docker
|
| 80 |
+
app_port: 7860
|
| 81 |
+
---
|
| 82 |
+
```
|
| 83 |
+
|
| 84 |
+
into root `README.md`, then add dev/hackathon docs below the closing `---`. Keep `apps/gradio-space/README.md` as an optional short package note, or drop it to avoid duplication.
|
| 85 |
+
|
| 86 |
+
## Balanced execution (your choice): two phases
|
| 87 |
+
|
| 88 |
+
### Phase 1 β Minimal live Space (ship this first)
|
| 89 |
+
|
| 90 |
+
Goal: working Space under `build-small-hackathon` with a chat demo, even if rough.
|
| 91 |
+
|
| 92 |
+
1. **Bootstrap uv workspace** (plan sections 1β3): root + `apps/gradio-space` + `libs/inference`
|
| 93 |
+
2. **Minimal inference**: `llama_cpp` backend only; `get_backend().load()` lazy singleton; `chat()` wired to `gr.ChatInterface`
|
| 94 |
+
3. **Default model**: `Qwen/Qwen2.5-3B-Instruct-GGUF` + `qwen2.5-3b-instruct-q4_k_m.gguf` (small, under 32B, CPU-viable)
|
| 95 |
+
4. **Root Dockerfile** + fixed root README YAML (`sdk: docker`, `app_port: 7860`)
|
| 96 |
+
5. **Create Space**: Docker SDK, CPU basic hardware, env vars `MODEL_REPO`, `MODEL_FILE`, `N_CTX=4096`, `N_GPU_LAYERS=0`
|
| 97 |
+
6. **Smoke test**: `uv sync`, local Gradio on `:7860`, push, confirm Space builds
|
| 98 |
+
|
| 99 |
+
Defer: `transformers` extra, custom UI (`gr.Server`), `scripts/download_model.py`, ruff/pytest, badge-specific polish.
|
| 100 |
+
|
| 101 |
+
### Phase 2 β Polish before June 15
|
| 102 |
+
|
| 103 |
+
- Track-specific product logic (Backyard AI vs Thousand Token Wood)
|
| 104 |
+
- Better loading UX (progress/status while GGUF downloads on cold start)
|
| 105 |
+
- Optional GPU Space + `N_GPU_LAYERS` if latency is bad on CPU
|
| 106 |
+
- Custom UI / agent traces if chasing **Off-Brand** or **Sharing is Caring** badges
|
| 107 |
+
- `scripts/download_model.py` for offline dev
|
| 108 |
+
- Demo video + social post
|
| 109 |
+
|
| 110 |
+
## What stays out of scope (correctly)
|
| 111 |
+
|
| 112 |
+
The plan already defers fine-tuning, CI, and badge-specific features. Keep those deferred until Phase 1 Space is green.
|
| 113 |
+
|
| 114 |
+
## Risk notes (small, actionable)
|
| 115 |
+
|
| 116 |
+
- **Cold start**: first request downloads GGUF from Hub β show a Gradio status message; consider HF Storage Bucket only if downloads are painfully slow on every restart
|
| 117 |
+
- **CPU latency**: 3B Q4 on CPU basic is acceptable for a demo; upgrade hardware only if needed
|
| 118 |
+
- **Docker build**: do not run GPU checks (`torch.cuda.is_available()`) at image build time β only at runtime (HF docs constraint)
|
| 119 |
+
- **UID 1000**: keep `USER 1000` in Dockerfile (HF requirement)
|
| 120 |
+
|
| 121 |
+
## Summary
|
| 122 |
+
|
| 123 |
+
| Question | Answer |
|
| 124 |
+
|----------|--------|
|
| 125 |
+
| Is the plan good for HF Space deploy? | **Yes** β it already targets Gradio on Spaces |
|
| 126 |
+
| Simpler Gradio-only deploy instead? | **No** β Gradio SDK is simpler to *author* but worse for monorepo + llama-cpp; you'd rework later |
|
| 127 |
+
| What to do now? | Execute the plan **Phase 1** with the root README YAML fix; polish in Phase 2 |
|