MSGEncrypted commited on
Commit
d1d46b8
Β·
1 Parent(s): a0b2364

hf basic plan

Browse files
.cursor/plans/hf_space_deploy_review_a7f8b3c3.plan.md ADDED
@@ -0,0 +1,127 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: HF Space deploy review
3
+ overview: "The existing uv monorepo plan is the right foundation for a Build Small Hackathon Space. It already deploys a simple Gradio chat app β€” Docker SDK is the correct choice for monorepo + llama-cpp, not a separate \"simpler\" path. With your balanced priority, execute the plan in two phases: minimal live Space first, polish and badges second."
4
+ todos:
5
+ - id: fix-readme-yaml
6
+ content: "Put HF Space YAML frontmatter (sdk: docker, app_port: 7860) in root README.md, not only apps/gradio-space/README.md"
7
+ status: pending
8
+ - id: phase1-bootstrap
9
+ content: "Phase 1: uv workspace + inference lib (llama_cpp only) + minimal gr.ChatInterface app"
10
+ status: pending
11
+ - id: phase1-docker
12
+ content: "Phase 1: root Dockerfile (uv sync, UID 1000, port 7860) and create Space under build-small-hackathon"
13
+ status: pending
14
+ - id: phase1-verify
15
+ content: "Phase 1: local uv sync + Gradio smoke test + confirm Space builds on CPU basic"
16
+ status: pending
17
+ - id: phase2-polish
18
+ content: "Phase 2: loading UX, optional GPU, track-specific logic, badges, demo video + social post by June 15"
19
+ status: pending
20
+ isProject: false
21
+ ---
22
+
23
+ # HF Space Deploy Review
24
+
25
+ ## Verdict: the plan is already good β€” and it *is* a simple Gradio deploy
26
+
27
+ Your plan in [`.cursor/plans/uv_monorepo_init_d9207227.plan.md`](.cursor/plans/uv_monorepo_init_d9207227.plan.md) does **not** over-engineer relative to hackathon requirements. The hackathon only requires:
28
+
29
+ - A **Gradio app** hosted as a **Hugging Face Space** under [build-small-hackathon](https://huggingface.co/build-small-hackathon)
30
+ - Model **≀ 32B**
31
+ - Submission by **June 15, 2026**: Space link + demo video + social post
32
+
33
+ The plan delivers exactly that: `gr.ChatInterface` in [`apps/gradio-space/src/gradio_space/app.py`](apps/gradio-space/src/gradio_space/app.py), exposed on port **7860**. The uv workspace and `libs/inference` are scaffolding so the UI stays thin and backends stay swappable β€” not extra product scope.
34
+
35
+ ```mermaid
36
+ flowchart TB
37
+ subgraph hfSpace [HF Space - Docker SDK]
38
+ Dockerfile[Root Dockerfile]
39
+ GradioApp[gradio_space.app]
40
+ end
41
+ subgraph monorepo [Monorepo]
42
+ InferenceLib[inference factory]
43
+ LlamaCpp[llama_cpp GGUF]
44
+ Transformers[transformers optional]
45
+ end
46
+ Dockerfile --> GradioApp
47
+ GradioApp --> InferenceLib
48
+ InferenceLib --> LlamaCpp
49
+ InferenceLib -.-> Transformers
50
+ ```
51
+
52
+ ## Gradio SDK vs Docker SDK β€” why not "just Gradio SDK"?
53
+
54
+ | Approach | Good for | Problem for this repo |
55
+ |----------|----------|------------------------|
56
+ | **Gradio SDK** (`app.py` + `requirements.txt` at repo root) | Single-file demos, fastest hello-world | No clean way to use `libs/inference` workspace package; `llama-cpp-python` needs compile/CMAKE control that Gradio SDK handles poorly |
57
+ | **Docker SDK** (root `Dockerfile`, still runs Gradio) | Monorepos, native deps, env control | ~30 extra lines of Dockerfile β€” already in your plan |
58
+
59
+ **Recommendation:** Keep **Docker SDK + Gradio UI**. You get the same user-facing Gradio app; Docker is just the shipping container. This is the standard monorepo pattern HF documents for custom runtimes.
60
+
61
+ A "simpler" alternative (Gradio SDK + `transformers` only, no llama-cpp) would boot faster to write but:
62
+
63
+ - Drops **Off-the-Grid** / **Llama Champion** badge paths
64
+ - Heavier default install (`torch`) on CPU Spaces
65
+ - Forces you to flatten or duplicate inference code later
66
+
67
+ Given balanced priority, Docker + llama-cpp default is the better baseline.
68
+
69
+ ## One fix to the existing plan
70
+
71
+ **Space card YAML must live in the repo-root [`README.md`](README.md), not only in `apps/gradio-space/README.md`.**
72
+
73
+ HF reads the YAML frontmatter from the **repository root** `README.md` when the Space is linked to this repo. Merge:
74
+
75
+ ```yaml
76
+ ---
77
+ title: <Your App Name>
78
+ emoji: ...
79
+ sdk: docker
80
+ app_port: 7860
81
+ ---
82
+ ```
83
+
84
+ into root `README.md`, then add dev/hackathon docs below the closing `---`. Keep `apps/gradio-space/README.md` as an optional short package note, or drop it to avoid duplication.
85
+
86
+ ## Balanced execution (your choice): two phases
87
+
88
+ ### Phase 1 β€” Minimal live Space (ship this first)
89
+
90
+ Goal: working Space under `build-small-hackathon` with a chat demo, even if rough.
91
+
92
+ 1. **Bootstrap uv workspace** (plan sections 1–3): root + `apps/gradio-space` + `libs/inference`
93
+ 2. **Minimal inference**: `llama_cpp` backend only; `get_backend().load()` lazy singleton; `chat()` wired to `gr.ChatInterface`
94
+ 3. **Default model**: `Qwen/Qwen2.5-3B-Instruct-GGUF` + `qwen2.5-3b-instruct-q4_k_m.gguf` (small, under 32B, CPU-viable)
95
+ 4. **Root Dockerfile** + fixed root README YAML (`sdk: docker`, `app_port: 7860`)
96
+ 5. **Create Space**: Docker SDK, CPU basic hardware, env vars `MODEL_REPO`, `MODEL_FILE`, `N_CTX=4096`, `N_GPU_LAYERS=0`
97
+ 6. **Smoke test**: `uv sync`, local Gradio on `:7860`, push, confirm Space builds
98
+
99
+ Defer: `transformers` extra, custom UI (`gr.Server`), `scripts/download_model.py`, ruff/pytest, badge-specific polish.
100
+
101
+ ### Phase 2 β€” Polish before June 15
102
+
103
+ - Track-specific product logic (Backyard AI vs Thousand Token Wood)
104
+ - Better loading UX (progress/status while GGUF downloads on cold start)
105
+ - Optional GPU Space + `N_GPU_LAYERS` if latency is bad on CPU
106
+ - Custom UI / agent traces if chasing **Off-Brand** or **Sharing is Caring** badges
107
+ - `scripts/download_model.py` for offline dev
108
+ - Demo video + social post
109
+
110
+ ## What stays out of scope (correctly)
111
+
112
+ The plan already defers fine-tuning, CI, and badge-specific features. Keep those deferred until Phase 1 Space is green.
113
+
114
+ ## Risk notes (small, actionable)
115
+
116
+ - **Cold start**: first request downloads GGUF from Hub β€” show a Gradio status message; consider HF Storage Bucket only if downloads are painfully slow on every restart
117
+ - **CPU latency**: 3B Q4 on CPU basic is acceptable for a demo; upgrade hardware only if needed
118
+ - **Docker build**: do not run GPU checks (`torch.cuda.is_available()`) at image build time β€” only at runtime (HF docs constraint)
119
+ - **UID 1000**: keep `USER 1000` in Dockerfile (HF requirement)
120
+
121
+ ## Summary
122
+
123
+ | Question | Answer |
124
+ |----------|--------|
125
+ | Is the plan good for HF Space deploy? | **Yes** β€” it already targets Gradio on Spaces |
126
+ | Simpler Gradio-only deploy instead? | **No** β€” Gradio SDK is simpler to *author* but worse for monorepo + llama-cpp; you'd rework later |
127
+ | What to do now? | Execute the plan **Phase 1** with the root README YAML fix; polish in Phase 2 |