lesson-agent / .cursor /plans /hf_space_deploy_review_a7f8b3c3.plan.md
MSGEncrypted's picture
wip script and monorepo
f173e0f
|
Raw
History Blame
6.47 kB
metadata
name: HF Space deploy review
overview: >-
  The existing uv monorepo plan is the right foundation for a Build Small
  Hackathon Space. It already deploys a simple Gradio chat app β€” Docker SDK is
  the correct choice for monorepo + llama-cpp, not a separate "simpler" path.
  With your balanced priority, execute the plan in two phases: minimal live
  Space first, polish and badges second.
todos:
  - id: fix-readme-yaml
    content: >-
      Put HF Space YAML frontmatter (sdk: docker, app_port: 7860) in root
      README.md, not only apps/gradio-space/README.md
    status: completed
  - id: phase1-bootstrap
    content: >-
      Phase 1: uv workspace + inference lib (llama_cpp only) + minimal
      gr.ChatInterface app
    status: completed
  - id: phase1-docker
    content: >-
      Phase 1: root Dockerfile (uv sync, UID 1000, port 7860) and create Space
      under build-small-hackathon
    status: in_progress
  - id: phase1-verify
    content: >-
      Phase 1: local uv sync + Gradio smoke test + confirm Space builds on CPU
      basic
    status: pending
  - id: phase2-polish
    content: >-
      Phase 2: loading UX, optional GPU, track-specific logic, badges, demo
      video + social post by June 15
    status: pending
isProject: false

HF Space Deploy Review

Verdict: the plan is already good β€” and it is a simple Gradio deploy

Your plan in .cursor/plans/uv_monorepo_init_d9207227.plan.md does not over-engineer relative to hackathon requirements. The hackathon only requires:

  • A Gradio app hosted as a Hugging Face Space under build-small-hackathon
  • Model ≀ 32B
  • Submission by June 15, 2026: Space link + demo video + social post

The plan delivers exactly that: gr.ChatInterface in apps/gradio-space/src/gradio_space/app.py, exposed on port 7860. The uv workspace and libs/inference are scaffolding so the UI stays thin and backends stay swappable β€” not extra product scope.

flowchart TB
  subgraph hfSpace [HF Space - Docker SDK]
    Dockerfile[Root Dockerfile]
    GradioApp[gradio_space.app]
  end
  subgraph monorepo [Monorepo]
    InferenceLib[inference factory]
    LlamaCpp[llama_cpp GGUF]
    Transformers[transformers optional]
  end
  Dockerfile --> GradioApp
  GradioApp --> InferenceLib
  InferenceLib --> LlamaCpp
  InferenceLib -.-> Transformers

Gradio SDK vs Docker SDK β€” why not "just Gradio SDK"?

Approach Good for Problem for this repo
Gradio SDK (app.py + requirements.txt at repo root) Single-file demos, fastest hello-world No clean way to use libs/inference workspace package; llama-cpp-python needs compile/CMAKE control that Gradio SDK handles poorly
Docker SDK (root Dockerfile, still runs Gradio) Monorepos, native deps, env control ~30 extra lines of Dockerfile β€” already in your plan

Recommendation: Keep Docker SDK + Gradio UI. You get the same user-facing Gradio app; Docker is just the shipping container. This is the standard monorepo pattern HF documents for custom runtimes.

A "simpler" alternative (Gradio SDK + transformers only, no llama-cpp) would boot faster to write but:

  • Drops Off-the-Grid / Llama Champion badge paths
  • Heavier default install (torch) on CPU Spaces
  • Forces you to flatten or duplicate inference code later

Given balanced priority, Docker + llama-cpp default is the better baseline.

One fix to the existing plan

Space card YAML must live in the repo-root README.md, not only in apps/gradio-space/README.md.

HF reads the YAML frontmatter from the repository root README.md when the Space is linked to this repo. Merge:

---
title: <Your App Name>
emoji: ...
sdk: docker
app_port: 7860
---

into root README.md, then add dev/hackathon docs below the closing ---. Keep apps/gradio-space/README.md as an optional short package note, or drop it to avoid duplication.

Balanced execution (your choice): two phases

Phase 1 β€” Minimal live Space (ship this first)

Goal: working Space under build-small-hackathon with a chat demo, even if rough.

  1. Bootstrap uv workspace (plan sections 1–3): root + apps/gradio-space + libs/inference
  2. Minimal inference: llama_cpp backend only; get_backend().load() lazy singleton; chat() wired to gr.ChatInterface
  3. Default model: Qwen/Qwen2.5-3B-Instruct-GGUF + qwen2.5-3b-instruct-q4_k_m.gguf (small, under 32B, CPU-viable)
  4. Root Dockerfile + fixed root README YAML (sdk: docker, app_port: 7860)
  5. Create Space: Docker SDK, CPU basic hardware, env vars MODEL_REPO, MODEL_FILE, N_CTX=4096, N_GPU_LAYERS=0
  6. Smoke test: uv sync, local Gradio on :7860, push, confirm Space builds

Defer: transformers extra, custom UI (gr.Server), scripts/download_model.py, ruff/pytest, badge-specific polish.

Phase 2 β€” Polish before June 15

  • Track-specific product logic (Backyard AI vs Thousand Token Wood)
  • Better loading UX (progress/status while GGUF downloads on cold start)
  • Optional GPU Space + N_GPU_LAYERS if latency is bad on CPU
  • Custom UI / agent traces if chasing Off-Brand or Sharing is Caring badges
  • scripts/download_model.py for offline dev
  • Demo video + social post

What stays out of scope (correctly)

The plan already defers fine-tuning, CI, and badge-specific features. Keep those deferred until Phase 1 Space is green.

Risk notes (small, actionable)

  • Cold start: first request downloads GGUF from Hub β€” show a Gradio status message; consider HF Storage Bucket only if downloads are painfully slow on every restart
  • CPU latency: 3B Q4 on CPU basic is acceptable for a demo; upgrade hardware only if needed
  • Docker build: do not run GPU checks (torch.cuda.is_available()) at image build time β€” only at runtime (HF docs constraint)
  • UID 1000: keep USER 1000 in Dockerfile (HF requirement)

Summary

Question Answer
Is the plan good for HF Space deploy? Yes β€” it already targets Gradio on Spaces
Simpler Gradio-only deploy instead? No β€” Gradio SDK is simpler to author but worse for monorepo + llama-cpp; you'd rework later
What to do now? Execute the plan Phase 1 with the root README YAML fix; polish in Phase 2