Spaces:

build-small-hackathon
/

lesson-agent

Running on Zero

App Files Files Community

lesson-agent / .cursor /plans /uv_monorepo_init_d9207227.plan.md

MSGEncrypted

init plan

a0b2364 25 days ago

8.9 kB

name: uv monorepo init
overview: >-
  Bootstrap a uv workspace monorepo from scratch with a Gradio HF Space app and
  a swappable local inference library (llama-cpp-python default, transformers
  optional), aligned with [Build Small
  Hackathon](https://huggingface.co/build-small-hackathon) constraints.
todos:
  - id: uv-workspace
    content: >-
      Run uv init at root + apps/gradio-space + libs/inference; configure
      workspace members, sources, and uv.lock
    status: pending
  - id: inference-lib
    content: >-
      Implement inference Protocol, llama_cpp backend, transformers backend
      (optional extra), and factory with env-based switching
    status: pending
  - id: gradio-app
    content: Create minimal Gradio chat app in apps/gradio-space wired to inference lib
    status: pending
  - id: hf-space
    content: >-
      Add Dockerfile, Space README YAML, .env.example, download_model script,
      and root README with dev/hackathon docs
    status: pending
  - id: verify
    content: Run uv sync, local Gradio smoke test, and confirm imports/backends work
    status: pending
isProject: false

uv Monorepo + Gradio + Local Llama Inference

Context

Repo today: only README.md — greenfield setup.
Your choices: generic track scaffold, abstract inference with llama-cpp default.
Hackathon hard rules: Gradio app on HF Space, models ≤ 32B, demo video + social post by June 15, 2026.

Target layout

small-model-hackathon/
├── pyproject.toml              # workspace root + shared dev tooling
├── uv.lock
├── .python-version             # 3.12
├── .gitignore
├── Dockerfile                  # HF Space (Docker SDK) — builds whole workspace
├── README.md                   # dev + hackathon checklist
├── apps/
│   └── gradio-space/
│       ├── pyproject.toml
│       ├── README.md           # HF Space card YAML (title, sdk, hardware hints)
│       └── src/gradio_space/
│           ├── __init__.py
│           └── app.py          # Gradio UI entrypoint
├── libs/
│   └── inference/
│       ├── pyproject.toml
│       └── src/inference/
│           ├── __init__.py
│           ├── base.py         # Protocol / ABC
│           ├── llama_cpp.py    # default backend (GGUF)
│           ├── transformers.py # optional HF backend
│           └── factory.py      # INFERENCE_BACKEND env switch
└── scripts/
    └── download_model.py       # pull GGUF from Hub to local cache

flowchart LR
  subgraph app [apps/gradio-space]
    GradioUI[app.py]
  end
  subgraph lib [libs/inference]
    Factory[factory.py]
    LlamaCpp[llama_cpp.py]
    Transformers[transformers.py]
  end
  GradioUI --> Factory
  Factory -->|default| LlamaCpp
  Factory -->|optional| Transformers
  LlamaCpp --> GGUF[Local GGUF file]
  Transformers --> HFModel[HF weights via transformers]

1. Initialize uv workspace

Run from repo root:

uv init --name small-model-hackathon
uv init --package apps/gradio-space
uv init --package libs/inference

Configure root pyproject.toml:

[tool.uv.workspace] with members = ["apps/*", "libs/*"]
Root depends on both workspace packages so uv sync installs everything:
- dependencies = ["gradio-space", "inference"]
- [tool.uv.sources] mapping each to { workspace = true }
Shared dev deps at root: ruff, pytest (optional but lightweight)
requires-python = ">=3.12" (matches your installed Python 3.12.9)

Lock and install:

uv lock
uv sync --all-packages

2. `libs/inference` — swappable local backends

Core interface in base.py:

class InferenceBackend(Protocol):
    def load(self) -> None: ...
    def generate(self, prompt: str, *, max_tokens: int = 512, temperature: float = 0.7) -> str: ...
    def chat(self, messages: list[dict[str, str]], **kwargs) -> str: ...

Default backend — llama_cpp.py

Dependency: llama-cpp-python (CPU build by default; GPU variant documented for local/CUDA Spaces)
Load GGUF via env config:
- MODEL_PATH — local file path, or
- MODEL_REPO + MODEL_FILE — download from Hugging Face Hub at startup (huggingface_hub.hf_hub_download)
Suggested default model for dev: Qwen/Qwen2.5-3B-Instruct-GGUF with a specific .gguf quant (well under 32B; laptop-friendly)

Optional backend — transformers.py

Dependencies kept in an optional extra: inference[transformers] → transformers, torch, accelerate
Same public methods; loads AutoModelForCausalLM + AutoTokenizer from MODEL_ID
Heavier; useful if you later fine-tune and publish on Hub

Factory — factory.py

INFERENCE_BACKEND=llama_cpp|transformers (default llama_cpp)
Lazy singleton so model loads once on first request (important for Gradio cold start)

3. `apps/gradio-space` — minimal chat UI

Dependencies: gradio, inference (workspace)

app.py skeleton:

gr.ChatInterface or simple Blocks with textbox + chat history
On startup: call get_backend().load() with a status message if model missing
Wire chat() to the inference backend
Expose demo.launch() guarded by if __name__ == "__main__"

Run locally:

uv run --package gradio-space python -m gradio_space.app
# or: uv run --package gradio-space gradio apps/gradio-space/src/gradio_space/app.py

Env template (.env.example at root):

INFERENCE_BACKEND=llama_cpp
MODEL_REPO=Qwen/Qwen2.5-3B-Instruct-GGUF
MODEL_FILE=qwen2.5-3b-instruct-q4_k_m.gguf
N_CTX=4096
N_GPU_LAYERS=0

4. HF Space deployment (monorepo-friendly)

Use Docker SDK at repo root (HF Docker Spaces docs) so the whole workspace ships together.

Root Dockerfile (outline):

Base: python:3.12-slim
Install uv via official installer
COPY monorepo, uv sync --frozen --no-dev --package gradio-space
Run as UID 1000 (HF requirement)
EXPOSE 7860
CMD ["uv", "run", "--package", "gradio-space", "python", "-m", "gradio_space.app"]

apps/gradio-space/README.md — Space card frontmatter:

---
title: <Your App Name>
emoji: ...
colorFrom: ...
colorTo: ...
sdk: docker
app_port: 7860
pinned: false
license: apache-2.0
---

When creating the Space under build-small-hackathon:

New Space → SDK: Docker → link this repo
Hardware: start CPU basic for llama-cpp dev; upgrade to GPU Space if you offload layers
Add Space secrets/env vars for MODEL_REPO, MODEL_FILE, etc.
Optionally attach a Storage Bucket if you cache large GGUF files persistently

5. Repo hygiene

.gitignore: .venv/, __pycache__/, .env, models/, *.gguf, .ruff_cache/, .pytest_cache/

README.md sections:

Prerequisites: uv, Python 3.12
Quick start: sync, download model script, run Gradio locally
Monorepo commands cheat sheet (uv add --package ..., uv run --package ...)
Hackathon checklist: track choice, Space link, demo video, social post, badge targets (Off-the-Grid, Llama Champion, etc.)

scripts/download_model.py: small CLI using huggingface_hub to fetch the configured GGUF into ./models/ for offline dev.

6. Verification checklist (post-init)

Step	Command / check
Workspace resolves	`uv sync --all-packages` succeeds
Import chain	`uv run python -c "from inference.factory import get_backend"`
Gradio boots	`uv run --package gradio-space python -m gradio_space.app` → localhost:7860
Backend switch	`INFERENCE_BACKEND=transformers` fails gracefully until extra installed
Docker build	`docker build -t hackathon-space .` (optional local smoke test)

Out of scope for this init (pick up later)

Track-specific product logic (Backyard AI vs Thousand Token Wood)
Fine-tuning pipeline / custom model publish
Custom UI via gr.Server (Off-Brand badge)
Agent traces dataset upload (Sharing is Caring badge)
CI/GitHub Actions

Key design decisions

Decision	Rationale
uv workspace with `apps/` + `libs/`	Clean separation; Gradio app stays thin; inference reusable
llama-cpp default	Matches "Off the Grid" + "Llama Champion" badges; runs on laptop CPU
transformers as optional extra	Keeps default install light; swap via env when needed
Docker Space at repo root	Standard pattern for monorepos on HF (see eu-ai-act example)
Qwen2.5-3B-Instruct GGUF default	Small, capable, llama.cpp-compatible, well under 32B cap