Spaces:
Running on Zero
Running on Zero
metadata
name: uv monorepo init
overview: >-
Bootstrap a uv workspace monorepo from scratch with a Gradio HF Space app and
a swappable local inference library (llama-cpp-python default, transformers
optional), aligned with [Build Small
Hackathon](https://huggingface.co/build-small-hackathon) constraints.
todos:
- id: uv-workspace
content: >-
Run uv init at root + apps/gradio-space + libs/inference; configure
workspace members, sources, and uv.lock
status: pending
- id: inference-lib
content: >-
Implement inference Protocol, llama_cpp backend, transformers backend
(optional extra), and factory with env-based switching
status: pending
- id: gradio-app
content: Create minimal Gradio chat app in apps/gradio-space wired to inference lib
status: pending
- id: hf-space
content: >-
Add Dockerfile, Space README YAML, .env.example, download_model script,
and root README with dev/hackathon docs
status: pending
- id: verify
content: Run uv sync, local Gradio smoke test, and confirm imports/backends work
status: pending
isProject: false
uv Monorepo + Gradio + Local Llama Inference
Context
- Repo today: only
README.mdβ greenfield setup. - Your choices: generic track scaffold, abstract inference with llama-cpp default.
- Hackathon hard rules: Gradio app on HF Space, models β€ 32B, demo video + social post by June 15, 2026.
Target layout
small-model-hackathon/
βββ pyproject.toml # workspace root + shared dev tooling
βββ uv.lock
βββ .python-version # 3.12
βββ .gitignore
βββ Dockerfile # HF Space (Docker SDK) β builds whole workspace
βββ README.md # dev + hackathon checklist
βββ apps/
β βββ gradio-space/
β βββ pyproject.toml
β βββ README.md # HF Space card YAML (title, sdk, hardware hints)
β βββ src/gradio_space/
β βββ __init__.py
β βββ app.py # Gradio UI entrypoint
βββ libs/
β βββ inference/
β βββ pyproject.toml
β βββ src/inference/
β βββ __init__.py
β βββ base.py # Protocol / ABC
β βββ llama_cpp.py # default backend (GGUF)
β βββ transformers.py # optional HF backend
β βββ factory.py # INFERENCE_BACKEND env switch
βββ scripts/
βββ download_model.py # pull GGUF from Hub to local cache
flowchart LR
subgraph app [apps/gradio-space]
GradioUI[app.py]
end
subgraph lib [libs/inference]
Factory[factory.py]
LlamaCpp[llama_cpp.py]
Transformers[transformers.py]
end
GradioUI --> Factory
Factory -->|default| LlamaCpp
Factory -->|optional| Transformers
LlamaCpp --> GGUF[Local GGUF file]
Transformers --> HFModel[HF weights via transformers]
1. Initialize uv workspace
Run from repo root:
uv init --name small-model-hackathon
uv init --package apps/gradio-space
uv init --package libs/inference
Configure root pyproject.toml:
[tool.uv.workspace]withmembers = ["apps/*", "libs/*"]- Root depends on both workspace packages so
uv syncinstalls everything:dependencies = ["gradio-space", "inference"][tool.uv.sources]mapping each to{ workspace = true }
- Shared dev deps at root:
ruff,pytest(optional but lightweight) requires-python = ">=3.12"(matches your installed Python 3.12.9)
Lock and install:
uv lock
uv sync --all-packages
2. libs/inference β swappable local backends
Core interface in base.py:
class InferenceBackend(Protocol):
def load(self) -> None: ...
def generate(self, prompt: str, *, max_tokens: int = 512, temperature: float = 0.7) -> str: ...
def chat(self, messages: list[dict[str, str]], **kwargs) -> str: ...
Default backend β llama_cpp.py
- Dependency:
llama-cpp-python(CPU build by default; GPU variant documented for local/CUDA Spaces) - Load GGUF via env config:
MODEL_PATHβ local file path, orMODEL_REPO+MODEL_FILEβ download from Hugging Face Hub at startup (huggingface_hub.hf_hub_download)
- Suggested default model for dev:
Qwen/Qwen2.5-3B-Instruct-GGUFwith a specific.ggufquant (well under 32B; laptop-friendly)
Optional backend β transformers.py
- Dependencies kept in an optional extra:
inference[transformers]βtransformers,torch,accelerate - Same public methods; loads
AutoModelForCausalLM+AutoTokenizerfromMODEL_ID - Heavier; useful if you later fine-tune and publish on Hub
Factory β factory.py
INFERENCE_BACKEND=llama_cpp|transformers(defaultllama_cpp)- Lazy singleton so model loads once on first request (important for Gradio cold start)
3. apps/gradio-space β minimal chat UI
Dependencies: gradio, inference (workspace)
app.py skeleton:
gr.ChatInterfaceor simpleBlockswith textbox + chat history- On startup: call
get_backend().load()with a status message if model missing - Wire
chat()to the inference backend - Expose
demo.launch()guarded byif __name__ == "__main__"
Run locally:
uv run --package gradio-space python -m gradio_space.app
# or: uv run --package gradio-space gradio apps/gradio-space/src/gradio_space/app.py
Env template (.env.example at root):
INFERENCE_BACKEND=llama_cpp
MODEL_REPO=Qwen/Qwen2.5-3B-Instruct-GGUF
MODEL_FILE=qwen2.5-3b-instruct-q4_k_m.gguf
N_CTX=4096
N_GPU_LAYERS=0
4. HF Space deployment (monorepo-friendly)
Use Docker SDK at repo root (HF Docker Spaces docs) so the whole workspace ships together.
Root Dockerfile (outline):
- Base:
python:3.12-slim - Install
uvvia official installer COPYmonorepo,uv sync --frozen --no-dev --package gradio-space- Run as UID 1000 (HF requirement)
EXPOSE 7860CMD ["uv", "run", "--package", "gradio-space", "python", "-m", "gradio_space.app"]
apps/gradio-space/README.md β Space card frontmatter:
---
title: <Your App Name>
emoji: ...
colorFrom: ...
colorTo: ...
sdk: docker
app_port: 7860
pinned: false
license: apache-2.0
---
When creating the Space under build-small-hackathon:
- New Space β SDK: Docker β link this repo
- Hardware: start CPU basic for llama-cpp dev; upgrade to GPU Space if you offload layers
- Add Space secrets/env vars for
MODEL_REPO,MODEL_FILE, etc. - Optionally attach a Storage Bucket if you cache large GGUF files persistently
5. Repo hygiene
.gitignore: .venv/, __pycache__/, .env, models/, *.gguf, .ruff_cache/, .pytest_cache/
README.md sections:
- Prerequisites:
uv, Python 3.12 - Quick start: sync, download model script, run Gradio locally
- Monorepo commands cheat sheet (
uv add --package ...,uv run --package ...) - Hackathon checklist: track choice, Space link, demo video, social post, badge targets (Off-the-Grid, Llama Champion, etc.)
scripts/download_model.py: small CLI using huggingface_hub to fetch the configured GGUF into ./models/ for offline dev.
6. Verification checklist (post-init)
| Step | Command / check |
|---|---|
| Workspace resolves | uv sync --all-packages succeeds |
| Import chain | uv run python -c "from inference.factory import get_backend" |
| Gradio boots | uv run --package gradio-space python -m gradio_space.app β localhost:7860 |
| Backend switch | INFERENCE_BACKEND=transformers fails gracefully until extra installed |
| Docker build | docker build -t hackathon-space . (optional local smoke test) |
Out of scope for this init (pick up later)
- Track-specific product logic (Backyard AI vs Thousand Token Wood)
- Fine-tuning pipeline / custom model publish
- Custom UI via
gr.Server(Off-Brand badge) - Agent traces dataset upload (Sharing is Caring badge)
- CI/GitHub Actions
Key design decisions
| Decision | Rationale |
|---|---|
uv workspace with apps/ + libs/ |
Clean separation; Gradio app stays thin; inference reusable |
| llama-cpp default | Matches "Off the Grid" + "Llama Champion" badges; runs on laptop CPU |
| transformers as optional extra | Keeps default install light; swap via env when needed |
| Docker Space at repo root | Standard pattern for monorepos on HF (see eu-ai-act example) |
| Qwen2.5-3B-Instruct GGUF default | Small, capable, llama.cpp-compatible, well under 32B cap |