# Usage

How to run the Gradio chat app locally, test it in Docker, and deploy to a Hugging Face Space for the [Build Small Hackathon](https://huggingface.co/build-small-hackathon).

## Prerequisites

- [uv](https://docs.astral.sh/uv/) installed
- Python 3.12 (see `.python-version`)
- For Docker testing: Docker installed locally
- For HF Space deploy: Hugging Face account with access to the `build-small-hackathon` org

## Local development

### 1. Install dependencies

```bash
uv sync --all-packages
```

### 2. Configure environment (optional)

```bash
cp .env.example .env
```

Edit `.env` if you want a different model or local GGUF path. Defaults work out of the box.

### 3. Pre-download the model (recommended)

The app can download the GGUF on first chat, but pre-downloading avoids a long wait during your first message:

```bash
uv run python scripts/download_model.py
```

Then add the printed path to `.env`:

```bash
MODEL_PATH=./models/qwen2.5-3b-instruct-q4_k_m.gguf
```

### 4. Run the Gradio app

```bash
uv run --package gradio-space python -m gradio_space.app
```

Open http://localhost:7860.

The model loads on the **first chat message** unless you set `MODEL_PATH`. After code changes, restart the process to pick up updates.

### 5. Quick sanity checks

```bash
# Inference package resolves
uv run python -c "from inference.factory import get_backend; print(type(get_backend()).__name__)"

# Gradio app module loads
uv run --package gradio-space python -c "from gradio_space.app import build_demo; print(build_demo())"
```

### Local env reference

| Variable | Default | Description |
|----------|---------|-------------|
| `INFERENCE_BACKEND` | `llama_cpp` | `llama_cpp` or `transformers` |
| `MODEL_REPO` | `Qwen/Qwen2.5-3B-Instruct-GGUF` | Hub repo for GGUF |
| `MODEL_FILE` | `qwen2.5-3b-instruct-q4_k_m.gguf` | GGUF filename |
| `MODEL_PATH` | — | Local GGUF path (skips Hub download) |
| `N_CTX` | `4096` | Context window |
| `N_GPU_LAYERS` | `0` | GPU layers for llama.cpp (`0` = CPU only) |
| `PORT` | `7860` | Gradio listen port |
| `MODEL_ID` | `Qwen/Qwen2.5-3B-Instruct` | Used when `INFERENCE_BACKEND=transformers` |

### Optional: transformers backend

Heavier install; only needed if you switch away from llama.cpp:

```bash
uv sync --package inference --extra transformers
INFERENCE_BACKEND=transformers MODEL_ID=Qwen/Qwen2.5-3B-Instruct \
  uv run --package gradio-space python -m gradio_space.app
```

---

## Docker (local prod-like test)

Run the same container image HF Spaces will build:

```bash
docker build -t hackathon-space .
docker run --rm -p 7860:7860 \
  -e MODEL_REPO=Qwen/Qwen2.5-3B-Instruct-GGUF \
  -e MODEL_FILE=qwen2.5-3b-instruct-q4_k_m.gguf \
  -e N_CTX=4096 \
  -e N_GPU_LAYERS=0 \
  hackathon-space
```

Open http://localhost:7860. Stop with `Ctrl+C`.

To use a pre-downloaded local model inside Docker, mount it and set `MODEL_PATH`:

```bash
docker run --rm -p 7860:7860 \
  -v "$(pwd)/models:/app/models:ro" \
  -e MODEL_PATH=/app/models/qwen2.5-3b-instruct-q4_k_m.gguf \
  hackathon-space
```

---

## Hugging Face Space deployment

This repo uses the **Docker SDK**. The Space card metadata lives in the YAML frontmatter at the top of [README.md](README.md).

### 1. Push code to GitHub

Make sure `main` (or your deploy branch) contains at minimum:

- `Dockerfile`
- `README.md` (with `sdk: docker` and `app_port: 7860`)
- `pyproject.toml`, `uv.lock`
- `apps/gradio-space/` and `libs/inference/`

### 2. Create the Space

1. Go to [build-small-hackathon](https://huggingface.co/build-small-hackathon)
2. **New Space**
3. Name: e.g. `small-model-hackathon`
4. SDK: **Docker**
5. Link your GitHub repo, or push directly to the Space repo

CLI alternative (if you have `hf` installed and org access):

```bash
hf repo create build-small-hackathon/<your-space-name> \
  --repo-type space \
  --space_sdk docker
```

### 3. Configure hardware

| Setting | Recommendation |
|---------|----------------|
| Hardware | **CPU basic** to start (llama.cpp with `N_GPU_LAYERS=0`) |
| Upgrade | GPU Space if you set `N_GPU_LAYERS > 0` for faster inference |

### 4. Set Space environment variables

In the Space **Settings → Variables and secrets**:

| Variable | Value |
|----------|-------|
| `INFERENCE_BACKEND` | `llama_cpp` |
| `MODEL_REPO` | `Qwen/Qwen2.5-3B-Instruct-GGUF` |
| `MODEL_FILE` | `qwen2.5-3b-instruct-q4_k_m.gguf` |
| `N_CTX` | `4096` |
| `N_GPU_LAYERS` | `0` (or higher on GPU hardware) |

### 5. Build and verify

HF builds from the root `Dockerfile` and runs:

```bash
uv run --package gradio-space python -m gradio_space.app
```

Check the **Logs** tab while the Space builds. Once running, open the Space URL and send a test chat message. The first message may take several minutes on CPU while the GGUF downloads.

### 6. Optional: persistent model cache

If cold starts are too slow, attach a **Storage Bucket** in Space settings so downloaded GGUF files survive restarts.

---

## Troubleshooting

| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| First chat hangs / slow | GGUF downloading from Hub | Pre-download locally; on Space, wait or use Storage Bucket |
| `Failed to load model` in chat | Wrong `MODEL_REPO` / `MODEL_FILE` | Check env vars match a valid GGUF on Hub |
| Docker build fails on `llama-cpp-python` | Missing build tools | Dockerfile already installs `build-essential` and `cmake` |
| Space build fails | Missing `uv.lock` or README YAML | Ensure `sdk: docker` is in root `README.md` frontmatter |
| `transformers` backend error | Optional deps not installed | Run `uv sync --package inference --extra transformers` |
| Port already in use locally | Another process on 7860 | `PORT=7861 uv run --package gradio-space python -m gradio_space.app` |

---

## Entrypoint summary

All three environments use the same command:

```bash
uv run --package gradio-space python -m gradio_space.app
```

| Environment | How to run |
|-------------|------------|
| Local dev | `uv run --package gradio-space python -m gradio_space.app` |
| Docker | `docker run -p 7860:7860 hackathon-space` |
| HF Space | Built and started automatically from `Dockerfile` `CMD` |