Spaces:

Sathvik0101
/

obj_localizer

Running

3v324v23 commited on 26 days ago

Commit

23db765

0 Parent(s):

feat: initialize SpaceDebris Localizer project

- Full project structure with src/, tests/, examples/, assets/
- LocateAnything-3B inference wrapper (src/inference.py)
- Bounding box parsing with normalized-to-pixel conversion (src/parsing.py)
- Space debris prompt templates (src/prompts.py)
- Image visualization with box drawing (src/visualization.py)
- Gradio UI with image upload, prompt input, annotated output (app.py)
- Comprehensive test suite for parsing, visualization, prompts
- CI workflow (ruff, black, pytest)
- HF Space sync workflow via GitHub Actions
- README with architecture, setup, deployment instructions

Files changed (22) hide show

.env.example +16 -0
.github/workflows/ci.yml +36 -0
.github/workflows/sync-to-hf-space.yml +20 -0
.gitignore +22 -0
LICENSE +21 -0
README.md +144 -0
app.py +170 -0
assets/demo_placeholder.png +0 -0
examples/sample_queries.md +34 -0
pyproject.toml +44 -0
requirements.txt +14 -0
src/__init__.py +3 -0
src/config.py +17 -0
src/inference.py +168 -0
src/parsing.py +168 -0
src/prompts.py +108 -0
src/utils.py +62 -0
src/visualization.py +119 -0
tests/test_app_smoke.py +46 -0
tests/test_parsing.py +115 -0
tests/test_prompts.py +67 -0
tests/test_visualization.py +61 -0

.env.example ADDED Viewed

	@@ -0,0 +1,16 @@

+# Environment Variables for SpaceDebris Localizer
+# Copy this file to .env and fill in your values
+# Hugging Face credentials (for GitHub Actions sync)
+HF_TOKEN=
+HF_USERNAME=
+HF_SPACE_NAME=
+# Model configuration
+# MODEL_ID=nvidia/LocateAnything-3B
+# DEVICE=cuda
+# DTYPE=bfloat16
+# MAX_NEW_TOKENS=8192
+# GENERATION_MODE=hybrid
+# TEMPERATURE=0.7
+# PORT=7860

.github/workflows/ci.yml ADDED Viewed

	@@ -0,0 +1,36 @@

+name: CI
+on:
+  push:
+    branches: [main]
+  pull_request:
+    branches: [main]
+jobs:
+  lint-and-test:
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python-version: ["3.10", "3.11"]
+    steps:
+      - uses: actions/checkout@v4
+      - name: Set up Python ${{ matrix.python-version }}
+        uses: actions/setup-python@v5
+        with:
+          python-version: ${{ matrix.python-version }}
+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install -e ".[dev]"
+      - name: Lint with ruff
+        run: ruff check src/ tests/ app.py
+      - name: Format check with black
+        run: black --check src/ tests/ app.py
+      - name: Run tests
+        run: pytest tests/ -v --tb=short

.github/workflows/sync-to-hf-space.yml ADDED Viewed

	@@ -0,0 +1,20 @@

+name: Sync to Hugging Face Space
+on:
+  push:
+    branches: [main]
+jobs:
+  sync:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+      - name: Push to Hugging Face Space
+        uses: cdanwards/action-push-to-hf-space@v2
+        with:
+          hf_token: ${{ secrets.HF_TOKEN }}
+          hf_space_name: ${{ secrets.HF_USERNAME }}/${{ secrets.HF_SPACE_NAME }}
+          branch: main

.gitignore ADDED Viewed

	@@ -0,0 +1,22 @@

+__pycache__/
+*.py[cod]
+*$py.class
+*.egg-info/
+dist/
+build/
+.eggs/
+*.egg
+.env
+.venv/
+venv/
+env/
+.pytest_cache/
+.mypy_cache/
+.ruff_cache/
+*.safetensors
+*.bin
+*.pt
+*.pth
+.DS_Store
+Thumbs.db
+*.log

LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2026 SpaceDebris Localizer Contributors
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README.md ADDED Viewed

	@@ -0,0 +1,144 @@

+# SpaceDebris Localizer
+Use **NVIDIA LocateAnything-3B** to locate space debris, satellite fragments, and spacecraft components in orbital imagery.
+Orbital debris is a growing threat to satellite operations and crewed spaceflight. This project demonstrates how state-of-the-art vision-language grounding models can be applied to identify and localize objects in space imagery — from satellite solar panels and antennas to rocket bodies and debris fields. Built as a Hugging Face Spaces application, it provides a natural-language interface: describe what you're looking for, and the model draws bounding boxes around matching objects in the image.
+## Why This Matters
+There are over 36,000 tracked objects in Earth orbit, and millions of smaller fragments too tiny to track. Traditional detection pipelines require specialized training data and domain-specific models. Vision-language grounding models like LocateAnything-3B offer a different approach: describe the target in natural language and let the model find it. This prototype explores whether general-purpose visual grounding can serve as a rapid-deployment tool for orbital debris awareness, satellite inspection, and space situational awareness workflows.
+## Architecture
+```
+User uploads image + text prompt
+         │
+         ▼
+┌─────────────────────┐
+│   Gradio Interface   │
+│   (app.py)           │
+└────────┬────────────┘
+         │
+         ▼
+┌─────────────────────┐
+│  LocateAnythingWorker│
+│  (src/inference.py)  │
+│  ┌─────────────────┐│
+│  │ nvidia/          ││
+│  │ LocateAnything-  ││
+│  │ 3B (3B params)   ││
+│  └─────────────────┘│
+└────────┬────────────┘
+         │ raw text with <box> tokens
+         ▼
+┌─────────────────────┐
+│  Output Parser       │
+│  (src/parsing.py)    │
+│  Regex → BBox list   │
+└────────┬────────────┘
+         │ structured BBox objects
+         ▼
+┌─────────────────────┐
+│  Visualizer          │
+│  (src/visualization) │
+│  Draw boxes + labels │
+└────────┬────────────┘
+         │
+         ▼
+   Annotated image + JSON metadata
+```
+## Setup
+### Prerequisites
+- Python 3.10+
+- CUDA-capable GPU (recommended) or CPU (slow)
+- ~8GB GPU memory for bfloat16 inference
+### Local Installation
+```bash
+git clone https://github.com/YOUR_USERNAME/space-debris-localizer.git
+cd space-debris-localizer
+pip install -e ".[dev]"
+```
+### Run Locally
+```bash
+python app.py
+```
+The app launches at `http://localhost:7860`. First run downloads the model (~6GB).
+### Environment Variables
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `MODEL_ID` | `nvidia/LocateAnything-3B` | HuggingFace model ID |
+| `DEVICE` | `cuda` | Device (`cuda` or `cpu`) |
+| `DTYPE` | `bfloat16` | Model precision |
+| `MAX_NEW_TOKENS` | `8192` | Max generation tokens |
+| `GENERATION_MODE` | `hybrid` | `fast`, `slow`, or `hybrid` |
+| `PORT` | `7860` | Gradio server port |
+## Deployment to Hugging Face Spaces
+### Automatic Sync via GitHub Actions
+1. Create a Hugging Face Space at [huggingface.co/new-space](https://huggingface.co/new-space) (select Gradio SDK)
+2. Set these GitHub repository secrets:
+   - `HF_TOKEN` — your Hugging Face [access token](https://huggingface.co/settings/tokens)
+   - `HF_USERNAME` — your Hugging Face username
+   - `HF_SPACE_NAME` — your space name
+3. Push to `main`. GitHub Actions will sync the repo to your HF Space automatically.
+### Manual Push
+```bash
+# Clone your HF Space repo
+git clone https://huggingface.co/spaces/YOUR_USERNAME/space-debris-localizer
+cd space-debris-localizer
+# Copy project files
+cp -r /path/to/space-debris-localizer/* .
+git add . && git commit -m "deploy" && git push
+```
+## Example Prompts
+- `Locate all the instances that match the following description: space debris.`
+- `Locate all the instances that match the following description: solar panel.`
+- `Locate a single instance that matches the following description: spacecraft.`
+- `Locate all the instances that match the following description: antenna.`
+- `Locate all the instances that match the following description: rocket body.`
+- `Locate all the instances that match the following description: thermal blanket.`
+## Known Limitations
+- **Domain gap:** The model was trained on general grounding data (COCO, LVIS, RefCOCO, etc.), not specifically on orbital imagery. Performance on space scenes is exploratory.
+- **Small debris:** Objects below a few pixels are unlikely to be grounded reliably.
+- **Image quality:** Detection depends heavily on image resolution and contrast.
+- **No confidence calibration:** The model does not output calibrated confidence scores; displayed confidence is a placeholder.
+- **GPU required:** CPU inference is extremely slow due to the 3B parameter size.
+## Future Work
+- Fine-tune on orbital debris datasets (e.g., ESA's DISCOS, ESA Clean Space imagery)
+- Integrate with real satellite imagery APIs (e.g., ESA Copernicus, Planet Labs)
+- Add temporal tracking across image sequences
+- Support video input for debris tracking
+- Add point-based localization for centroid estimation
+- Deploy with quantized model for faster CPU inference
+## Tech Stack
+- **Model:** [nvidia/LocateAnything-3B](https://huggingface.co/nvidia/LocateAnything-3B)
+- **Framework:** Gradio 5.x, Hugging Face Transformers
+- **Language:** Python 3.10+
+- **CI/CD:** GitHub Actions
+- **Deployment:** Hugging Face Spaces
+## License
+MIT License. The underlying LocateAnything-3B model is subject to the [NVIDIA License](https://huggingface.co/nvidia/LocateAnything-3B/blob/main/LICENSE) (non-commercial research use).

app.py ADDED Viewed

	@@ -0,0 +1,170 @@

+"""SpaceDebris Localizer - Gradio application.
+Uses nvidia/LocateAnything-3B to locate space debris, satellite fragments,
+and spacecraft components in space imagery.
+"""
+from __future__ import annotations
+import logging
+import os
+import sys
+import gradio as gr
+from PIL import Image
+from src.config import APP_SUBTITLE, APP_TITLE
+from src.inference import LocateAnythingWorker, run_localization
+from src.parsing import ParseResult
+from src.prompts import SPACE_DEBRIS_EXAMPLES, get_example_prompts
+from src.utils import ensure_rgb, format_json_output, format_metadata, validate_image
+logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")
+logger = logging.getLogger(__name__)
+worker: LocateAnythingWorker | None = None
+def get_worker() -> LocateAnythingWorker:
+    """Lazy-load the model worker on first use."""
+    global worker
+    if worker is None:
+        logger.info("Loading LocateAnything-3B model...")
+        worker = LocateAnythingWorker()
+        worker.load()
+        logger.info("Model loaded successfully.")
+    return worker
+def run_inference(
+    image: Image.Image | None,
+    prompt: str,
+) -> tuple[Image.Image | None, str, str, str, str]:
+    """Main inference function for Gradio interface.
+    Returns:
+        (annotated_image, metadata, raw_output, json_output, status_message)
+    """
+    is_valid, error_msg = validate_image(image)
+    if not is_valid:
+        return None, "", "", "", f"Error: {error_msg}"
+    if not prompt or not prompt.strip():
+        return None, "", "", "", "Error: Please enter a detection prompt."
+    try:
+        image_rgb = ensure_rgb(image)
+        w = get_worker()
+        annotated, raw_output, parsed = run_localization(image_rgb, prompt.strip(), worker=w)
+        metadata = format_metadata(parsed)
+        json_out = format_json_output(parsed)
+        import json
+        json_str = json.dumps(json_out, indent=2, ensure_ascii=False)
+        status = f"Done. Found {parsed.num_detections} object(s)."
+        if parsed.parse_errors:
+            status += f" ({len(parsed.parse_errors)} warning(s))"
+        return annotated, metadata, raw_output, json_str, status
+    except Exception as exc:
+        logger.exception("Inference failed")
+        return None, "", "", "", f"Inference error: {exc}"
+def build_app() -> gr.Blocks:
+    """Build the Gradio Blocks interface."""
+    with gr.Blocks(
+        title=APP_TITLE,
+        theme=gr.themes.Soft(),
+        css="""
+        .main-title { text-align: center; margin-bottom: 0; }
+        .subtitle { text-align: center; color: #666; margin-top: 0; }
+        .footer { text-align: center; color: #999; font-size: 0.85em; margin-top: 20px; }
+        """,
+    ) as app:
+        gr.HTML(f"""
+            <h1 class="main-title">{APP_TITLE}</h1>
+            <p class="subtitle">{APP_SUBTITLE}</p>
+        """)
+        gr.Markdown("""
+        > **How it works:** Upload a space or satellite image and enter a natural-language
+        > prompt describing what to locate. The model grounds your query in the image and
+        > returns bounding box coordinates. Detection quality depends on image resolution,
+        > object visibility, and model grounding capability.
+        """)
+        with gr.Row():
+            with gr.Column(scale=1):
+                input_image = gr.Image(type="pil", label="Upload Image")
+                prompt_input = gr.Textbox(
+                    label="Detection Prompt",
+                    placeholder="e.g. Locate all the instances that match the following description: space debris.",
+                    lines=2,
+                )
+                run_btn = gr.Button("Run Localization", variant="primary", size="lg")
+                status_text = gr.Textbox(label="Status", interactive=False, lines=1)
+            with gr.Column(scale=1):
+                output_image = gr.Image(type="pil", label="Annotated Image")
+                with gr.Tabs():
+                    with gr.TabItem("Metadata"):
+                        metadata_output = gr.Textbox(label="Detection Metadata", lines=6, interactive=False)
+                    with gr.TabItem("Raw Output"):
+                        raw_output = gr.Textbox(label="Raw Model Output", lines=8, interactive=False, show_copy_button=True)
+                    with gr.TabItem("JSON Output"):
+                        json_output = gr.Code(label="Parsed JSON", language="json", lines=8)
+        gr.Markdown("### Example Prompts")
+        gr.Markdown("Click an example to load it into the prompt field.")
+        examples_list = get_example_prompts()
+        gr.Examples(
+            examples=examples_list,
+            inputs=[prompt_input],
+            label="Space Debris Prompts",
+        )
+        with gr.Accordion("About This Project", open=False):
+            gr.Markdown("""
+            **SpaceDebris Localizer** is a hackathon prototype demonstrating how NVIDIA's
+            **LocateAnything-3B** vision-language model can be applied to orbital debris
+            localization and satellite component identification.
+            ### Capabilities
+            - Open-set object detection from natural-language prompts
+            - Bounding-box grounding for arbitrary visual concepts
+            - Structured output with pixel-coordinate parsing
+            ### Limitations
+            - The model was trained on general grounding data, not specifically orbital imagery
+            - Detection quality depends heavily on image resolution and object clarity
+            - Small debris fragments may not be reliably detected
+            - This is a proof-of-concept, not a production debris tracking system
+            ### Model
+            - [nvidia/LocateAnything-3B](https://huggingface.co/nvidia/LocateAnything-3B) on Hugging Face
+            - 3B parameter vision-language model with Parallel Box Decoding
+            - Coordinates are normalized to [0, 1000] and converted to pixel space
+            """)
+        gr.HTML('<p class="footer">Powered by nvidia/LocateAnything-3B | SpaceDebris Localizer</p>')
+        run_btn.click(
+            fn=run_inference,
+            inputs=[input_image, prompt_input],
+            outputs=[output_image, metadata_output, raw_output, json_output, status_text],
+        )
+        prompt_input.submit(
+            fn=run_inference,
+            inputs=[input_image, prompt_input],
+            outputs=[output_image, metadata_output, raw_output, json_output, status_text],
+        )
+    return app
+if __name__ == "__main__":
+    app = build_app()
+    app.launch(server_name="0.0.0.0", server_port=int(os.getenv("PORT", "7860")))

assets/demo_placeholder.png ADDED Viewed

examples/sample_queries.md ADDED Viewed

	@@ -0,0 +1,34 @@

+# Example Queries for SpaceDebris Localizer
+## Single Object Grounding
+| Prompt | Description |
+|--------|-------------|
+| `Locate a single instance that matches the following description: spacecraft.` | Find one spacecraft |
+| `Locate a single instance that matches the following description: solar panel.` | Find one solar panel |
+| `Locate a single instance that matches the following description: antenna.` | Find one antenna |
+## Multi-Object Detection
+| Prompt | Description |
+|--------|-------------|
+| `Locate all the instances that match the following description: space debris.` | Find all debris fragments |
+| `Locate all the instances that match the following description: satellite fragment.` | Find all satellite pieces |
+| `Locate all the instances that match the following description: solar panel.` | Find all solar panels |
+| `Locate all the instances that match the following description: rocket body.` | Find all rocket stages |
+| `Locate all the instances that match the following description: thermal blanket.` | Find all thermal blankets |
+## Multi-Category Detection
+| Prompt | Description |
+|--------|-------------|
+| `Locate all the instances that matches the following description: debris</c>antenna</c>solar panel.` | Find debris, antennas, and panels |
+| `Locate all the instances that matches the following description: spacecraft</c>satellite fragment.` | Find spacecraft and fragments |
+## Tips
+- Be specific with object descriptions for better grounding results
+- Use `all the instances` when you expect multiple objects
+- Use `a single instance` when targeting one specific object
+- Higher resolution images generally produce better results
+- The model works best with clearly visible, well-lit objects

pyproject.toml ADDED Viewed

	@@ -0,0 +1,44 @@

+[project]
+name = "space-debris-localizer"
+version = "1.0.0"
+description = "Locate space debris, satellite fragments, and spacecraft components in orbital imagery using NVIDIA LocateAnything-3B"
+readme = "README.md"
+license = {text = "MIT"}
+requires-python = ">=3.10"
+dependencies = [
+    "transformers>=4.57.0",
+    "torch>=2.0.0",
+    "torchvision",
+    "Pillow>=11.0.0",
+    "numpy>=1.25.0",
+    "opencv-python-headless>=4.11.0",
+    "gradio>=5.0.0",
+    "peft",
+    "decord>=0.6.0",
+    "lmdb>=1.7.5",
+    "python-dotenv>=1.0.0",
+]
+[project.optional-dependencies]
+dev = [
+    "ruff>=0.4.0",
+    "black>=24.0.0",
+    "pytest>=8.0.0",
+]
+[tool.black]
+line-length = 100
+target-version = ["py310"]
+[tool.ruff]
+line-length = 100
+target-version = "py310"
+[tool.ruff.lint]
+select = ["E", "F", "W", "I", "N", "UP", "B"]
+ignore = ["E501"]
+[tool.pytest.ini_options]
+testpaths = ["tests"]
+python_files = ["test_*.py"]
+python_functions = ["test_*"]

requirements.txt ADDED Viewed

	@@ -0,0 +1,14 @@

+transformers>=4.57.0
+torch>=2.0.0
+torchvision
+Pillow>=11.0.0
+numpy>=1.25.0
+opencv-python-headless>=4.11.0
+gradio>=5.0.0
+peft
+decord>=0.6.0
+lmdb>=1.7.5
+ruff>=0.4.0
+black>=24.0.0
+pytest>=8.0.0
+python-dotenv>=1.0.0

src/__init__.py ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ """SpaceDebris Localizer - Locate space debris and satellite components in orbital imagery."""
2	+
3	+ __version__ = "1.0.0"

src/config.py ADDED Viewed

	@@ -0,0 +1,17 @@

+"""Configuration constants for SpaceDebris Localizer."""
+import os
+MODEL_ID: str = os.getenv("MODEL_ID", "nvidia/LocateAnything-3B")
+DEVICE: str = os.getenv("DEVICE", "cuda")
+DTYPE: str = os.getenv("DTYPE", "bfloat16")
+MAX_NEW_TOKENS: int = int(os.getenv("MAX_NEW_TOKENS", "8192"))
+GENERATION_MODE: str = os.getenv("GENERATION_MODE", "hybrid")
+TEMPERATURE: float = float(os.getenv("TEMPERATURE", "0.7"))
+COORD_MAX: int = 1000
+DEFAULT_CONFIDENCE: float = 0.85
+APP_TITLE: str = "SpaceDebris Localizer"
+APP_SUBTITLE: str = (
+    "Use LocateAnything-3B to ground debris, satellite fragments, "
+    "and spacecraft components in space imagery."
+)

src/inference.py ADDED Viewed

	@@ -0,0 +1,168 @@

+"""Inference wrapper for nvidia/LocateAnything-3B."""
+from __future__ import annotations
+import re
+from typing import Any
+import torch
+from PIL import Image
+from transformers import AutoModel, AutoProcessor, AutoTokenizer
+from src.config import (
+    COORD_MAX,
+    DEFAULT_CONFIDENCE,
+    DEVICE,
+    DTYPE,
+    GENERATION_MODE,
+    MAX_NEW_TOKENS,
+    MODEL_ID,
+    TEMPERATURE,
+)
+from src.parsing import BBox, ParseResult, parse_boxes
+class LocateAnythingWorker:
+    """Stateful worker that loads LocateAnything-3B once and serves queries."""
+    def __init__(
+        self,
+        model_path: str = MODEL_ID,
+        device: str = DEVICE,
+        dtype_str: str = DTYPE,
+    ) -> None:
+        self.device = device
+        self.dtype = getattr(torch, dtype_str, torch.bfloat16)
+        self.model_path = model_path
+        self._loaded = False
+        self.tokenizer = None
+        self.processor = None
+        self.model = None
+    def load(self) -> None:
+        """Load model, tokenizer, and processor. Call once at startup."""
+        if self._loaded:
+            return
+        self.tokenizer = AutoTokenizer.from_pretrained(self.model_path, trust_remote_code=True)
+        self.processor = AutoProcessor.from_pretrained(self.model_path, trust_remote_code=True)
+        self.model = (
+            AutoModel.from_pretrained(
+                self.model_path,
+                torch_dtype=self.dtype,
+                trust_remote_code=True,
+            )
+            .to(self.device)
+            .eval()
+        )
+        self._loaded = True
+    @torch.no_grad()
+    def predict(
+        self,
+        image: Image.Image,
+        question: str,
+        generation_mode: str = GENERATION_MODE,
+        max_new_tokens: int = MAX_NEW_TOKENS,
+        temperature: float = TEMPERATURE,
+    ) -> dict[str, Any]:
+        """Run inference on an image with a text prompt.
+        Returns dict with 'answer', optionally 'history' and 'stats'.
+        """
+        if not self._loaded:
+            self.load()
+        messages = [
+            {
+                "role": "user",
+                "content": [
+                    {"type": "image", "image": image},
+                    {"type": "text", "text": question},
+                ],
+            }
+        ]
+        text = self.processor.py_apply_chat_template(
+            messages, tokenize=False, add_generation_prompt=True
+        )
+        images, videos = self.processor.process_vision_info(messages)
+        inputs = self.processor(
+            text=[text], images=images, videos=videos, return_tensors="pt"
+        ).to(self.device)
+        pixel_values = inputs["pixel_values"].to(self.dtype)
+        input_ids = inputs["input_ids"]
+        image_grid_hws = inputs.get("image_grid_hws", None)
+        response = self.model.generate(
+            pixel_values=pixel_values,
+            input_ids=input_ids,
+            attention_mask=inputs["attention_mask"],
+            image_grid_hws=image_grid_hws,
+            tokenizer=self.tokenizer,
+            max_new_tokens=max_new_tokens,
+            use_cache=True,
+            generation_mode=generation_mode,
+            temperature=temperature,
+            do_sample=True,
+            top_p=0.9,
+            repetition_penalty=1.1,
+            verbose=False,
+        )
+        result: dict[str, Any] = {"answer": response[0] if isinstance(response, tuple) else response}
+        if isinstance(response, tuple) and len(response) >= 3:
+            result["history"] = response[1]
+            result["stats"] = response[2]
+        return result
+    def detect(self, image: Image.Image, categories: list[str], **kwargs: Any) -> dict[str, Any]:
+        """Object detection with multiple categories."""
+        cats = "</c>".join(categories)
+        prompt = f"Locate all the instances that matches the following description: {cats}."
+        return self.predict(image, prompt, **kwargs)
+    def ground_single(self, image: Image.Image, phrase: str, **kwargs: Any) -> dict[str, Any]:
+        """Phrase grounding — single instance."""
+        prompt = f"Locate a single instance that matches the following description: {phrase}."
+        return self.predict(image, prompt, **kwargs)
+    def ground_multi(self, image: Image.Image, phrase: str, **kwargs: Any) -> dict[str, Any]:
+        """Phrase grounding — multiple instances."""
+        prompt = f"Locate all the instances that match the following description: {phrase}."
+        return self.predict(image, prompt, **kwargs)
+def run_localization(
+    image: Image.Image,
+    prompt: str,
+    worker: LocateAnythingWorker | None = None,
+) -> tuple[Image.Image, str, ParseResult]:
+    """High-level entry point: run localization and return annotated image + results.
+    Args:
+        image: Input PIL image.
+        prompt: Natural language prompt.
+        worker: Pre-loaded worker instance. If None, creates and loads one.
+    Returns:
+        Tuple of (annotated_image, raw_output, parse_result).
+    """
+    from src.visualization import draw_boxes, create_no_detection_overlay
+    if worker is None:
+        worker = LocateAnythingWorker()
+        worker.load()
+    result = worker.predict(image, prompt)
+    raw_output = result.get("answer", "")
+    img_w, img_h = image.size
+    parsed = parse_boxes(raw_output, img_w, img_h)
+    if parsed.boxes:
+        annotated = draw_boxes(image, parsed.boxes)
+    else:
+        annotated = create_no_detection_overlay(image)
+    return annotated, raw_output, parsed

src/parsing.py ADDED Viewed

	@@ -0,0 +1,168 @@

+"""Output parsing for LocateAnything-3B bounding box responses."""
+from __future__ import annotations
+import re
+from dataclasses import dataclass, field
+from typing import Any
+from src.config import COORD_MAX, DEFAULT_CONFIDENCE
+@dataclass
+class BBox:
+    """A parsed bounding box in pixel coordinates."""
+    x1: float
+    y1: float
+    x2: float
+    y2: float
+    confidence: float = DEFAULT_CONFIDENCE
+    label: str = ""
+    @property
+    def width(self) -> float:
+        return max(0.0, self.x2 - self.x1)
+    @property
+    def height(self) -> float:
+        return max(0.0, self.y2 - self.y1)
+    @property
+    def area(self) -> float:
+        return self.width * self.height
+    @property
+    def center(self) -> tuple[float, float]:
+        return ((self.x1 + self.x2) / 2, (self.y1 + self.y2) / 2)
+    def is_valid(self, img_w: int, img_h: int) -> bool:
+        """Check if box is within image bounds and has positive area."""
+        return (
+            self.x1 >= 0
+            and self.y1 >= 0
+            and self.x2 <= img_w + 1
+            and self.y2 <= img_h + 1
+            and self.width > 1
+            and self.height > 1
+        )
+    def clamp(self, img_w: int, img_h: int) -> BBox:
+        """Return a clamped copy within image bounds."""
+        return BBox(
+            x1=max(0, min(self.x1, img_w)),
+            y1=max(0, min(self.y1, img_h)),
+            x2=max(0, min(self.x2, img_w)),
+            y2=max(0, min(self.y2, img_h)),
+            confidence=self.confidence,
+            label=self.label,
+        )
+    def to_dict(self) -> dict[str, Any]:
+        return {
+            "x1": round(self.x1, 2),
+            "y1": round(self.y1, 2),
+            "x2": round(self.x2, 2),
+            "y2": round(self.y2, 2),
+            "width": round(self.width, 2),
+            "height": round(self.height, 2),
+            "confidence": self.confidence,
+            "label": self.label,
+        }
+@dataclass
+class ParseResult:
+    """Structured result from parsing model output."""
+    boxes: list[BBox] = field(default_factory=list)
+    raw_output: str = ""
+    parse_errors: list[str] = field(default_factory=list)
+    @property
+    def num_detections(self) -> int:
+        return len(self.boxes)
+    def to_dict(self) -> dict[str, Any]:
+        return {
+            "num_detections": self.num_detections,
+            "boxes": [b.to_dict() for b in self.boxes],
+            "raw_output": self.raw_output,
+            "parse_errors": self.parse_errors,
+        }
+BOX_PATTERN_4 = re.compile(r"<box><(\d+)><(\d+)><(\d+)><(\d+)></box>")
+BOX_PATTERN_4_ALT = re.compile(r"<box>\s*(\d+)\s*,\s*(\d+)\s*,\s*(\d+)\s*,\s*(\d+)\s*</box>")
+BOX_PATTERN_2 = re.compile(r"<box><(\d+)><(\d+)></box>")
+def _norm_to_pixel(val: int, scale: int) -> float:
+    """Convert normalized [0, 1000] coordinate to pixel coordinate."""
+    return val / COORD_MAX * scale
+def parse_boxes(
+    raw_output: str,
+    image_width: int,
+    image_height: int,
+) -> ParseResult:
+    """Parse model output into structured bounding boxes.
+    The model outputs coordinates normalized to [0, 1000].
+    This function converts them to pixel coordinates.
+    """
+    result = ParseResult(raw_output=raw_output)
+    seen: set[tuple[float, float, float, float]] = set()
+    for match in BOX_PATTERN_4.finditer(raw_output):
+        try:
+            x1 = _norm_to_pixel(int(match.group(1)), image_width)
+            y1 = _norm_to_pixel(int(match.group(2)), image_height)
+            x2 = _norm_to_pixel(int(match.group(3)), image_width)
+            y2 = _norm_to_pixel(int(match.group(4)), image_height)
+            key = (round(x1, 1), round(y1, 1), round(x2, 1), round(y2, 1))
+            if key not in seen:
+                seen.add(key)
+                box = BBox(x1=x1, y1=y1, x2=x2, y2=y2)
+                if box.is_valid(image_width, image_height):
+                    result.boxes.append(box)
+                else:
+                    result.parse_errors.append(f"Out-of-bounds box discarded: {key}")
+        except (ValueError, IndexError) as exc:
+            result.parse_errors.append(f"Failed to parse box: {exc}")
+    if not result.boxes:
+        for match in BOX_PATTERN_4_ALT.finditer(raw_output):
+            try:
+                x1 = _norm_to_pixel(int(match.group(1)), image_width)
+                y1 = _norm_to_pixel(int(match.group(2)), image_height)
+                x2 = _norm_to_pixel(int(match.group(3)), image_width)
+                y2 = _norm_to_pixel(int(match.group(4)), image_height)
+                key = (round(x1, 1), round(y1, 1), round(x2, 1), round(y2, 1))
+                if key not in seen:
+                    seen.add(key)
+                    box = BBox(x1=x1, y1=y1, x2=x2, y2=y2)
+                    if box.is_valid(image_width, image_height):
+                        result.boxes.append(box)
+            except (ValueError, IndexError) as exc:
+                result.parse_errors.append(f"Failed to parse alt box: {exc}")
+    return result
+def parse_points(
+    raw_output: str,
+    image_width: int,
+    image_height: int,
+) -> list[dict[str, float]]:
+    """Parse model output into pixel-coordinate points."""
+    points = []
+    for match in BOX_PATTERN_2.finditer(raw_output):
+        try:
+            x = _norm_to_pixel(int(match.group(1)), image_width)
+            y = _norm_to_pixel(int(match.group(2)), image_height)
+            points.append({"x": round(x, 2), "y": round(y, 2)})
+        except (ValueError, IndexError):
+            pass
+    return points

src/prompts.py ADDED Viewed

	@@ -0,0 +1,108 @@

+"""Prompt templates for space debris localization tasks."""
+from dataclasses import dataclass
+@dataclass(frozen=True)
+class PromptTemplate:
+    """A reusable prompt template with metadata."""
+    name: str
+    template: str
+    description: str
+    category: str
+DETECTION_TEMPLATES: list[PromptTemplate] = [
+    PromptTemplate(
+        name="debris_single",
+        template="Locate a single instance that matches the following description: {phrase}.",
+        description="Locate one instance of a specific object",
+        category="grounding",
+    ),
+    PromptTemplate(
+        name="debris_multi",
+        template="Locate all the instances that match the following description: {phrase}.",
+        description="Locate all instances of a specific object type",
+        category="detection",
+    ),
+    PromptTemplate(
+        name="debris_categories",
+        template="Locate all the instances that matches the following description: {categories}.",
+        description="Detect multiple object categories at once",
+        category="detection",
+    ),
+    PromptTemplate(
+        name="text_grounding",
+        template="Please locate the text referred as {phrase}.",
+        description="Locate text labels or markings in the image",
+        category="text",
+    ),
+    PromptTemplate(
+        name="scene_text",
+        template="Detect all the text in box format.",
+        description="Detect all visible text in the scene",
+        category="text",
+    ),
+]
+SPACE_DEBRIS_EXAMPLES: list[dict[str, str]] = [
+    {
+        "phrase": "space debris",
+        "prompt": "Locate all the instances that match the following description: space debris.",
+        "description": "Find all visible space debris fragments",
+    },
+    {
+        "phrase": "satellite fragment",
+        "prompt": "Locate all the instances that match the following description: satellite fragment.",
+        "description": "Identify broken satellite pieces",
+    },
+    {
+        "phrase": "solar panel",
+        "prompt": "Locate all the instances that match the following description: solar panel.",
+        "description": "Find satellite solar panels",
+    },
+    {
+        "phrase": "antenna",
+        "prompt": "Locate all the instances that match the following description: antenna.",
+        "description": "Locate spacecraft antennas",
+    },
+    {
+        "phrase": "rocket body",
+        "prompt": "Locate all the instances that match the following description: rocket body.",
+        "description": "Find spent rocket stages",
+    },
+    {
+        "phrase": "spacecraft",
+        "prompt": "Locate a single instance that matches the following description: spacecraft.",
+        "description": "Locate a single spacecraft",
+    },
+    {
+        "phrase": "debris field",
+        "prompt": "Locate all the instances that match the following description: debris field.",
+        "description": "Find clusters of orbital debris",
+    },
+    {
+        "phrase": "thermal blanket",
+        "prompt": "Locate all the instances that match the following description: thermal blanket.",
+        "description": "Find loose thermal insulation material",
+    },
+]
+def build_detect_prompt(categories: list[str]) -> str:
+    """Build a multi-category detection prompt."""
+    joined = "</c>".join(categories)
+    return f"Locate all the instances that matches the following description: {joined}."
+def build_grounding_prompt(phrase: str, *, single: bool = False) -> str:
+    """Build a phrase grounding prompt."""
+    if single:
+        return f"Locate a single instance that matches the following description: {phrase}."
+    return f"Locate all the instances that match the following description: {phrase}."
+def get_example_prompts() -> list[list[str]]:
+    """Return example prompts for Gradio examples component."""
+    return [[ex["prompt"]] for ex in SPACE_DEBRIS_EXAMPLES]

src/utils.py ADDED Viewed

	@@ -0,0 +1,62 @@

+"""Utility functions for SpaceDebris Localizer."""
+from __future__ import annotations
+import io
+import logging
+from typing import Any
+from PIL import Image
+logger = logging.getLogger(__name__)
+def validate_image(image: Any) -> tuple[bool, str]:
+    """Validate that an input is a usable PIL image.
+    Returns:
+        Tuple of (is_valid, error_message).
+    """
+    if image is None:
+        return False, "No image provided. Please upload an image."
+    if not isinstance(image, Image.Image):
+        return False, "Invalid image format. Please upload a valid image file."
+    if image.mode not in ("RGB", "RGBA", "L"):
+        return False, f"Unsupported image mode: {image.mode}. Use RGB or grayscale."
+    w, h = image.size
+    if w < 32 or h < 32:
+        return False, f"Image too small ({w}x{h}). Minimum 32x32 pixels."
+    if w > 8192 or h > 8192:
+        return False, f"Image too large ({w}x{h}). Maximum 8192x8192 pixels."
+    return True, ""
+def ensure_rgb(image: Image.Image) -> Image.Image:
+    """Convert image to RGB if needed."""
+    if image.mode == "RGBA":
+        background = Image.new("RGB", image.size, (0, 0, 0))
+        background.paste(image, mask=image.split()[3])
+        return background
+    if image.mode == "L":
+        return image.convert("RGB")
+    if image.mode != "RGB":
+        return image.convert("RGB")
+    return image
+def format_metadata(parse_result: Any) -> str:
+    """Format parse result metadata as human-readable text."""
+    lines = [
+        f"Detected objects: {parse_result.num_detections}",
+        f"Raw output length: {len(parse_result.raw_output)} chars",
+    ]
+    if parse_result.parse_errors:
+        lines.append(f"Parse warnings: {len(parse_result.parse_errors)}")
+        for err in parse_result.parse_errors[:5]:
+            lines.append(f"  - {err}")
+    return "\n".join(lines)
+def format_json_output(parse_result: Any) -> dict[str, Any]:
+    """Return JSON-serializable dict from parse result."""
+    return parse_result.to_dict()

src/visualization.py ADDED Viewed

	@@ -0,0 +1,119 @@

+"""Visualization utilities for drawing bounding boxes on images."""
+from __future__ import annotations
+from typing import TYPE_CHECKING
+import numpy as np
+from PIL import Image, ImageDraw, ImageFont
+if TYPE_CHECKING:
+    from src.parsing import BBox
+BOX_COLORS = [
+    "#FF0000", "#00FF00", "#0000FF", "#FFFF00", "#FF00FF", "#00FFFF",
+    "#FF8800", "#8800FF", "#00FF88", "#FF0088", "#88FF00", "#0088FF",
+]
+MIN_BOX_SIZE = 4
+def _get_font(size: int = 14) -> ImageFont.FreeTypeFont | ImageFont.ImageFont:
+    """Try to load a reasonable font, fall back to default."""
+    try:
+        return ImageFont.truetype("arial.ttf", size)
+    except (OSError, IOError):
+        try:
+            return ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", size)
+        except (OSError, IOError):
+            return ImageFont.load_default()
+def _hex_to_rgba(hex_color: str, alpha: int = 80) -> tuple[int, int, int, int]:
+    """Convert hex color to RGBA tuple."""
+    h = hex_color.lstrip("#")
+    r, g, b = int(h[0:2], 16), int(h[2:4], 16), int(h[4:6], 16)
+    return (r, g, b, alpha)
+def draw_boxes(
+    image: Image.Image,
+    boxes: list[BBox],
+    labels: list[str] | None = None,
+    show_confidence: bool = True,
+    line_width: int = 3,
+    font_size: int = 14,
+) -> Image.Image:
+    """Draw bounding boxes with labels on an image.
+    Args:
+        image: Source PIL image.
+        boxes: List of BBox objects in pixel coordinates.
+        labels: Optional per-box labels. If None, uses box.label or index.
+        show_confidence: Whether to show confidence score in label.
+        line_width: Width of bounding box outlines.
+        font_size: Font size for labels.
+    Returns:
+        New image with drawn overlays.
+    """
+    if not boxes:
+        return image.copy()
+    img = image.copy().convert("RGBA")
+    overlay = Image.new("RGBA", img.size, (0, 0, 0, 0))
+    draw_overlay = ImageDraw.Draw(overlay)
+    draw_text = ImageDraw.Draw(img)
+    font = _get_font(font_size)
+    img_w, img_h = img.size
+    for i, box in enumerate(boxes):
+        color_hex = BOX_COLORS[i % len(BOX_COLORS)]
+        fill_rgba = _hex_to_rgba(color_hex, alpha=50)
+        outline_rgb = color_hex
+        bx1, by1 = max(0, box.x1), max(0, box.y1)
+        bx2, by2 = min(img_w, box.x2), min(img_h, box.y2)
+        if (bx2 - bx1) < MIN_BOX_SIZE or (by2 - by1) < MIN_BOX_SIZE:
+            cx, cy = (bx1 + bx2) / 2, (by1 + by2) / 2
+            half = MIN_BOX_SIZE
+            bx1, by1 = cx - half, cy - half
+            bx2, by2 = cx + half, cy + half
+        draw_overlay.rectangle([bx1, by1, bx2, by2], fill=fill_rgba, outline=outline_rgb, width=line_width)
+        label = labels[i] if labels and i < len(labels) else (box.label or f"#{i+1}")
+        if show_confidence and box.confidence > 0:
+            label = f"{label} ({box.confidence:.0%})"
+        text_bbox = draw_text.textbbox((0, 0), label, font=font)
+        text_w = text_bbox[2] - text_bbox[0]
+        text_h = text_bbox[3] - text_bbox[1]
+        text_y = by1 - text_h - 4 if by1 - text_h - 4 > 0 else by1 + 4
+        text_x = max(0, bx1)
+        draw_text.rectangle(
+            [text_x, text_y, text_x + text_w + 6, text_y + text_h + 4],
+            fill=color_hex,
+        )
+        draw_text.text((text_x + 3, text_y + 2), label, fill="white", font=font)
+    img = Image.alpha_composite(img, overlay).convert("RGB")
+    return img
+def create_no_detection_overlay(image: Image.Image, message: str = "No detections found") -> Image.Image:
+    """Create an overlay indicating no objects were detected."""
+    img = image.copy()
+    draw = ImageDraw.Draw(img)
+    font = _get_font(18)
+    text_bbox = draw.textbbox((0, 0), message, font=font)
+    text_w = text_bbox[2] - text_bbox[0]
+    text_h = text_bbox[3] - text_bbox[1]
+    img_w, img_h = img.size
+    x = (img_w - text_w) / 2
+    y = img_h - text_h - 20
+    draw.rectangle([x - 10, y - 5, x + text_w + 10, y + text_h + 5], fill=(0, 0, 0, 180))
+    draw.text((x, y), message, fill="yellow", font=font)
+    return img

tests/test_app_smoke.py ADDED Viewed

	@@ -0,0 +1,46 @@

+"""Smoke tests for the Gradio app — import-level checks only."""
+import pytest
+def test_app_module_imports():
+    """Verify lightweight source modules can be imported without error."""
+    import src.config
+    import src.parsing
+    import src.prompts
+    import src.utils
+def test_config_values():
+    from src.config import APP_TITLE, APP_SUBTITLE, MODEL_ID, COORD_MAX
+    assert MODEL_ID == "nvidia/LocateAnything-3B"
+    assert COORD_MAX == 1000
+    assert len(APP_TITLE) > 0
+    assert len(APP_SUBTITLE) > 0
+def test_visualization_import():
+    """Verify visualization module imports."""
+    import src.visualization
+    assert hasattr(src.visualization, "draw_boxes")
+    assert hasattr(src.visualization, "create_no_detection_overlay")
+def test_app_build_callable():
+    """Verify the Gradio app builder is importable and callable."""
+    try:
+        from app import build_app
+        assert callable(build_app)
+    except ImportError:
+        pytest.skip("Gradio not available in test environment")
+def test_inference_module_imports():
+    """Verify inference module structure without heavy imports."""
+    try:
+        from src.inference import LocateAnythingWorker
+        w = LocateAnythingWorker.__new__(LocateAnythingWorker)
+        assert not getattr(w, "_loaded", True)
+    except ImportError:
+        pytest.skip("transformers/torch not available in test environment")

tests/test_parsing.py ADDED Viewed

	@@ -0,0 +1,115 @@

+"""Tests for the output parsing module."""
+import pytest
+from src.parsing import BBox, ParseResult, parse_boxes, parse_points
+class TestBBox:
+    def test_valid_box_within_bounds(self):
+        box = BBox(x1=10, y1=20, x2=100, y2=200)
+        assert box.is_valid(640, 480)
+        assert box.width == 90
+        assert box.height == 180
+        assert box.area == 16200
+    def test_box_center(self):
+        box = BBox(x1=0, y1=0, x2=100, y2=100)
+        assert box.center == (50.0, 50.0)
+    def test_invalid_box_zero_area(self):
+        box = BBox(x1=50, y1=50, x2=50, y2=50)
+        assert not box.is_valid(640, 480)
+    def test_invalid_box_out_of_bounds(self):
+        box = BBox(x1=-10, y1=0, x2=100, y2=100)
+        assert not box.is_valid(640, 480)
+    def test_clamp(self):
+        box = BBox(x1=-10, y1=-5, x2=700, y2=500)
+        clamped = box.clamp(640, 480)
+        assert clamped.x1 == 0
+        assert clamped.y1 == 0
+        assert clamped.x2 == 640
+        assert clamped.y2 == 480
+    def test_to_dict(self):
+        box = BBox(x1=10, y1=20, x2=100, y2=200, confidence=0.9, label="test")
+        d = box.to_dict()
+        assert d["x1"] == 10
+        assert d["label"] == "test"
+        assert d["confidence"] == 0.9
+class TestParseBoxes:
+    def test_single_box(self):
+        raw = "<box><100><200><300><400></box>"
+        result = parse_boxes(raw, 1000, 1000)
+        assert result.num_detections == 1
+        assert result.boxes[0].x1 == 100.0
+        assert result.boxes[0].y1 == 200.0
+        assert result.boxes[0].x2 == 300.0
+        assert result.boxes[0].y2 == 400.0
+    def test_multiple_boxes(self):
+        raw = "<box><100><100><200><200></box> some text <box><500><500><600><600></box>"
+        result = parse_boxes(raw, 1000, 1000)
+        assert result.num_detections == 2
+    def test_duplicate_boxes_deduplicated(self):
+        raw = "<box><100><100><200><200></box> <box><100><100><200><200></box>"
+        result = parse_boxes(raw, 1000, 1000)
+        assert result.num_detections == 1
+    def test_no_boxes(self):
+        raw = "No objects detected in this image."
+        result = parse_boxes(raw, 1000, 1000)
+        assert result.num_detections == 0
+    def test_coordinate_scaling(self):
+        raw = "<box><500><500><1000><1000></box>"
+        result = parse_boxes(raw, 640, 480)
+        assert result.num_detections == 1
+        assert abs(result.boxes[0].x2 - 640.0) < 0.1
+        assert abs(result.boxes[0].y2 - 480.0) < 0.1
+    def test_out_of_bounds_box_discarded(self):
+        raw = "<box><999><999><1001><1001></box>"
+        result = parse_boxes(raw, 100, 100)
+        assert result.num_detections == 0
+        assert len(result.parse_errors) > 0
+    def test_alt_format(self):
+        raw = "<box>100, 200, 300, 400</box>"
+        result = parse_boxes(raw, 1000, 1000)
+        assert result.num_detections == 1
+class TestParsePoints:
+    def test_single_point(self):
+        raw = "<box><500><500></box>"
+        points = parse_points(raw, 1000, 1000)
+        assert len(points) == 1
+        assert points[0]["x"] == 500.0
+        assert points[0]["y"] == 500.0
+    def test_no_points(self):
+        raw = "nothing here"
+        points = parse_points(raw, 1000, 1000)
+        assert len(points) == 0
+class TestParseResult:
+    def test_empty_result(self):
+        r = ParseResult()
+        assert r.num_detections == 0
+        d = r.to_dict()
+        assert d["num_detections"] == 0
+    def test_result_with_boxes(self):
+        r = ParseResult(
+            boxes=[BBox(10, 20, 100, 200)],
+            raw_output="<box><10><20><100><200></box>",
+        )
+        assert r.num_detections == 1
+        d = r.to_dict()
+        assert len(d["boxes"]) == 1

tests/test_prompts.py ADDED Viewed

	@@ -0,0 +1,67 @@

+"""Tests for prompt templates."""
+import pytest
+from src.prompts import (
+    SPACE_DEBRIS_EXAMPLES,
+    DETECTION_TEMPLATES,
+    PromptTemplate,
+    build_detect_prompt,
+    build_grounding_prompt,
+    get_example_prompts,
+)
+class TestPromptTemplate:
+    def test_template_fields(self):
+        t = DETECTION_TEMPLATES[0]
+        assert isinstance(t, PromptTemplate)
+        assert t.name
+        assert t.template
+        assert t.description
+        assert t.category
+    def test_template_has_placeholder(self):
+        for t in DETECTION_TEMPLATES:
+            assert "{" in t.template or "Detect" in t.template
+class TestBuildPrompts:
+    def test_build_detect_prompt_single(self):
+        prompt = build_detect_prompt(["debris"])
+        assert "debris" in prompt
+        assert "Locate" in prompt
+    def test_build_detect_prompt_multiple(self):
+        prompt = build_detect_prompt(["debris", "antenna", "panel"])
+        assert "</c>" in prompt
+        assert "debris" in prompt
+        assert "antenna" in prompt
+    def test_build_grounding_prompt_multi(self):
+        prompt = build_grounding_prompt("solar panel")
+        assert "solar panel" in prompt
+        assert "all the instances" in prompt
+    def test_build_grounding_prompt_single(self):
+        prompt = build_grounding_prompt("spacecraft", single=True)
+        assert "spacecraft" in prompt
+        assert "single instance" in prompt
+class TestExamples:
+    def test_examples_not_empty(self):
+        assert len(SPACE_DEBRIS_EXAMPLES) > 0
+    def test_example_structure(self):
+        for ex in SPACE_DEBRIS_EXAMPLES:
+            assert "phrase" in ex
+            assert "prompt" in ex
+            assert "description" in ex
+    def test_get_example_prompts(self):
+        prompts = get_example_prompts()
+        assert len(prompts) == len(SPACE_DEBRIS_EXAMPLES)
+        for p in prompts:
+            assert len(p) == 1
+            assert isinstance(p[0], str)
+            assert len(p[0]) > 0

tests/test_visualization.py ADDED Viewed

	@@ -0,0 +1,61 @@

+"""Tests for the visualization module."""
+import pytest
+from PIL import Image
+from src.parsing import BBox
+from src.visualization import draw_boxes, create_no_detection_overlay, MIN_BOX_SIZE
+@pytest.fixture
+def sample_image():
+    return Image.new("RGB", (640, 480), color=(30, 30, 60))
+@pytest.fixture
+def sample_boxes():
+    return [
+        BBox(x1=50, y1=50, x2=200, y2=150, confidence=0.9, label="debris"),
+        BBox(x1=300, y1=200, x2=450, y2=350, confidence=0.75, label="satellite"),
+    ]
+class TestDrawBoxes:
+    def test_returns_image(self, sample_image, sample_boxes):
+        result = draw_boxes(sample_image, sample_boxes)
+        assert isinstance(result, Image.Image)
+        assert result.size == sample_image.size
+    def test_empty_boxes_returns_copy(self, sample_image):
+        result = draw_boxes(sample_image, [])
+        assert isinstance(result, Image.Image)
+        assert result.size == sample_image.size
+    def test_custom_labels(self, sample_image, sample_boxes):
+        labels = ["fragment", "panel"]
+        result = draw_boxes(sample_image, sample_boxes, labels=labels)
+        assert isinstance(result, Image.Image)
+    def test_tiny_boxes_expanded(self, sample_image):
+        tiny_boxes = [BBox(x1=100, y1=100, x2=101, y2=101)]
+        result = draw_boxes(sample_image, tiny_boxes)
+        assert isinstance(result, Image.Image)
+    def test_out_of_bounds_boxes_clipped(self, sample_image):
+        boxes = [BBox(x1=-50, y1=-50, x2=800, y2=600)]
+        result = draw_boxes(sample_image, boxes)
+        assert isinstance(result, Image.Image)
+    def test_no_confidence_display(self, sample_image, sample_boxes):
+        result = draw_boxes(sample_image, sample_boxes, show_confidence=False)
+        assert isinstance(result, Image.Image)
+class TestNoDetectionOverlay:
+    def test_returns_image(self, sample_image):
+        result = create_no_detection_overlay(sample_image)
+        assert isinstance(result, Image.Image)
+        assert result.size == sample_image.size
+    def test_custom_message(self, sample_image):
+        result = create_no_detection_overlay(sample_image, "Custom message")
+        assert isinstance(result, Image.Image)