3v324v23 commited on
Commit
23db765
·
0 Parent(s):

feat: initialize SpaceDebris Localizer project

Browse files

- Full project structure with src/, tests/, examples/, assets/
- LocateAnything-3B inference wrapper (src/inference.py)
- Bounding box parsing with normalized-to-pixel conversion (src/parsing.py)
- Space debris prompt templates (src/prompts.py)
- Image visualization with box drawing (src/visualization.py)
- Gradio UI with image upload, prompt input, annotated output (app.py)
- Comprehensive test suite for parsing, visualization, prompts
- CI workflow (ruff, black, pytest)
- HF Space sync workflow via GitHub Actions
- README with architecture, setup, deployment instructions

.env.example ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Environment Variables for SpaceDebris Localizer
2
+ # Copy this file to .env and fill in your values
3
+
4
+ # Hugging Face credentials (for GitHub Actions sync)
5
+ HF_TOKEN=
6
+ HF_USERNAME=
7
+ HF_SPACE_NAME=
8
+
9
+ # Model configuration
10
+ # MODEL_ID=nvidia/LocateAnything-3B
11
+ # DEVICE=cuda
12
+ # DTYPE=bfloat16
13
+ # MAX_NEW_TOKENS=8192
14
+ # GENERATION_MODE=hybrid
15
+ # TEMPERATURE=0.7
16
+ # PORT=7860
.github/workflows/ci.yml ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: CI
2
+
3
+ on:
4
+ push:
5
+ branches: [main]
6
+ pull_request:
7
+ branches: [main]
8
+
9
+ jobs:
10
+ lint-and-test:
11
+ runs-on: ubuntu-latest
12
+ strategy:
13
+ matrix:
14
+ python-version: ["3.10", "3.11"]
15
+
16
+ steps:
17
+ - uses: actions/checkout@v4
18
+
19
+ - name: Set up Python ${{ matrix.python-version }}
20
+ uses: actions/setup-python@v5
21
+ with:
22
+ python-version: ${{ matrix.python-version }}
23
+
24
+ - name: Install dependencies
25
+ run: |
26
+ python -m pip install --upgrade pip
27
+ pip install -e ".[dev]"
28
+
29
+ - name: Lint with ruff
30
+ run: ruff check src/ tests/ app.py
31
+
32
+ - name: Format check with black
33
+ run: black --check src/ tests/ app.py
34
+
35
+ - name: Run tests
36
+ run: pytest tests/ -v --tb=short
.github/workflows/sync-to-hf-space.yml ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Sync to Hugging Face Space
2
+
3
+ on:
4
+ push:
5
+ branches: [main]
6
+
7
+ jobs:
8
+ sync:
9
+ runs-on: ubuntu-latest
10
+ steps:
11
+ - uses: actions/checkout@v4
12
+ with:
13
+ fetch-depth: 0
14
+
15
+ - name: Push to Hugging Face Space
16
+ uses: cdanwards/action-push-to-hf-space@v2
17
+ with:
18
+ hf_token: ${{ secrets.HF_TOKEN }}
19
+ hf_space_name: ${{ secrets.HF_USERNAME }}/${{ secrets.HF_SPACE_NAME }}
20
+ branch: main
.gitignore ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ __pycache__/
2
+ *.py[cod]
3
+ *$py.class
4
+ *.egg-info/
5
+ dist/
6
+ build/
7
+ .eggs/
8
+ *.egg
9
+ .env
10
+ .venv/
11
+ venv/
12
+ env/
13
+ .pytest_cache/
14
+ .mypy_cache/
15
+ .ruff_cache/
16
+ *.safetensors
17
+ *.bin
18
+ *.pt
19
+ *.pth
20
+ .DS_Store
21
+ Thumbs.db
22
+ *.log
LICENSE ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ MIT License
2
+
3
+ Copyright (c) 2026 SpaceDebris Localizer Contributors
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
README.md ADDED
@@ -0,0 +1,144 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SpaceDebris Localizer
2
+
3
+ Use **NVIDIA LocateAnything-3B** to locate space debris, satellite fragments, and spacecraft components in orbital imagery.
4
+
5
+ Orbital debris is a growing threat to satellite operations and crewed spaceflight. This project demonstrates how state-of-the-art vision-language grounding models can be applied to identify and localize objects in space imagery — from satellite solar panels and antennas to rocket bodies and debris fields. Built as a Hugging Face Spaces application, it provides a natural-language interface: describe what you're looking for, and the model draws bounding boxes around matching objects in the image.
6
+
7
+ ## Why This Matters
8
+
9
+ There are over 36,000 tracked objects in Earth orbit, and millions of smaller fragments too tiny to track. Traditional detection pipelines require specialized training data and domain-specific models. Vision-language grounding models like LocateAnything-3B offer a different approach: describe the target in natural language and let the model find it. This prototype explores whether general-purpose visual grounding can serve as a rapid-deployment tool for orbital debris awareness, satellite inspection, and space situational awareness workflows.
10
+
11
+ ## Architecture
12
+
13
+ ```
14
+ User uploads image + text prompt
15
+
16
+
17
+ ┌─────────────────────┐
18
+ │ Gradio Interface │
19
+ │ (app.py) │
20
+ └────────┬────────────┘
21
+
22
+
23
+ ┌─────────────────────┐
24
+ │ LocateAnythingWorker│
25
+ │ (src/inference.py) │
26
+ │ ┌─────────────────┐│
27
+ │ │ nvidia/ ││
28
+ │ │ LocateAnything- ││
29
+ │ │ 3B (3B params) ││
30
+ │ └─────────────────┘│
31
+ └────────┬────────────┘
32
+ │ raw text with <box> tokens
33
+
34
+ ┌─────────────────────┐
35
+ │ Output Parser │
36
+ │ (src/parsing.py) │
37
+ │ Regex → BBox list │
38
+ └────────┬────────────┘
39
+ │ structured BBox objects
40
+
41
+ ┌─────────────────────┐
42
+ │ Visualizer │
43
+ │ (src/visualization) │
44
+ │ Draw boxes + labels │
45
+ └────────┬────────────┘
46
+
47
+
48
+ Annotated image + JSON metadata
49
+ ```
50
+
51
+ ## Setup
52
+
53
+ ### Prerequisites
54
+
55
+ - Python 3.10+
56
+ - CUDA-capable GPU (recommended) or CPU (slow)
57
+ - ~8GB GPU memory for bfloat16 inference
58
+
59
+ ### Local Installation
60
+
61
+ ```bash
62
+ git clone https://github.com/YOUR_USERNAME/space-debris-localizer.git
63
+ cd space-debris-localizer
64
+ pip install -e ".[dev]"
65
+ ```
66
+
67
+ ### Run Locally
68
+
69
+ ```bash
70
+ python app.py
71
+ ```
72
+
73
+ The app launches at `http://localhost:7860`. First run downloads the model (~6GB).
74
+
75
+ ### Environment Variables
76
+
77
+ | Variable | Default | Description |
78
+ |----------|---------|-------------|
79
+ | `MODEL_ID` | `nvidia/LocateAnything-3B` | HuggingFace model ID |
80
+ | `DEVICE` | `cuda` | Device (`cuda` or `cpu`) |
81
+ | `DTYPE` | `bfloat16` | Model precision |
82
+ | `MAX_NEW_TOKENS` | `8192` | Max generation tokens |
83
+ | `GENERATION_MODE` | `hybrid` | `fast`, `slow`, or `hybrid` |
84
+ | `PORT` | `7860` | Gradio server port |
85
+
86
+ ## Deployment to Hugging Face Spaces
87
+
88
+ ### Automatic Sync via GitHub Actions
89
+
90
+ 1. Create a Hugging Face Space at [huggingface.co/new-space](https://huggingface.co/new-space) (select Gradio SDK)
91
+ 2. Set these GitHub repository secrets:
92
+ - `HF_TOKEN` — your Hugging Face [access token](https://huggingface.co/settings/tokens)
93
+ - `HF_USERNAME` — your Hugging Face username
94
+ - `HF_SPACE_NAME` — your space name
95
+ 3. Push to `main`. GitHub Actions will sync the repo to your HF Space automatically.
96
+
97
+ ### Manual Push
98
+
99
+ ```bash
100
+ # Clone your HF Space repo
101
+ git clone https://huggingface.co/spaces/YOUR_USERNAME/space-debris-localizer
102
+ cd space-debris-localizer
103
+ # Copy project files
104
+ cp -r /path/to/space-debris-localizer/* .
105
+ git add . && git commit -m "deploy" && git push
106
+ ```
107
+
108
+ ## Example Prompts
109
+
110
+ - `Locate all the instances that match the following description: space debris.`
111
+ - `Locate all the instances that match the following description: solar panel.`
112
+ - `Locate a single instance that matches the following description: spacecraft.`
113
+ - `Locate all the instances that match the following description: antenna.`
114
+ - `Locate all the instances that match the following description: rocket body.`
115
+ - `Locate all the instances that match the following description: thermal blanket.`
116
+
117
+ ## Known Limitations
118
+
119
+ - **Domain gap:** The model was trained on general grounding data (COCO, LVIS, RefCOCO, etc.), not specifically on orbital imagery. Performance on space scenes is exploratory.
120
+ - **Small debris:** Objects below a few pixels are unlikely to be grounded reliably.
121
+ - **Image quality:** Detection depends heavily on image resolution and contrast.
122
+ - **No confidence calibration:** The model does not output calibrated confidence scores; displayed confidence is a placeholder.
123
+ - **GPU required:** CPU inference is extremely slow due to the 3B parameter size.
124
+
125
+ ## Future Work
126
+
127
+ - Fine-tune on orbital debris datasets (e.g., ESA's DISCOS, ESA Clean Space imagery)
128
+ - Integrate with real satellite imagery APIs (e.g., ESA Copernicus, Planet Labs)
129
+ - Add temporal tracking across image sequences
130
+ - Support video input for debris tracking
131
+ - Add point-based localization for centroid estimation
132
+ - Deploy with quantized model for faster CPU inference
133
+
134
+ ## Tech Stack
135
+
136
+ - **Model:** [nvidia/LocateAnything-3B](https://huggingface.co/nvidia/LocateAnything-3B)
137
+ - **Framework:** Gradio 5.x, Hugging Face Transformers
138
+ - **Language:** Python 3.10+
139
+ - **CI/CD:** GitHub Actions
140
+ - **Deployment:** Hugging Face Spaces
141
+
142
+ ## License
143
+
144
+ MIT License. The underlying LocateAnything-3B model is subject to the [NVIDIA License](https://huggingface.co/nvidia/LocateAnything-3B/blob/main/LICENSE) (non-commercial research use).
app.py ADDED
@@ -0,0 +1,170 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """SpaceDebris Localizer - Gradio application.
2
+
3
+ Uses nvidia/LocateAnything-3B to locate space debris, satellite fragments,
4
+ and spacecraft components in space imagery.
5
+ """
6
+
7
+ from __future__ import annotations
8
+
9
+ import logging
10
+ import os
11
+ import sys
12
+
13
+ import gradio as gr
14
+ from PIL import Image
15
+
16
+ from src.config import APP_SUBTITLE, APP_TITLE
17
+ from src.inference import LocateAnythingWorker, run_localization
18
+ from src.parsing import ParseResult
19
+ from src.prompts import SPACE_DEBRIS_EXAMPLES, get_example_prompts
20
+ from src.utils import ensure_rgb, format_json_output, format_metadata, validate_image
21
+
22
+ logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")
23
+ logger = logging.getLogger(__name__)
24
+
25
+ worker: LocateAnythingWorker | None = None
26
+
27
+
28
+ def get_worker() -> LocateAnythingWorker:
29
+ """Lazy-load the model worker on first use."""
30
+ global worker
31
+ if worker is None:
32
+ logger.info("Loading LocateAnything-3B model...")
33
+ worker = LocateAnythingWorker()
34
+ worker.load()
35
+ logger.info("Model loaded successfully.")
36
+ return worker
37
+
38
+
39
+ def run_inference(
40
+ image: Image.Image | None,
41
+ prompt: str,
42
+ ) -> tuple[Image.Image | None, str, str, str, str]:
43
+ """Main inference function for Gradio interface.
44
+
45
+ Returns:
46
+ (annotated_image, metadata, raw_output, json_output, status_message)
47
+ """
48
+ is_valid, error_msg = validate_image(image)
49
+ if not is_valid:
50
+ return None, "", "", "", f"Error: {error_msg}"
51
+
52
+ if not prompt or not prompt.strip():
53
+ return None, "", "", "", "Error: Please enter a detection prompt."
54
+
55
+ try:
56
+ image_rgb = ensure_rgb(image)
57
+ w = get_worker()
58
+ annotated, raw_output, parsed = run_localization(image_rgb, prompt.strip(), worker=w)
59
+
60
+ metadata = format_metadata(parsed)
61
+ json_out = format_json_output(parsed)
62
+ import json
63
+ json_str = json.dumps(json_out, indent=2, ensure_ascii=False)
64
+
65
+ status = f"Done. Found {parsed.num_detections} object(s)."
66
+ if parsed.parse_errors:
67
+ status += f" ({len(parsed.parse_errors)} warning(s))"
68
+
69
+ return annotated, metadata, raw_output, json_str, status
70
+
71
+ except Exception as exc:
72
+ logger.exception("Inference failed")
73
+ return None, "", "", "", f"Inference error: {exc}"
74
+
75
+
76
+ def build_app() -> gr.Blocks:
77
+ """Build the Gradio Blocks interface."""
78
+ with gr.Blocks(
79
+ title=APP_TITLE,
80
+ theme=gr.themes.Soft(),
81
+ css="""
82
+ .main-title { text-align: center; margin-bottom: 0; }
83
+ .subtitle { text-align: center; color: #666; margin-top: 0; }
84
+ .footer { text-align: center; color: #999; font-size: 0.85em; margin-top: 20px; }
85
+ """,
86
+ ) as app:
87
+ gr.HTML(f"""
88
+ <h1 class="main-title">{APP_TITLE}</h1>
89
+ <p class="subtitle">{APP_SUBTITLE}</p>
90
+ """)
91
+
92
+ gr.Markdown("""
93
+ > **How it works:** Upload a space or satellite image and enter a natural-language
94
+ > prompt describing what to locate. The model grounds your query in the image and
95
+ > returns bounding box coordinates. Detection quality depends on image resolution,
96
+ > object visibility, and model grounding capability.
97
+ """)
98
+
99
+ with gr.Row():
100
+ with gr.Column(scale=1):
101
+ input_image = gr.Image(type="pil", label="Upload Image")
102
+ prompt_input = gr.Textbox(
103
+ label="Detection Prompt",
104
+ placeholder="e.g. Locate all the instances that match the following description: space debris.",
105
+ lines=2,
106
+ )
107
+ run_btn = gr.Button("Run Localization", variant="primary", size="lg")
108
+ status_text = gr.Textbox(label="Status", interactive=False, lines=1)
109
+
110
+ with gr.Column(scale=1):
111
+ output_image = gr.Image(type="pil", label="Annotated Image")
112
+ with gr.Tabs():
113
+ with gr.TabItem("Metadata"):
114
+ metadata_output = gr.Textbox(label="Detection Metadata", lines=6, interactive=False)
115
+ with gr.TabItem("Raw Output"):
116
+ raw_output = gr.Textbox(label="Raw Model Output", lines=8, interactive=False, show_copy_button=True)
117
+ with gr.TabItem("JSON Output"):
118
+ json_output = gr.Code(label="Parsed JSON", language="json", lines=8)
119
+
120
+ gr.Markdown("### Example Prompts")
121
+ gr.Markdown("Click an example to load it into the prompt field.")
122
+ examples_list = get_example_prompts()
123
+ gr.Examples(
124
+ examples=examples_list,
125
+ inputs=[prompt_input],
126
+ label="Space Debris Prompts",
127
+ )
128
+
129
+ with gr.Accordion("About This Project", open=False):
130
+ gr.Markdown("""
131
+ **SpaceDebris Localizer** is a hackathon prototype demonstrating how NVIDIA's
132
+ **LocateAnything-3B** vision-language model can be applied to orbital debris
133
+ localization and satellite component identification.
134
+
135
+ ### Capabilities
136
+ - Open-set object detection from natural-language prompts
137
+ - Bounding-box grounding for arbitrary visual concepts
138
+ - Structured output with pixel-coordinate parsing
139
+
140
+ ### Limitations
141
+ - The model was trained on general grounding data, not specifically orbital imagery
142
+ - Detection quality depends heavily on image resolution and object clarity
143
+ - Small debris fragments may not be reliably detected
144
+ - This is a proof-of-concept, not a production debris tracking system
145
+
146
+ ### Model
147
+ - [nvidia/LocateAnything-3B](https://huggingface.co/nvidia/LocateAnything-3B) on Hugging Face
148
+ - 3B parameter vision-language model with Parallel Box Decoding
149
+ - Coordinates are normalized to [0, 1000] and converted to pixel space
150
+ """)
151
+
152
+ gr.HTML('<p class="footer">Powered by nvidia/LocateAnything-3B | SpaceDebris Localizer</p>')
153
+
154
+ run_btn.click(
155
+ fn=run_inference,
156
+ inputs=[input_image, prompt_input],
157
+ outputs=[output_image, metadata_output, raw_output, json_output, status_text],
158
+ )
159
+ prompt_input.submit(
160
+ fn=run_inference,
161
+ inputs=[input_image, prompt_input],
162
+ outputs=[output_image, metadata_output, raw_output, json_output, status_text],
163
+ )
164
+
165
+ return app
166
+
167
+
168
+ if __name__ == "__main__":
169
+ app = build_app()
170
+ app.launch(server_name="0.0.0.0", server_port=int(os.getenv("PORT", "7860")))
assets/demo_placeholder.png ADDED
examples/sample_queries.md ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Example Queries for SpaceDebris Localizer
2
+
3
+ ## Single Object Grounding
4
+
5
+ | Prompt | Description |
6
+ |--------|-------------|
7
+ | `Locate a single instance that matches the following description: spacecraft.` | Find one spacecraft |
8
+ | `Locate a single instance that matches the following description: solar panel.` | Find one solar panel |
9
+ | `Locate a single instance that matches the following description: antenna.` | Find one antenna |
10
+
11
+ ## Multi-Object Detection
12
+
13
+ | Prompt | Description |
14
+ |--------|-------------|
15
+ | `Locate all the instances that match the following description: space debris.` | Find all debris fragments |
16
+ | `Locate all the instances that match the following description: satellite fragment.` | Find all satellite pieces |
17
+ | `Locate all the instances that match the following description: solar panel.` | Find all solar panels |
18
+ | `Locate all the instances that match the following description: rocket body.` | Find all rocket stages |
19
+ | `Locate all the instances that match the following description: thermal blanket.` | Find all thermal blankets |
20
+
21
+ ## Multi-Category Detection
22
+
23
+ | Prompt | Description |
24
+ |--------|-------------|
25
+ | `Locate all the instances that matches the following description: debris</c>antenna</c>solar panel.` | Find debris, antennas, and panels |
26
+ | `Locate all the instances that matches the following description: spacecraft</c>satellite fragment.` | Find spacecraft and fragments |
27
+
28
+ ## Tips
29
+
30
+ - Be specific with object descriptions for better grounding results
31
+ - Use `all the instances` when you expect multiple objects
32
+ - Use `a single instance` when targeting one specific object
33
+ - Higher resolution images generally produce better results
34
+ - The model works best with clearly visible, well-lit objects
pyproject.toml ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [project]
2
+ name = "space-debris-localizer"
3
+ version = "1.0.0"
4
+ description = "Locate space debris, satellite fragments, and spacecraft components in orbital imagery using NVIDIA LocateAnything-3B"
5
+ readme = "README.md"
6
+ license = {text = "MIT"}
7
+ requires-python = ">=3.10"
8
+ dependencies = [
9
+ "transformers>=4.57.0",
10
+ "torch>=2.0.0",
11
+ "torchvision",
12
+ "Pillow>=11.0.0",
13
+ "numpy>=1.25.0",
14
+ "opencv-python-headless>=4.11.0",
15
+ "gradio>=5.0.0",
16
+ "peft",
17
+ "decord>=0.6.0",
18
+ "lmdb>=1.7.5",
19
+ "python-dotenv>=1.0.0",
20
+ ]
21
+
22
+ [project.optional-dependencies]
23
+ dev = [
24
+ "ruff>=0.4.0",
25
+ "black>=24.0.0",
26
+ "pytest>=8.0.0",
27
+ ]
28
+
29
+ [tool.black]
30
+ line-length = 100
31
+ target-version = ["py310"]
32
+
33
+ [tool.ruff]
34
+ line-length = 100
35
+ target-version = "py310"
36
+
37
+ [tool.ruff.lint]
38
+ select = ["E", "F", "W", "I", "N", "UP", "B"]
39
+ ignore = ["E501"]
40
+
41
+ [tool.pytest.ini_options]
42
+ testpaths = ["tests"]
43
+ python_files = ["test_*.py"]
44
+ python_functions = ["test_*"]
requirements.txt ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ transformers>=4.57.0
2
+ torch>=2.0.0
3
+ torchvision
4
+ Pillow>=11.0.0
5
+ numpy>=1.25.0
6
+ opencv-python-headless>=4.11.0
7
+ gradio>=5.0.0
8
+ peft
9
+ decord>=0.6.0
10
+ lmdb>=1.7.5
11
+ ruff>=0.4.0
12
+ black>=24.0.0
13
+ pytest>=8.0.0
14
+ python-dotenv>=1.0.0
src/__init__.py ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ """SpaceDebris Localizer - Locate space debris and satellite components in orbital imagery."""
2
+
3
+ __version__ = "1.0.0"
src/config.py ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Configuration constants for SpaceDebris Localizer."""
2
+
3
+ import os
4
+
5
+ MODEL_ID: str = os.getenv("MODEL_ID", "nvidia/LocateAnything-3B")
6
+ DEVICE: str = os.getenv("DEVICE", "cuda")
7
+ DTYPE: str = os.getenv("DTYPE", "bfloat16")
8
+ MAX_NEW_TOKENS: int = int(os.getenv("MAX_NEW_TOKENS", "8192"))
9
+ GENERATION_MODE: str = os.getenv("GENERATION_MODE", "hybrid")
10
+ TEMPERATURE: float = float(os.getenv("TEMPERATURE", "0.7"))
11
+ COORD_MAX: int = 1000
12
+ DEFAULT_CONFIDENCE: float = 0.85
13
+ APP_TITLE: str = "SpaceDebris Localizer"
14
+ APP_SUBTITLE: str = (
15
+ "Use LocateAnything-3B to ground debris, satellite fragments, "
16
+ "and spacecraft components in space imagery."
17
+ )
src/inference.py ADDED
@@ -0,0 +1,168 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Inference wrapper for nvidia/LocateAnything-3B."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import re
6
+ from typing import Any
7
+
8
+ import torch
9
+ from PIL import Image
10
+ from transformers import AutoModel, AutoProcessor, AutoTokenizer
11
+
12
+ from src.config import (
13
+ COORD_MAX,
14
+ DEFAULT_CONFIDENCE,
15
+ DEVICE,
16
+ DTYPE,
17
+ GENERATION_MODE,
18
+ MAX_NEW_TOKENS,
19
+ MODEL_ID,
20
+ TEMPERATURE,
21
+ )
22
+ from src.parsing import BBox, ParseResult, parse_boxes
23
+
24
+
25
+ class LocateAnythingWorker:
26
+ """Stateful worker that loads LocateAnything-3B once and serves queries."""
27
+
28
+ def __init__(
29
+ self,
30
+ model_path: str = MODEL_ID,
31
+ device: str = DEVICE,
32
+ dtype_str: str = DTYPE,
33
+ ) -> None:
34
+ self.device = device
35
+ self.dtype = getattr(torch, dtype_str, torch.bfloat16)
36
+ self.model_path = model_path
37
+ self._loaded = False
38
+ self.tokenizer = None
39
+ self.processor = None
40
+ self.model = None
41
+
42
+ def load(self) -> None:
43
+ """Load model, tokenizer, and processor. Call once at startup."""
44
+ if self._loaded:
45
+ return
46
+ self.tokenizer = AutoTokenizer.from_pretrained(self.model_path, trust_remote_code=True)
47
+ self.processor = AutoProcessor.from_pretrained(self.model_path, trust_remote_code=True)
48
+ self.model = (
49
+ AutoModel.from_pretrained(
50
+ self.model_path,
51
+ torch_dtype=self.dtype,
52
+ trust_remote_code=True,
53
+ )
54
+ .to(self.device)
55
+ .eval()
56
+ )
57
+ self._loaded = True
58
+
59
+ @torch.no_grad()
60
+ def predict(
61
+ self,
62
+ image: Image.Image,
63
+ question: str,
64
+ generation_mode: str = GENERATION_MODE,
65
+ max_new_tokens: int = MAX_NEW_TOKENS,
66
+ temperature: float = TEMPERATURE,
67
+ ) -> dict[str, Any]:
68
+ """Run inference on an image with a text prompt.
69
+
70
+ Returns dict with 'answer', optionally 'history' and 'stats'.
71
+ """
72
+ if not self._loaded:
73
+ self.load()
74
+
75
+ messages = [
76
+ {
77
+ "role": "user",
78
+ "content": [
79
+ {"type": "image", "image": image},
80
+ {"type": "text", "text": question},
81
+ ],
82
+ }
83
+ ]
84
+
85
+ text = self.processor.py_apply_chat_template(
86
+ messages, tokenize=False, add_generation_prompt=True
87
+ )
88
+ images, videos = self.processor.process_vision_info(messages)
89
+ inputs = self.processor(
90
+ text=[text], images=images, videos=videos, return_tensors="pt"
91
+ ).to(self.device)
92
+
93
+ pixel_values = inputs["pixel_values"].to(self.dtype)
94
+ input_ids = inputs["input_ids"]
95
+ image_grid_hws = inputs.get("image_grid_hws", None)
96
+
97
+ response = self.model.generate(
98
+ pixel_values=pixel_values,
99
+ input_ids=input_ids,
100
+ attention_mask=inputs["attention_mask"],
101
+ image_grid_hws=image_grid_hws,
102
+ tokenizer=self.tokenizer,
103
+ max_new_tokens=max_new_tokens,
104
+ use_cache=True,
105
+ generation_mode=generation_mode,
106
+ temperature=temperature,
107
+ do_sample=True,
108
+ top_p=0.9,
109
+ repetition_penalty=1.1,
110
+ verbose=False,
111
+ )
112
+
113
+ result: dict[str, Any] = {"answer": response[0] if isinstance(response, tuple) else response}
114
+ if isinstance(response, tuple) and len(response) >= 3:
115
+ result["history"] = response[1]
116
+ result["stats"] = response[2]
117
+ return result
118
+
119
+ def detect(self, image: Image.Image, categories: list[str], **kwargs: Any) -> dict[str, Any]:
120
+ """Object detection with multiple categories."""
121
+ cats = "</c>".join(categories)
122
+ prompt = f"Locate all the instances that matches the following description: {cats}."
123
+ return self.predict(image, prompt, **kwargs)
124
+
125
+ def ground_single(self, image: Image.Image, phrase: str, **kwargs: Any) -> dict[str, Any]:
126
+ """Phrase grounding — single instance."""
127
+ prompt = f"Locate a single instance that matches the following description: {phrase}."
128
+ return self.predict(image, prompt, **kwargs)
129
+
130
+ def ground_multi(self, image: Image.Image, phrase: str, **kwargs: Any) -> dict[str, Any]:
131
+ """Phrase grounding — multiple instances."""
132
+ prompt = f"Locate all the instances that match the following description: {phrase}."
133
+ return self.predict(image, prompt, **kwargs)
134
+
135
+
136
+ def run_localization(
137
+ image: Image.Image,
138
+ prompt: str,
139
+ worker: LocateAnythingWorker | None = None,
140
+ ) -> tuple[Image.Image, str, ParseResult]:
141
+ """High-level entry point: run localization and return annotated image + results.
142
+
143
+ Args:
144
+ image: Input PIL image.
145
+ prompt: Natural language prompt.
146
+ worker: Pre-loaded worker instance. If None, creates and loads one.
147
+
148
+ Returns:
149
+ Tuple of (annotated_image, raw_output, parse_result).
150
+ """
151
+ from src.visualization import draw_boxes, create_no_detection_overlay
152
+
153
+ if worker is None:
154
+ worker = LocateAnythingWorker()
155
+ worker.load()
156
+
157
+ result = worker.predict(image, prompt)
158
+ raw_output = result.get("answer", "")
159
+
160
+ img_w, img_h = image.size
161
+ parsed = parse_boxes(raw_output, img_w, img_h)
162
+
163
+ if parsed.boxes:
164
+ annotated = draw_boxes(image, parsed.boxes)
165
+ else:
166
+ annotated = create_no_detection_overlay(image)
167
+
168
+ return annotated, raw_output, parsed
src/parsing.py ADDED
@@ -0,0 +1,168 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Output parsing for LocateAnything-3B bounding box responses."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import re
6
+ from dataclasses import dataclass, field
7
+ from typing import Any
8
+
9
+ from src.config import COORD_MAX, DEFAULT_CONFIDENCE
10
+
11
+
12
+ @dataclass
13
+ class BBox:
14
+ """A parsed bounding box in pixel coordinates."""
15
+
16
+ x1: float
17
+ y1: float
18
+ x2: float
19
+ y2: float
20
+ confidence: float = DEFAULT_CONFIDENCE
21
+ label: str = ""
22
+
23
+ @property
24
+ def width(self) -> float:
25
+ return max(0.0, self.x2 - self.x1)
26
+
27
+ @property
28
+ def height(self) -> float:
29
+ return max(0.0, self.y2 - self.y1)
30
+
31
+ @property
32
+ def area(self) -> float:
33
+ return self.width * self.height
34
+
35
+ @property
36
+ def center(self) -> tuple[float, float]:
37
+ return ((self.x1 + self.x2) / 2, (self.y1 + self.y2) / 2)
38
+
39
+ def is_valid(self, img_w: int, img_h: int) -> bool:
40
+ """Check if box is within image bounds and has positive area."""
41
+ return (
42
+ self.x1 >= 0
43
+ and self.y1 >= 0
44
+ and self.x2 <= img_w + 1
45
+ and self.y2 <= img_h + 1
46
+ and self.width > 1
47
+ and self.height > 1
48
+ )
49
+
50
+ def clamp(self, img_w: int, img_h: int) -> BBox:
51
+ """Return a clamped copy within image bounds."""
52
+ return BBox(
53
+ x1=max(0, min(self.x1, img_w)),
54
+ y1=max(0, min(self.y1, img_h)),
55
+ x2=max(0, min(self.x2, img_w)),
56
+ y2=max(0, min(self.y2, img_h)),
57
+ confidence=self.confidence,
58
+ label=self.label,
59
+ )
60
+
61
+ def to_dict(self) -> dict[str, Any]:
62
+ return {
63
+ "x1": round(self.x1, 2),
64
+ "y1": round(self.y1, 2),
65
+ "x2": round(self.x2, 2),
66
+ "y2": round(self.y2, 2),
67
+ "width": round(self.width, 2),
68
+ "height": round(self.height, 2),
69
+ "confidence": self.confidence,
70
+ "label": self.label,
71
+ }
72
+
73
+
74
+ @dataclass
75
+ class ParseResult:
76
+ """Structured result from parsing model output."""
77
+
78
+ boxes: list[BBox] = field(default_factory=list)
79
+ raw_output: str = ""
80
+ parse_errors: list[str] = field(default_factory=list)
81
+
82
+ @property
83
+ def num_detections(self) -> int:
84
+ return len(self.boxes)
85
+
86
+ def to_dict(self) -> dict[str, Any]:
87
+ return {
88
+ "num_detections": self.num_detections,
89
+ "boxes": [b.to_dict() for b in self.boxes],
90
+ "raw_output": self.raw_output,
91
+ "parse_errors": self.parse_errors,
92
+ }
93
+
94
+
95
+ BOX_PATTERN_4 = re.compile(r"<box><(\d+)><(\d+)><(\d+)><(\d+)></box>")
96
+ BOX_PATTERN_4_ALT = re.compile(r"<box>\s*(\d+)\s*,\s*(\d+)\s*,\s*(\d+)\s*,\s*(\d+)\s*</box>")
97
+ BOX_PATTERN_2 = re.compile(r"<box><(\d+)><(\d+)></box>")
98
+
99
+
100
+ def _norm_to_pixel(val: int, scale: int) -> float:
101
+ """Convert normalized [0, 1000] coordinate to pixel coordinate."""
102
+ return val / COORD_MAX * scale
103
+
104
+
105
+ def parse_boxes(
106
+ raw_output: str,
107
+ image_width: int,
108
+ image_height: int,
109
+ ) -> ParseResult:
110
+ """Parse model output into structured bounding boxes.
111
+
112
+ The model outputs coordinates normalized to [0, 1000].
113
+ This function converts them to pixel coordinates.
114
+ """
115
+ result = ParseResult(raw_output=raw_output)
116
+ seen: set[tuple[float, float, float, float]] = set()
117
+
118
+ for match in BOX_PATTERN_4.finditer(raw_output):
119
+ try:
120
+ x1 = _norm_to_pixel(int(match.group(1)), image_width)
121
+ y1 = _norm_to_pixel(int(match.group(2)), image_height)
122
+ x2 = _norm_to_pixel(int(match.group(3)), image_width)
123
+ y2 = _norm_to_pixel(int(match.group(4)), image_height)
124
+ key = (round(x1, 1), round(y1, 1), round(x2, 1), round(y2, 1))
125
+ if key not in seen:
126
+ seen.add(key)
127
+ box = BBox(x1=x1, y1=y1, x2=x2, y2=y2)
128
+ if box.is_valid(image_width, image_height):
129
+ result.boxes.append(box)
130
+ else:
131
+ result.parse_errors.append(f"Out-of-bounds box discarded: {key}")
132
+ except (ValueError, IndexError) as exc:
133
+ result.parse_errors.append(f"Failed to parse box: {exc}")
134
+
135
+ if not result.boxes:
136
+ for match in BOX_PATTERN_4_ALT.finditer(raw_output):
137
+ try:
138
+ x1 = _norm_to_pixel(int(match.group(1)), image_width)
139
+ y1 = _norm_to_pixel(int(match.group(2)), image_height)
140
+ x2 = _norm_to_pixel(int(match.group(3)), image_width)
141
+ y2 = _norm_to_pixel(int(match.group(4)), image_height)
142
+ key = (round(x1, 1), round(y1, 1), round(x2, 1), round(y2, 1))
143
+ if key not in seen:
144
+ seen.add(key)
145
+ box = BBox(x1=x1, y1=y1, x2=x2, y2=y2)
146
+ if box.is_valid(image_width, image_height):
147
+ result.boxes.append(box)
148
+ except (ValueError, IndexError) as exc:
149
+ result.parse_errors.append(f"Failed to parse alt box: {exc}")
150
+
151
+ return result
152
+
153
+
154
+ def parse_points(
155
+ raw_output: str,
156
+ image_width: int,
157
+ image_height: int,
158
+ ) -> list[dict[str, float]]:
159
+ """Parse model output into pixel-coordinate points."""
160
+ points = []
161
+ for match in BOX_PATTERN_2.finditer(raw_output):
162
+ try:
163
+ x = _norm_to_pixel(int(match.group(1)), image_width)
164
+ y = _norm_to_pixel(int(match.group(2)), image_height)
165
+ points.append({"x": round(x, 2), "y": round(y, 2)})
166
+ except (ValueError, IndexError):
167
+ pass
168
+ return points
src/prompts.py ADDED
@@ -0,0 +1,108 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Prompt templates for space debris localization tasks."""
2
+
3
+ from dataclasses import dataclass
4
+
5
+
6
+ @dataclass(frozen=True)
7
+ class PromptTemplate:
8
+ """A reusable prompt template with metadata."""
9
+
10
+ name: str
11
+ template: str
12
+ description: str
13
+ category: str
14
+
15
+
16
+ DETECTION_TEMPLATES: list[PromptTemplate] = [
17
+ PromptTemplate(
18
+ name="debris_single",
19
+ template="Locate a single instance that matches the following description: {phrase}.",
20
+ description="Locate one instance of a specific object",
21
+ category="grounding",
22
+ ),
23
+ PromptTemplate(
24
+ name="debris_multi",
25
+ template="Locate all the instances that match the following description: {phrase}.",
26
+ description="Locate all instances of a specific object type",
27
+ category="detection",
28
+ ),
29
+ PromptTemplate(
30
+ name="debris_categories",
31
+ template="Locate all the instances that matches the following description: {categories}.",
32
+ description="Detect multiple object categories at once",
33
+ category="detection",
34
+ ),
35
+ PromptTemplate(
36
+ name="text_grounding",
37
+ template="Please locate the text referred as {phrase}.",
38
+ description="Locate text labels or markings in the image",
39
+ category="text",
40
+ ),
41
+ PromptTemplate(
42
+ name="scene_text",
43
+ template="Detect all the text in box format.",
44
+ description="Detect all visible text in the scene",
45
+ category="text",
46
+ ),
47
+ ]
48
+
49
+ SPACE_DEBRIS_EXAMPLES: list[dict[str, str]] = [
50
+ {
51
+ "phrase": "space debris",
52
+ "prompt": "Locate all the instances that match the following description: space debris.",
53
+ "description": "Find all visible space debris fragments",
54
+ },
55
+ {
56
+ "phrase": "satellite fragment",
57
+ "prompt": "Locate all the instances that match the following description: satellite fragment.",
58
+ "description": "Identify broken satellite pieces",
59
+ },
60
+ {
61
+ "phrase": "solar panel",
62
+ "prompt": "Locate all the instances that match the following description: solar panel.",
63
+ "description": "Find satellite solar panels",
64
+ },
65
+ {
66
+ "phrase": "antenna",
67
+ "prompt": "Locate all the instances that match the following description: antenna.",
68
+ "description": "Locate spacecraft antennas",
69
+ },
70
+ {
71
+ "phrase": "rocket body",
72
+ "prompt": "Locate all the instances that match the following description: rocket body.",
73
+ "description": "Find spent rocket stages",
74
+ },
75
+ {
76
+ "phrase": "spacecraft",
77
+ "prompt": "Locate a single instance that matches the following description: spacecraft.",
78
+ "description": "Locate a single spacecraft",
79
+ },
80
+ {
81
+ "phrase": "debris field",
82
+ "prompt": "Locate all the instances that match the following description: debris field.",
83
+ "description": "Find clusters of orbital debris",
84
+ },
85
+ {
86
+ "phrase": "thermal blanket",
87
+ "prompt": "Locate all the instances that match the following description: thermal blanket.",
88
+ "description": "Find loose thermal insulation material",
89
+ },
90
+ ]
91
+
92
+
93
+ def build_detect_prompt(categories: list[str]) -> str:
94
+ """Build a multi-category detection prompt."""
95
+ joined = "</c>".join(categories)
96
+ return f"Locate all the instances that matches the following description: {joined}."
97
+
98
+
99
+ def build_grounding_prompt(phrase: str, *, single: bool = False) -> str:
100
+ """Build a phrase grounding prompt."""
101
+ if single:
102
+ return f"Locate a single instance that matches the following description: {phrase}."
103
+ return f"Locate all the instances that match the following description: {phrase}."
104
+
105
+
106
+ def get_example_prompts() -> list[list[str]]:
107
+ """Return example prompts for Gradio examples component."""
108
+ return [[ex["prompt"]] for ex in SPACE_DEBRIS_EXAMPLES]
src/utils.py ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Utility functions for SpaceDebris Localizer."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import io
6
+ import logging
7
+ from typing import Any
8
+
9
+ from PIL import Image
10
+
11
+ logger = logging.getLogger(__name__)
12
+
13
+
14
+ def validate_image(image: Any) -> tuple[bool, str]:
15
+ """Validate that an input is a usable PIL image.
16
+
17
+ Returns:
18
+ Tuple of (is_valid, error_message).
19
+ """
20
+ if image is None:
21
+ return False, "No image provided. Please upload an image."
22
+ if not isinstance(image, Image.Image):
23
+ return False, "Invalid image format. Please upload a valid image file."
24
+ if image.mode not in ("RGB", "RGBA", "L"):
25
+ return False, f"Unsupported image mode: {image.mode}. Use RGB or grayscale."
26
+ w, h = image.size
27
+ if w < 32 or h < 32:
28
+ return False, f"Image too small ({w}x{h}). Minimum 32x32 pixels."
29
+ if w > 8192 or h > 8192:
30
+ return False, f"Image too large ({w}x{h}). Maximum 8192x8192 pixels."
31
+ return True, ""
32
+
33
+
34
+ def ensure_rgb(image: Image.Image) -> Image.Image:
35
+ """Convert image to RGB if needed."""
36
+ if image.mode == "RGBA":
37
+ background = Image.new("RGB", image.size, (0, 0, 0))
38
+ background.paste(image, mask=image.split()[3])
39
+ return background
40
+ if image.mode == "L":
41
+ return image.convert("RGB")
42
+ if image.mode != "RGB":
43
+ return image.convert("RGB")
44
+ return image
45
+
46
+
47
+ def format_metadata(parse_result: Any) -> str:
48
+ """Format parse result metadata as human-readable text."""
49
+ lines = [
50
+ f"Detected objects: {parse_result.num_detections}",
51
+ f"Raw output length: {len(parse_result.raw_output)} chars",
52
+ ]
53
+ if parse_result.parse_errors:
54
+ lines.append(f"Parse warnings: {len(parse_result.parse_errors)}")
55
+ for err in parse_result.parse_errors[:5]:
56
+ lines.append(f" - {err}")
57
+ return "\n".join(lines)
58
+
59
+
60
+ def format_json_output(parse_result: Any) -> dict[str, Any]:
61
+ """Return JSON-serializable dict from parse result."""
62
+ return parse_result.to_dict()
src/visualization.py ADDED
@@ -0,0 +1,119 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Visualization utilities for drawing bounding boxes on images."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from typing import TYPE_CHECKING
6
+
7
+ import numpy as np
8
+ from PIL import Image, ImageDraw, ImageFont
9
+
10
+ if TYPE_CHECKING:
11
+ from src.parsing import BBox
12
+
13
+ BOX_COLORS = [
14
+ "#FF0000", "#00FF00", "#0000FF", "#FFFF00", "#FF00FF", "#00FFFF",
15
+ "#FF8800", "#8800FF", "#00FF88", "#FF0088", "#88FF00", "#0088FF",
16
+ ]
17
+
18
+ MIN_BOX_SIZE = 4
19
+
20
+
21
+ def _get_font(size: int = 14) -> ImageFont.FreeTypeFont | ImageFont.ImageFont:
22
+ """Try to load a reasonable font, fall back to default."""
23
+ try:
24
+ return ImageFont.truetype("arial.ttf", size)
25
+ except (OSError, IOError):
26
+ try:
27
+ return ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", size)
28
+ except (OSError, IOError):
29
+ return ImageFont.load_default()
30
+
31
+
32
+ def _hex_to_rgba(hex_color: str, alpha: int = 80) -> tuple[int, int, int, int]:
33
+ """Convert hex color to RGBA tuple."""
34
+ h = hex_color.lstrip("#")
35
+ r, g, b = int(h[0:2], 16), int(h[2:4], 16), int(h[4:6], 16)
36
+ return (r, g, b, alpha)
37
+
38
+
39
+ def draw_boxes(
40
+ image: Image.Image,
41
+ boxes: list[BBox],
42
+ labels: list[str] | None = None,
43
+ show_confidence: bool = True,
44
+ line_width: int = 3,
45
+ font_size: int = 14,
46
+ ) -> Image.Image:
47
+ """Draw bounding boxes with labels on an image.
48
+
49
+ Args:
50
+ image: Source PIL image.
51
+ boxes: List of BBox objects in pixel coordinates.
52
+ labels: Optional per-box labels. If None, uses box.label or index.
53
+ show_confidence: Whether to show confidence score in label.
54
+ line_width: Width of bounding box outlines.
55
+ font_size: Font size for labels.
56
+
57
+ Returns:
58
+ New image with drawn overlays.
59
+ """
60
+ if not boxes:
61
+ return image.copy()
62
+
63
+ img = image.copy().convert("RGBA")
64
+ overlay = Image.new("RGBA", img.size, (0, 0, 0, 0))
65
+ draw_overlay = ImageDraw.Draw(overlay)
66
+ draw_text = ImageDraw.Draw(img)
67
+ font = _get_font(font_size)
68
+ img_w, img_h = img.size
69
+
70
+ for i, box in enumerate(boxes):
71
+ color_hex = BOX_COLORS[i % len(BOX_COLORS)]
72
+ fill_rgba = _hex_to_rgba(color_hex, alpha=50)
73
+ outline_rgb = color_hex
74
+
75
+ bx1, by1 = max(0, box.x1), max(0, box.y1)
76
+ bx2, by2 = min(img_w, box.x2), min(img_h, box.y2)
77
+
78
+ if (bx2 - bx1) < MIN_BOX_SIZE or (by2 - by1) < MIN_BOX_SIZE:
79
+ cx, cy = (bx1 + bx2) / 2, (by1 + by2) / 2
80
+ half = MIN_BOX_SIZE
81
+ bx1, by1 = cx - half, cy - half
82
+ bx2, by2 = cx + half, cy + half
83
+
84
+ draw_overlay.rectangle([bx1, by1, bx2, by2], fill=fill_rgba, outline=outline_rgb, width=line_width)
85
+
86
+ label = labels[i] if labels and i < len(labels) else (box.label or f"#{i+1}")
87
+ if show_confidence and box.confidence > 0:
88
+ label = f"{label} ({box.confidence:.0%})"
89
+
90
+ text_bbox = draw_text.textbbox((0, 0), label, font=font)
91
+ text_w = text_bbox[2] - text_bbox[0]
92
+ text_h = text_bbox[3] - text_bbox[1]
93
+ text_y = by1 - text_h - 4 if by1 - text_h - 4 > 0 else by1 + 4
94
+ text_x = max(0, bx1)
95
+
96
+ draw_text.rectangle(
97
+ [text_x, text_y, text_x + text_w + 6, text_y + text_h + 4],
98
+ fill=color_hex,
99
+ )
100
+ draw_text.text((text_x + 3, text_y + 2), label, fill="white", font=font)
101
+
102
+ img = Image.alpha_composite(img, overlay).convert("RGB")
103
+ return img
104
+
105
+
106
+ def create_no_detection_overlay(image: Image.Image, message: str = "No detections found") -> Image.Image:
107
+ """Create an overlay indicating no objects were detected."""
108
+ img = image.copy()
109
+ draw = ImageDraw.Draw(img)
110
+ font = _get_font(18)
111
+ text_bbox = draw.textbbox((0, 0), message, font=font)
112
+ text_w = text_bbox[2] - text_bbox[0]
113
+ text_h = text_bbox[3] - text_bbox[1]
114
+ img_w, img_h = img.size
115
+ x = (img_w - text_w) / 2
116
+ y = img_h - text_h - 20
117
+ draw.rectangle([x - 10, y - 5, x + text_w + 10, y + text_h + 5], fill=(0, 0, 0, 180))
118
+ draw.text((x, y), message, fill="yellow", font=font)
119
+ return img
tests/test_app_smoke.py ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Smoke tests for the Gradio app — import-level checks only."""
2
+
3
+ import pytest
4
+
5
+
6
+ def test_app_module_imports():
7
+ """Verify lightweight source modules can be imported without error."""
8
+ import src.config
9
+ import src.parsing
10
+ import src.prompts
11
+ import src.utils
12
+
13
+
14
+ def test_config_values():
15
+ from src.config import APP_TITLE, APP_SUBTITLE, MODEL_ID, COORD_MAX
16
+
17
+ assert MODEL_ID == "nvidia/LocateAnything-3B"
18
+ assert COORD_MAX == 1000
19
+ assert len(APP_TITLE) > 0
20
+ assert len(APP_SUBTITLE) > 0
21
+
22
+
23
+ def test_visualization_import():
24
+ """Verify visualization module imports."""
25
+ import src.visualization
26
+ assert hasattr(src.visualization, "draw_boxes")
27
+ assert hasattr(src.visualization, "create_no_detection_overlay")
28
+
29
+
30
+ def test_app_build_callable():
31
+ """Verify the Gradio app builder is importable and callable."""
32
+ try:
33
+ from app import build_app
34
+ assert callable(build_app)
35
+ except ImportError:
36
+ pytest.skip("Gradio not available in test environment")
37
+
38
+
39
+ def test_inference_module_imports():
40
+ """Verify inference module structure without heavy imports."""
41
+ try:
42
+ from src.inference import LocateAnythingWorker
43
+ w = LocateAnythingWorker.__new__(LocateAnythingWorker)
44
+ assert not getattr(w, "_loaded", True)
45
+ except ImportError:
46
+ pytest.skip("transformers/torch not available in test environment")
tests/test_parsing.py ADDED
@@ -0,0 +1,115 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Tests for the output parsing module."""
2
+
3
+ import pytest
4
+ from src.parsing import BBox, ParseResult, parse_boxes, parse_points
5
+
6
+
7
+ class TestBBox:
8
+ def test_valid_box_within_bounds(self):
9
+ box = BBox(x1=10, y1=20, x2=100, y2=200)
10
+ assert box.is_valid(640, 480)
11
+ assert box.width == 90
12
+ assert box.height == 180
13
+ assert box.area == 16200
14
+
15
+ def test_box_center(self):
16
+ box = BBox(x1=0, y1=0, x2=100, y2=100)
17
+ assert box.center == (50.0, 50.0)
18
+
19
+ def test_invalid_box_zero_area(self):
20
+ box = BBox(x1=50, y1=50, x2=50, y2=50)
21
+ assert not box.is_valid(640, 480)
22
+
23
+ def test_invalid_box_out_of_bounds(self):
24
+ box = BBox(x1=-10, y1=0, x2=100, y2=100)
25
+ assert not box.is_valid(640, 480)
26
+
27
+ def test_clamp(self):
28
+ box = BBox(x1=-10, y1=-5, x2=700, y2=500)
29
+ clamped = box.clamp(640, 480)
30
+ assert clamped.x1 == 0
31
+ assert clamped.y1 == 0
32
+ assert clamped.x2 == 640
33
+ assert clamped.y2 == 480
34
+
35
+ def test_to_dict(self):
36
+ box = BBox(x1=10, y1=20, x2=100, y2=200, confidence=0.9, label="test")
37
+ d = box.to_dict()
38
+ assert d["x1"] == 10
39
+ assert d["label"] == "test"
40
+ assert d["confidence"] == 0.9
41
+
42
+
43
+ class TestParseBoxes:
44
+ def test_single_box(self):
45
+ raw = "<box><100><200><300><400></box>"
46
+ result = parse_boxes(raw, 1000, 1000)
47
+ assert result.num_detections == 1
48
+ assert result.boxes[0].x1 == 100.0
49
+ assert result.boxes[0].y1 == 200.0
50
+ assert result.boxes[0].x2 == 300.0
51
+ assert result.boxes[0].y2 == 400.0
52
+
53
+ def test_multiple_boxes(self):
54
+ raw = "<box><100><100><200><200></box> some text <box><500><500><600><600></box>"
55
+ result = parse_boxes(raw, 1000, 1000)
56
+ assert result.num_detections == 2
57
+
58
+ def test_duplicate_boxes_deduplicated(self):
59
+ raw = "<box><100><100><200><200></box> <box><100><100><200><200></box>"
60
+ result = parse_boxes(raw, 1000, 1000)
61
+ assert result.num_detections == 1
62
+
63
+ def test_no_boxes(self):
64
+ raw = "No objects detected in this image."
65
+ result = parse_boxes(raw, 1000, 1000)
66
+ assert result.num_detections == 0
67
+
68
+ def test_coordinate_scaling(self):
69
+ raw = "<box><500><500><1000><1000></box>"
70
+ result = parse_boxes(raw, 640, 480)
71
+ assert result.num_detections == 1
72
+ assert abs(result.boxes[0].x2 - 640.0) < 0.1
73
+ assert abs(result.boxes[0].y2 - 480.0) < 0.1
74
+
75
+ def test_out_of_bounds_box_discarded(self):
76
+ raw = "<box><999><999><1001><1001></box>"
77
+ result = parse_boxes(raw, 100, 100)
78
+ assert result.num_detections == 0
79
+ assert len(result.parse_errors) > 0
80
+
81
+ def test_alt_format(self):
82
+ raw = "<box>100, 200, 300, 400</box>"
83
+ result = parse_boxes(raw, 1000, 1000)
84
+ assert result.num_detections == 1
85
+
86
+
87
+ class TestParsePoints:
88
+ def test_single_point(self):
89
+ raw = "<box><500><500></box>"
90
+ points = parse_points(raw, 1000, 1000)
91
+ assert len(points) == 1
92
+ assert points[0]["x"] == 500.0
93
+ assert points[0]["y"] == 500.0
94
+
95
+ def test_no_points(self):
96
+ raw = "nothing here"
97
+ points = parse_points(raw, 1000, 1000)
98
+ assert len(points) == 0
99
+
100
+
101
+ class TestParseResult:
102
+ def test_empty_result(self):
103
+ r = ParseResult()
104
+ assert r.num_detections == 0
105
+ d = r.to_dict()
106
+ assert d["num_detections"] == 0
107
+
108
+ def test_result_with_boxes(self):
109
+ r = ParseResult(
110
+ boxes=[BBox(10, 20, 100, 200)],
111
+ raw_output="<box><10><20><100><200></box>",
112
+ )
113
+ assert r.num_detections == 1
114
+ d = r.to_dict()
115
+ assert len(d["boxes"]) == 1
tests/test_prompts.py ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Tests for prompt templates."""
2
+
3
+ import pytest
4
+ from src.prompts import (
5
+ SPACE_DEBRIS_EXAMPLES,
6
+ DETECTION_TEMPLATES,
7
+ PromptTemplate,
8
+ build_detect_prompt,
9
+ build_grounding_prompt,
10
+ get_example_prompts,
11
+ )
12
+
13
+
14
+ class TestPromptTemplate:
15
+ def test_template_fields(self):
16
+ t = DETECTION_TEMPLATES[0]
17
+ assert isinstance(t, PromptTemplate)
18
+ assert t.name
19
+ assert t.template
20
+ assert t.description
21
+ assert t.category
22
+
23
+ def test_template_has_placeholder(self):
24
+ for t in DETECTION_TEMPLATES:
25
+ assert "{" in t.template or "Detect" in t.template
26
+
27
+
28
+ class TestBuildPrompts:
29
+ def test_build_detect_prompt_single(self):
30
+ prompt = build_detect_prompt(["debris"])
31
+ assert "debris" in prompt
32
+ assert "Locate" in prompt
33
+
34
+ def test_build_detect_prompt_multiple(self):
35
+ prompt = build_detect_prompt(["debris", "antenna", "panel"])
36
+ assert "</c>" in prompt
37
+ assert "debris" in prompt
38
+ assert "antenna" in prompt
39
+
40
+ def test_build_grounding_prompt_multi(self):
41
+ prompt = build_grounding_prompt("solar panel")
42
+ assert "solar panel" in prompt
43
+ assert "all the instances" in prompt
44
+
45
+ def test_build_grounding_prompt_single(self):
46
+ prompt = build_grounding_prompt("spacecraft", single=True)
47
+ assert "spacecraft" in prompt
48
+ assert "single instance" in prompt
49
+
50
+
51
+ class TestExamples:
52
+ def test_examples_not_empty(self):
53
+ assert len(SPACE_DEBRIS_EXAMPLES) > 0
54
+
55
+ def test_example_structure(self):
56
+ for ex in SPACE_DEBRIS_EXAMPLES:
57
+ assert "phrase" in ex
58
+ assert "prompt" in ex
59
+ assert "description" in ex
60
+
61
+ def test_get_example_prompts(self):
62
+ prompts = get_example_prompts()
63
+ assert len(prompts) == len(SPACE_DEBRIS_EXAMPLES)
64
+ for p in prompts:
65
+ assert len(p) == 1
66
+ assert isinstance(p[0], str)
67
+ assert len(p[0]) > 0
tests/test_visualization.py ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Tests for the visualization module."""
2
+
3
+ import pytest
4
+ from PIL import Image
5
+ from src.parsing import BBox
6
+ from src.visualization import draw_boxes, create_no_detection_overlay, MIN_BOX_SIZE
7
+
8
+
9
+ @pytest.fixture
10
+ def sample_image():
11
+ return Image.new("RGB", (640, 480), color=(30, 30, 60))
12
+
13
+
14
+ @pytest.fixture
15
+ def sample_boxes():
16
+ return [
17
+ BBox(x1=50, y1=50, x2=200, y2=150, confidence=0.9, label="debris"),
18
+ BBox(x1=300, y1=200, x2=450, y2=350, confidence=0.75, label="satellite"),
19
+ ]
20
+
21
+
22
+ class TestDrawBoxes:
23
+ def test_returns_image(self, sample_image, sample_boxes):
24
+ result = draw_boxes(sample_image, sample_boxes)
25
+ assert isinstance(result, Image.Image)
26
+ assert result.size == sample_image.size
27
+
28
+ def test_empty_boxes_returns_copy(self, sample_image):
29
+ result = draw_boxes(sample_image, [])
30
+ assert isinstance(result, Image.Image)
31
+ assert result.size == sample_image.size
32
+
33
+ def test_custom_labels(self, sample_image, sample_boxes):
34
+ labels = ["fragment", "panel"]
35
+ result = draw_boxes(sample_image, sample_boxes, labels=labels)
36
+ assert isinstance(result, Image.Image)
37
+
38
+ def test_tiny_boxes_expanded(self, sample_image):
39
+ tiny_boxes = [BBox(x1=100, y1=100, x2=101, y2=101)]
40
+ result = draw_boxes(sample_image, tiny_boxes)
41
+ assert isinstance(result, Image.Image)
42
+
43
+ def test_out_of_bounds_boxes_clipped(self, sample_image):
44
+ boxes = [BBox(x1=-50, y1=-50, x2=800, y2=600)]
45
+ result = draw_boxes(sample_image, boxes)
46
+ assert isinstance(result, Image.Image)
47
+
48
+ def test_no_confidence_display(self, sample_image, sample_boxes):
49
+ result = draw_boxes(sample_image, sample_boxes, show_confidence=False)
50
+ assert isinstance(result, Image.Image)
51
+
52
+
53
+ class TestNoDetectionOverlay:
54
+ def test_returns_image(self, sample_image):
55
+ result = create_no_detection_overlay(sample_image)
56
+ assert isinstance(result, Image.Image)
57
+ assert result.size == sample_image.size
58
+
59
+ def test_custom_message(self, sample_image):
60
+ result = create_no_detection_overlay(sample_image, "Custom message")
61
+ assert isinstance(result, Image.Image)