qwen3-vl-32b-soccer-v11-fp8
LoRA-merged FP8 quantized variant of Qwen3-VL-32B for soccer event classification.
This is the production checkpoint that powers the dual-pass event detector in the soccer-video-pipeline project. It's a single ~34 GB artifact you can serve directly with vLLM — no separate base + adapter merge step needed.
What it does
Given a short window of soccer match frames (4-8 frames sampled at 1 Hz over a 5-10 second clip), the model classifies the event happening in the window as one of:
goalshot_on_targetfree_kick_shotcatchshot_stop_diving,shot_stop_standingcorner_kick,goal_kick,throw_inkickoff_restart,active_play,idle(auxiliary states)
The model was fine-tuned to suppress some noisy auxiliary labels (notably
kickoff_restart) for cleaner downstream event classification. For
detecting kickoff restarts (used in the goal-recall pipeline), use the
base Qwen/Qwen3-VL-32B-Instruct-FP8 instead — see the architecture doc
in the GitHub repo.
How to serve
vLLM 0.19.1 is the only known-working version. Newer vLLM releases silently break this checkpoint (garbage token output). Pin it.
pip install vllm==0.19.1
vllm serve acatorcini/qwen3-vl-32b-soccer-v11-fp8 \
--tensor-parallel-size 2 \
--max-model-len 16384 \
--gpu-memory-utilization 0.92 \
--max-num-seqs 16 \
--port 8000 \
--host 0.0.0.0 \
--dtype auto \
--served-model-name qwen3-vl-32b \
--quantization compressed-tensors
--quantization compressed-tensors is required (this is a LoRA-merged FP8
checkpoint). Using --quantization fp8 will fail. Conversely, the base
model is served with --quantization fp8 — don't mix them up.
Hardware
- Minimum: 2× RTX 3090 / 4090 over NVLink (48 GB VRAM total), tensor-parallel 2
- Single GPU: needs ≥40 GB VRAM (A100, H100)
Training data
Custom-curated set of ~10,000 short soccer event clips with manual labels, drawn from amateur and youth-level matches (1080p, sideline camera at ~50m). Multiple games, multiple venues, varied lighting. Training data is not redistributed.
Intended use
Personal soccer analytics, research on amateur sports video understanding,
component of the open-source soccer-video-pipeline system. Not intended
for professional broadcast use.
Limitations
- The model's ViT cannot reliably distinguish the ball at >50m camera distance (the ball is 3-5 px). This affects raw goal recall — the upstream pipeline compensates with a kickoff-restart ensemble.
- Performance degrades on dramatically different camera framings than the training corpus (e.g., behind-goal cameras, drone footage).
- Trained on English commentary / labels only.
License
Inherits the Qwen3-VL Tongyi Qianwen License. Commercial use is permitted for products with <100M MAU; otherwise license terms apply. Read the linked license for the authoritative terms.
Related artifacts
- LoRA adapter alone (2.27 GB, for users who want to merge against a different base or continue training): acatorcini/qwen3-vl-32b-soccer-v11-lora
- Ball-detection YOLOv9: acatorcini/yolov9-soccer-ball
- Pipeline source: github.com/acato/soccer-video-pipeline
- Downloads last month
- 16
Model tree for acatorcini/qwen3-vl-32b-soccer-v11-fp8
Base model
Qwen/Qwen3-VL-32B-Instruct