qwen3-vl-32b-soccer-v11-fp8

LoRA-merged FP8 quantized variant of Qwen3-VL-32B for soccer event classification.

This is the production checkpoint that powers the dual-pass event detector in the soccer-video-pipeline project. It's a single ~34 GB artifact you can serve directly with vLLM — no separate base + adapter merge step needed.

What it does

Given a short window of soccer match frames (4-8 frames sampled at 1 Hz over a 5-10 second clip), the model classifies the event happening in the window as one of:

  • goal
  • shot_on_target
  • free_kick_shot
  • catch
  • shot_stop_diving, shot_stop_standing
  • corner_kick, goal_kick, throw_in
  • kickoff_restart, active_play, idle (auxiliary states)

The model was fine-tuned to suppress some noisy auxiliary labels (notably kickoff_restart) for cleaner downstream event classification. For detecting kickoff restarts (used in the goal-recall pipeline), use the base Qwen/Qwen3-VL-32B-Instruct-FP8 instead — see the architecture doc in the GitHub repo.

How to serve

vLLM 0.19.1 is the only known-working version. Newer vLLM releases silently break this checkpoint (garbage token output). Pin it.

pip install vllm==0.19.1

vllm serve acatorcini/qwen3-vl-32b-soccer-v11-fp8 \
  --tensor-parallel-size 2 \
  --max-model-len 16384 \
  --gpu-memory-utilization 0.92 \
  --max-num-seqs 16 \
  --port 8000 \
  --host 0.0.0.0 \
  --dtype auto \
  --served-model-name qwen3-vl-32b \
  --quantization compressed-tensors

--quantization compressed-tensors is required (this is a LoRA-merged FP8 checkpoint). Using --quantization fp8 will fail. Conversely, the base model is served with --quantization fp8 — don't mix them up.

Hardware

  • Minimum: 2× RTX 3090 / 4090 over NVLink (48 GB VRAM total), tensor-parallel 2
  • Single GPU: needs ≥40 GB VRAM (A100, H100)

Training data

Custom-curated set of ~10,000 short soccer event clips with manual labels, drawn from amateur and youth-level matches (1080p, sideline camera at ~50m). Multiple games, multiple venues, varied lighting. Training data is not redistributed.

Intended use

Personal soccer analytics, research on amateur sports video understanding, component of the open-source soccer-video-pipeline system. Not intended for professional broadcast use.

Limitations

  • The model's ViT cannot reliably distinguish the ball at >50m camera distance (the ball is 3-5 px). This affects raw goal recall — the upstream pipeline compensates with a kickoff-restart ensemble.
  • Performance degrades on dramatically different camera framings than the training corpus (e.g., behind-goal cameras, drone footage).
  • Trained on English commentary / labels only.

License

Inherits the Qwen3-VL Tongyi Qianwen License. Commercial use is permitted for products with <100M MAU; otherwise license terms apply. Read the linked license for the authoritative terms.

Related artifacts

Downloads last month
16
Safetensors
Model size
33B params
Tensor type
BF16
·
F8_E4M3
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for acatorcini/qwen3-vl-32b-soccer-v11-fp8

Quantized
(2)
this model