A newer version of the Gradio SDK is available: 6.19.0
title: Celebrity Deathmatch
emoji: π₯
colorFrom: red
colorTo: yellow
sdk: gradio
sdk_version: 6.0.0
app_file: app.py
pinned: true
short_description: Two photos in, a claymation celebrity death match out
tags:
- track:wood
- sponsor:modal
- sponsor:openbmb
- achievement:offbrand
- thousand-token-wood
- off-brand
- best-demo
- text-to-video
- image-generation
- text-to-speech
- gradio
- modal
models:
- openbmb/MiniCPM-V-2_6
- black-forest-labs/FLUX.1-schnell
- Lightricks/LTX-Video
- Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign
π₯ Celebrity Deathmatch
Upload two photos. Our AI ring director books the brawl β a claymation fight script, a rendered keyframe reel, a declared winner β then turns it into one continuous fight video with two ring announcers screaming over the action and the crowd going wild.
It's MTV's Celebrity Deathmatch as an AI-native toy: pure spectacle, zero practical value, maximum fun. (That's the Thousand Token Wood track in one sentence.)
β οΈ Parody. Every visual is an AI-generated claymation caricature of a public figure, for comedic effect. Not real. No real people were harmed.
βΆοΈ See it in action
π¬ Watch the 60-second demo Β· π₯ Try it live Β· π Launch post
Why it's worth a look
- ποΈ Two-announcer voiceover, not a silent clip. Every beat is called by Nick (dry, sarcastic) and Johnny (loud, over-excited) β two designed voices from Qwen3-TTS VoiceDesign β mixed over a bell, a crowd murmur bed, and a winner roar. The fight has a soundtrack, like the real show.
- π¨ Off-brand UI. No default Gradio look: a custom claymation-fight art direction β Anton display type, fire-and-clay palette, tale-of-the-tape stat bars, animated winner banner.
- π§± Real caricatures, real stakes. MiniCPM reads both photos, invents a fighting persona, signature move, and stat line for each, then choreographs a 5-beat arc and picks a winner.
How it works
A four-model pipeline β every model under the hackathon's 32B cap:
| Stage | Role | Model | Params |
|---|---|---|---|
| 1 | Fight director β reads both photos β fight card JSON | MiniCPM-V-2_6 (OpenBMB) | 8B |
| 2 | Claymation keyframe reel | FLUX.1-schnell (BFL) | 12B |
| 3 | Keyframes β continuous fight video (opt-in) | LTX-Video (Lightricks) | 2B |
| 3 | Two-announcer voiceover | Qwen3-TTS VoiceDesign (Qwen) | 1.7B |
The entire fight pipeline runs on small models β all β€12B, well under the 32B cap. No giant foundation model anywhere: a clever chain of small specialists (read β draw β animate β voice) does the whole show. That's the Build Small ethos.
photo A + photo B
βββΆ Stage 1 MiniCPM-V-2_6 β fight card (fighters, 5 beats, 2-announcer commentary, winner)
βββΆ Stage 2 FLUX.1-schnell β 5 claymation keyframes
βββΆ Stage 3 (Animate) LTX-Video β chained clips
+ Qwen3-TTS β Nick/Johnny voiceover + crowd SFX + burned-in captions
β one MP4
The reel (Stages 1β2) is the fast default; Animate (Stage 3) is opt-in because it's the GPU-heavy step.
Tech
- Frontend: this Gradio 6 Space (custom CSS, no template look).
- Backend: two Modal GPU apps β
deathmatch(MiniCPM A10G + FLUX L40S + ASGI gateway) anddeathmatch-video(ComfyUI + LTX on H100, Qwen3-TTS on A10G). - Audio/video post: pure
ffmpegβ xfade stitch, fit-to-beat TTS mixing, synthesized crowd SFX, burned-in captions. - Wired to the backend via
DEATHMATCH_API_URL(set automatically on deploy).
Sponsors we built on
- π€ OpenBMB β MiniCPM-V-2_6. The whole show starts here: one 8B vision-language model reads both fighter photos in a single call and returns the entire fight card as JSON β names, personas, signature moves, stat lines, a 5-beat script, two-announcer commentary, and a winner. It is the ring director.
- β² Modal. Every GPU stage runs on Modal β MiniCPM (A10G), FLUX (L40S), and
the ComfyUI + LTX-Video + Qwen3-TTS video app (H100 / A10G) β behind warm-pooled
@app.clscontainers and an ASGI gateway. The HF Space stays CPU-only and calls Modal over HTTP; onemodal deployships each app.
Run it locally (no GPU)
DEATHMATCH_MOCK=1 python app.py
Mock mode exercises the entire UX on CPU β canned fight card, placeholder keyframe reel, and a stand-in fight video β so you can see the whole flow without burning a single GPU minute.
