---
title: Text-to-3d Flux Trellis
emoji: 📉
colorFrom: pink
colorTo: gray
sdk: gradio
sdk_version: 5.16.0
python_version: "3.10"
app_file: app.py
pinned: false
license: mit
short_description: Text to 3D using DeepSeek-R1, FLUX.1-dev and TRELLIS
---

# Text to 3D — from a sentence to a downloadable 3D object

This Space turns a **one-line product idea** into an **interactive, downloadable
3D asset** by chaining three specialised models. It is geared toward *product
design*: describe an object, get a catalogue-style image, then a textured 3D mesh.

## Pipeline

```
Prompt simple ──▶ DeepSeek-R1 ──▶ FLUX.1-dev ──▶ TRELLIS ──▶ Vidéo + GLB
   (texte)         (raisonnement     (image          (objet 3D)
                    + prompt riche)    produit)
```

1. **DeepSeek-R1-Distill-Llama-8B — prompt design (text → text).**
   Acting as a product designer, the model *reasons* about the request (its
   chain-of-thought is shown in a dedicated accordion) and writes a detailed,
   photorealistic prompt for FLUX. Only the final prompt — never the reasoning —
   is forwarded downstream, so the image generator is not polluted by the
   `<think>` trace.

2. **FLUX.1-dev — image generation (text → image).**
   The detailed prompt is rendered as a clean product shot on a white background.

3. **TRELLIS (`microsoft/TRELLIS-image-large`) — 3D generation (image → 3D).**
   The image is converted into a 3D asset: a color + normals preview video, and a
   textured **GLB** that is viewable in the interactive 3D viewer and downloadable.

**Why generate an image before the 3D step?** TRELLIS is image-conditioned.
Producing a sharp, well-framed image first yields a much cleaner mesh and texture
than trying to go straight from text to 3D.

## Using the Space

- **Générer tout** runs the whole pipeline in one click, with a per-step status.
- Each stage also has its own button to replay it in isolation.
- The **Galerie** tab shows pre-rendered examples for an instant preview even when
  the ZeroGPU GPU is cold.
- The **Comment ça marche** tab documents the pipeline in-app.

> ⏳ **Cold start (ZeroGPU):** the first run loads DeepSeek-R1 (8B), FLUX and
> TRELLIS and can take several minutes; subsequent runs are much faster.

## Stack technique

| Rôle | Modèle |
| --- | --- |
| Raisonnement + ingénierie de prompt | `deepseek-ai/DeepSeek-R1-Distill-Llama-8B` |
| Texte → image | `black-forest-labs/FLUX.1-dev` |
| Image → 3D | `microsoft/TRELLIS-image-large` |
| UI | Gradio 5 (Blocks, `Model3D`, theming) |

## Choix techniques

- **ZeroGPU + GPU Blackwell (sm_120).** The Space targets the newer ZeroGPU
  hardware. `xformers` and `flash-attn` ship no kernels for `sm_120`, so the app
  forces PyTorch-native **`sdpa`** attention everywhere (`ATTN_BACKEND`,
  `SPARSE_ATTN_BACKEND`) and disables xformers for the `torch.hub` DINOv2
  conditioning model (`XFORMERS_DISABLED=1`).
- **Runtime-built CUDA extensions.** `diff_gaussian_rasterization` (the
  Mip-Splatting fork, needed for TRELLIS' kernel-size rasterization settings) and
  `nvdiffrast` are compiled from source at startup against the installed torch and
  `sm_120` (`TORCH_CUDA_ARCH_LIST=12.0+PTX`, with PTX so the driver can JIT for
  newer archs) instead of shipping prebuilt wheels.
- **Reasoning/prompt separation.** A small parser splits the DeepSeek-R1 response
  on `</think>`, strips header lines and quotes, and gracefully falls back to
  treating the whole output as the prompt if the closing tag is missing.
- **gradio_client patch.** A shim works around a gradio_client 1.7.0 bug where
  boolean JSON schemas (produced by `gr.State`) crash the `/info` endpoint.

---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference