--- title: Text-to-3d Flux Trellis emoji: 📉 colorFrom: pink colorTo: gray sdk: gradio sdk_version: 5.16.0 python_version: "3.10" app_file: app.py pinned: false license: mit short_description: Text to 3D using DeepSeek-R1, FLUX.1-dev and TRELLIS --- # Text to 3D — from a sentence to a downloadable 3D object This Space turns a **one-line product idea** into an **interactive, downloadable 3D asset** by chaining three specialised models. It is geared toward *product design*: describe an object, get a catalogue-style image, then a textured 3D mesh. ## Pipeline ``` Prompt simple ──▶ DeepSeek-R1 ──▶ FLUX.1-dev ──▶ TRELLIS ──▶ Vidéo + GLB (texte) (raisonnement (image (objet 3D) + prompt riche) produit) ``` 1. **DeepSeek-R1-Distill-Llama-8B — prompt design (text → text).** Acting as a product designer, the model *reasons* about the request (its chain-of-thought is shown in a dedicated accordion) and writes a detailed, photorealistic prompt for FLUX. Only the final prompt — never the reasoning — is forwarded downstream, so the image generator is not polluted by the `` trace. 2. **FLUX.1-dev — image generation (text → image).** The detailed prompt is rendered as a clean product shot on a white background. 3. **TRELLIS (`microsoft/TRELLIS-image-large`) — 3D generation (image → 3D).** The image is converted into a 3D asset: a color + normals preview video, and a textured **GLB** that is viewable in the interactive 3D viewer and downloadable. **Why generate an image before the 3D step?** TRELLIS is image-conditioned. Producing a sharp, well-framed image first yields a much cleaner mesh and texture than trying to go straight from text to 3D. ## Using the Space - **Générer tout** runs the whole pipeline in one click, with a per-step status. - Each stage also has its own button to replay it in isolation. - The **Galerie** tab shows pre-rendered examples for an instant preview even when the ZeroGPU GPU is cold. - The **Comment ça marche** tab documents the pipeline in-app. > ⏳ **Cold start (ZeroGPU):** the first run loads DeepSeek-R1 (8B), FLUX and > TRELLIS and can take several minutes; subsequent runs are much faster. ## Stack technique | Rôle | Modèle | | --- | --- | | Raisonnement + ingénierie de prompt | `deepseek-ai/DeepSeek-R1-Distill-Llama-8B` | | Texte → image | `black-forest-labs/FLUX.1-dev` | | Image → 3D | `microsoft/TRELLIS-image-large` | | UI | Gradio 5 (Blocks, `Model3D`, theming) | ## Choix techniques - **ZeroGPU + GPU Blackwell (sm_120).** The Space targets the newer ZeroGPU hardware. `xformers` and `flash-attn` ship no kernels for `sm_120`, so the app forces PyTorch-native **`sdpa`** attention everywhere (`ATTN_BACKEND`, `SPARSE_ATTN_BACKEND`) and disables xformers for the `torch.hub` DINOv2 conditioning model (`XFORMERS_DISABLED=1`). - **Runtime-built CUDA extensions.** `diff_gaussian_rasterization` (the Mip-Splatting fork, needed for TRELLIS' kernel-size rasterization settings) and `nvdiffrast` are compiled from source at startup against the installed torch and `sm_120` (`TORCH_CUDA_ARCH_LIST=12.0+PTX`, with PTX so the driver can JIT for newer archs) instead of shipping prebuilt wheels. - **Reasoning/prompt separation.** A small parser splits the DeepSeek-R1 response on ``, strips header lines and quotes, and gracefully falls back to treating the whole output as the prompt if the closing tag is missing. - **gradio_client patch.** A shim works around a gradio_client 1.7.0 bug where boolean JSON schemas (produced by `gr.State`) crash the `/info` endpoint. --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference