| --- |
| license: other |
| license_name: ideogram-4-non-commercial |
| license_link: https://huggingface.co/ideogram-ai/ideogram-4-fp8 |
| base_model: ideogram-ai/ideogram-4-fp8 |
| pipeline_tag: text-to-image |
| tags: [text-to-image, diffusion, flow-matching, quantization, gguf, q4_k, ideogram] |
| --- |
| |
| # Ideogram 4 — GGUF Q4_K (Transformer Lab) |
| |
| A **GGUF Q4_K** (4.5 bits/weight) quantization of the Ideogram 4 DiT, sized for consumer GPUs. |
| |
| > :warning: **Not a llama.cpp / stable-diffusion.cpp file.** Despite the `.gguf` extension, this |
| > loads **only** via the included PyTorch `gguf_loader.py` + the `ideogram4` pipeline. It is |
| > **not** compatible with llama.cpp, stable-diffusion.cpp, Ollama, etc. |
|
|
| > ℹ️ **Quantized DiT only.** This checkpoint is the DiT (both CFG branches). To generate you |
| > also need the **Qwen3-VL text encoder and VAE** from the base repo [`ideogram-ai/ideogram-4-fp8`](https://huggingface.co/ideogram-ai/ideogram-4-fp8) |
| > and the custom inference code at [`github.com/ideogram-oss/ideogram4`](https://github.com/ideogram-oss/ideogram4). |
| > The quantization recipe and loader are included **in this repo** (`recipe-q4_k.json`, `gguf_loader.py`). |
|
|
| ## Why this one |
| Q4_K is the **Pareto winner** on the quality-vs-memory frontier: at **10.4 GB** (the same |
| on-disk size class as the published NF4 build) it **beats NF4 on quality** by +0.84 Pick / |
| +2.93 CLIP on a 50-prompt slice. If you're tight on VRAM, this is the build to grab. |
| |
| ## Method |
| Weight-only GGUF Q4_K of the DiT linears (custom NumPy quantizer, verified bit-exact |
| against the gguf-py reference decoder); non-linear tensors kept F16. |
|
|
| ## Numbers (preliminary — single n=50 slice) |
| - Pick 19.08 / CLIP 18.68 vs NF4 18.24 / 15.75 at equal size. |
| - Latency ~203 s/img (48 steps, 1024², RTX 3090); ~23% slower than NF4. |
| - Full-battery validation is in progress. |
|
|
| ## How to run (self-contained) |
|
|
| Everything you need is in this repo. The GGUF is the **quantized DiT only**, so |
| step 1 fetches the text encoder + VAE + the inference package. |
|
|
| ```bash |
| # 1) one-time: install the ideogram4 package + download the base components |
| # (needs your own access to the GATED base repo ideogram-ai/ideogram-4-fp8) |
| python download_deps.py |
| |
| # 2) generate |
| python usage.py "a poster that says HELLO" |
| ``` |
|
|
| Files here: |
| - `ideogram4-q4_k.gguf` — the Q4_K quantized DiT (both CFG branches). |
| - `gguf_loader.py` — loads + dequantizes the GGUF into the pipeline (reference impl). |
| - `download_deps.py`, `usage.py` — setup + a minimal generation example. |
| - `recipe-q4_k.json` — the exact quantization recipe / tensor layout. |
|
|
| > `gguf_loader.py` is a **reference**: the dequant math is validated bit-exact, but the |
| > standalone loader hasn't been GPU-tested end-to-end yet — verify before production use. |
| > This is **not** a llama.cpp / stable-diffusion.cpp file; it loads only via this PyTorch |
| > path + the `ideogram4` pipeline. |
| |
| ## License |
| Derived from Ideogram 4 under its **non-commercial, research-only** license. See `LICENSE`. |
| |