File size: 3,099 Bytes
491fb61
 
 
 
 
 
 
 
 
 
 
893b966
491fb61
a77a580
 
 
 
 
 
 
 
491fb61
93e149c
491fb61
 
 
 
93e149c
 
 
491fb61
93e149c
491fb61
 
dad0d61
 
93e149c
 
 
 
dad0d61
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73f42c6
 
 
dad0d61
491fb61
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
---
license: other
license_name: ideogram-4-non-commercial
license_link: https://huggingface.co/ideogram-ai/ideogram-4-fp8
base_model: ideogram-ai/ideogram-4-fp8
pipeline_tag: text-to-image
tags: [text-to-image, diffusion, flow-matching, quantization, gguf, q4_k, ideogram]
---

# Ideogram 4 — GGUF Q4_K (Transformer Lab)

A **GGUF Q4_K** (4.5 bits/weight) quantization of the Ideogram 4 DiT, sized for consumer GPUs.

⚠️ **Not a llama.cpp / stable-diffusion.cpp file.** Despite the `.gguf` extension, this
loads **only** via the included PyTorch `gguf_loader.py` + the `ideogram4` pipeline. It is
**not** compatible with llama.cpp, stable-diffusion.cpp, Ollama, etc.

ℹ️ **Quantized DiT only.** This checkpoint is the DiT (both CFG branches). To generate you
also need the **Qwen3-VL text encoder and VAE** from the base repo [`ideogram-ai/ideogram-4-fp8`](https://huggingface.co/ideogram-ai/ideogram-4-fp8)
and the custom inference code at [`github.com/ideogram-oss/ideogram4`](https://github.com/ideogram-oss/ideogram4).
The quantization recipe and loader are included **in this repo** (`recipe-q4_k.json`, `gguf_loader.py`).

## Why Q4_K
Q4_K is the **Pareto winner** on the quality-vs-memory frontier: at **10.4 GB** (the same
on-disk size class as the published NF4 build) it **beats NF4 on quality** by +0.84 Pick /
+2.93 CLIP on a 50-prompt slice. If you're tight on VRAM, this is the build to grab.

## Samples

![image (8)](https://cdn-uploads.huggingface.co/production/uploads/6316131329411a6864b13751/1gGu1ZK500Sw4F02Qofil.png)

## Benchmarks (preliminary — single n=50 slice)
- Pick 19.08 / CLIP 18.68 vs NF4 18.24 / 15.75 at equal size.
- Latency ~203 s/img (48 steps, 1024², RTX 3090); ~23% slower than NF4.
- Full-battery validation is in progress.

## Method
Weight-only GGUF Q4_K of the DiT linears (custom NumPy quantizer, verified bit-exact
against the gguf-py reference decoder); non-linear tensors kept F16.

## How to run (self-contained)

Everything you need is in this repo. The GGUF is the **quantized DiT only**, so
step 1 fetches the text encoder + VAE + the inference package.

```bash
# 1) one-time: install the ideogram4 package + download the base components
#    (needs your own access to the GATED base repo ideogram-ai/ideogram-4-fp8)
python download_deps.py

# 2) generate
python usage.py "a poster that says HELLO"
```

Files here:
- `ideogram4-q4_k.gguf` — the Q4_K quantized DiT (both CFG branches).
- `gguf_loader.py` — loads + dequantizes the GGUF into the pipeline (reference impl).
- `download_deps.py`, `usage.py` — setup + a minimal generation example.
- `recipe-q4_k.json` — the exact quantization recipe / tensor layout.

> `gguf_loader.py` is a **reference**: the dequant math is validated bit-exact, but the
> standalone loader hasn't been GPU-tested end-to-end yet — verify before production use.
> This is **not** a llama.cpp / stable-diffusion.cpp file; it loads only via this PyTorch
> path + the `ideogram4` pipeline.

## License
Derived from Ideogram 4 under its **non-commercial, research-only** license. See `LICENSE`.