transformerlab
/

ideogram-4-gguf-q4_k

Model card Files Files and versions

ideogram-4-gguf-q4_k / README.md

dadmobile's picture

Added custom GGUF loader note to README.md

893b966 verified 18 days ago

|

2.99 kB

	---
	license: other
	license_name: ideogram-4-non-commercial
	license_link: https://huggingface.co/ideogram-ai/ideogram-4-fp8
	base_model: ideogram-ai/ideogram-4-fp8
	pipeline_tag: text-to-image
	tags: [text-to-image, diffusion, flow-matching, quantization, gguf, q4_k, ideogram]
	---

	# Ideogram 4 — GGUF Q4_K (Transformer Lab)

	A GGUF Q4_K (4.5 bits/weight) quantization of the Ideogram 4 DiT, sized for consumer GPUs.

	> :warning: Not a llama.cpp / stable-diffusion.cpp file. Despite the `.gguf` extension, this
	> loads only via the included PyTorch `gguf_loader.py` + the `ideogram4` pipeline. It is
	> not compatible with llama.cpp, stable-diffusion.cpp, Ollama, etc.

	> ℹ️ Quantized DiT only. This checkpoint is the DiT (both CFG branches). To generate you
	> also need the Qwen3-VL text encoder and VAE from the base repo [`ideogram-ai/ideogram-4-fp8`](https://huggingface.co/ideogram-ai/ideogram-4-fp8)
	> and the custom inference code at [`github.com/ideogram-oss/ideogram4`](https://github.com/ideogram-oss/ideogram4).
	> The quantization recipe and loader are included in this repo (`recipe-q4_k.json`, `gguf_loader.py`).

	## Why this one
	Q4_K is the Pareto winner on the quality-vs-memory frontier: at 10.4 GB (the same
	on-disk size class as the published NF4 build) it beats NF4 on quality by +0.84 Pick /
	+2.93 CLIP on a 50-prompt slice. If you're tight on VRAM, this is the build to grab.

	## Method
	Weight-only GGUF Q4_K of the DiT linears (custom NumPy quantizer, verified bit-exact
	against the gguf-py reference decoder); non-linear tensors kept F16.

	## Numbers (preliminary — single n=50 slice)
	- Pick 19.08 / CLIP 18.68 vs NF4 18.24 / 15.75 at equal size.
	- Latency ~203 s/img (48 steps, 1024², RTX 3090); ~23% slower than NF4.
	- Full-battery validation is in progress.

	## How to run (self-contained)

	Everything you need is in this repo. The GGUF is the quantized DiT only, so
	step 1 fetches the text encoder + VAE + the inference package.

	```bash
	# 1) one-time: install the ideogram4 package + download the base components
	# (needs your own access to the GATED base repo ideogram-ai/ideogram-4-fp8)
	python download_deps.py

	# 2) generate
	python usage.py "a poster that says HELLO"
	```

	Files here:
	- `ideogram4-q4_k.gguf` — the Q4_K quantized DiT (both CFG branches).
	- `gguf_loader.py` — loads + dequantizes the GGUF into the pipeline (reference impl).
	- `download_deps.py`, `usage.py` — setup + a minimal generation example.
	- `recipe-q4_k.json` — the exact quantization recipe / tensor layout.

	> `gguf_loader.py` is a reference: the dequant math is validated bit-exact, but the
	> standalone loader hasn't been GPU-tested end-to-end yet — verify before production use.
	> This is not a llama.cpp / stable-diffusion.cpp file; it loads only via this PyTorch
	> path + the `ideogram4` pipeline.

	## License
	Derived from Ideogram 4 under its non-commercial, research-only license. See `LICENSE`.