TokForge — DreamShaper-7 + LCM GGUF (Q4_0)

A Q4_0-quantized single self-contained GGUF for stable-diffusion.cpp, packaging DreamShaper-7 (SD1.5, realistic finetune) with the LCM-LoRA fused into the UNet for fast, guidance-free few-step sampling.

This is the 6 GB-tier fast image route for the TokForge apps — the smaller sibling of TokForge-DreamShaper-LCM-GGUF (f16, ~2.1 GB). It renders the same coherent people and hands (no SD-Turbo body-horror) at the same few-step LCM speed class, but at a smaller download + RAM footprint so it fits the 6 GB device tier where the f16 build is gated out.

Files

File	Size	Precision	Contents
`dreamshaper-7-lcm-q4_0.gguf`	~1.63 GB	Q4_0 (mixed)	CLIP text encoder (F16) + LCM-fused UNet (Q4_0 linears/1×1 convs, F16 3×3 convs) + VAE

MD5SUMS and manifest.json carry the integrity hash + render defaults.

Precision details (CLIP-safe quantization)

This GGUF was quantized with stable-diffusion.cpp's own -M convert --type q4_0, whose tensor_should_be_converted rule protects embeddings and norms — it keeps every tensor whose name contains embedding (and all .bias / .scale / norm tensors) at F16. So the CLIP text encoder's token_embedding.weight stays F16.

This is deliberate and required: a block-quantized CLIP token-embedding (the failure mode of some external blanket-quantizers) produces empty CLIP conditioning through ggml_get_rows and makes sd.cpp abort at conditioner.hpp GGML_ASSERT(!chunk_hidden_states.empty()). Keeping CLIP at F16 avoids that entirely.

Note the file is ~1.63 GB rather than ~0.6 GB because SD-1.5's UNet is dominated by 3×3 convolution weights (ne[0] = 3), which are not divisible by Q4_0's block size of 32 and so are kept at F16 by ggml (the same reason the working gpustack Q4_0 SD-1.5 GGUF is ~1.75 GB). The Q4_0 win comes from the attention/projection linears and 1×1 convs. Verified type histogram: 690 F16 tensors + 440 Q4_0 tensors; CLIP token_embedding.weight = F16.

Recommended render settings (LCM, few-step, guidance-free)

sampler:      lcm
scheduler:    lcm
steps:        6   (4 = fast floor, 8 = extra refinement)
cfg-scale:    1.5
resolution:   512x512 (SD1.5 native; 256/384 presets also work)

stable-diffusion.cpp CLI example

sd -M img_gen \
  -m dreamshaper-7-lcm-q4_0.gguf \
  -p "a busy outdoor street market crowded with people shopping, candid street photo" \
  --sampling-method lcm --scheduler lcm --steps 6 --cfg-scale 1.5 \
  -W 512 -H 512 -o out.png

Provenance & how this was built

Started from TokForge-DreamShaper-LCM-GGUF dreamshaper-7-lcm-f16.gguf (DreamShaper-7 with LCM-LoRA fused into the UNet, exported to a single SD1.5 GGUF, f16).
Quantized to Q4_0 with stable-diffusion.cpp (leejet): sd -M convert -m dreamshaper-7-lcm-f16.gguf -o dreamshaper-7-lcm-q4_0.gguf --type q4_0. sd.cpp's quantizer keeps CLIP / embeddings / norms at F16 (model_loader.cpp tensor_should_be_converted).
Verified the CLIP token_embedding.weight is F16 in the output header, and ran a 6-step LCM smoke render that produced a coherent on-prompt image.

License & attribution

License: CreativeML OpenRAIL-M (inherited from DreamShaper-7 / Stable Diffusion 1.5). Use is subject to the OpenRAIL-M use restrictions.
Base model: DreamShaper-7 by Lykon — https://huggingface.co/Lykon/dreamshaper-7
Adapter: LCM-LoRA SD1.5 by Latent Consistency — https://huggingface.co/latent-consistency/lcm-lora-sdv1-5
Quantization tooling: stable-diffusion.cpp by leejet
Built on top of Stable Diffusion 1.5 (Runway/CompVis/Stability).

No additional restrictions are imposed by this repackaging; the original OpenRAIL-M terms and attribution requirements propagate to this GGUF and any images generated with it.