TokForge β€” DreamShaper-7 + LCM GGUF (Q4_0)

A Q4_0-quantized single self-contained GGUF for stable-diffusion.cpp, packaging DreamShaper-7 (SD1.5, realistic finetune) with the LCM-LoRA fused into the UNet for fast, guidance-free few-step sampling.

This is the 6 GB-tier fast image route for the TokForge apps β€” the smaller sibling of TokForge-DreamShaper-LCM-GGUF (f16, ~2.1 GB). It renders the same coherent people and hands (no SD-Turbo body-horror) at the same few-step LCM speed class, but at a smaller download + RAM footprint so it fits the 6 GB device tier where the f16 build is gated out.

Files

File Size Precision Contents
dreamshaper-7-lcm-q4_0.gguf ~1.63 GB Q4_0 (mixed) CLIP text encoder (F16) + LCM-fused UNet (Q4_0 linears/1Γ—1 convs, F16 3Γ—3 convs) + VAE

MD5SUMS and manifest.json carry the integrity hash + render defaults.

Precision details (CLIP-safe quantization)

This GGUF was quantized with stable-diffusion.cpp's own -M convert --type q4_0, whose tensor_should_be_converted rule protects embeddings and norms β€” it keeps every tensor whose name contains embedding (and all .bias / .scale / norm tensors) at F16. So the CLIP text encoder's token_embedding.weight stays F16.

This is deliberate and required: a block-quantized CLIP token-embedding (the failure mode of some external blanket-quantizers) produces empty CLIP conditioning through ggml_get_rows and makes sd.cpp abort at conditioner.hpp GGML_ASSERT(!chunk_hidden_states.empty()). Keeping CLIP at F16 avoids that entirely.

Note the file is ~1.63 GB rather than ~0.6 GB because SD-1.5's UNet is dominated by 3Γ—3 convolution weights (ne[0] = 3), which are not divisible by Q4_0's block size of 32 and so are kept at F16 by ggml (the same reason the working gpustack Q4_0 SD-1.5 GGUF is ~1.75 GB). The Q4_0 win comes from the attention/projection linears and 1Γ—1 convs. Verified type histogram: 690 F16 tensors + 440 Q4_0 tensors; CLIP token_embedding.weight = F16.

Recommended render settings (LCM, few-step, guidance-free)

sampler:      lcm
scheduler:    lcm
steps:        6   (4 = fast floor, 8 = extra refinement)
cfg-scale:    1.5
resolution:   512x512 (SD1.5 native; 256/384 presets also work)

stable-diffusion.cpp CLI example

sd -M img_gen \
  -m dreamshaper-7-lcm-q4_0.gguf \
  -p "a busy outdoor street market crowded with people shopping, candid street photo" \
  --sampling-method lcm --scheduler lcm --steps 6 --cfg-scale 1.5 \
  -W 512 -H 512 -o out.png

Provenance & how this was built

  1. Started from TokForge-DreamShaper-LCM-GGUF dreamshaper-7-lcm-f16.gguf (DreamShaper-7 with LCM-LoRA fused into the UNet, exported to a single SD1.5 GGUF, f16).
  2. Quantized to Q4_0 with stable-diffusion.cpp (leejet): sd -M convert -m dreamshaper-7-lcm-f16.gguf -o dreamshaper-7-lcm-q4_0.gguf --type q4_0. sd.cpp's quantizer keeps CLIP / embeddings / norms at F16 (model_loader.cpp tensor_should_be_converted).
  3. Verified the CLIP token_embedding.weight is F16 in the output header, and ran a 6-step LCM smoke render that produced a coherent on-prompt image.

License & attribution

No additional restrictions are imposed by this repackaging; the original OpenRAIL-M terms and attribution requirements propagate to this GGUF and any images generated with it.

Downloads last month
-
GGUF
Model size
1B params
Architecture
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for darkmaniac7/TokForge-DreamShaper-LCM-GGUF-q4

Quantized
(3)
this model

Collection including darkmaniac7/TokForge-DreamShaper-LCM-GGUF-q4