Upload README.md with huggingface_hub

37b6fb7 verified 17 days ago

4.61 kB

	---
	license: creativeml-openrail-m
	license_name: tokforge-sd15-ipadapter-bundle
	tags:
	- text-to-image
	- image-to-image
	- stable-diffusion
	- ip-adapter
	- dreamshaper
	- reference-image
	- identity
	- gguf
	- stable-diffusion-cpp
	- tokforge
	base_model:
	- Lykon/dreamshaper-7
	- h94/IP-Adapter
	pipeline_tag: text-to-image
	---

	# TokForge — SD1.5 IP-Adapter (Reference Identity) bundle

	The reference-identity image route for the [TokForge](https://tokforge.ai) Android app.
	Attach a photo of a person, then render that person in any scene
	("me as a superhero flying over New York"). The plus-face IP-Adapter transfers
	the face only while the prompt drives the whole scene.

	This bundle runs on the on-device [`stable-diffusion.cpp`](https://github.com/leejet/stable-diffusion.cpp)
	GGUF engine (TokForge's IP-Adapter port) on CPU and Adreno OpenCL. SD1.5 is
	light enough for any 8 GB+ phone — the broadest-reach identity tier (lighter than the
	SDXL PhotoMaker tier).

	## Files

	\| File \| Size \| License \| Contents \|
	\|------\|------\|---------\|----------\|
	\| `sd15-base-f16.gguf` \| ~2.2 GB \| CreativeML-OpenRAIL-M \| DreamShaper-7 (SD1.5 realistic finetune) — CLIP text encoder + UNet + VAE in one f16 GGUF \|
	\| `ip-adapter-plus-face_sd15.safetensors` \| ~98 MB \| Apache-2.0 \| IP-Adapter plus-face (`h94/IP-Adapter`) — 16-token Resampler + decoupled cross-attn \|
	\| `ip_adapter_clip_vision_vith.safetensors` \| ~2.5 GB \| MIT \| OpenCLIP ViT-H-14 image encoder (the plus-face path needs ViT-H, not bigG) \|

	`manifest.json` and `MD5SUMS` carry the integrity hashes + render defaults.

	### Why this base, and why f16 (not Q4)

	The base is the standard, non-LCM DreamShaper-7 — the same realistic SD1.5 finetune
	TokForge ships on its other image tiers. It is converted at f16 (full precision) so
	the IP-Adapter's decoupled cross-attention and the face Resampler keep subject quality
	high. A `q4_0`/emaonly base measurably weakens the transferred identity, so this bundle
	deliberately uses f16.

	### Why plus-face (not the base adapter)

	The base `ip-adapter_sd15` projects the whole pooled CLIP embedding (4 tokens) → it
	drags the reference's entire scene through (a car selfie came out "the person in his car").
	The plus-face Resampler extracts the face only (16 tokens from the ViT-H penultimate
	hidden state) → identity is preserved while the prompt controls the scene. The TokForge
	sd.cpp IP-Adapter loader auto-detects plus-face by the presence of `image_proj.latents`.

	## How TokForge uses it

	In the app: Image model picker → download "SD1.5 IP-Adapter (Reference Identity)" →
	attach a face photo as a reference under chat → prompt the scene. The engine is invoked as:

	```bash
	sd -M img_gen \
	-m sd15-base-f16.gguf \
	-p "as a superhero flying over New York" \
	-n "<strong negative>" \
	--clip_vision ip_adapter_clip_vision_vith.safetensors \
	--ip-adapter ip-adapter-plus-face_sd15.safetensors \
	--ip-adapter-image <your_face.jpg> \
	--ip-adapter-scale 0.6 \
	--cfg-scale 7.0 --sampling-method euler_a --scheduler discrete \
	--steps 30 -H 512 -W 512
	```

	### Recommended render settings

	\| Setting \| Value \|
	\|---------\|-------\|
	\| sampler \| `euler_a` \|
	\| scheduler \| `discrete` \|
	\| steps \| `30` (full quality; fewer = faster) \|
	\| cfg-scale \| `7.0` \|
	\| ip-adapter-scale \| `0.6` (≈0.5–0.6 keeps the scene with recognizable identity; ~0.8 reconstructs the reference) \|
	\| resolution \| `512×512` (SD1.5 native) \|

	## Licenses

	This is an aggregate of three independently-licensed components — each retains its own license:

	- DreamShaper-7 base (`sd15-base-f16.gguf`) — CreativeML-OpenRAIL-M ([Lykon/dreamshaper-7](https://huggingface.co/Lykon/dreamshaper-7)). Use must comply with the OpenRAIL-M use-based restrictions.
	- IP-Adapter plus-face (`ip-adapter-plus-face_sd15.safetensors`) — Apache-2.0 ([h94/IP-Adapter](https://huggingface.co/h94/IP-Adapter)).
	- OpenCLIP ViT-H-14 image encoder (`ip_adapter_clip_vision_vith.safetensors`) — MIT (OpenCLIP / LAION ViT-H-14).

	> The non-commercial IP-Adapter-FaceID / InsightFace path is NOT used here — only the
	> Apache-2.0 base + plus-face adapters from `h94/IP-Adapter`.

	## Provenance

	- Base converted from `Lykon/dreamshaper-7` (diffusers) to a single f16 GGUF via the TokForge
	`stable-diffusion.cpp` convert path (`-M convert --type f16`).
	- Adapter + image encoder copied verbatim from `h94/IP-Adapter` (`models/ip-adapter-plus-face_sd15.safetensors`,
	`models/image_encoder/model.safetensors`).