darkmaniac7
/

TokForge-SD15-IPAdapter

+---
+license: creativeml-openrail-m
+license_name: tokforge-sd15-ipadapter-bundle
+tags:
+- text-to-image
+- image-to-image
+- stable-diffusion
+- ip-adapter
+- dreamshaper
+- reference-image
+- identity
+- gguf
+- stable-diffusion-cpp
+- tokforge
+base_model:
+- Lykon/dreamshaper-7
+- h94/IP-Adapter
+pipeline_tag: text-to-image
+---
+# TokForge — SD1.5 IP-Adapter (Reference Identity) bundle
+The **reference-identity** image route for the [TokForge](https://tokforge.ai) Android app.
+Attach a photo of a person, then render **that person** in any scene
+(*"me as a superhero flying over New York"*). The **plus-face** IP-Adapter transfers
+the **face only** while the **prompt drives the whole scene**.
+This bundle runs on the on-device [`stable-diffusion.cpp`](https://github.com/leejet/stable-diffusion.cpp)
+GGUF engine (TokForge's IP-Adapter port) on **CPU** and **Adreno OpenCL**. SD1.5 is
+light enough for any 8 GB+ phone — the broadest-reach identity tier (lighter than the
+SDXL PhotoMaker tier).
+## Files
+| File | Size | License | Contents |
+|------|------|---------|----------|
+| `sd15-base-f16.gguf` | ~2.2 GB | CreativeML-OpenRAIL-M | **DreamShaper-7** (SD1.5 realistic finetune) — CLIP text encoder + UNet + VAE in one **f16** GGUF |
+| `ip-adapter-plus-face_sd15.safetensors` | ~98 MB | Apache-2.0 | IP-Adapter **plus-face** (`h94/IP-Adapter`) — 16-token Resampler + decoupled cross-attn |
+| `ip_adapter_clip_vision_vith.safetensors` | ~2.5 GB | MIT | OpenCLIP **ViT-H-14** image encoder (the plus-face path needs ViT-H, not bigG) |
+`manifest.json` and `MD5SUMS` carry the integrity hashes + render defaults.
+### Why this base, and why f16 (not Q4)
+The base is the **standard, non-LCM DreamShaper-7** — the same realistic SD1.5 finetune
+TokForge ships on its other image tiers. It is converted at **f16** (full precision) so
+the IP-Adapter's decoupled cross-attention and the face Resampler keep **subject quality**
+high. A `q4_0`/emaonly base measurably weakens the transferred identity, so this bundle
+deliberately uses f16.
+### Why plus-face (not the base adapter)
+The **base** `ip-adapter_sd15` projects the whole pooled CLIP embedding (4 tokens) → it
+drags the reference's *entire scene* through (a car selfie came out *"the person in his car"*).
+The **plus-face** Resampler extracts the **face only** (16 tokens from the ViT-H penultimate
+hidden state) → identity is preserved while the **prompt** controls the scene. The TokForge
+sd.cpp IP-Adapter loader auto-detects plus-face by the presence of `image_proj.latents`.
+## How TokForge uses it
+In the app: **Image** model picker → download **"SD1.5 IP-Adapter (Reference Identity)"** →
+attach a face photo as a reference under chat → prompt the scene. The engine is invoked as:
+```bash
+sd -M img_gen \
+   -m sd15-base-f16.gguf \
+   -p "as a superhero flying over New York" \
+   -n "<strong negative>" \
+   --clip_vision ip_adapter_clip_vision_vith.safetensors \
+   --ip-adapter ip-adapter-plus-face_sd15.safetensors \
+   --ip-adapter-image <your_face.jpg> \
+   --ip-adapter-scale 0.6 \
+   --cfg-scale 7.0 --sampling-method euler_a --scheduler discrete \
+   --steps 30 -H 512 -W 512
+```
+### Recommended render settings
+| Setting | Value |
+|---------|-------|
+| sampler | `euler_a` |
+| scheduler | `discrete` |
+| steps | `30` (full quality; fewer = faster) |
+| cfg-scale | `7.0` |
+| ip-adapter-scale | `0.6` (≈0.5–0.6 keeps the scene with recognizable identity; ~0.8 reconstructs the reference) |
+| resolution | `512×512` (SD1.5 native) |
+## Licenses
+This is an aggregate of three independently-licensed components — each retains its own license:
+- **DreamShaper-7 base** (`sd15-base-f16.gguf`) — **CreativeML-OpenRAIL-M** ([Lykon/dreamshaper-7](https://huggingface.co/Lykon/dreamshaper-7)). Use must comply with the OpenRAIL-M use-based restrictions.
+- **IP-Adapter plus-face** (`ip-adapter-plus-face_sd15.safetensors`) — **Apache-2.0** ([h94/IP-Adapter](https://huggingface.co/h94/IP-Adapter)).
+- **OpenCLIP ViT-H-14 image encoder** (`ip_adapter_clip_vision_vith.safetensors`) — **MIT** (OpenCLIP / LAION ViT-H-14).
+> The non-commercial **IP-Adapter-FaceID** / InsightFace path is **NOT** used here — only the
+> Apache-2.0 base + plus-face adapters from `h94/IP-Adapter`.
+## Provenance
+- Base converted from `Lykon/dreamshaper-7` (diffusers) to a single f16 GGUF via the TokForge
+  `stable-diffusion.cpp` convert path (`-M convert --type f16`).
+- Adapter + image encoder copied verbatim from `h94/IP-Adapter` (`models/ip-adapter-plus-face_sd15.safetensors`,
+  `models/image_encoder/model.safetensors`).