| --- |
| license: creativeml-openrail-m |
| license_name: tokforge-sd15-ipadapter-bundle |
| tags: |
| - text-to-image |
| - image-to-image |
| - stable-diffusion |
| - ip-adapter |
| - dreamshaper |
| - reference-image |
| - identity |
| - gguf |
| - stable-diffusion-cpp |
| - tokforge |
| base_model: |
| - Lykon/dreamshaper-7 |
| - h94/IP-Adapter |
| pipeline_tag: text-to-image |
| --- |
| |
| # TokForge β SD1.5 IP-Adapter (Reference Identity) bundle |
|
|
| The **reference-identity** image route for the [TokForge](https://tokforge.ai) Android app. |
| Attach a photo of a person, then render **that person** in any scene |
| (*"me as a superhero flying over New York"*). The **plus-face** IP-Adapter transfers |
| the **face only** while the **prompt drives the whole scene**. |
|
|
| This bundle runs on the on-device [`stable-diffusion.cpp`](https://github.com/leejet/stable-diffusion.cpp) |
| GGUF engine (TokForge's IP-Adapter port) on **CPU** and **Adreno OpenCL**. SD1.5 is |
| light enough for any 8 GB+ phone β the broadest-reach identity tier (lighter than the |
| SDXL PhotoMaker tier). |
|
|
| ## Files |
|
|
| | File | Size | License | Contents | |
| |------|------|---------|----------| |
| | `sd15-base-f16.gguf` | ~2.2 GB | CreativeML-OpenRAIL-M | **DreamShaper-7** (SD1.5 realistic finetune) β CLIP text encoder + UNet + VAE in one **f16** GGUF | |
| | `ip-adapter-plus-face_sd15.safetensors` | ~98 MB | Apache-2.0 | IP-Adapter **plus-face** (`h94/IP-Adapter`) β 16-token Resampler + decoupled cross-attn | |
| | `ip_adapter_clip_vision_vith.safetensors` | ~2.5 GB | MIT | OpenCLIP **ViT-H-14** image encoder (the plus-face path needs ViT-H, not bigG) | |
|
|
| `manifest.json` and `MD5SUMS` carry the integrity hashes + render defaults. |
|
|
| ### Why this base, and why f16 (not Q4) |
|
|
| The base is the **standard, non-LCM DreamShaper-7** β the same realistic SD1.5 finetune |
| TokForge ships on its other image tiers. It is converted at **f16** (full precision) so |
| the IP-Adapter's decoupled cross-attention and the face Resampler keep **subject quality** |
| high. A `q4_0`/emaonly base measurably weakens the transferred identity, so this bundle |
| deliberately uses f16. |
|
|
| ### Why plus-face (not the base adapter) |
|
|
| The **base** `ip-adapter_sd15` projects the whole pooled CLIP embedding (4 tokens) β it |
| drags the reference's *entire scene* through (a car selfie came out *"the person in his car"*). |
| The **plus-face** Resampler extracts the **face only** (16 tokens from the ViT-H penultimate |
| hidden state) β identity is preserved while the **prompt** controls the scene. The TokForge |
| sd.cpp IP-Adapter loader auto-detects plus-face by the presence of `image_proj.latents`. |
|
|
| ## How TokForge uses it |
|
|
| In the app: **Image** model picker β download **"SD1.5 IP-Adapter (Reference Identity)"** β |
| attach a face photo as a reference under chat β prompt the scene. The engine is invoked as: |
|
|
| ```bash |
| sd -M img_gen \ |
| -m sd15-base-f16.gguf \ |
| -p "as a superhero flying over New York" \ |
| -n "<strong negative>" \ |
| --clip_vision ip_adapter_clip_vision_vith.safetensors \ |
| --ip-adapter ip-adapter-plus-face_sd15.safetensors \ |
| --ip-adapter-image <your_face.jpg> \ |
| --ip-adapter-scale 0.6 \ |
| --cfg-scale 7.0 --sampling-method euler_a --scheduler discrete \ |
| --steps 30 -H 512 -W 512 |
| ``` |
|
|
| ### Recommended render settings |
|
|
| | Setting | Value | |
| |---------|-------| |
| | sampler | `euler_a` | |
| | scheduler | `discrete` | |
| | steps | `30` (full quality; fewer = faster) | |
| | cfg-scale | `7.0` | |
| | ip-adapter-scale | `0.6` (β0.5β0.6 keeps the scene with recognizable identity; ~0.8 reconstructs the reference) | |
| | resolution | `512Γ512` (SD1.5 native) | |
|
|
| ## Licenses |
|
|
| This is an aggregate of three independently-licensed components β each retains its own license: |
|
|
| - **DreamShaper-7 base** (`sd15-base-f16.gguf`) β **CreativeML-OpenRAIL-M** ([Lykon/dreamshaper-7](https://huggingface.co/Lykon/dreamshaper-7)). Use must comply with the OpenRAIL-M use-based restrictions. |
| - **IP-Adapter plus-face** (`ip-adapter-plus-face_sd15.safetensors`) β **Apache-2.0** ([h94/IP-Adapter](https://huggingface.co/h94/IP-Adapter)). |
| - **OpenCLIP ViT-H-14 image encoder** (`ip_adapter_clip_vision_vith.safetensors`) β **MIT** (OpenCLIP / LAION ViT-H-14). |
|
|
| > The non-commercial **IP-Adapter-FaceID** / InsightFace path is **NOT** used here β only the |
| > Apache-2.0 base + plus-face adapters from `h94/IP-Adapter`. |
|
|
| ## Provenance |
|
|
| - Base converted from `Lykon/dreamshaper-7` (diffusers) to a single f16 GGUF via the TokForge |
| `stable-diffusion.cpp` convert path (`-M convert --type f16`). |
| - Adapter + image encoder copied verbatim from `h94/IP-Adapter` (`models/ip-adapter-plus-face_sd15.safetensors`, |
| `models/image_encoder/model.safetensors`). |
|
|