darkmaniac7 commited on
Commit
37b6fb7
Β·
verified Β·
1 Parent(s): 9403772

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +104 -0
README.md ADDED
@@ -0,0 +1,104 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: creativeml-openrail-m
3
+ license_name: tokforge-sd15-ipadapter-bundle
4
+ tags:
5
+ - text-to-image
6
+ - image-to-image
7
+ - stable-diffusion
8
+ - ip-adapter
9
+ - dreamshaper
10
+ - reference-image
11
+ - identity
12
+ - gguf
13
+ - stable-diffusion-cpp
14
+ - tokforge
15
+ base_model:
16
+ - Lykon/dreamshaper-7
17
+ - h94/IP-Adapter
18
+ pipeline_tag: text-to-image
19
+ ---
20
+
21
+ # TokForge β€” SD1.5 IP-Adapter (Reference Identity) bundle
22
+
23
+ The **reference-identity** image route for the [TokForge](https://tokforge.ai) Android app.
24
+ Attach a photo of a person, then render **that person** in any scene
25
+ (*"me as a superhero flying over New York"*). The **plus-face** IP-Adapter transfers
26
+ the **face only** while the **prompt drives the whole scene**.
27
+
28
+ This bundle runs on the on-device [`stable-diffusion.cpp`](https://github.com/leejet/stable-diffusion.cpp)
29
+ GGUF engine (TokForge's IP-Adapter port) on **CPU** and **Adreno OpenCL**. SD1.5 is
30
+ light enough for any 8 GB+ phone β€” the broadest-reach identity tier (lighter than the
31
+ SDXL PhotoMaker tier).
32
+
33
+ ## Files
34
+
35
+ | File | Size | License | Contents |
36
+ |------|------|---------|----------|
37
+ | `sd15-base-f16.gguf` | ~2.2 GB | CreativeML-OpenRAIL-M | **DreamShaper-7** (SD1.5 realistic finetune) β€” CLIP text encoder + UNet + VAE in one **f16** GGUF |
38
+ | `ip-adapter-plus-face_sd15.safetensors` | ~98 MB | Apache-2.0 | IP-Adapter **plus-face** (`h94/IP-Adapter`) β€” 16-token Resampler + decoupled cross-attn |
39
+ | `ip_adapter_clip_vision_vith.safetensors` | ~2.5 GB | MIT | OpenCLIP **ViT-H-14** image encoder (the plus-face path needs ViT-H, not bigG) |
40
+
41
+ `manifest.json` and `MD5SUMS` carry the integrity hashes + render defaults.
42
+
43
+ ### Why this base, and why f16 (not Q4)
44
+
45
+ The base is the **standard, non-LCM DreamShaper-7** β€” the same realistic SD1.5 finetune
46
+ TokForge ships on its other image tiers. It is converted at **f16** (full precision) so
47
+ the IP-Adapter's decoupled cross-attention and the face Resampler keep **subject quality**
48
+ high. A `q4_0`/emaonly base measurably weakens the transferred identity, so this bundle
49
+ deliberately uses f16.
50
+
51
+ ### Why plus-face (not the base adapter)
52
+
53
+ The **base** `ip-adapter_sd15` projects the whole pooled CLIP embedding (4 tokens) β†’ it
54
+ drags the reference's *entire scene* through (a car selfie came out *"the person in his car"*).
55
+ The **plus-face** Resampler extracts the **face only** (16 tokens from the ViT-H penultimate
56
+ hidden state) β†’ identity is preserved while the **prompt** controls the scene. The TokForge
57
+ sd.cpp IP-Adapter loader auto-detects plus-face by the presence of `image_proj.latents`.
58
+
59
+ ## How TokForge uses it
60
+
61
+ In the app: **Image** model picker β†’ download **"SD1.5 IP-Adapter (Reference Identity)"** β†’
62
+ attach a face photo as a reference under chat β†’ prompt the scene. The engine is invoked as:
63
+
64
+ ```bash
65
+ sd -M img_gen \
66
+ -m sd15-base-f16.gguf \
67
+ -p "as a superhero flying over New York" \
68
+ -n "<strong negative>" \
69
+ --clip_vision ip_adapter_clip_vision_vith.safetensors \
70
+ --ip-adapter ip-adapter-plus-face_sd15.safetensors \
71
+ --ip-adapter-image <your_face.jpg> \
72
+ --ip-adapter-scale 0.6 \
73
+ --cfg-scale 7.0 --sampling-method euler_a --scheduler discrete \
74
+ --steps 30 -H 512 -W 512
75
+ ```
76
+
77
+ ### Recommended render settings
78
+
79
+ | Setting | Value |
80
+ |---------|-------|
81
+ | sampler | `euler_a` |
82
+ | scheduler | `discrete` |
83
+ | steps | `30` (full quality; fewer = faster) |
84
+ | cfg-scale | `7.0` |
85
+ | ip-adapter-scale | `0.6` (β‰ˆ0.5–0.6 keeps the scene with recognizable identity; ~0.8 reconstructs the reference) |
86
+ | resolution | `512Γ—512` (SD1.5 native) |
87
+
88
+ ## Licenses
89
+
90
+ This is an aggregate of three independently-licensed components β€” each retains its own license:
91
+
92
+ - **DreamShaper-7 base** (`sd15-base-f16.gguf`) β€” **CreativeML-OpenRAIL-M** ([Lykon/dreamshaper-7](https://huggingface.co/Lykon/dreamshaper-7)). Use must comply with the OpenRAIL-M use-based restrictions.
93
+ - **IP-Adapter plus-face** (`ip-adapter-plus-face_sd15.safetensors`) β€” **Apache-2.0** ([h94/IP-Adapter](https://huggingface.co/h94/IP-Adapter)).
94
+ - **OpenCLIP ViT-H-14 image encoder** (`ip_adapter_clip_vision_vith.safetensors`) β€” **MIT** (OpenCLIP / LAION ViT-H-14).
95
+
96
+ > The non-commercial **IP-Adapter-FaceID** / InsightFace path is **NOT** used here β€” only the
97
+ > Apache-2.0 base + plus-face adapters from `h94/IP-Adapter`.
98
+
99
+ ## Provenance
100
+
101
+ - Base converted from `Lykon/dreamshaper-7` (diffusers) to a single f16 GGUF via the TokForge
102
+ `stable-diffusion.cpp` convert path (`-M convert --type f16`).
103
+ - Adapter + image encoder copied verbatim from `h94/IP-Adapter` (`models/ip-adapter-plus-face_sd15.safetensors`,
104
+ `models/image_encoder/model.safetensors`).