froggeric commited on
Commit
7c41485
·
verified ·
1 Parent(s): 2f14381

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +20 -0
README.md CHANGED
@@ -170,6 +170,26 @@ This approach was submitted as a pull request to Heretic but was not merged —
170
 
171
  ---
172
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
173
  ## Sampling
174
 
175
  From the official Qwen authors. Reserve 128K+ context for thinking mode.
 
170
 
171
  ---
172
 
173
+ ## How it compares
174
+
175
+ ### Community results
176
+
177
+ r/LocalLLaMA users have been A/B-testing various uncensored Qwen 3.6 variants — [Heretic](https://github.com/p-e-w/heretic), HauhauCS Aggressive, abliterix, and simple orthogonal projection. The pattern is consistent: **Heretic produces the best balance of refusal removal and output quality**.
178
+
179
+ [Community discussion →](https://www.reddit.com/r/LocalLLaMA/comments/1sw5fb7/qwen36_35b_a3b_heretic_kld_00015_incredible_model/)
180
+
181
+ ### Why
182
+
183
+ Most abliteration methods treat all layers identically. Qwen 3.6's hybrid attention (3:1 linear-to-softmax ratio) means a single parameter set either under-abliterate the DeltaNet blocks or over-abliterate the softmax blocks. Architecture-aware abliteration — separate parameters per attention type — is the key differentiator.
184
+
185
+ ### A note on SSM conv1d "repair"
186
+
187
+ Some uncensored variants apply a pre-processing step that rescales SSM conv1d weights before abliteration, claiming to fix "outlier" tensors in the DeltaNet linear attention layers. This technique (originating as "Sig-ScaleSync") was benchmarked with **284 data points** across perplexity, needle-in-a-haystack, and repetition tests at multiple context lengths (4K–128K). Result: **perplexity degraded at every length with no improvement** in NIAH or repetition. The unrepaired original weights perform best.
188
+
189
+ Abliterating a degraded baseline can yield a lower measured KL divergence — but that measures distance from a worse starting point, not better preservation of the original model's capabilities.
190
+
191
+ ---
192
+
193
  ## Sampling
194
 
195
  From the official Qwen authors. Reserve 128K+ context for thinking mode.