---
license: apache-2.0
pipeline_tag: feature-extraction
base_model: mlabonne/gemma-3-12b-it-qat-abliterated
tags:
- nvfp4
- gemma-3
- comfyui
- abliterated
- qat
- ltx-2
- ltx-2.3
- uncensored
model_type: gemma
language:
- en
---

# 🌍 Gemma‑3‑12B‑QAT‑Abliterated — Sikaworld FP4 Editions

**Blackwell‑optimized FP4 text encoders for LTX‑2 and 2.3, based on mlabonne’s improved Abliteration technique.**

## 🌐 Overview
The NVIDIA Blackwell architecture update introduced first‑class support for FP4/NVFP4 inference, enabling extremely fast and memory‑efficient text encoders. At the same time, the LTX‑2 development team officially recommends **Gemma‑QAT‑based encoders** for video generation due to their stable activation distributions, strong semantic gradients, and robust temporal behavior.

This repository provides two custom **FP4 variants** of the uncensored Gemma‑3‑12B‑QAT model created by mlabonne using his improved Abliteration v2 method.

Both models are fully uncensored, explicitly optimized for LTX‑2 and of course LTX-2.3, and designed to deliver strong motion vectors while maintaining spatial coherence.

---

## 📦 The Two FP4 Editions

### 🛡️ FP4 High‑Fidelity Edition (Protected Layers) *[Recommended]*
This version uses a surgical **mixed‑precision stabilizer** to preserve facial symmetry and spatial coherence. 
* **Layers 0–1** (Input embeddings) kept in **BF16**.
* **Layers 44–47** (Final output projections) kept in **BF16**.
* All **LayerNorms and Biases** kept in **BF16**.
* All mid-transformer layers quantized to **FP4**.

**Best for:** Maximum stability, minimal facial drift, consistent anatomy, and strong but mathematically controlled motion vectors. Highly recommended for complex I2V/T2V tasks.

### 🚀 FP4 Pure Edition (No Protected Layers)
This version is a relentless, flat FP4/NVFP4 quantization of the Abliterated QAT model.
* **All** transformer layers (0-47) quantized to **FP4**.
* Only LayerNorms and Biases remain in BF16.

**Best for:** Maximum performance, the absolute lowest VRAM footprint, and the fastest inference on Blackwell GPUs. It trades a tiny amount of spatial stability for raw speed and more intense, aggressive motion vectors.

---

## 🧰 Usage in ComfyUI

1. Download your preferred `.safetensors` file.
2. Place the file inside your ComfyUI models folder:
   `ComfyUI/models/text_encoders/`
3. Load the model via the standard **DualCLIPLoader** or **LTX‑2 Text Encoder Loader**.
4. **Recommended dtype:** `fp8_e4m3fn` (Note: The BF16‑protected layers will automatically be respected and kept in BF16 by ComfyUI's loader).

> **💡 Prompting Tip:** Start your prompts with direct action verbs (e.g., *"running"*, *"falling"*, *"embracing"*, *"exploding"*). FP4 models respond extremely well to dynamic, upfront phrasing.

---

## 🔬 Technical Background

### Why Gemma‑QAT for LTX‑2?
The LTX‑2 base model architecture reacts very sensitively to the text encoder's conditioning. The LTX‑team recommends QAT (Quantization-Aware Training) encoders because they provide:
* Stable activation distributions
* Smooth residual streams
* Strong temporal gradients
* Robust spatial alignment
* Heavily reduced “frozen video” (motion collapse) behavior

### The Abliteration V2 Magic
These models are derived from `mlabonne/gemma-3-12b-it-qat-abliterated`. Abliteration is a multi‑step orthogonalization process, not just a simple deletion. It compares residual streams from harmful vs. harmless samples, computes a "refusal direction", and subtracts this direction natively from the hidden states of target modules. The result is a fully uncensored, high‑fidelity instruction model with **loud and uninhibited semantic gradients** — acting as the perfect cure for static/frozen LTX‑2 generations.

### Why FP4 for Blackwell GPUs?
NVIDIA's latest Blackwell Tensor Cores are explicitly optimized for FP4/NVFP4 mathematical operations. This format offers:
* Significantly higher throughput than FP8
* Extremely low VRAM footprint 
* Faster long‑prompt (prefill) inference
* Decreased pressure on memory bandwidth

These FP4 editions feature a pure FP4 tensor layout (with appropriate micro-block and global scales) fully compatible with NVFP4 hardware acceleration on RTX 50‑series and data center hardware.

---

## 📊 Technical Summary

| Component | 🛡️ High‑Fidelity Edition | 🚀 Pure Edition |
| :--- | :--- | :--- |
| **Base Model** | mlabonne/gemma‑3‑12b‑it‑qat‑abliterated | mlabonne/gemma‑3‑12b‑it‑qat‑abliterated |
| **Quantization** | FP4 + BF16 stabilizer | Pure FP4 |
| **Protected Layers** | `0–1`, `44–47` | None |
| **Norms & Biases** | BF16 | BF16 |
| **Inference Speed** | Fast | Fastest |
| **Stability** | Highest | Moderate |
| **VRAM Usage** | Low | Lowest |

--
## 🏷️ Credits & Acknowledgments
* **Base Model & Abliteration v2:** [mlabonne](https://huggingface.co/mlabonne)
* **QAT Architecture & Gemma Weights:** [Google](https://huggingface.co/google)
* **FP4 Optimization, Hybrid Architecture & Stabilization:** Sikaworld
* **LTX‑2 & QAT Recommendation:** Lightricks / LTX‑Team