--- license: apache-2.0 pipeline_tag: feature-extraction base_model: mlabonne/gemma-3-12b-it-qat-abliterated tags: - nvfp4 - gemma-3 - comfyui - abliterated - qat - ltx-2 - ltx-2.3 - uncensored model_type: gemma language: - en --- # 🌍 Gemma‑3‑12B‑QAT‑Abliterated — Sikaworld FP4 Editions **Blackwell‑optimized FP4 text encoders for LTX‑2 and 2.3, based on mlabonne’s improved Abliteration technique.** ## 🌐 Overview The NVIDIA Blackwell architecture update introduced first‑class support for FP4/NVFP4 inference, enabling extremely fast and memory‑efficient text encoders. At the same time, the LTX‑2 development team officially recommends **Gemma‑QAT‑based encoders** for video generation due to their stable activation distributions, strong semantic gradients, and robust temporal behavior. This repository provides two custom **FP4 variants** of the uncensored Gemma‑3‑12B‑QAT model created by mlabonne using his improved Abliteration v2 method. Both models are fully uncensored, explicitly optimized for LTX‑2 and of course LTX-2.3, and designed to deliver strong motion vectors while maintaining spatial coherence. --- ## 📦 The Two FP4 Editions ### 🛡️ FP4 High‑Fidelity Edition (Protected Layers) *[Recommended]* This version uses a surgical **mixed‑precision stabilizer** to preserve facial symmetry and spatial coherence. * **Layers 0–1** (Input embeddings) kept in **BF16**. * **Layers 44–47** (Final output projections) kept in **BF16**. * All **LayerNorms and Biases** kept in **BF16**. * All mid-transformer layers quantized to **FP4**. **Best for:** Maximum stability, minimal facial drift, consistent anatomy, and strong but mathematically controlled motion vectors. Highly recommended for complex I2V/T2V tasks. ### 🚀 FP4 Pure Edition (No Protected Layers) This version is a relentless, flat FP4/NVFP4 quantization of the Abliterated QAT model. * **All** transformer layers (0-47) quantized to **FP4**. * Only LayerNorms and Biases remain in BF16. **Best for:** Maximum performance, the absolute lowest VRAM footprint, and the fastest inference on Blackwell GPUs. It trades a tiny amount of spatial stability for raw speed and more intense, aggressive motion vectors. --- ## 🧰 Usage in ComfyUI 1. Download your preferred `.safetensors` file. 2. Place the file inside your ComfyUI models folder: `ComfyUI/models/text_encoders/` 3. Load the model via the standard **DualCLIPLoader** or **LTX‑2 Text Encoder Loader**. 4. **Recommended dtype:** `fp8_e4m3fn` (Note: The BF16‑protected layers will automatically be respected and kept in BF16 by ComfyUI's loader). > **💡 Prompting Tip:** Start your prompts with direct action verbs (e.g., *"running"*, *"falling"*, *"embracing"*, *"exploding"*). FP4 models respond extremely well to dynamic, upfront phrasing. --- ## 🔬 Technical Background ### Why Gemma‑QAT for LTX‑2? The LTX‑2 base model architecture reacts very sensitively to the text encoder's conditioning. The LTX‑team recommends QAT (Quantization-Aware Training) encoders because they provide: * Stable activation distributions * Smooth residual streams * Strong temporal gradients * Robust spatial alignment * Heavily reduced “frozen video” (motion collapse) behavior ### The Abliteration V2 Magic These models are derived from `mlabonne/gemma-3-12b-it-qat-abliterated`. Abliteration is a multi‑step orthogonalization process, not just a simple deletion. It compares residual streams from harmful vs. harmless samples, computes a "refusal direction", and subtracts this direction natively from the hidden states of target modules. The result is a fully uncensored, high‑fidelity instruction model with **loud and uninhibited semantic gradients** — acting as the perfect cure for static/frozen LTX‑2 generations. ### Why FP4 for Blackwell GPUs? NVIDIA's latest Blackwell Tensor Cores are explicitly optimized for FP4/NVFP4 mathematical operations. This format offers: * Significantly higher throughput than FP8 * Extremely low VRAM footprint * Faster long‑prompt (prefill) inference * Decreased pressure on memory bandwidth These FP4 editions feature a pure FP4 tensor layout (with appropriate micro-block and global scales) fully compatible with NVFP4 hardware acceleration on RTX 50‑series and data center hardware. --- ## 📊 Technical Summary | Component | 🛡️ High‑Fidelity Edition | 🚀 Pure Edition | | :--- | :--- | :--- | | **Base Model** | mlabonne/gemma‑3‑12b‑it‑qat‑abliterated | mlabonne/gemma‑3‑12b‑it‑qat‑abliterated | | **Quantization** | FP4 + BF16 stabilizer | Pure FP4 | | **Protected Layers** | `0–1`, `44–47` | None | | **Norms & Biases** | BF16 | BF16 | | **Inference Speed** | Fast | Fastest | | **Stability** | Highest | Moderate | | **VRAM Usage** | Low | Lowest | -- ## 🏷️ Credits & Acknowledgments * **Base Model & Abliteration v2:** [mlabonne](https://huggingface.co/mlabonne) * **QAT Architecture & Gemma Weights:** [Google](https://huggingface.co/google) * **FP4 Optimization, Hybrid Architecture & Stabilization:** Sikaworld * **LTX‑2 & QAT Recommendation:** Lightricks / LTX‑Team