Instructions to use unsloth/gemma-3-12b-it-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use unsloth/gemma-3-12b-it-GGUF with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="unsloth/gemma-3-12b-it-GGUF")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("unsloth/gemma-3-12b-it-GGUF")
model = AutoModelForMultimodalLM.from_pretrained("unsloth/gemma-3-12b-it-GGUF")

llama-cpp-python

How to use unsloth/gemma-3-12b-it-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="unsloth/gemma-3-12b-it-GGUF",
	filename="gemma-3-12b-it-BF16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": [
				{
					"type": "text",
					"text": "Describe this image in one sentence."
				},
				{
					"type": "image_url",
					"image_url": {
						"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
					}
				}
			]
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use unsloth/gemma-3-12b-it-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf unsloth/gemma-3-12b-it-GGUF:UD-Q4_K_XL
# Run inference directly in the terminal:
llama-cli -hf unsloth/gemma-3-12b-it-GGUF:UD-Q4_K_XL

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf unsloth/gemma-3-12b-it-GGUF:UD-Q4_K_XL
# Run inference directly in the terminal:
llama-cli -hf unsloth/gemma-3-12b-it-GGUF:UD-Q4_K_XL

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf unsloth/gemma-3-12b-it-GGUF:UD-Q4_K_XL
# Run inference directly in the terminal:
./llama-cli -hf unsloth/gemma-3-12b-it-GGUF:UD-Q4_K_XL

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf unsloth/gemma-3-12b-it-GGUF:UD-Q4_K_XL
# Run inference directly in the terminal:
./build/bin/llama-cli -hf unsloth/gemma-3-12b-it-GGUF:UD-Q4_K_XL

Use Docker

docker model run hf.co/unsloth/gemma-3-12b-it-GGUF:UD-Q4_K_XL

LM Studio
Jan

vLLM

How to use unsloth/gemma-3-12b-it-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "unsloth/gemma-3-12b-it-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "unsloth/gemma-3-12b-it-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/unsloth/gemma-3-12b-it-GGUF:UD-Q4_K_XL

SGLang

How to use unsloth/gemma-3-12b-it-GGUF with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "unsloth/gemma-3-12b-it-GGUF" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "unsloth/gemma-3-12b-it-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "unsloth/gemma-3-12b-it-GGUF" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "unsloth/gemma-3-12b-it-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Ollama
How to use unsloth/gemma-3-12b-it-GGUF with Ollama:
```
ollama run hf.co/unsloth/gemma-3-12b-it-GGUF:UD-Q4_K_XL
```

Unsloth Studio

How to use unsloth/gemma-3-12b-it-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for unsloth/gemma-3-12b-it-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for unsloth/gemma-3-12b-it-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for unsloth/gemma-3-12b-it-GGUF to start chatting

Atomic Chat new
Docker Model Runner
How to use unsloth/gemma-3-12b-it-GGUF with Docker Model Runner:
```
docker model run hf.co/unsloth/gemma-3-12b-it-GGUF:UD-Q4_K_XL
```

Lemonade

How to use unsloth/gemma-3-12b-it-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull unsloth/gemma-3-12b-it-GGUF:UD-Q4_K_XL

Run and chat with the model

lemonade run user.gemma-3-12b-it-GGUF-UD-Q4_K_XL

List all available models

lemonade list

Unexpected text model architecture type in GGUF file: 'gemma3'

#12

by Theoldsong - opened Jan 10

Discussion

Theoldsong

Jan 10

Unexpected text model architecture type in GGUF file: 'gemma3'

cadenlol

Jan 10

Same. Does not work with ComfyUI. Not sure why it is embedded on the workflow.

MrRyukami

Jan 10

I also ran into the same problem

ychristian008

Jan 11

follow this, it works: https://github.com/city96/ComfyUI-GGUF/pull/404

yui7854

Jan 13

Nothing works. LTX 2 is the worst AI model on ComfyUI. Without the audio, it would have already been forgotten.

doublemathew

Jan 21

This repo is the wrong one for LTX2. LTX2 uses a qat checkpoint, so unsloth/gemma-3-12b-it-qat-GGUF is the correct repo to use.

check https://huggingface.co/unsloth/LTX-2-GGUF/discussions/7 for workflow reference.

pasted below for convenience.

The GGUF's for LTX2 require a few more extra components to be loaded since the GGUF's don't have the vae's and embedding connectors packaged in the transformer model.

You also need to install two custom node packages:
https://github.com/city96/ComfyUI-GGUF
https://github.com/kijai/ComfyUI-KJNodes

Navigate to your ComfyUI model folder and run the following to download all the model weights:

# Can try any quant type 
ln -s "$(hf download unsloth/LTX-2-GGUF ltx-2-19b-dev-UD-Q2_K_XL.gguf --quiet)" unet/ltx-2-19b-dev-UD-Q2_K_XL.gguf
ln -s "$(hf download unsloth/LTX-2-GGUF vae/ltx-2-19b-dev_audio_vae.safetensors --quiet)" vae/ltx-2-19b-dev_audio_vae.safetensors
ln -s "$(hf download unsloth/LTX-2-GGUF vae/ltx-2-19b-dev_video_vae.safetensors --quiet)" vae/ltx-2-19b-dev_video_vae.safetensors

# Can try any quant type 
ln -s "$(hf download unsloth/gemma-3-12b-it-qat-GGUF gemma-3-12b-it-qat-UD-Q4_K_XL.gguf --quiet)" text_encoders/gemma-3-12b-it-qat-UD-Q4_K_XL.gguf
ln -s "$(hf download unsloth/gemma-3-12b-it-qat-GGUF mmproj-BF16.gguf --quiet)" text_encoders/gemma-3-12b-it-qat-mmproj-BF16.gguf
ln -s "$(hf download unsloth/LTX-2-GGUF text_encoders/ltx-2-19b-dev_embeddings_connectors.safetensors --quiet)" text_encoders/ltx-2-19b-dev_embeddings_connectors.safetensors
ln -s "$(hf download Lightricks/LTX-2 ltx-2-19b-distilled-lora-384.safetensors --quiet)" loras/ltx-2-19b-distilled-lora-384.safetensors
ln -s "$(hf download Lightricks/LTX-2 ltx-2-spatial-upscaler-x2-1.0.safetensors --quiet)" latent_upscale_models/ltx-2-spatial-upscaler-x2-1.0.safetensors

# Optional
ln -s "$(hf download Lightricks/LTX-2-19b-LoRA-Camera-Control-Dolly-Left ltx-2-19b-lora-camera-control-dolly-left.safetensors --quiet)" loras/ltx-2-19b-lora-camera-control-dolly-left.safetensors

This mp4 should have the ltx2 workflow used to generate the mp4 embedded in it, which references the models downloaded above.

T1-Faker1

Jan 28

哪里出错了呢?

shimmyshimmer

Unsloth AI org Jan 28

Check that you're using the latest comfyui and comfyui gguf.

Kwissbeats

Jan 29

Check that you're using the latest comfyui and comfyui gguf.

Had the exact same issue, updating guff to x.x.10 made it go past clip encoding. thanks

RJAcelite

Jan 31

This is my third day trying to fix this bs, i quit. I gave up, LTX2 sucks. They really hyped this up ltx2 saying it works even on low vram or no vram. Which is kinda sus and a nice bs ad now that nothing works. I can't believe how simple wan 2.2 is that it even works at 6gb vram so that's how simple wan 2.2 is. I have 12gb vram btw. I just really want to try the audio part. I don't know how others make it work but I gave up. Too complicated my hair turned white already. I'm just waiting for the time wan integrate audio on open source

and LTX2 will really be abandoned. Simplicity wins for me.

MachineMinded

Mar 8

•

edited Mar 8

Anyone figure this out? I'm trying to use a basic LTX 2.3 flow but having this exact issue when trying to load GGUF gemma3 in the dual clip loader.

doublemathew

Mar 9

This has a sample workflow embedded: https://huggingface.co/unsloth/LTX-2.3-GGUF/blob/main/unsloth_flowers.mp4

I would start with a fresh python venv and install comfy and the custom nodes.

python3 -m venv .diffusion
source .diffusion/bin/activate
git clone https://github.com/Comfy-Org/ComfyUI.git
cd ComfyUI
pip install -r requirements.txt
pip install huggingface_hub
cd custom_nodes/
git clone https://github.com/city96/ComfyUI-GGUF.git
cd ComfyUI-GGUF/
pip install -r requirements.txt 
cd ..
git clone https://github.com/kijai/ComfyUI-KJNodes.git
cd ComfyUI-KJNodes/
pip install -r requirements.txt 
cd ../../models

download the models once in the models dir.

ln -s "$(hf download unsloth/LTX-2.3-GGUF ltx-2.3-22b-dev-Q4_K_M.gguf --quiet)" unet/.
ln -s "$(hf download unsloth/LTX-2.3-GGUF vae/ltx-2.3-22b-dev_video_vae.safetensors --quiet)" vae/.
ln -s "$(hf download unsloth/LTX-2.3-GGUF vae/ltx-2.3-22b-dev_audio_vae.safetensors --quiet)" vae/.
ln -s "$(hf download unsloth/LTX-2.3-GGUF text_encoders/ltx-2.3-22b-dev_embeddings_connectors.safetensors --quiet)" text_encoders/.

ln -s "$(hf download Lightricks/LTX-2.3 ltx-2.3-22b-distilled-lora-384.safetensors --quiet)" loras/.
ln -s "$(hf download Lightricks/LTX-2.3 ltx-2.3-spatial-upscaler-x2-1.0.safetensors --quiet)" latent_upscale_models/.
ln -s "$(hf download unsloth/gemma-3-12b-it-qat-GGUF gemma-3-12b-it-qat-UD-Q4_K_XL.gguf --quiet)" text_encoders/.
ln -s "$(hf download unsloth/gemma-3-12b-it-qat-GGUF mmproj-BF16.gguf --quiet)" text_encoders/.

Notably it uses the qat model and not this one. And needs some custom nodes to get gguf inference working.

Run Comfy

cd ..
python main.py

The open the mp4 and the workflow will load. Click run will generate a video.

plugtwo

Mar 22

If you're still getting that dual clip loader error, make sure you have the latest of everything. I had to manually ensure the various nodes were up to date by clicking the "Extensions" button in the top right corner to bring up the "Node Manager", and then "Updates Available" in the sidebar. It's bizarre that it's so hard to find this and that there's no prompt to automatically update nodes.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment