Instructions to use unsloth/gemma-3-12b-it-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use unsloth/gemma-3-12b-it-GGUF with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="unsloth/gemma-3-12b-it-GGUF") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("unsloth/gemma-3-12b-it-GGUF") model = AutoModelForMultimodalLM.from_pretrained("unsloth/gemma-3-12b-it-GGUF") - llama-cpp-python
How to use unsloth/gemma-3-12b-it-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="unsloth/gemma-3-12b-it-GGUF", filename="gemma-3-12b-it-BF16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use unsloth/gemma-3-12b-it-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf unsloth/gemma-3-12b-it-GGUF:UD-Q4_K_XL # Run inference directly in the terminal: llama-cli -hf unsloth/gemma-3-12b-it-GGUF:UD-Q4_K_XL
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf unsloth/gemma-3-12b-it-GGUF:UD-Q4_K_XL # Run inference directly in the terminal: llama-cli -hf unsloth/gemma-3-12b-it-GGUF:UD-Q4_K_XL
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf unsloth/gemma-3-12b-it-GGUF:UD-Q4_K_XL # Run inference directly in the terminal: ./llama-cli -hf unsloth/gemma-3-12b-it-GGUF:UD-Q4_K_XL
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf unsloth/gemma-3-12b-it-GGUF:UD-Q4_K_XL # Run inference directly in the terminal: ./build/bin/llama-cli -hf unsloth/gemma-3-12b-it-GGUF:UD-Q4_K_XL
Use Docker
docker model run hf.co/unsloth/gemma-3-12b-it-GGUF:UD-Q4_K_XL
- LM Studio
- Jan
- vLLM
How to use unsloth/gemma-3-12b-it-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "unsloth/gemma-3-12b-it-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "unsloth/gemma-3-12b-it-GGUF", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/unsloth/gemma-3-12b-it-GGUF:UD-Q4_K_XL
- SGLang
How to use unsloth/gemma-3-12b-it-GGUF with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "unsloth/gemma-3-12b-it-GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "unsloth/gemma-3-12b-it-GGUF", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "unsloth/gemma-3-12b-it-GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "unsloth/gemma-3-12b-it-GGUF", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Ollama
How to use unsloth/gemma-3-12b-it-GGUF with Ollama:
ollama run hf.co/unsloth/gemma-3-12b-it-GGUF:UD-Q4_K_XL
- Unsloth Studio
How to use unsloth/gemma-3-12b-it-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for unsloth/gemma-3-12b-it-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for unsloth/gemma-3-12b-it-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for unsloth/gemma-3-12b-it-GGUF to start chatting
- Atomic Chat new
- Docker Model Runner
How to use unsloth/gemma-3-12b-it-GGUF with Docker Model Runner:
docker model run hf.co/unsloth/gemma-3-12b-it-GGUF:UD-Q4_K_XL
- Lemonade
How to use unsloth/gemma-3-12b-it-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull unsloth/gemma-3-12b-it-GGUF:UD-Q4_K_XL
Run and chat with the model
lemonade run user.gemma-3-12b-it-GGUF-UD-Q4_K_XL
List all available models
lemonade list
Unexpected text model architecture type in GGUF file: 'gemma3'
Unexpected text model architecture type in GGUF file: 'gemma3'
Same. Does not work with ComfyUI. Not sure why it is embedded on the workflow.
I also ran into the same problem
Nothing works. LTX 2 is the worst AI model on ComfyUI. Without the audio, it would have already been forgotten.
This repo is the wrong one for LTX2. LTX2 uses a qat checkpoint, so unsloth/gemma-3-12b-it-qat-GGUF is the correct repo to use.
check https://huggingface.co/unsloth/LTX-2-GGUF/discussions/7 for workflow reference.
pasted below for convenience.
The GGUF's for LTX2 require a few more extra components to be loaded since the GGUF's don't have the vae's and embedding connectors packaged in the transformer model.
You also need to install two custom node packages:
https://github.com/city96/ComfyUI-GGUF
https://github.com/kijai/ComfyUI-KJNodes
Navigate to your ComfyUI model folder and run the following to download all the model weights:
# Can try any quant type
ln -s "$(hf download unsloth/LTX-2-GGUF ltx-2-19b-dev-UD-Q2_K_XL.gguf --quiet)" unet/ltx-2-19b-dev-UD-Q2_K_XL.gguf
ln -s "$(hf download unsloth/LTX-2-GGUF vae/ltx-2-19b-dev_audio_vae.safetensors --quiet)" vae/ltx-2-19b-dev_audio_vae.safetensors
ln -s "$(hf download unsloth/LTX-2-GGUF vae/ltx-2-19b-dev_video_vae.safetensors --quiet)" vae/ltx-2-19b-dev_video_vae.safetensors
# Can try any quant type
ln -s "$(hf download unsloth/gemma-3-12b-it-qat-GGUF gemma-3-12b-it-qat-UD-Q4_K_XL.gguf --quiet)" text_encoders/gemma-3-12b-it-qat-UD-Q4_K_XL.gguf
ln -s "$(hf download unsloth/gemma-3-12b-it-qat-GGUF mmproj-BF16.gguf --quiet)" text_encoders/gemma-3-12b-it-qat-mmproj-BF16.gguf
ln -s "$(hf download unsloth/LTX-2-GGUF text_encoders/ltx-2-19b-dev_embeddings_connectors.safetensors --quiet)" text_encoders/ltx-2-19b-dev_embeddings_connectors.safetensors
ln -s "$(hf download Lightricks/LTX-2 ltx-2-19b-distilled-lora-384.safetensors --quiet)" loras/ltx-2-19b-distilled-lora-384.safetensors
ln -s "$(hf download Lightricks/LTX-2 ltx-2-spatial-upscaler-x2-1.0.safetensors --quiet)" latent_upscale_models/ltx-2-spatial-upscaler-x2-1.0.safetensors
# Optional
ln -s "$(hf download Lightricks/LTX-2-19b-LoRA-Camera-Control-Dolly-Left ltx-2-19b-lora-camera-control-dolly-left.safetensors --quiet)" loras/ltx-2-19b-lora-camera-control-dolly-left.safetensors
This mp4 should have the ltx2 workflow used to generate the mp4 embedded in it, which references the models downloaded above.
Check that you're using the latest comfyui and comfyui gguf.
Check that you're using the latest comfyui and comfyui gguf.
Had the exact same issue, updating guff to x.x.10 made it go past clip encoding. thanks
This is my third day trying to fix this bs, i quit. I gave up, LTX2 sucks. They really hyped this up ltx2 saying it works even on low vram or no vram. Which is kinda sus and a nice bs ad now that nothing works. I can't believe how simple wan 2.2 is that it even works at 6gb vram so that's how simple wan 2.2 is. I have 12gb vram btw. I just really want to try the audio part. I don't know how others make it work but I gave up. Too complicated my hair turned white already. I'm just waiting for the time wan integrate audio on open source
and LTX2 will really be abandoned. Simplicity wins for me.
Anyone figure this out? I'm trying to use a basic LTX 2.3 flow but having this exact issue when trying to load GGUF gemma3 in the dual clip loader.
This has a sample workflow embedded: https://huggingface.co/unsloth/LTX-2.3-GGUF/blob/main/unsloth_flowers.mp4
I would start with a fresh python venv and install comfy and the custom nodes.
python3 -m venv .diffusion
source .diffusion/bin/activate
git clone https://github.com/Comfy-Org/ComfyUI.git
cd ComfyUI
pip install -r requirements.txt
pip install huggingface_hub
cd custom_nodes/
git clone https://github.com/city96/ComfyUI-GGUF.git
cd ComfyUI-GGUF/
pip install -r requirements.txt
cd ..
git clone https://github.com/kijai/ComfyUI-KJNodes.git
cd ComfyUI-KJNodes/
pip install -r requirements.txt
cd ../../models
download the models once in the models dir.
ln -s "$(hf download unsloth/LTX-2.3-GGUF ltx-2.3-22b-dev-Q4_K_M.gguf --quiet)" unet/.
ln -s "$(hf download unsloth/LTX-2.3-GGUF vae/ltx-2.3-22b-dev_video_vae.safetensors --quiet)" vae/.
ln -s "$(hf download unsloth/LTX-2.3-GGUF vae/ltx-2.3-22b-dev_audio_vae.safetensors --quiet)" vae/.
ln -s "$(hf download unsloth/LTX-2.3-GGUF text_encoders/ltx-2.3-22b-dev_embeddings_connectors.safetensors --quiet)" text_encoders/.
ln -s "$(hf download Lightricks/LTX-2.3 ltx-2.3-22b-distilled-lora-384.safetensors --quiet)" loras/.
ln -s "$(hf download Lightricks/LTX-2.3 ltx-2.3-spatial-upscaler-x2-1.0.safetensors --quiet)" latent_upscale_models/.
ln -s "$(hf download unsloth/gemma-3-12b-it-qat-GGUF gemma-3-12b-it-qat-UD-Q4_K_XL.gguf --quiet)" text_encoders/.
ln -s "$(hf download unsloth/gemma-3-12b-it-qat-GGUF mmproj-BF16.gguf --quiet)" text_encoders/.
Notably it uses the qat model and not this one. And needs some custom nodes to get gguf inference working.
Run Comfy
cd ..
python main.py
The open the mp4 and the workflow will load. Click run will generate a video.
If you're still getting that dual clip loader error, make sure you have the latest of everything. I had to manually ensure the various nodes were up to date by clicking the "Extensions" button in the top right corner to bring up the "Node Manager", and then "Updates Available" in the sidebar. It's bizarre that it's so hard to find this and that there's no prompt to automatically update nodes.

