Instructions to use google/gemma-4-12B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/gemma-4-12B with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("google/gemma-4-12B") model = AutoModelForMultimodalLM.from_pretrained("google/gemma-4-12B") - Notebooks
- Google Colab
- Kaggle
Vision👍
This model has better vision than Qwen 3.6 35BA3B and was able to get further when playing Pokemon Crystal using just vision
Hi @dpe1 -
Thanks for sharing your feedback! It's great to hear that you were impressed by the vision capabilities of the Gemma 4 12B variant.
The default resolution of input images is set way too low. The 12b model is not able to pick up text on a paper bag on a photo that all Qwen models did without a hitch in all my tests.
There should be a way to change it for the other Gemma 4 models in Llamacpp, but currently, that doesn't support local MCP servers, so I'm stuck between using those and have myopic vision in LMstudio or go Llamacpp and have vision but no local MCP.
I wish the resolution parameters had been exposed in the chat template.