Instructions to use google/gemma-4-12B-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/gemma-4-12B-it with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("google/gemma-4-12B-it") model = AutoModelForMultimodalLM.from_pretrained("google/gemma-4-12B-it") - Notebooks
- Google Colab
- Kaggle
What languages are supported in audio input?
#25
by J22 - opened
E4B supports Chinese (although not quite good, frankly speaking), but 12B does not.
Please provide this information on the model card.
Can confirm Korean works for audio input — a short (~10s) Korean phone-call clip transcribes correctly via llama-server's input_audio field (Q4_K_M + the bf16 mmproj, recent master, --jinja). Quality is solid for short clips but the model card caps audio at ~30s, and longer clips degrade. Note it's also a reasoning model, so the transcription comes after a think block — keep max_tokens generous and read reasoning_content vs content separately on the OpenAI endpoint.