Instructions to use stduhpf/google-gemma-3-12b-it-qat-q4_0-gguf-small with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use stduhpf/google-gemma-3-12b-it-qat-q4_0-gguf-small with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="stduhpf/google-gemma-3-12b-it-qat-q4_0-gguf-small", filename="gemma-3-12b-it-q4_0_s.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use stduhpf/google-gemma-3-12b-it-qat-q4_0-gguf-small with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf stduhpf/google-gemma-3-12b-it-qat-q4_0-gguf-small:Q4_0_S # Run inference directly in the terminal: llama-cli -hf stduhpf/google-gemma-3-12b-it-qat-q4_0-gguf-small:Q4_0_S
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf stduhpf/google-gemma-3-12b-it-qat-q4_0-gguf-small:Q4_0_S # Run inference directly in the terminal: llama-cli -hf stduhpf/google-gemma-3-12b-it-qat-q4_0-gguf-small:Q4_0_S
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf stduhpf/google-gemma-3-12b-it-qat-q4_0-gguf-small:Q4_0_S # Run inference directly in the terminal: ./llama-cli -hf stduhpf/google-gemma-3-12b-it-qat-q4_0-gguf-small:Q4_0_S
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf stduhpf/google-gemma-3-12b-it-qat-q4_0-gguf-small:Q4_0_S # Run inference directly in the terminal: ./build/bin/llama-cli -hf stduhpf/google-gemma-3-12b-it-qat-q4_0-gguf-small:Q4_0_S
Use Docker
docker model run hf.co/stduhpf/google-gemma-3-12b-it-qat-q4_0-gguf-small:Q4_0_S
- LM Studio
- Jan
- Ollama
How to use stduhpf/google-gemma-3-12b-it-qat-q4_0-gguf-small with Ollama:
ollama run hf.co/stduhpf/google-gemma-3-12b-it-qat-q4_0-gguf-small:Q4_0_S
- Unsloth Studio
How to use stduhpf/google-gemma-3-12b-it-qat-q4_0-gguf-small with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for stduhpf/google-gemma-3-12b-it-qat-q4_0-gguf-small to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for stduhpf/google-gemma-3-12b-it-qat-q4_0-gguf-small to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for stduhpf/google-gemma-3-12b-it-qat-q4_0-gguf-small to start chatting
- Atomic Chat new
- Docker Model Runner
How to use stduhpf/google-gemma-3-12b-it-qat-q4_0-gguf-small with Docker Model Runner:
docker model run hf.co/stduhpf/google-gemma-3-12b-it-qat-q4_0-gguf-small:Q4_0_S
- Lemonade
How to use stduhpf/google-gemma-3-12b-it-qat-q4_0-gguf-small with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull stduhpf/google-gemma-3-12b-it-qat-q4_0-gguf-small:Q4_0_S
Run and chat with the model
lemonade run user.google-gemma-3-12b-it-qat-q4_0-gguf-small-Q4_0_S
List all available models
lemonade list
| import argparse | |
| from gguf.gguf_reader import GGUFReader | |
| from gguf import GGUFWriter,GGUFValueType,ReaderField,OrderedDict # noqa: E402 | |
| def add_keys(writer: GGUFWriter, fields: "OrderedDict[str, ReaderField]"): | |
| for key, field in fields.items(): | |
| if(key not in ["general.architecture","GGUF.version","GGUF.tensor_count","GGUF.kv_count"]): | |
| if field.types[0] == GGUFValueType.STRING: | |
| writer.add_string(key=key, val=''.join(chr(i) for i in field.parts[field.data[0]])) | |
| elif field.types[0] == GGUFValueType.ARRAY: | |
| writer.add_array(key=key, val=field.contents()) | |
| else: | |
| writer.add_key_value(key=key, val=field.parts[field.data[0]][0],vtype=field.types[0]) | |
| if __name__ == "__main__": | |
| parser = argparse.ArgumentParser(description="Merge GGUF models, especially embedding tables.") | |
| parser.add_argument("model_src_path", help="Path to the main model GGUF file.") | |
| parser.add_argument("embed_src_path", help="Path to the model GGUF file to take the embeddings table from (or other tensors).") | |
| parser.add_argument("dst_path", help="Path to the output GGUF file.") | |
| parser.add_argument("--target_blocks", nargs="+", default=["token_embd.weight"], | |
| help="List of tensor names to merge from the embedding file. Default: token_embd.weight") | |
| args = parser.parse_args() | |
| reader_model = GGUFReader(args.model_src_path) | |
| reader_embed = GGUFReader(args.embed_src_path) | |
| archField = reader_model.get_field("general.architecture") | |
| if archField is None: | |
| print("Couldn't get arch from src0 file") | |
| exit(-1) | |
| arch = str(''.join(chr(i) for i in archField.parts[archField.data[0]])) | |
| archField = reader_model.get_field("general.architecture") | |
| if archField is None: | |
| print("Couldn't get arch from src1 file") | |
| exit(-1) | |
| if str(''.join(chr(i) for i in archField.parts[archField.data[0]])) != arch: | |
| print("src0 and sdc1 have different architectures") | |
| exit(-1) | |
| writer = GGUFWriter(path=args.dst_path, arch=arch) | |
| add_keys(writer,reader_model.fields) | |
| for tensor in reader_model.tensors: | |
| # print(tensor.name) | |
| if tensor.name in args.target_blocks: | |
| name = tensor.name | |
| for tensorSrc in reader_embed.tensors: | |
| if tensorSrc.name == name: | |
| writer.add_tensor(name = tensorSrc.name, tensor=tensorSrc.data, raw_shape=tensorSrc.shape.tolist().reverse(),raw_dtype= tensorSrc.tensor_type) | |
| break | |
| else: | |
| writer.add_tensor(name = tensor.name, tensor=tensor.data, raw_shape=tensor.shape.tolist().reverse(),raw_dtype= tensor.tensor_type) | |
| writer.write_header_to_file() | |
| writer.write_kv_data_to_file() | |
| writer.write_tensors_to_file() | |
| # exit(0) | |
| writer.close() | |