How to use from
Pi
Start the llama.cpp server
# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf priyankapathak/gemma-4-E4B-it-Q5_K_M:Q5_K_M
Configure the model in Pi
# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "priyankapathak/gemma-4-E4B-it-Q5_K_M:Q5_K_M"
        }
      ]
    }
  }
}
Run Pi
# Start Pi in your project directory:
pi
Quick Links

gemma-4-E4B-it — GGUF (Q5_K_M)


📊 Performance Metrics

  • Hardware: AMD EPYC 7B12 (4 vCPUs)
  • Size: 5.37 GB
  • Speed (Generation): 4.03 tokens/sec
  • Speed (Prompt): 8.83 tokens/sec
  • KV Cache Usage: 0.0143 GB
  • Quantization: Q5_K_M

🔷 Model Overview

This repository contains a GGUF quantized version of:

  • Base Model: gemma-4-E4B-it
  • Format: GGUF (optimized for llama.cpp inference)
  • Precision: Q5_K_M
  • Efficiency Score: 0.7509 (TPS/GB)

GGUF format provides:

  • Fast loading via memory mapping
  • Single-file model distribution
  • Cross-platform compatibility
  • Efficient inference with llama.cpp

📦 Files

File Description
gemma-4-E4B-it-Q5_K_M.gguf Quantized GGUF model file

⚙️ Technical Details

Parameter Value
Architecture gemma-4-E4B-it
Format GGUF
Precision Q5_K_M
Runtime llama.cpp
Benchmark Hardware AMD EPYC 7B12 (4 vCPUs)
Context Latency 36.22s
Memory (KV) 0.0143 GB

⚡ Why GGUF?

GGUF is designed for efficient inference:

  • Optimized for llama.cpp
  • Supports CPU and GPU inference
  • Single-file deployment
  • Memory-mapped loading for speed
  • Ideal for edge / local environments

⚠️ License & Usage

This is a converted derivative model.

  • You must comply with the original model license of gemma-4-E4B-it
  • This is not an official release
  • No additional rights are granted
  • Original ownership remains with the base model creator

🚀 Quick Start (llama.cpp)

./llama-cli -m gemma-4-E4B-it-Q5_K_M.gguf -p "Explain AI simply"
Downloads last month
7
GGUF
Model size
8B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

5-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support