How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Roestblik/OMEGA-V21-Full-Merged-GGUF:
# Run inference directly in the terminal:
llama-cli -hf Roestblik/OMEGA-V21-Full-Merged-GGUF:
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Roestblik/OMEGA-V21-Full-Merged-GGUF:
# Run inference directly in the terminal:
llama-cli -hf Roestblik/OMEGA-V21-Full-Merged-GGUF:
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Roestblik/OMEGA-V21-Full-Merged-GGUF:
# Run inference directly in the terminal:
./llama-cli -hf Roestblik/OMEGA-V21-Full-Merged-GGUF:
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Roestblik/OMEGA-V21-Full-Merged-GGUF:
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Roestblik/OMEGA-V21-Full-Merged-GGUF:
Use Docker
docker model run hf.co/Roestblik/OMEGA-V21-Full-Merged-GGUF:
Quick Links

OMEGA V21 - Full Merged GGUF

Autonomous AI agent fine-tuned on OBLITERATED Gemma 4 E4B for Android/Termux deployment.

Files (5 quantization levels)

File Size Use Case Quality
omega-v21-F16.gguf ~15 GB Maximum quality, reference Perfect
omega-v21-Q8_0.gguf ~8.0 GB Highest practical quality Excellent
omega-v21-Q6_K.gguf ~6.2 GB Balanced (recommended) Very Good
omega-v21-Q5_K_M.gguf ~5.7 GB Lighter, good quality Good
omega-v21-Q4_K_M.gguf ~5.3 GB Smallest, mobile-friendly Acceptable

Quick Start on Termux (Android)

pkg install llama-cpp
wget https://huggingface.co/Abdllahd/OMEGA-V21-Full-Merged-GGUF/resolve/main/omega-v21-Q6_K.gguf

llama-cli \
  -m omega-v21-Q6_K.gguf \
  --ctx-size 4096 \
  --threads $(nproc) \
  -cnv \
  --chat-template gemma \
  --temp 0.3

Features

  • 0% refusal rate (OBLITERATED base)
  • Bilingual: Arabic (70%) + English (30%)
  • ReAct reasoning with 6-point think structure
  • Real bash code (no placeholders)
  • Termux/PRoot-aware
  • Runs on 7.6GB RAM Android devices (Q4/Q5)

Hardware Requirements

Quant Min RAM Speed on Android
F16 18 GB Reference only
Q8_0 10 GB ~2 t/s
Q6_K 8 GB ~3 t/s
Q5_K_M 7 GB ~3-4 t/s
Q4_K_M 6 GB ~4 t/s

Part of OMEGA v22 Architecture

Four-layer autonomous agent:

  • Spiders (data collectors)
  • Tools (action executors)
  • Orchestrator (this model)
  • SpatialCache (memory system)

Training Details

  • Base: OBLITERATUS/gemma-4-E4B-it-OBLITERATED (0% refusal)
  • Method: LoRA fine-tuning (rank 64)
  • Dataset: 10,000 bilingual examples (Arabic/English)
  • Format: Strict JSON with think reasoning + bash code blocks

Related Repositories

License

Apache 2.0 (inherited from OBLITERATED base)

Downloads last month
82
GGUF
Model size
7B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Roestblik/OMEGA-V21-Full-Merged-GGUF

Quantized
(26)
this model