How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf prithivMLmods/llama-3.2-3b-it-grpo-250404-GGUF:
# Run inference directly in the terminal:
llama-cli -hf prithivMLmods/llama-3.2-3b-it-grpo-250404-GGUF:
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf prithivMLmods/llama-3.2-3b-it-grpo-250404-GGUF:
# Run inference directly in the terminal:
llama-cli -hf prithivMLmods/llama-3.2-3b-it-grpo-250404-GGUF:
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf prithivMLmods/llama-3.2-3b-it-grpo-250404-GGUF:
# Run inference directly in the terminal:
./llama-cli -hf prithivMLmods/llama-3.2-3b-it-grpo-250404-GGUF:
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf prithivMLmods/llama-3.2-3b-it-grpo-250404-GGUF:
# Run inference directly in the terminal:
./build/bin/llama-cli -hf prithivMLmods/llama-3.2-3b-it-grpo-250404-GGUF:
Use Docker
docker model run hf.co/prithivMLmods/llama-3.2-3b-it-grpo-250404-GGUF:
Quick Links

llama-3.2-3b-it-grpo-250404-GGUF

ReZero-v0.1-llama-3.2-3b-it-grpo-250404 is a research project focused on enhancing the search abilities of small language models by training them to develop robust search strategies rather than memorizing static data. The model, built on a Llama-3.2-3B backbone, interacts with multiple synthetic search engines that each have unique retrieval mechanisms, enabling it to refine queries iteratively and persist in finding exact answers using reinforcement learning. The repository provides setup instructions, including environment configuration and dependency installation, as well as scripts to train the model or regenerate synthetic training data. Demonstrations can be run through a Gradio interface, and the release includes comprehensive experiment logs on reward strategies and search quality. The model and associated resources are open-source and accessible to the research community, with further details on experiments and references provided in the documentation.

Model Files

File name Size Quant Type
llama-3.2-3b-it-grpo-250404.F32.gguf 12.9 GB F32
llama-3.2-3b-it-grpo-250404.BF16.gguf 6.43 GB BF16
llama-3.2-3b-it-grpo-250404.F16.gguf 6.43 GB F16
llama-3.2-3b-it-grpo-250404.Q8_0.gguf 3.42 GB Q8_0
llama-3.2-3b-it-grpo-250404.Q6_K.gguf 2.64 GB Q6_K
llama-3.2-3b-it-grpo-250404.Q5_K_M.gguf 2.32 GB Q5_K_M
llama-3.2-3b-it-grpo-250404.Q5_K_S.gguf 2.27 GB Q5_K_S
llama-3.2-3b-it-grpo-250404.Q4_K_M.gguf 2.02 GB Q4_K_M
llama-3.2-3b-it-grpo-250404.Q4_K_S.gguf 1.93 GB Q4_K_S
llama-3.2-3b-it-grpo-250404.Q3_K_L.gguf 1.82 GB Q3_K_L
llama-3.2-3b-it-grpo-250404.Q3_K_M.gguf 1.69 GB Q3_K_M
llama-3.2-3b-it-grpo-250404.Q3_K_S.gguf 1.54 GB Q3_K_S
llama-3.2-3b-it-grpo-250404.Q2_K.gguf 1.36 GB Q2_K

Quants Usage

(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)

Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):

image.png

Downloads last month
18
GGUF
Model size
3B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

32-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for prithivMLmods/llama-3.2-3b-it-grpo-250404-GGUF