Instructions to use Menlo/ReZero-v0.1-llama-3.2-3b-it-grpo-250404 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Menlo/ReZero-v0.1-llama-3.2-3b-it-grpo-250404 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Menlo/ReZero-v0.1-llama-3.2-3b-it-grpo-250404") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("Menlo/ReZero-v0.1-llama-3.2-3b-it-grpo-250404") model = AutoModelForMultimodalLM.from_pretrained("Menlo/ReZero-v0.1-llama-3.2-3b-it-grpo-250404") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Menlo/ReZero-v0.1-llama-3.2-3b-it-grpo-250404 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Menlo/ReZero-v0.1-llama-3.2-3b-it-grpo-250404" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Menlo/ReZero-v0.1-llama-3.2-3b-it-grpo-250404", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Menlo/ReZero-v0.1-llama-3.2-3b-it-grpo-250404
- SGLang
How to use Menlo/ReZero-v0.1-llama-3.2-3b-it-grpo-250404 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Menlo/ReZero-v0.1-llama-3.2-3b-it-grpo-250404" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Menlo/ReZero-v0.1-llama-3.2-3b-it-grpo-250404", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Menlo/ReZero-v0.1-llama-3.2-3b-it-grpo-250404" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Menlo/ReZero-v0.1-llama-3.2-3b-it-grpo-250404", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Menlo/ReZero-v0.1-llama-3.2-3b-it-grpo-250404 with Docker Model Runner:
docker model run hf.co/Menlo/ReZero-v0.1-llama-3.2-3b-it-grpo-250404
ReZero: Enhancing LLM search ability by trying one-more-time
ReZero trains a small language model to develop effective search behaviors instead of memorizing static data. It interacts with multiple synthetic search engines, each with unique retrieval mechanisms, to refine queries and persist in searching until it finds exact answers. The project focuses on reinforcement learning, preventing overfitting, and optimizing for efficiency in real-world search applications.
Quick Demo 🚀
Run the interactive web interface to see ReZero in action:
python app.py
This will launch a Gradio interface where you can interact with the model and test different search behaviors.
Setup 🛠️
# Clone the repository
git clone https://github.com/menloresearch/ReZero
cd ReZero
# Create virtual environment
python -m venv .venv
# Activate the environment
source .venv/bin/activate
# Install dependencies
pip install --upgrade pip
pip install -e .
# Set up environment variables (required for websearch demo)
cp .env.example .env
# Edit .env and add your Tavily API key if you want to use the websearch demo
Data and Training 🧠
All necessary training data is included in the data/ folder. To train:
python train_grpo.py
If you want to regenerate the data, please run:
python scripts/generate_data.py
Models 🤖
You can find our models on Hugging Face 🤗! We're committed to open-source and easy access for the research community.
| Model | Backbone | Size | Link | GGUF |
|---|---|---|---|---|
| ReZero-v0.1 | Llama-3.2-3B | 3B | 🤗 Menlo/ReZero-v0.1-llama-3.2-3b-it-grpo-250404 | 🤗 GGUF |
Experiments 🧪
| Run ID | Model Config | Dataset | Steps | Hardware | TensorBoard | Description |
|---|---|---|---|---|---|---|
| exp-01 | Llama-3.2-3b-instruct | Apollo Mission Report | 300 | ~2 hours on 1xH200 | 📊 | Added reward_search_strategy and reward_search_quality. Reward weights: [4.0, 2.0, 1.0, 1.0, 1.0, 1.0]. Loss crashed after step 400. Best accuracy: 31.25% at step 400. Max agent turns: 10. |
| exp-02 | Llama-3.2-3b-instruct | Apollo Mission Report | 1000 | ~7 hours on 1xH200 | 📊 | Improved reward_retry logic to only reward search when answers found. Increased max agent turns to 20. Reward weights: [4.0, 2.0, 1.0, 1.0, 1.0, 1.0]. Best accuracy: 46.88% at step 250. Higher early reward_correctness (~0.6 vs 0.4-0.5). Loss stable but reward crashed after step 350. |
| exp-03 | Llama-3.2-3b-instruct | Apollo Mission Report | 1000 | ~7 hours on 1xH200 | 📊 | Same as exp-02 but without the retry reward function. |
References 📖
arxiv.org/abs/2504.11001
Acknowledgements 🤝
- This project is kickstarted from the source code of AutoDidact
- Downloads last month
- 30
Model tree for Menlo/ReZero-v0.1-llama-3.2-3b-it-grpo-250404
Base model
meta-llama/Llama-3.2-3B-Instruct
