Instructions to use alvarobartt/Mistral-7B-v0.1-ORPO with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use alvarobartt/Mistral-7B-v0.1-ORPO with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="alvarobartt/Mistral-7B-v0.1-ORPO")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("alvarobartt/Mistral-7B-v0.1-ORPO")
model = AutoModelForMultimodalLM.from_pretrained("alvarobartt/Mistral-7B-v0.1-ORPO")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use alvarobartt/Mistral-7B-v0.1-ORPO with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "alvarobartt/Mistral-7B-v0.1-ORPO"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "alvarobartt/Mistral-7B-v0.1-ORPO",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/alvarobartt/Mistral-7B-v0.1-ORPO

SGLang

How to use alvarobartt/Mistral-7B-v0.1-ORPO with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "alvarobartt/Mistral-7B-v0.1-ORPO" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "alvarobartt/Mistral-7B-v0.1-ORPO",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "alvarobartt/Mistral-7B-v0.1-ORPO" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "alvarobartt/Mistral-7B-v0.1-ORPO",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use alvarobartt/Mistral-7B-v0.1-ORPO with Docker Model Runner:
```
docker model run hf.co/alvarobartt/Mistral-7B-v0.1-ORPO
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

ORPO fine-tune of Mistral 7B v0.1 with DPO Mix 7K

Stable Diffusion XL "A capybara, a killer whale, and a robot named Ultra being friends"

This is an ORPO fine-tune of mistralai/Mistral-7B-v0.1 with alvarobartt/dpo-mix-7k-simplified.

⚠️ Note that the code is still experimental, as the ORPOTrainer PR is still not merged, follow its progress at 🤗trl - ORPOTrainer PR.

About the fine-tuning

In order to fine-tune mistralai/Mistral-7B-v0.1 using ORPO, the branch orpo from 🤗trl has been used, thanks to the invaluable and quick contribution of @kashif.

ORPO stands for Odds Ratio Preference Optimization, and defines a new paradigm on fine-tuning LLMs, “combining” both the SFT and the PPO/DPO stage into a single stage, thanks to the proposed loss function starting off from a preference dataset i.e. chosen-rejected pairs.

Some key features about ORPO:

⚡️ Faster to train as it’s now a single stage fine-tuning
👨🏻‍🏫 Requires preference data i.e. (prompt, chosen, rejected)-like datasets
⬇️ Less memory than PPO/DPO as doesn’t need a reference model
🏆 SOTA results for Phi-2 (2.7B), Llama-2 (7B), and Mistral (7B) when fine-tuned using single-turn UltraFeedback

Some notes on the experiments mentioned in the paper:

📌 Up to 7B parameter LLMs were fine-tuned, achieving better performance compared to 7B counterparts and even 13B LLMs
📌 Not yet trained with multi-turn datasets as Capybara (may be an interesting experiment to run)
📌 For OPT models fine-tuned with HH-RLHF from Anthropic, truncated and padded to 1024 tokens, filtering out filtering the prompts with > 1024 tokens
📌 For Phi-2, Mistral (7B) and Llama 2 (7B), or UltraFeedback from OpenBMB (truncated and padded to 2048 tokens), filtering out filtering the prompts with > 1024 tokens
📌 Fine-tuned for 10 epochs, and using the evaluation loss as the metric for selecting the best models

For more information about ORPO, I highly recommend reading their paper titled ORPO: Monolithic Preference Optimization without Reference Model, as it contains a lot of information and details not only on the ORPO method, but also on the experiment they ran, the results they got, and much more.

📅 Fine-tuning code will be shared soon, stay tuned!

About the dataset

The dataset used for this fine-tune is alvarobartt/dpo-mix-7k-simplified, which is a simplified version of argilla/dpo-mix-7k.

The simplification comes from the fact that the prompt column is detached from both the chosen and rejected columns so that there's no need for extra pre-processing while applying the chat template to the dataset before the fine-tuning. So on, the dataset remains as is, with an additional column for the prompt.

The dataset is a small cocktail combining Argilla's latest efforts on DPO datasets, mixing the following datasets:

The samples have been randomly selected from the original datasets with a proportion of 0.33 each, as can be seen via the dataset column of the dataset.

For more information about the original dataset check the README.md file of argilla/dpo-mix-7k.