Instructions to use alvarobartt/Mistral-7B-v0.1-ORPO with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use alvarobartt/Mistral-7B-v0.1-ORPO with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="alvarobartt/Mistral-7B-v0.1-ORPO") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("alvarobartt/Mistral-7B-v0.1-ORPO") model = AutoModelForMultimodalLM.from_pretrained("alvarobartt/Mistral-7B-v0.1-ORPO") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use alvarobartt/Mistral-7B-v0.1-ORPO with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "alvarobartt/Mistral-7B-v0.1-ORPO" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "alvarobartt/Mistral-7B-v0.1-ORPO", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/alvarobartt/Mistral-7B-v0.1-ORPO
- SGLang
How to use alvarobartt/Mistral-7B-v0.1-ORPO with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "alvarobartt/Mistral-7B-v0.1-ORPO" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "alvarobartt/Mistral-7B-v0.1-ORPO", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "alvarobartt/Mistral-7B-v0.1-ORPO" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "alvarobartt/Mistral-7B-v0.1-ORPO", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use alvarobartt/Mistral-7B-v0.1-ORPO with Docker Model Runner:
docker model run hf.co/alvarobartt/Mistral-7B-v0.1-ORPO
ORPO fine-tune of Mistral 7B v0.1 with DPO Mix 7K
Stable Diffusion XL "A capybara, a killer whale, and a robot named Ultra being friends"
This is an ORPO fine-tune of mistralai/Mistral-7B-v0.1 with
alvarobartt/dpo-mix-7k-simplified.
⚠️ Note that the code is still experimental, as the ORPOTrainer PR is still not merged, follow its progress
at 🤗trl - ORPOTrainer PR.
About the fine-tuning
In order to fine-tune mistralai/Mistral-7B-v0.1 using ORPO, the branch
orpo from 🤗trl has been used, thanks to the invaluable and quick contribution of
@kashif.
ORPO stands for Odds Ratio Preference Optimization, and defines a new paradigm on fine-tuning LLMs, “combining” both the SFT and the PPO/DPO stage into a single stage, thanks to the proposed loss function starting off from a preference dataset i.e. chosen-rejected pairs.
Some key features about ORPO:
- ⚡️ Faster to train as it’s now a single stage fine-tuning
- 👨🏻🏫 Requires preference data i.e. (prompt, chosen, rejected)-like datasets
- ⬇️ Less memory than PPO/DPO as doesn’t need a reference model
- 🏆 SOTA results for Phi-2 (2.7B), Llama-2 (7B), and Mistral (7B) when fine-tuned using single-turn UltraFeedback
Some notes on the experiments mentioned in the paper:
- 📌 Up to 7B parameter LLMs were fine-tuned, achieving better performance compared to 7B counterparts and even 13B LLMs
- 📌 Not yet trained with multi-turn datasets as Capybara (may be an interesting experiment to run)
- 📌 For OPT models fine-tuned with HH-RLHF from Anthropic, truncated and padded to 1024 tokens, filtering out filtering the prompts with > 1024 tokens
- 📌 For Phi-2, Mistral (7B) and Llama 2 (7B), or UltraFeedback from OpenBMB (truncated and padded to 2048 tokens), filtering out filtering the prompts with > 1024 tokens
- 📌 Fine-tuned for 10 epochs, and using the evaluation loss as the metric for selecting the best models
For more information about ORPO, I highly recommend reading their paper titled ORPO: Monolithic Preference Optimization without Reference Model,
as it contains a lot of information and details not only on the ORPO method, but also on the experiment they ran, the results they got, and much more.
📅 Fine-tuning code will be shared soon, stay tuned!
About the dataset
The dataset used for this fine-tune is alvarobartt/dpo-mix-7k-simplified,
which is a simplified version of argilla/dpo-mix-7k.
The simplification comes from the fact that the prompt column is detached from both the chosen and rejected
columns so that there's no need for extra pre-processing while applying the chat template to the dataset before the
fine-tuning. So on, the dataset remains as is, with an additional column for the prompt.
The dataset is a small cocktail combining Argilla's latest efforts on DPO datasets, mixing the following datasets:
argilla/distilabel-capybara-dpo-7k-binarizedargilla/distilabel-intel-orca-dpo-pairsargilla/ultrafeedback-binarized-preferences-cleaned
The samples have been randomly selected from the original datasets with a proportion of 0.33 each, as can be seen via the dataset column of the dataset.
For more information about the original dataset check the README.md file of argilla/dpo-mix-7k.
- Downloads last month
- 31
