Instructions to use mistralai/Mistral-7B-Instruct-v0.1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use mistralai/Mistral-7B-Instruct-v0.1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="mistralai/Mistral-7B-Instruct-v0.1") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1") model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Local Apps Settings
- vLLM
How to use mistralai/Mistral-7B-Instruct-v0.1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Install mistral-common: pip install --upgrade mistral-common # Start the vLLM server: vllm serve "mistralai/Mistral-7B-Instruct-v0.1" --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mistralai/Mistral-7B-Instruct-v0.1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/mistralai/Mistral-7B-Instruct-v0.1
- SGLang
How to use mistralai/Mistral-7B-Instruct-v0.1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "mistralai/Mistral-7B-Instruct-v0.1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mistralai/Mistral-7B-Instruct-v0.1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "mistralai/Mistral-7B-Instruct-v0.1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mistralai/Mistral-7B-Instruct-v0.1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use mistralai/Mistral-7B-Instruct-v0.1 with Docker Model Runner:
docker model run hf.co/mistralai/Mistral-7B-Instruct-v0.1
Unable to load checkpoint shards
Got an error something like:
.cache\huggingface\hub\models--mistralai--Mistral-7B-Instruct-v0.1\snapshots\d635d39671aaceec5ef84b745bc21625b324b7f8\pytorch_model-00001-of-00002.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.
I have the same issue, any news?
@19Peppe95 The error is because system is running out of RAM to load the model in one go.
You can use CTransformers to load the model or can try GGUF model versions of your model which is basically much smaller version of it.
Gist - Use GGUF version of this model https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF
and use CTransformers to load it from the dowloads and run the program, hopefully it should work.
@19Peppe95 The error is because system is running out of RAM to load the model in one go.
You can use CTransformers to load the model or can try GGUF model versions of your model which is basically much smaller version of it.
Gist - Use GGUF version of this model https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF
and use CTransformers to load it from the dowloads and run the program, hopefully it should work.
I also have the RAM issue. But it is werid that the model is only 14GB in totally and I have 64GB RAM and 24GB VRAM available.
Just found out that there was a download issue, the bins are broken so the memory usage when loading the files became uncontrollable.
Hi everyone
In case you are facing CPU OOM issues while loading the model please consider using sharded models with small shards, for this model I would recommend using this repository: https://huggingface.co/bn22/Mistral-7B-Instruct-v0.1-sharded
hi @shantanudave
Indeed, please use: https://huggingface.co/alexsherstinsky/Mistral-7B-v0.1-sharded instead