Instructions to use mistralai/Magistral-Small-2506_gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use mistralai/Magistral-Small-2506_gguf with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="mistralai/Magistral-Small-2506_gguf", filename="Magistral-Small-2506.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use mistralai/Magistral-Small-2506_gguf with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf mistralai/Magistral-Small-2506_gguf:Q8_0 # Run inference directly in the terminal: llama-cli -hf mistralai/Magistral-Small-2506_gguf:Q8_0
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf mistralai/Magistral-Small-2506_gguf:Q8_0 # Run inference directly in the terminal: llama-cli -hf mistralai/Magistral-Small-2506_gguf:Q8_0
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf mistralai/Magistral-Small-2506_gguf:Q8_0 # Run inference directly in the terminal: ./llama-cli -hf mistralai/Magistral-Small-2506_gguf:Q8_0
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf mistralai/Magistral-Small-2506_gguf:Q8_0 # Run inference directly in the terminal: ./build/bin/llama-cli -hf mistralai/Magistral-Small-2506_gguf:Q8_0
Use Docker
docker model run hf.co/mistralai/Magistral-Small-2506_gguf:Q8_0
- LM Studio
- Jan
- Ollama
How to use mistralai/Magistral-Small-2506_gguf with Ollama:
ollama run hf.co/mistralai/Magistral-Small-2506_gguf:Q8_0
- Unsloth Studio
How to use mistralai/Magistral-Small-2506_gguf with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for mistralai/Magistral-Small-2506_gguf to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for mistralai/Magistral-Small-2506_gguf to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for mistralai/Magistral-Small-2506_gguf to start chatting
- Atomic Chat new
- Docker Model Runner
How to use mistralai/Magistral-Small-2506_gguf with Docker Model Runner:
docker model run hf.co/mistralai/Magistral-Small-2506_gguf:Q8_0
- Lemonade
How to use mistralai/Magistral-Small-2506_gguf with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull mistralai/Magistral-Small-2506_gguf:Q8_0
Run and chat with the model
lemonade run user.Magistral-Small-2506_gguf-Q8_0
List all available models
lemonade list
Congrats on this, and for supporting gguf models.. However, not loading correctly out of the box..
Haven't really jumped into debugging, just a quick check out, and attempt to load with llama-server, CLI.. as ..
/git/llama.cpp/build/bin/llama-server -c 0 --top-p 0.95 --temp 0.7 -ngl 2 -m /models/Magistral-Small-2506_gguf/ --host 0.0.0.0 --port 7070
results in ..
main: loading model
srv load_model: loading model '/models/Magistral-Small-2506_gguf/Magistral-Small-2506.gguf'
gguf_init_from_file_impl: invalid magic characters: 'vers', expected 'GGUF'
llama_model_load: error loading model: llama_model_loader: failed to load model from /models/Magistral-Small-2506_gguf/Magistral-Small-2506.gguf
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model '/models/Magistral-Small-2506_gguf/Magistral-Small-2506.gguf'
srv load_model: failed to load model, '/models/Magistral-Small-2506_gguf/Magistral-Small-2506.gguf'
srv operator(): operator(): cleaning up before exit...
main: exiting due to model loading error
The 8bit quantized version also has the same issue..
For the record, if it is helpful.. attempted to run the automated conversion tool..
https://huggingface.co/spaces/ggml-org/gguf-my-repo
And result was..
.....
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 32768
INFO:hf-to-gguf:gguf: embedding length = 5120
INFO:hf-to-gguf:gguf: feed forward length = 32768
INFO:hf-to-gguf:gguf: head count = 32
INFO:hf-to-gguf:gguf: key-value head count = 8
INFO:hf-to-gguf:gguf: rope theta = 1000000000.0
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05
INFO:hf-to-gguf:gguf: file type = 1
INFO:hf-to-gguf:Set model quantization version
INFO:hf-to-gguf:Set model tokenizer
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message
Traceback (most recent call last):
File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 1865, in set_vocab
self._set_vocab_sentencepiece()
File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 902, in _set_vocab_sentencepiece
tokens, scores, toktypes = self._create_vocab_sentencepiece()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 919, in _create_vocab_sentencepiece
raise FileNotFoundError(f"File not found: {tokenizer_path}")
FileNotFoundError: File not found: downloads/tmpxdyj7bte/Magistral-Small-2506/tokenizer.model
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 1868, in set_vocab
self._set_vocab_llama_hf()
File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 997, in _set_vocab_llama_hf
vocab = gguf.LlamaHfVocab(self.dir_model)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/app/llama.cpp/gguf-py/gguf/vocab.py", line 379, in init
with open(fname_tokenizer, encoding='utf-8') as f:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'downloads/tmpxdyj7bte/Magistral-Small-2506/tokenizer.json'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 6533, in
main()
File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 6527, in main
model_instance.write()
File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 404, in write
self.prepare_metadata(vocab_only=False)
File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 517, in prepare_metadata
self.set_vocab()
File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 1871, in set_vocab
self._set_vocab_gpt2()
File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 838, in _set_vocab_gpt2
tokens, toktypes, tokpre = self.get_vocab_base()
^^^^^^^^^^^^^^^^^^^^^
File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 603, in get_vocab_base
tokenizer = AutoTokenizer.from_pretrained(self.dir_model)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.pyenv/versions/3.11.12/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 1032, in from_pretrained
return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.pyenv/versions/3.11.12/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2025, in from_pretrained
return cls._from_pretrained(
^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.pyenv/versions/3.11.12/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2063, in _from_pretrained
slow_tokenizer = (cls.slow_tokenizer_class)._from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.pyenv/versions/3.11.12/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2278, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.pyenv/versions/3.11.12/lib/python3.11/site-packages/transformers/models/llama/tokenization_llama.py", line 171, in init
self.sp_model = self.get_spm_processor(kwargs.pop("from_slow", False))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.pyenv/versions/3.11.12/lib/python3.11/site-packages/transformers/models/llama/tokenization_llama.py", line 198, in get_spm_processor
tokenizer.Load(self.vocab_file)
File "/home/user/.pyenv/versions/3.11.12/lib/python3.11/site-packages/sentencepiece/init.py", line 961, in Load
return self.LoadFromFile(model_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.pyenv/versions/3.11.12/lib/python3.11/site-packages/sentencepiece/init.py", line 316, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: not a string
Hi ! Unfortunately I cannot reproduce your issue, did you try to point to the actual file instead of the folder in your command ?
Also maybe try to update llama.ccp if you haven't.
Unfortunately a very DOH! moment, I never checked the integrity of the model after download :( Redownloading now, and will report back.
And yes.. this was my fault.. Closing