Send me your support to help me feed the data beast! also taking comissions for universe specific models

Model Description

This model is a specialized, text-only version of Mistral AI's powerful Magistral Small 1.2. It was derived from the official release by carefully removing the vision encoder and related multimodal layers. The result is a more streamlined and efficient 24B parameter model that excels at text-based tasks, retaining the exceptional reasoning capabilities of its progenitor.

Built upon Mistral Small 3.2 (2506), this model underwent Supervised Fine-Tuning (SFT) from Magistral Medium traces and Reinforcement Learning (RL) on top, inheriting a deep capacity for logical deduction and problem-solving. By focusing exclusively on text, this version offers a smaller footprint and potentially faster inference for applications where vision is not required.

Important: Reasoning Format & Backend Setup

This model uses a special reasoning format. There are two methods to enable it: the official format designed by MistralAI, and a legacy format that works due to the base model's pre-training. The correct method depends on your backend software (e.g., llama.cpp, Kobold.cpp).

Official Format: `[THINK]` (Recommended for llama.cpp)

This is the official instruction format from MistralAI and is the recommended method. It is confirmed to work with backends like llama.cpp (with specific flags) and mistral-common.

Llama.cpp Prerequisite: Launch llama.cpp with the --special and --jinja arguments enabled.
Instruction Format: The model uses [THINK] and [/THINK] tags.
Activation (2 steps):
1. Set your prefill sequence (in your frontend like SillyTavern) to start with [THINK].
2. You must also include the keyword /think anywhere in your system prompt to activate the reasoning module.

Recommended System Prompt for Official Format

Add the following to your system prompt to guide the model's output structure:

First draft your thinking process (inner monologue) until you arrive at a response. You must use the /think keyword. Format your response using Markdown, and use LaTeX for any mathematical equations. Write both your thoughts and the response in the same language as the input. Your thinking process must follow the template below:[THINK]Your thoughts or/and draft, like working through an exercise on scratch paper. Be as casual and as long as you want until you are confident to generate the response. Use the same language as the input.[/THINK]Here, provide a self-contained response.

SillyTavern Quick Setup

For a complete SillyTavern configuration, you can download and import this JSON file:

Download SillyTavern JSON →

Legacy Format: `<think>` (For Kobold.cpp & TabbyAPI)

This format is not official but is highly effective with backends like Kobold.cpp and TabbyAPI. It works because the model's predecessor was trained on these angle-bracket tags, and the model inherits this behavior.

Instruction Format: Wrap the model's reasoning in <think> and </think> tags.
Activation: In your frontend, set your prefill sequence to start with <think>.

See the GitHub Issue for technical details →

Key Features

Reasoning: Capable of producing long, coherent chains of thought to break down complex problems before providing an answer.
Multilingual: Supports dozens of languages, including English, French, German, Spanish, Italian, Japanese, Korean, Chinese, Arabic, and many more.
Apache 2.0 License: Features a permissive, open license allowing for both commercial and non-commercial use.
Context Window: A 128k context window. Performance might degrade past 40k, but the model should still provide good results.

Usage Guide

Recommended Sampler Settings

Temperature 0.7

Top P 0.95

Max Tokens 131072

For best results, you must use the recommended system prompt and response format. The model uses special [THINK] and [/THINK] tokens to encapsulate its reasoning process before delivering the final answer.

System Prompt: First draft your thinking process (inner monologue) until you arrive at a response. Format your response using Markdown, and use LaTeX for any mathematical equations. Write both your thoughts and the response in the same language as the input. Your thinking process must follow the template below:[THINK]Your thoughts or/and draft, like working through an exercise on scratch paper. Be as casual and as long as you want until you are confident to generate the response. Use the same language as the input.[/THINK]Here, provide a self-contained response.

Important: The [THINK] and [/THINK] tags are special tokens and must be encoded as such. Please ensure you are using a recent version of a library that supports the Magistral chat template, such as mistral-common.

Benchmark Results

The following benchmarks are from the official release of the original multimodal Magistral Small 1.2. As this version is a direct derivative with only vision components removed, performance on these text-based reasoning benchmarks is expected to be identical.

Model	AIME24 pass@1	AIME25 pass@1	GPQA Diamond	Livecodebench (v5)
Magistral Medium 1.2	91.82%	83.48%	76.26%	75.00%
Magistral Small 1.2	86.14%	77.34%	70.07%	70.88%
Magistral Small 1.1	70.52%	62.03%	65.78%	59.17%

Intended Use & Limitations

Intended Use: This model is designed for text-based tasks that require strong reasoning, instruction following, and multilingual chat capabilities, without the computational overhead of multimodal features.
Limitations & Quirks:
- This is a text-only model and cannot process image or other non-text inputs.
- Performance on tasks outside of its core training domain (e.g., highly specialized coding, non-chat formats) is not guaranteed.
- The model may "hallucinate" or generate plausible but incorrect information. Always verify critical facts.
- Safety: This model has not undergone additional safety alignment beyond what was included in its base Magistral model. Standard responsible AI practices should be followed.

Acknowledgements

Credit to Mistral AI for the powerful Magistral architecture and for releasing their work openly.

Magistral-Small-1.2-Text-Only