Text Generation
Transformers
Safetensors
Turkish
English
qwen3
turkish
legal
turkish-legal
mecellem
qwen
decoder-only
continual-pretraining
TRUBA
MN5
conversational
text-generation-inference
Instructions to use newmindai/Mecellem-Qwen3-1.7B-TR with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use newmindai/Mecellem-Qwen3-1.7B-TR with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="newmindai/Mecellem-Qwen3-1.7B-TR") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("newmindai/Mecellem-Qwen3-1.7B-TR") model = AutoModelForMultimodalLM.from_pretrained("newmindai/Mecellem-Qwen3-1.7B-TR") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use newmindai/Mecellem-Qwen3-1.7B-TR with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "newmindai/Mecellem-Qwen3-1.7B-TR" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "newmindai/Mecellem-Qwen3-1.7B-TR", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/newmindai/Mecellem-Qwen3-1.7B-TR
- SGLang
How to use newmindai/Mecellem-Qwen3-1.7B-TR with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "newmindai/Mecellem-Qwen3-1.7B-TR" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "newmindai/Mecellem-Qwen3-1.7B-TR", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "newmindai/Mecellem-Qwen3-1.7B-TR" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "newmindai/Mecellem-Qwen3-1.7B-TR", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use newmindai/Mecellem-Qwen3-1.7B-TR with Docker Model Runner:
docker model run hf.co/newmindai/Mecellem-Qwen3-1.7B-TR
File size: 10,750 Bytes
3bcd753 107c316 3bcd753 107c316 3bcd753 107c316 3bcd753 107c316 2ed2e5d 3bcd753 2ed2e5d 3bcd753 107c316 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 | ---
base_model: Qwen/Qwen3-1.7B
language:
- tr
- en
license: apache-2.0
pipeline_tag: text-generation
library_name: transformers
tags:
- text-generation
- turkish
- legal
- turkish-legal
- mecellem
- qwen
- decoder-only
- continual-pretraining
- TRUBA
- MN5
---
# Mecellem-Qwen3-1.7B-TR
[](https://opensource.org/licenses/Apache-2.0)
Mecellem-Qwen3-1.7B-TR is a Turkish legal language model presented in [Mecellem Models: Turkish Models Trained from Scratch and Continually Pre-trained for the Legal Domain](https://huggingface.co/papers/2601.16018).
**Resources:**
- **Code:** [GitHub Repository](https://github.com/newmindai/mecellem-models)
- **Paper:** [arXiv:2601.16018](https://arxiv.org/abs/2601.16018)
## Model Description
Mecellem-Qwen3-1.7B-TR is a Turkish legal language model adapted through Continual Pre-training (CPT) on Turkish legal and official texts. The model is based on Qwen3-1.7B decoder architecture (1.7B parameters) and trained using a four-phase curriculum learning strategy specifically designed to account for Turkish linguistic complexity. The CPT process progressively transitions from general-purpose texts to domain-specific legal content, achieving 36.2% perplexity reduction on Turkish legal text compared to the base Qwen3-1.7B model.
**Key Features:**
- Continual pre-training on approximately 225 billion tokens across four phases
- Four-phase curriculum learning:
- Phase 1: ~3.7B tokens
- Phase 2: ~57B tokens
- Phase 3: ~165B tokens
- Phase 4: ~24.9B tokens
- Dataset includes Turkish legal sources (Yargıtay, Danıştay, YÖKTEZ) and general Turkish web data (FineWeb2, CulturaX)
- Preserves general language capabilities while injecting domain-specific legal knowledge
**Model Type:** Decoder-only Language Model
**Parameters:** 1.7B
**Base Model:** Qwen/Qwen3-1.7B
**Architecture:** Qwen3 decoder with grouped query attention (GQA)
### Architecture Details
- **Max Position Embeddings:** 40,960 tokens
- **Number of Layers:** 28 transformer layers
- **Hidden Size:** 2,048
- **FFN Hidden Size:** 6,144
- **Number of Heads:** 16
- **Number of KV Heads (GQA):** 8
- **Activation Function:** SwiGLU
- **Position Encodings:** RoPE (Rotary Position Embeddings)
- **Layer Norm:** RMSNorm
### Training Details
**Continual Pre-training (CPT):**
- **Total Training Tokens:** ~225 billion tokens (250,739,476,454 tokens across four phases)
- **Training Method:** Four-phase curriculum learning
- **Framework:** NVIDIA NeMo with Megatron-Core
- **Hardware:** MareNostrum 5 supercomputer (BSC), H100 GPUs
- **Precision:** BF16
**Dataset Composition:**
- **Legal Sources:**
- Court of Cassation (Yargıtay): 10.3M sequences, ~3.43B tokens
- Council of State (Danıştay): 151K sequences, ~0.11B tokens
- Academic theses (YÖKTEZ): 21.1M sequences, ~9.61B tokens (after DocsOCR processing)
- **General Turkish Sources:**
- FineWeb2: General Turkish web data
- CulturaX: Multilingual corpus (Turkish subset)
- Total general Turkish: 212M sequences, ~96.17B tokens
- **Additional Categories:** English, Mathematics, Python code, multilingual content (Spanish, Arabic, Russian, Chinese)
**Phase 1 (~3.7B tokens):**
- Focus: Short, general-purpose Turkish texts
- Purpose: Adapt model to Turkish language patterns while maintaining stability
- Learning Rate: Higher with extended warmup
- Dataset: Academic-focused data with semantic deduplication and FineWeb quality filtering
**Phase 2 (~57B tokens):**
- Focus: Legal content with domain-specific terminology
- Includes: Court decisions, legal articles, regulatory documents
- Data Replay: YÖKTEZ academic legal data from Phase 1
- Dataset: Lighter pipeline with FineWeb quality filtering, preserving topical diversity
**Phase 3 (~165B tokens):**
- Focus: Long, structurally complex normative texts
- Includes: Full court decisions, legislative documents, academic legal theses
- Purpose: Refine model's understanding of legal reasoning patterns
- Dataset: Long-form documents with merged consecutive pages
**Phase 4 (~24.9B tokens):**
- Focus: Extended domain-specific refinement
- Includes: Mixed complexity documents
- Purpose: Consolidate knowledge and improve generalization
**Training Hyperparameters:**
- Sequence Length: 4,096 tokens
- Optimizer: Adam with cosine learning rate schedule
- Max Learning Rate: 5×10⁻⁵
- Min Learning Rate: 5×10⁻⁶
- Weight Decay: 0.01
- Warmup Steps: Phase-dependent (200-2,340 steps)
- Precision: BF16 mixed precision
- Framework: NVIDIA NeMo with Megatron-Core
**Hardware Infrastructure:**
- **System:** MareNostrum 5 ACC partition at Barcelona Supercomputing Center (BSC)
- **Node Configuration:** Each node equipped with 4× NVIDIA Hopper H100 64GB GPUs (SXM), 80 CPU cores, 512GB DDR5 memory
- **Interconnect:** 800 Gb/s InfiniBand for distributed training
- **GPU Interconnect:** NVLink for intra-node GPU communication (4 GPUs per node connected via NVLink)
- **Distributed Training:** Data-parallel multi-node and multi-GPU distributed architecture with 4 GPUs per node
- **InfiniBand Network:** Enabled efficient processing of large-scale token flow and ensured high scalability and training stability in long-term CPT training
- **Phase-Specific Hardware:**
- **Phase 1:** 50 nodes, 200 GPUs, ~3.7B tokens, 3.77M tokens/sec throughput, 20.7% median MFU
- **Phase 2:** 50 nodes, 200 GPUs, ~57B tokens, 3.59M tokens/sec throughput, 20.7% median MFU
- **Phase 3:** 100 nodes, 400 GPUs, ~165B tokens, 7.35M tokens/sec throughput, 20.3% median MFU
- **Phase 4:** 50 nodes, 200 GPUs, ~24.9B tokens, 3.25M tokens/sec throughput, 20.6% median MFU
**Catastrophic Forgetting Mitigation:**
- Curriculum learning: Progressive transition from general to specialized knowledge
- Replay buffer: YÖKTEZ data from Phase 1 included in Phase 2
- Conservative learning rates and extended warmup periods
**Performance:** Achieved 36.2% perplexity reduction on Turkish legal text compared to base Qwen3-1.7B model.
### Training Visualization
The following visualizations show the model's training progress and dataset distribution:

*Qwen3-1.7B CPT Dataset Distribution across Four Phases. The curriculum learning strategy progressively introduces more complex legal content.*

*Qwen3-1.7B CPT Training and Validation Loss Across Four Phases. The model shows consistent improvement throughout all training phases.*
### Benchmark Performance
The model was evaluated using the Muhakim reward model on Turkish legal tasks:

*Benchmark Performance of 1.7B Decoder-Only Models Across Context Lengths Using the Muhakim Reward Model. Mecellem-Qwen3-1.7B-TR consistently outperforms the base Qwen3-1.7B model across all five legal quality objectives, with particularly pronounced gains for depth of coverage, statute reference usage, and legal accuracy.*
### Rewards Comparison Analysis
The following visualization compares rewards across different token lengths for base vs CPT models:

*Rewards Comparison: Base vs CPT Models Across Token Lengths. Mecellem-Qwen3-1.7B-TR shows consistent improvements over the base model across all context length settings, demonstrating the effectiveness of Turkish legal domain adaptation.*
## Usage
### Installation
```bash
pip install transformers torch
```
### Text Generation
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("newmindai/Mecellem-Qwen3-1.7B-TR")
model = AutoModelForCausalLM.from_pretrained("newmindai/Mecellem-Qwen3-1.7B-TR")
# Example prompt
prompt = "Türk hukuk sisteminde sözleşme feshi"
inputs = tokenizer(prompt, return_tensors="pt")
# Generate
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.7,
do_sample=True,
top_p=0.9
)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
```
### Chat Format
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("newmindai/Mecellem-Qwen3-1.7B-TR")
model = AutoModelForCausalLM.from_pretrained("newmindai/Mecellem-Qwen3-1.7B-TR")
messages = [
{"role": "user", "content": "Türk hukuk sisteminde sözleşme feshi nasıl yapılır?"}
]
# Apply chat template
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")
# Generate response
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=256)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```
## Use Cases
- Turkish legal text generation
- Legal document summarization
- Legal question answering
- Legal text completion
- Domain-specific language modeling for Turkish legal domain
- Retrieval-Augmented Generation (RAG) applications
## Acknowledgments
This work was supported by the EuroHPC Joint Undertaking through project etur46 with access to the MareNostrum 5 supercomputer, hosted by Barcelona Supercomputing Center (BSC), Spain. MareNostrum 5 is owned by EuroHPC JU and operated by BSC. We are grateful to the BSC support team for their assistance with job scheduling, environment configuration, and technical guidance throughout the project.
The numerical calculations reported in this work were fully/partially performed at TÜBİTAK ULAKBİM, High Performance and Grid Computing Center (TRUBA resources). The authors gratefully acknowledge the know-how provided by the MINERVA Support for expert guidance and collaboration opportunities in HPC-AI integration.
## References
If you use this model, please cite our paper:
```bibtex
@article{mecellem2026,
title={Mecellem Models: Turkish Models Trained from Scratch and Continually Pre-trained for the Legal Domain},
author={Uğur, Özgür and Göksu, Mahmut and Çimen, Mahmut and Yılmaz, Musa and Şavirdi, Esra and Demir, Alp Talha and Güllüce, Rumeysa and İclal Çetin and Sağbaş, Ömer Can},
journal={arXiv preprint arXiv:2601.16018},
year={2026},
month={January},
url={https://arxiv.org/abs/2601.16018},
doi={10.48550/arXiv.2601.16018},
eprint={2601.16018},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
### Base Model References
```bibtex
@article{qwen2024,
title={Qwen3: A Large Language Model Series},
author={Qwen Team},
journal={arXiv preprint arXiv:2409.00000},
year={2024}
}
``` |