Instructions to use mayflowergmbh/boldt-dc-1b-german-it-16k-dpo with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use mayflowergmbh/boldt-dc-1b-german-it-16k-dpo with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="mayflowergmbh/boldt-dc-1b-german-it-16k-dpo") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("mayflowergmbh/boldt-dc-1b-german-it-16k-dpo") model = AutoModelForMultimodalLM.from_pretrained("mayflowergmbh/boldt-dc-1b-german-it-16k-dpo") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use mayflowergmbh/boldt-dc-1b-german-it-16k-dpo with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "mayflowergmbh/boldt-dc-1b-german-it-16k-dpo" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mayflowergmbh/boldt-dc-1b-german-it-16k-dpo", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/mayflowergmbh/boldt-dc-1b-german-it-16k-dpo
- SGLang
How to use mayflowergmbh/boldt-dc-1b-german-it-16k-dpo with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "mayflowergmbh/boldt-dc-1b-german-it-16k-dpo" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mayflowergmbh/boldt-dc-1b-german-it-16k-dpo", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "mayflowergmbh/boldt-dc-1b-german-it-16k-dpo" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mayflowergmbh/boldt-dc-1b-german-it-16k-dpo", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use mayflowergmbh/boldt-dc-1b-german-it-16k-dpo with Docker Model Runner:
docker model run hf.co/mayflowergmbh/boldt-dc-1b-german-it-16k-dpo
# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM
tokenizer = AutoTokenizer.from_pretrained("mayflowergmbh/boldt-dc-1b-german-it-16k-dpo")
model = AutoModelForMultimodalLM.from_pretrained("mayflowergmbh/boldt-dc-1b-german-it-16k-dpo")
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))Boldt-DC-1B German IT 16K — DPO + SLERP refinement
A SLERP-merged variant of mayflowergmbh/boldt-dc-1b-german-it-16k. Built by merging the SFT model with a DPO-tuned checkpoint of itself, following the same-model-merging approach described in the LFM2 technical report §4.4 (Liquid AI, arXiv:2511.23404). Same architecture, same context length, same chat format as the SFT release.
Usage
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "mayflowergmbh/boldt-dc-1b-german-it-16k-dpo"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, dtype=torch.bfloat16, device_map="cuda")
messages = [{"role": "user", "content": "Erkläre kurz, was eine Funktion in Python ist."}]
prompt = tok.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=200, do_sample=False)
print(tok.decode(out[0, inputs.input_ids.shape[-1]:], skip_special_tokens=False))
generation_config.eos_token_id = [0, 32003] covers both <|endoftext|> and <|end|>.
Recipe
- SFT model (
mayflowergmbh/boldt-dc-1b-german-it-16k): plaintransformers+peftSFT ofBoldt/Boldt-DC-1B, 7000 steps at 16K context. - DPO checkpoint: TRL
DPOTrainer,loss_type="sigmoid", β=0.3,rpo_alpha=0.5(NLL anchor on chosen — prevents the "response suppression" failure mode documented in 3D-Properties of DPO, Yan et al. 2024, arXiv:2406.07327). LR 5e-7, 800 steps, LoRA r=32 on QKV/MLP. Dataset:mayflowergmbh/boldt-dc-1b-orpo-onpolicy-delength-filtered to|chosen|/|rejected| ≤ 3(54k → 22k pairs). - SLERP merge via
mergekitatt=0.5,dtype=bfloat16,tokenizer_source: union. ~30 seconds on a single A6000.
Why merge: the SFT model preserves more reasoning capacity (commonsense benchmarks regress less than under pure DPO), while the DPO model has slightly better chat-style behaviour. SLERP at t=0.5 recovers both. The LFM2 paper documents the same observation for full-model merging at the 1.2B scale.
Evaluation (lm-evaluation-harness, German tier 1)
| Task | base (no FT) | SFT | DPO (pre-merge) | this (merge) |
|---|---|---|---|---|
| arc_de (25-shot) | 0.3618 | 0.3319 | 0.3285 | 0.3353 |
| hellaswag_de (10-shot) | 0.5037 | 0.4655 | 0.4667 | 0.4651 |
| m_mmlu_de (5-shot) | 0.2560 | 0.2488 | 0.2503 | 0.2488 |
| truthfulqa_de_mc2 (0-shot) | 0.3733 | 0.4154 | 0.4164 | 0.4160 |
| belebele_deu_Latn (0-shot) | 0.2289 | 0.2278 | 0.2344 | 0.2367 |
| mean | 0.3448 | 0.3379 | 0.3393 | 0.3404 |
The merge is the highest-mean working variant in the SFT/DPO/merge family — +0.25 pp over pure SFT, +0.11 pp over pure DPO. Largest individual gains: arc_de recovers +0.34 pp from the DPO regression, belebele_deu_Latn adds +0.89 pp over SFT. Per-task deltas are within stderr (~±1.5 pp), but the direction is consistent across tasks and the result reproduces the LFM2 paper's claim that same-model merging recovers task-specific knowledge that preference tuning erodes.
No public 1B-class German chat model published Q4 2025 – Q1 2026 has been found to meaningfully exceed the Boldt/Boldt-DC-1B base on these tier-1 averages without teacher-model distillation. This release does not close that gap, but it is the highest tier-1 mean among models in this family that also generate coherent German.
Mergekit config
slices:
- sources:
- model: mayflowergmbh/boldt-dc-1b-german-it-16k # SFT
layer_range: [0, 16]
- model: <DPO-tuned checkpoint of the SFT>
layer_range: [0, 16]
merge_method: slerp
base_model: mayflowergmbh/boldt-dc-1b-german-it-16k
parameters:
t: 0.5
dtype: bfloat16
tokenizer_source: union
Known limitations
Inherits all of the SFT base's limits — math arithmetic is unreliable (1.25 B ceiling), factual recall has typical small-model errors, no tool-use / function-calling training, long-context use beyond ~8 K is untested.
License
Apache-2.0 (inherits from the base model).
- Downloads last month
- 36
Model tree for mayflowergmbh/boldt-dc-1b-german-it-16k-dpo
Base model
Boldt/Boldt-DC-1B
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="mayflowergmbh/boldt-dc-1b-german-it-16k-dpo") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)