Instructions to use johannhartmann/Wiedervereinigung-WIP with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use johannhartmann/Wiedervereinigung-WIP with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="johannhartmann/Wiedervereinigung-WIP") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("johannhartmann/Wiedervereinigung-WIP") model = AutoModelForMultimodalLM.from_pretrained("johannhartmann/Wiedervereinigung-WIP") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use johannhartmann/Wiedervereinigung-WIP with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "johannhartmann/Wiedervereinigung-WIP" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "johannhartmann/Wiedervereinigung-WIP", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/johannhartmann/Wiedervereinigung-WIP
- SGLang
How to use johannhartmann/Wiedervereinigung-WIP with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "johannhartmann/Wiedervereinigung-WIP" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "johannhartmann/Wiedervereinigung-WIP", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "johannhartmann/Wiedervereinigung-WIP" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "johannhartmann/Wiedervereinigung-WIP", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use johannhartmann/Wiedervereinigung-WIP with Docker Model Runner:
docker model run hf.co/johannhartmann/Wiedervereinigung-WIP
Wiedervereinigung-7b-dpo
This is a dpo aligned merge of our favourite german models, scoring 7.11 on the mt-bench-de average. Since the original models based on mistral - three of them on the brilliant german LeoLM/leo-mistral-hessianai-7b - they are reunited in this merged model. Therefore the name, no nationalist ideas involved :-).
To improve result quality they are dpo-trained with a german translation of slimorca dpo using hermeo-7B for reject results.
If you are gpu-poor like me you can now use LLaMA-Factory to train with german datasets.
Kudos to the authors of the original models at DiscoResearch and VAGOsolutions, Malte Ostendorff and Matthias Uhlig. We are your fan club.
This model was brought to you and the nvidia bill was paid by Mayflower GmbH.
Benchmark results: mt-bench-de
Is the merged model alone already good? Well, of course. But it is even better with the help of some dpo tuning.
{
"first_turn": 7.3,
"second_turn": 6.925,
"categories": {
"writing": 8.425,
"roleplay": 8.6,
"reasoning": 5.4,
"math": 4.35,
"coding": 4.3,
"extraction": 7.975,
"stem": 8.5,
"humanities": 9.35
},
"average": 7.1125
}
Other Versions
A big thank you to LoneStriker for the quantized models.
| Name | Quant method | Bits |
|---|---|---|
| Wiedervereinigung-7b-dpo | Unquantized | 16 |
| Wiedervereinigung-7b-dpo-GPTQ | GPTQ | 4 |
| Wiedervereinigung-7b-dpo-AWQ | AWQ | 4 |
| Wiedervereinigung-7b-dpo-GGUF | GGUF | 3-8 |
| Wiedervereinigung-7b-dpo-8.0bpw-h8-exl2 | EXL2 | 8 |
| Wiedervereinigung-7b-dpo-6.0bpw-h6-exl2 | EXL2 | 6 |
| Wiedervereinigung-7b-dpo-5.0bpw-h6-exl2 | EXL2 | 5 |
| Wiedervereinigung-7b-dpo-4.0bpw-h6-exl2 | EXL2 | 4 |
| Wiedervereinigung-7b-dpo-3.0bpw-h6-exl2 | EXL2 | 3 |
Wiedervereinigung-7b is a LazyMergekit merge of:
- DiscoResearch/DiscoLM_German_7b_v1
- DRXD1000/Phoenix
- VAGOsolutions/SauerkrautLM-7b-v1-mistral
- malteos/hermeo-7b
🧩 Configuration
models:
- model: LeoLM/leo-mistral-hessianai-7b
# No parameters necessary for base model
- model: DiscoResearch/DiscoLM_German_7b_v1
parameters:
density: 0.6
weight: 0.25
- model: DRXD1000/Phoenix
parameters:
density: 0.6
weight: 0.25
- model: VAGOsolutions/SauerkrautLM-7b-v1-mistral
parameters:
density: 0.6
weight: 0.25
- model: malteos/hermeo-7b
parameters:
density: 0.6
weight: 0.25
merge_method: dare_ties
base_model: LeoLM/leo-mistral-hessianai-7b
parameters:
int8_mask: true
dtype: bfloat16
💻 Usage
!pip install -qU transformers accelerate
from transformers import AutoTokenizer
import transformers
import torch
model = "mayflowergmbh/Wiedervereinigung-7b-dpo"
messages = [{"role": "user", "content": "Was ist ein deutsches Large Language Model?"}]
tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
"text-generation",
model=model,
torch_dtype=torch.float16,
device_map="auto",
)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
- Downloads last month
- 2
