Instructions to use TeamPV/mistral-nemo-onr-sft-singleGPU with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use TeamPV/mistral-nemo-onr-sft-singleGPU with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-Nemo-Instruct-2407")
model = PeftModel.from_pretrained(base_model, "TeamPV/mistral-nemo-onr-sft-singleGPU")

Transformers

How to use TeamPV/mistral-nemo-onr-sft-singleGPU with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="TeamPV/mistral-nemo-onr-sft-singleGPU")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("TeamPV/mistral-nemo-onr-sft-singleGPU")
model = AutoModelForMultimodalLM.from_pretrained("TeamPV/mistral-nemo-onr-sft-singleGPU")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use TeamPV/mistral-nemo-onr-sft-singleGPU with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "TeamPV/mistral-nemo-onr-sft-singleGPU"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TeamPV/mistral-nemo-onr-sft-singleGPU",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/TeamPV/mistral-nemo-onr-sft-singleGPU

SGLang

How to use TeamPV/mistral-nemo-onr-sft-singleGPU with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "TeamPV/mistral-nemo-onr-sft-singleGPU" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TeamPV/mistral-nemo-onr-sft-singleGPU",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "TeamPV/mistral-nemo-onr-sft-singleGPU" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TeamPV/mistral-nemo-onr-sft-singleGPU",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use TeamPV/mistral-nemo-onr-sft-singleGPU with Docker Model Runner:
```
docker model run hf.co/TeamPV/mistral-nemo-onr-sft-singleGPU
```

See axolotl config

axolotl version: 0.13.0.dev0

base_model: mistralai/Mistral-Nemo-Instruct-2407

##uncomment for 2 GPU. More than two require more settings.
#deepspeed: deepspeed_configs/zero1.json

# Model quantization for qLoRA
bnb_config_kwargs:
  bnb_4bit_compute_dtype: bfloat16
  bnb_4bit_quant_type: nf4
  bnb_4bit_use_double_quant: true

seed: 42 # do not change
val_set_size: 0.01 # Use 1% of the dataset for validation; no pre-split in dataset
## For other datasets set to ratio based on dataset size, 100k - 0.01, ...,  100 - 0.05
datasets:
  - path: TeamPV/distractors-onr-v2
    split: train
    type: chat_template
    conversation: messages  # Your dataset has 'messages' field

chat_template: tokenizer_default # Use model's built-in chat template

eval_sample_packing: false # Only 70b model can handle this
eval_batch_size: 14 # TUNE THIS to achieve ~70+ GB CRAM usage on H100 (often same value as micro_batch_size in pre-trainer config)
evals_per_epoch: 5
# early_stopping_patience: 3


# Tokenization
sequence_len: 3000 # CRITICAL to check
pad_to_sequence_len: true
sample_packing: false # this will make small models go insane.

special_tokens:
  pad_token: "</s>"

# LoRA/DoRA
adapter: lora
lora_r: 32 # 70B will require 128. Memory cost, workarounds exist.
lora_alpha: 64 # 2x r
lora_dropout: 0.05
lora_target_modules: # This is basic full coverage. For LLAMA use unsloth.
  - q_proj
  - k_proj
  - v_proj
  - o_proj
  - up_proj
  - down_proj
  - gate_proj
peft_use_dora: false # 2x slower training, but allowed to drop r x4
output_dir: /model_out/mistral-nemo-12b_sft # change this
use_tensorboard: true

# Training
micro_batch_size: 9 # TUNE THIS to achieve ~70+ GB VRAM usage on H100
gradient_accumulation_steps: 1 # Not worth it under 12B on h100. 70B will be mandatory.
num_epochs: 4 # SFT is 4-5
learning_rate: 0.00005
lr_scheduler: cosine
warmup_ratio: 0.10

# Optimizer
# optimizer: adamw_torch_fused
optimizer: adamw_bnb_8bit
bf16: true
fp16: false
tf32: true # H100 parameter

# Attention
flash_attention: true

# Memory
gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false

# Checkpointing
save_first_step: true
saves_per_epoch: 2
save_total_limit: 10
load_best_model_at_end: true

# Logging
logging_steps: 50

# HuggingFace Hub upload
hub_model_id: TeamPV/mistral-nemo-onr-sft  # ALWAYS CHANGE
hub_strategy: every_save  # Options: end, every_save, checkpoint, all_checkpoints
hf_use_auth_token: true

mistral-nemo-onr-sft

This model is a fine-tuned version of mistralai/Mistral-Nemo-Instruct-2407 on the TeamPV/distractors-onr-v2 dataset. It achieves the following results on the evaluation set:

Loss: 1.0149
Memory/max Active (gib): 77.11
Memory/max Allocated (gib): 77.11
Memory/device Reserved (gib): 77.96

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 9
eval_batch_size: 14
seed: 42
optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 4393
training_steps: 43939

Training results

Training Loss	Epoch	Step	Validation Loss	Active (gib)	Allocated (gib)	Reserved (gib)
No log	0	0	1.9760	76.86	76.86	77.68
1.1451	0.2	2197	1.1194	77.11	77.11	77.96
1.0682	0.4	4394	1.0709	77.11	77.11	77.96
1.0512	0.6	6591	1.0371	77.11	77.11	77.96
1.0213	0.8	8788	1.0147	77.11	77.11	77.96
1.0041	1.0	10985	0.9990	77.11	77.11	77.96
0.9459	1.2	13182	0.9950	77.11	77.11	77.96
0.9329	1.4	15379	0.9897	77.11	77.11	77.96
0.9445	1.6	17576	0.9783	77.11	77.11	77.96
0.9434	1.8	19773	0.9706	77.11	77.11	77.96
0.88	2.0	21970	0.9620	77.11	77.11	77.96
0.8008	2.2	24167	0.9877	77.11	77.11	77.96
0.7725	2.4	26364	0.9867	77.11	77.11	77.96
0.781	2.6	28561	0.9801	77.11	77.11	77.96
0.7722	2.8	30758	0.9785	77.11	77.11	77.96
0.7704	3.0	32955	0.9736	77.11	77.11	77.96
0.6672	3.2	35152	1.0137	77.11	77.11	77.96
0.6657	3.4	37349	1.0155	77.11	77.11	77.96
0.6744	3.6	39546	1.0152	77.11	77.11	77.96
0.6398	3.8	41743	1.0149	77.11	77.11	77.96

Framework versions

PEFT 0.17.1
Transformers 4.57.1
Pytorch 2.8.0+cu128
Datasets 4.0.0
Tokenizers 0.22.1

Downloads last month: -

Model tree for TeamPV/mistral-nemo-onr-sft-singleGPU

Base model

mistralai/Mistral-Nemo-Base-2407

Finetuned

mistralai/Mistral-Nemo-Instruct-2407

Adapter

(95)

this model