---
license: apache-2.0
library_name: transformers
base_model: unsloth/NVIDIA-Nemotron-3-Nano-4B
tags:
  - noir
  - nemotron
  - unsloth
  - modal
  - sankalphs
pipeline_tag: text-generation
---

# noir-verdict-nemotron-4b-merged

## How to use

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

repo = "sankalphs/noir-verdict-nemotron-4b-merged"
tok = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    repo, torch_dtype=torch.bfloat16, trust_remote_code=True,
).cuda().eval()
```

### Chat template

The chat template is the Nemotron 3 chat template, with
`enable_thinking=False` baked in. The system prompt for an active
interrogation is built by `engine/prompts.py:build_system_prompt(...)`.

```python
messages = [
    {"role": "system", "content": "You are Greta Lindholm, junior continuity writer at WJBK. ..."},
    {"role": "user",   "content": "Where were you at the time of the theft?"},
]
text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=False)
```

### Inference tips

- `n_ctx` ≥ 4096
- `temperature` 0.6–0.7, `top_p` 0.9–0.95
- `max_new_tokens` 180–280 per turn
- Stop on `<|im_end|>`


## How it was trained

- **Image**: `nvidia/cuda:12.8.1-devel-ubuntu22.04` + Python 3.13
- **Pip deps**: `torch>=2.8.0`, `triton>=3.4.0`, `unsloth_zoo[base] @ git+https://github.com/unslothai/unsloth-zoo`, `unsloth[base] @ git+https://github.com/unslothai/unsloth`, `--torch-backend=cu128`
- **Native**: `causal-conv1d==1.6.2.post1` and `mamba-ssm==2.3.2.post1` compiled from source with `--no-build-isolation`, `CC=gcc`, `CXX=g++` (no prebuilt cu128 + Py3.13 wheel exists)
- **Trainer**: TRL `SFTTrainer`, packing, bf16, Unsloth LoRA (r=16, alpha=32, lr=2e-4 cosine, bs=2 grad_accum=8, 240 steps)
- **Orchestrator**: `train/modal_finetune.py`

## 5-case smoke results (A10G, `--n-gpu-layers 99`)

| case | suspect | personality | truth_mode | failure_flags |
|---|---|---|---|---|
| 0   | Greta Lindholm | nervous  | lie          | none |
| 37  | (37, 1)        | helpful  | partial_truth | none |
| 113 | (113, 2)       | arrogant | truth         | none |
| 241 | (241, 3)       | evasive  | deflect       | none |
| 497 | Greta Lindholm | nervous  | lie          | none |

Pace: ~125 tokens/sec on A10G. No role-token leaks, no leaked `<think>` blocks, no overlong generations.

## Companion artifacts

- **LoRA**: [sankalphs/noir-verdict-nemotron-4b-lora](https://huggingface.co/sankalphs/noir-verdict-nemotron-4b-lora)
- **Merged BF16**: [sankalphs/noir-verdict-nemotron-4b-merged](https://huggingface.co/sankalphs/noir-verdict-nemotron-4b-merged) (7.95 GB)
- **Q4_K_M GGUF**: [sankalphs/noir-verdict-nemotron-4b-gguf](https://huggingface.co/sankalphs/noir-verdict-nemotron-4b-gguf) (2.84 GB)
- **Traces**: [sankalphs/noir-verdict-traces](https://huggingface.co/datasets/sankalphs/noir-verdict-traces)
- **App**: [build-small-hackathon/noir-verdict](https://huggingface.co/spaces/build-small-hackathon/noir-verdict)

## License

Apache-2.0. The base Nemotron 3 Nano weights are governed by NVIDIA's
[model license](https://huggingface.co/unsloth/NVIDIA-Nemotron-3-Nano-4B);
the adapter and training code in this repo are Apache-2.0.