--- license: apache-2.0 library_name: transformers base_model: unsloth/NVIDIA-Nemotron-3-Nano-4B tags: - noir - nemotron - unsloth - modal - sankalphs pipeline_tag: text-generation --- # noir-verdict-nemotron-4b-merged ## How to use ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer repo = "sankalphs/noir-verdict-nemotron-4b-merged" tok = AutoTokenizer.from_pretrained(repo, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( repo, torch_dtype=torch.bfloat16, trust_remote_code=True, ).cuda().eval() ``` ### Chat template The chat template is the Nemotron 3 chat template, with `enable_thinking=False` baked in. The system prompt for an active interrogation is built by `engine/prompts.py:build_system_prompt(...)`. ```python messages = [ {"role": "system", "content": "You are Greta Lindholm, junior continuity writer at WJBK. ..."}, {"role": "user", "content": "Where were you at the time of the theft?"}, ] text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=False) ``` ### Inference tips - `n_ctx` ≥ 4096 - `temperature` 0.6–0.7, `top_p` 0.9–0.95 - `max_new_tokens` 180–280 per turn - Stop on `<|im_end|>` ## How it was trained - **Image**: `nvidia/cuda:12.8.1-devel-ubuntu22.04` + Python 3.13 - **Pip deps**: `torch>=2.8.0`, `triton>=3.4.0`, `unsloth_zoo[base] @ git+https://github.com/unslothai/unsloth-zoo`, `unsloth[base] @ git+https://github.com/unslothai/unsloth`, `--torch-backend=cu128` - **Native**: `causal-conv1d==1.6.2.post1` and `mamba-ssm==2.3.2.post1` compiled from source with `--no-build-isolation`, `CC=gcc`, `CXX=g++` (no prebuilt cu128 + Py3.13 wheel exists) - **Trainer**: TRL `SFTTrainer`, packing, bf16, Unsloth LoRA (r=16, alpha=32, lr=2e-4 cosine, bs=2 grad_accum=8, 240 steps) - **Orchestrator**: `train/modal_finetune.py` ## 5-case smoke results (A10G, `--n-gpu-layers 99`) | case | suspect | personality | truth_mode | failure_flags | |---|---|---|---|---| | 0 | Greta Lindholm | nervous | lie | none | | 37 | (37, 1) | helpful | partial_truth | none | | 113 | (113, 2) | arrogant | truth | none | | 241 | (241, 3) | evasive | deflect | none | | 497 | Greta Lindholm | nervous | lie | none | Pace: ~125 tokens/sec on A10G. No role-token leaks, no leaked `` blocks, no overlong generations. ## Companion artifacts - **LoRA**: [sankalphs/noir-verdict-nemotron-4b-lora](https://huggingface.co/sankalphs/noir-verdict-nemotron-4b-lora) - **Merged BF16**: [sankalphs/noir-verdict-nemotron-4b-merged](https://huggingface.co/sankalphs/noir-verdict-nemotron-4b-merged) (7.95 GB) - **Q4_K_M GGUF**: [sankalphs/noir-verdict-nemotron-4b-gguf](https://huggingface.co/sankalphs/noir-verdict-nemotron-4b-gguf) (2.84 GB) - **Traces**: [sankalphs/noir-verdict-traces](https://huggingface.co/datasets/sankalphs/noir-verdict-traces) - **App**: [build-small-hackathon/noir-verdict](https://huggingface.co/spaces/build-small-hackathon/noir-verdict) ## License Apache-2.0. The base Nemotron 3 Nano weights are governed by NVIDIA's [model license](https://huggingface.co/unsloth/NVIDIA-Nemotron-3-Nano-4B); the adapter and training code in this repo are Apache-2.0.