Instructions to use shazzadulimun/gpt-oss-120b-aurora-chat-v3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use shazzadulimun/gpt-oss-120b-aurora-chat-v3 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="shazzadulimun/gpt-oss-120b-aurora-chat-v3") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("shazzadulimun/gpt-oss-120b-aurora-chat-v3") model = AutoModelForMultimodalLM.from_pretrained("shazzadulimun/gpt-oss-120b-aurora-chat-v3") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use shazzadulimun/gpt-oss-120b-aurora-chat-v3 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "shazzadulimun/gpt-oss-120b-aurora-chat-v3" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "shazzadulimun/gpt-oss-120b-aurora-chat-v3", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/shazzadulimun/gpt-oss-120b-aurora-chat-v3
- SGLang
How to use shazzadulimun/gpt-oss-120b-aurora-chat-v3 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "shazzadulimun/gpt-oss-120b-aurora-chat-v3" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "shazzadulimun/gpt-oss-120b-aurora-chat-v3", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "shazzadulimun/gpt-oss-120b-aurora-chat-v3" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "shazzadulimun/gpt-oss-120b-aurora-chat-v3", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use shazzadulimun/gpt-oss-120b-aurora-chat-v3 with Docker Model Runner:
docker model run hf.co/shazzadulimun/gpt-oss-120b-aurora-chat-v3
gpt-oss-120B-Aurora-Chat v3
LoRA fine-tune of openai/gpt-oss-120b specialized for the
ALCF Aurora supercomputer (Intel Xeon Sapphire
Rapids + Intel GPU Max 1550 / Ponte Vecchio, oneAPI / SYCL, PBS Pro).
Off-the-shelf code-LLMs hallucinate Aurora specifics — they suggest nvcc instead of
icpx -fsycl, srun / aprun instead of mpiexec, NERSC's /global/cfs instead of
/lus/flare, and CUDA device strings instead of xpu. This adapter teaches the base
model the actual Aurora toolchain, file system layout, scheduler conventions, and
recommended PyTorch/TensorFlow/SYCL idioms.
Model summary
| Base model | openai/gpt-oss-120b |
| Format | Merged 16-bit — HuggingFace Transformers / vLLM / TGI |
| Fine-tuning | LoRA (PEFT) — r=32, α=64, dropout 0.0, 2 epochs |
| Optimizer | AdamW fused, lr 2e-4 cosine, warmup 3%, batch 1 × grad-accum 8 |
| Precision / seq-len | bf16, 1,536 tokens |
| Training data | aurora-docs-distill-multirank — 4,495 ChatML rows |
| Train loss (final) | 0.4800 |
| Hardware | Aurora node, model-parallel across 1–12 PVC tiles via HF device_map='auto', IPEX + PyTorch 2.10 XPU backend |
| Eval (53-Q Aurora, 0–5) | pending |
Quick start
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
tok = AutoTokenizer.from_pretrained("shazzadulimun/gpt-oss-120b-aurora-chat-v3")
mdl = AutoModelForCausalLM.from_pretrained("shazzadulimun/gpt-oss-120b-aurora-chat-v3", torch_dtype=torch.bfloat16, device_map="auto")
msgs = [{"role": "user", "content": "How do I launch one MPI rank per Aurora GPU tile?"}]
ids = tok(tok.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True), return_tensors="pt").to(mdl.device)
print(tok.decode(mdl.generate(**ids, max_new_tokens=400, temperature=0.0)[0][ids.input_ids.shape[1]:], skip_special_tokens=True))
Training data
Distilled from openai/gpt-oss-120b on ALCF Sophia (vLLM) over 416 cleaned chunks of
docs.alcf.anl.gov/aurora. 4,495
training rows + 562 validation rows in ChatML format with embedded
chain-of-thought (**Reasoning:** / **Answer:**).
Broad coverage, parallel-rank distillation. 20 worker ranks each took a disjoint slice (~21 chunks) of the cleaned docs.alcf.anl.gov/aurora corpus and asked the teacher for chain-of-thought QA pairs. Disjoint slicing maximizes phrasing diversity (each rank sees fresh context) while still covering every chunk exactly once.
Full corpus + reproduction scripts: SIslamMun/Generator @ aurora-datasets-2026-04-30.
Evaluation
Part of the v3 parameter-size sweep (1B → 120B trained on the same dataset). Holdout scorecard appears here once the full sweep completes.
Limitations
- Synthetic-data biases. Teacher (
gpt-oss-120b) can confabulate plausible-looking but incorrect commands. Treat outputs as a verifiable first draft, not authoritative. - Doc snapshot is fixed at 2026-04-29. Module versions, queue names, and APIs change — anything published after that date isn't reflected here.
- Aurora-only. Specifics (
/lus/flare,xpu, PBS queues) won't transfer to Frontier, Polaris, or other systems. - Use temperature ≤ 0.1 for technical answers; higher temps invite invented flag names and paths.
Citation
@misc{aurora-llms-2026,
title = { gpt-oss-120B-Aurora-Chat v3 },
author = { Islam Mun, Shazzadul },
year = { 2026 },
url = { https://huggingface.co/shazzadulimun/gpt-oss-120b-aurora-chat-v3 },
note = { LoRA fine-tune of gpt-oss-120b; data distilled from gpt-oss-120b on docs.alcf.anl.gov/aurora }
}
License
Apache-2.0 for the adapter weights and synthetic training data. Source corpus is public
ALCF user documentation. Base model retains its own license — see
openai/gpt-oss-120b.
- Downloads last month
- 80
Model tree for shazzadulimun/gpt-oss-120b-aurora-chat-v3
Base model
openai/gpt-oss-120b