gpt-oss-140b Ren-2

gpt-oss-140b, codename Ren-2, is an agentic / SWE-oriented derivative of OpenAI GPT-OSS 120B.

This release takes the 120B base model and adds roughly 20B more parameters oriented toward agentic coding and SWE-style behavior.

Overview

  • Base model: openai/gpt-oss-120b
  • Release name: gpt-oss-140b Ren-2
  • Format: MXFP4
  • Intended use: coding, agentic coding, SWE-style assistant workflows
  • Status: research preview

Ren-2 is meant to feel like a more agentic version of GPT-OSS 120B rather than a generic continuation of the base checkpoint.

Training

  • Built on a custom framework
  • Roughly 3 hours of pre-training / post-training work for this release path
  • Expanded from the 120B base with roughly 20B additional parameters

This is an iterative open release. More sizes and follow-up revisions will come later.

Inference

This model was tested with:

  • vLLM 0.19.0

Recommended serving settings:

  • num_experts_per_tok=12
  • --reasoning-parser openai_gptoss
  • --tool-call-parser openai
  • --enable-auto-tool-choice

The original base setup used 4 active experts per token. Ren-2 is intended to run at 12 active experts per token.

Rough active-equivalent compute:

  • original top-k=4: about 5.7B active-equivalent params
  • Ren-2 top-k=12: about 12.9B active-equivalent params

These are approximate active-equivalent numbers, not total parameter counts.

In internal agentic-task traces at top-k=12, roughly half of active routing traffic ran through the added 20B expansion. Observed new-expert usage in those traces was about 48.6% of active expert selections and about 46.2% of routing mass. This is workload-dependent.

This release is intended to run directly from the baked model shards. No extra router merge step is required at inference time.

What It Is Good At

  • coding
  • agentic coding
  • SWE-style assistant behavior
  • practical tool-using workflows

Ren-2 is intended to be usable for production-style coding and agentic workflows, including terminal coding agents, SWE assistants, and tool-using automation setups.

Feedback

Useful feedback includes:

  • coding quality
  • tool use quality
  • long-context behavior
  • inference stability
  • preferred smaller sizes / VRAM targets

If you want smaller custom models, reach out with the hardware target and the kind of feedback you can provide.

It can be a different size or architecture, as long as the feedback loop is useful.

Included Files

  • config.json
  • generation_config.json
  • tokenizer.json
  • tokenizer_config.json
  • chat_template.jinja
  • model.safetensors.index.json
  • model-*.safetensors
  • README.md

License

Replace the placeholder license: other metadata with the actual license you want to publish under after confirming compatibility with the base model and your added weights.

Downloads last month
2
Safetensors
Model size
143B params
Tensor type
BF16
·
U8
·
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for LLMWildling/gpt-oss-140b-ren-2

Quantized
(112)
this model