---
license: apache-2.0
tags:
  - neural-rendering
  - real-time-graphics
  - game-engine
  - pytorch
  - onnx
---

# Uruk Neural Renderer

A multi-model neural rendering pipeline designed for real-time game graphics, trained on NVIDIA B200 GPUs. The Uruk system uses a modular workstream architecture where specialized models handle different aspects of the rendering pipeline — from world modeling and scene remapping to cinematic rendering and runtime optimization.

## Architecture Overview

The Uruk Neural Renderer is organized into a multi-workstream pipeline, where each workstream trains a specialized model family. A policy-compliant orchestrator manages the 4-stage training lifecycle: **Smoke** (bug-catching), **Calibration** (hyperparameter tuning), **Production** (full training with early stopping), and **Distillation** (teacher-to-student compression).

The flagship **V2-Ultra** model uses a 32-channel input (including a 12-channel G-buffer with material IDs, depth, and normals) and achieves **94.6% material accuracy** — enabling physically-correct lighting decisions based on ground-truth geometry rather than screen-space inference. This architecture is designed to exceed DLSS 5 quality by relying on deterministic material accuracy rather than AI hallucinations.

## Models

### V2-Ultra (Neural Renderer V2)

The primary neural renderer with a 6.9M-parameter student model and 37.4M-parameter teacher. Trained with a 2-stage curriculum on the Frontier dataset.

| File | Description | Size |
|---|---|---|
| `v2_ultra/v2_ultra_global_best.pt` | Global best checkpoint (student) | 26.4 MB |
| `v2_ultra/v2_ultra_best_stage1.pt` | Best Stage 1 (foundation) checkpoint | 79.3 MB |
| `v2_ultra/v2_ultra_best_stage2.pt` | Best Stage 2 (fine-tune) checkpoint | 79.3 MB |
| `onnx/uruk_v2_ultra_best.onnx` | ONNX export for deployment | 26.4 MB |

### V2-Optimized (Reconstruction-First Approach)

An improved training run that initializes from V1 weights and uses a 3-stage curriculum (rendering, material, optional GAN) with MS-SSIM loss and quality gates.

| File | Description | Size |
|---|---|---|
| `v2_optimized/v2opt_global_best.pt` | Global best checkpoint | 26.4 MB |

### Workstream Production Models

Best checkpoints from the policy-compliant orchestrator production runs.

| File | Workstream | Description | Size |
|---|---|---|---|
| `workstreams/ws2_learned_world/ws2_best.pt` | WS2 — Learned World Model (Family D) | Learns world dynamics and physics | 59.4 MB |
| `workstreams/ws3_world_authoring/ws3_best.pt` | WS3 — World Authoring (Family B) | Procedural world generation from embeddings | 103.3 MB |
| `workstreams/ws4_world_remapper/ws4_best.pt` | WS4 — World Remapper (Family C) | Scene-to-scene transformation | 259.3 MB |
| `workstreams/ws4_world_remapper_v2/ws4_v2_best.pt` | WS4 v2 — World Remapper (optimized rerun) | Completed all 500 epochs | 259.3 MB |
| `workstreams/ws5_cinematic_renderer/ws5_frontier_best.pt` | WS5 — Cinematic Renderer (Family I) | Rich G-buffer rendering (19-ch input, 10.1M params) | 116.2 MB |
| `workstreams/ws6_runtime_optimization/ws6_best.pt` | WS6 — Runtime Optimization (Family G) | Inference speed optimization | 4.4 MB |
| `workstreams/ws6_runtime_optimization_v2/ws6_v2_best.pt` | WS6 v2 — Runtime Optimization (rerun) | Completed all 500 epochs | 13.7 MB |
| `workstreams/ws7_scene_to_world_v2/ws7_v2_best.pt` | WS7 v2 — Scene to World | Graph-based scene understanding | 43.7 MB |

### Distillation Students

Distilled student models compressed from the production teachers for optimal runtime performance.

| File | Description | Size |
|---|---|---|
| `distillation/ws2_student_best.pt` | WS2 Learned World Model distilled student (best val_loss: 0.000375) | 21.6 MB |
| `distillation/ws3_student_best.pt` | WS3 World Authoring distilled student (best val_loss: 0.000073) | 38.4 MB |
| `distillation/ws4_student_best.pt` | WS4 World Remapper distilled student (best val_loss: 0.049) | 90.6 MB |
| `distillation/ws5_student_best.pt` | WS5 Cinematic Renderer distilled student (best val_loss: 0.216) | 40.6 MB |
| `distillation/ws6_student_best.pt` | WS6 Runtime Optimization distilled student (best val_loss: 0.000016) | 1.76 MB |

*(Note: Final epoch checkpoints are also available in the repository as `*_student_final.pt`)*

### Additional Models

| File | Description | Size |
|---|---|---|
| `npc_director_v2/best.pt` | NPC Director v2 — behavioral AI for NPC state management (99.57% state accuracy) | 15.0 MB |
| `animation_director/best.pt` | Animation Director — procedural animation control | 8.0 MB |
| `ws8_structure_generator/best_model.pt` | WS8 — Structure Generator | 32.4 MB |
| `onnx/npc_director.onnx` | NPC Director ONNX export | 5.1 MB |

## Training Infrastructure

All models were trained on a single **NVIDIA B200** GPU (183 GB VRAM) on RunPod. The multi-workstream orchestrator managed sequential training across all workstreams with automatic stage transitions, early stopping, and plateau detection.

## Usage

```python
import torch

# Load a checkpoint
checkpoint = torch.load("v2_ultra/v2_ultra_global_best.pt", map_location="cpu")
model_state = checkpoint["model_state_dict"]

# For ONNX inference
import onnxruntime as ort
session = ort.InferenceSession("onnx/uruk_v2_ultra_best.onnx")
```

## License

Apache 2.0