--- license: apache-2.0 tags: - neural-rendering - real-time-graphics - game-engine - pytorch - onnx --- # Uruk Neural Renderer A multi-model neural rendering pipeline designed for real-time game graphics, trained on NVIDIA B200 GPUs. The Uruk system uses a modular workstream architecture where specialized models handle different aspects of the rendering pipeline — from world modeling and scene remapping to cinematic rendering and runtime optimization. ## Architecture Overview The Uruk Neural Renderer is organized into a multi-workstream pipeline, where each workstream trains a specialized model family. A policy-compliant orchestrator manages the 4-stage training lifecycle: **Smoke** (bug-catching), **Calibration** (hyperparameter tuning), **Production** (full training with early stopping), and **Distillation** (teacher-to-student compression). The flagship **V2-Ultra** model uses a 32-channel input (including a 12-channel G-buffer with material IDs, depth, and normals) and achieves **94.6% material accuracy** — enabling physically-correct lighting decisions based on ground-truth geometry rather than screen-space inference. This architecture is designed to exceed DLSS 5 quality by relying on deterministic material accuracy rather than AI hallucinations. ## Models ### V2-Ultra (Neural Renderer V2) The primary neural renderer with a 6.9M-parameter student model and 37.4M-parameter teacher. Trained with a 2-stage curriculum on the Frontier dataset. | File | Description | Size | |---|---|---| | `v2_ultra/v2_ultra_global_best.pt` | Global best checkpoint (student) | 26.4 MB | | `v2_ultra/v2_ultra_best_stage1.pt` | Best Stage 1 (foundation) checkpoint | 79.3 MB | | `v2_ultra/v2_ultra_best_stage2.pt` | Best Stage 2 (fine-tune) checkpoint | 79.3 MB | | `onnx/uruk_v2_ultra_best.onnx` | ONNX export for deployment | 26.4 MB | ### V2-Optimized (Reconstruction-First Approach) An improved training run that initializes from V1 weights and uses a 3-stage curriculum (rendering, material, optional GAN) with MS-SSIM loss and quality gates. | File | Description | Size | |---|---|---| | `v2_optimized/v2opt_global_best.pt` | Global best checkpoint | 26.4 MB | ### Workstream Production Models Best checkpoints from the policy-compliant orchestrator production runs. | File | Workstream | Description | Size | |---|---|---|---| | `workstreams/ws2_learned_world/ws2_best.pt` | WS2 — Learned World Model (Family D) | Learns world dynamics and physics | 59.4 MB | | `workstreams/ws3_world_authoring/ws3_best.pt` | WS3 — World Authoring (Family B) | Procedural world generation from embeddings | 103.3 MB | | `workstreams/ws4_world_remapper/ws4_best.pt` | WS4 — World Remapper (Family C) | Scene-to-scene transformation | 259.3 MB | | `workstreams/ws4_world_remapper_v2/ws4_v2_best.pt` | WS4 v2 — World Remapper (optimized rerun) | Completed all 500 epochs | 259.3 MB | | `workstreams/ws5_cinematic_renderer/ws5_frontier_best.pt` | WS5 — Cinematic Renderer (Family I) | Rich G-buffer rendering (19-ch input, 10.1M params) | 116.2 MB | | `workstreams/ws6_runtime_optimization/ws6_best.pt` | WS6 — Runtime Optimization (Family G) | Inference speed optimization | 4.4 MB | | `workstreams/ws6_runtime_optimization_v2/ws6_v2_best.pt` | WS6 v2 — Runtime Optimization (rerun) | Completed all 500 epochs | 13.7 MB | | `workstreams/ws7_scene_to_world_v2/ws7_v2_best.pt` | WS7 v2 — Scene to World | Graph-based scene understanding | 43.7 MB | ### Distillation Students Distilled student models compressed from the production teachers for optimal runtime performance. | File | Description | Size | |---|---|---| | `distillation/ws2_student_best.pt` | WS2 Learned World Model distilled student (best val_loss: 0.000375) | 21.6 MB | | `distillation/ws3_student_best.pt` | WS3 World Authoring distilled student (best val_loss: 0.000073) | 38.4 MB | | `distillation/ws4_student_best.pt` | WS4 World Remapper distilled student (best val_loss: 0.049) | 90.6 MB | | `distillation/ws5_student_best.pt` | WS5 Cinematic Renderer distilled student (best val_loss: 0.216) | 40.6 MB | | `distillation/ws6_student_best.pt` | WS6 Runtime Optimization distilled student (best val_loss: 0.000016) | 1.76 MB | *(Note: Final epoch checkpoints are also available in the repository as `*_student_final.pt`)* ### Additional Models | File | Description | Size | |---|---|---| | `npc_director_v2/best.pt` | NPC Director v2 — behavioral AI for NPC state management (99.57% state accuracy) | 15.0 MB | | `animation_director/best.pt` | Animation Director — procedural animation control | 8.0 MB | | `ws8_structure_generator/best_model.pt` | WS8 — Structure Generator | 32.4 MB | | `onnx/npc_director.onnx` | NPC Director ONNX export | 5.1 MB | ## Training Infrastructure All models were trained on a single **NVIDIA B200** GPU (183 GB VRAM) on RunPod. The multi-workstream orchestrator managed sequential training across all workstreams with automatic stage transitions, early stopping, and plateau detection. ## Usage ```python import torch # Load a checkpoint checkpoint = torch.load("v2_ultra/v2_ultra_global_best.pt", map_location="cpu") model_state = checkpoint["model_state_dict"] # For ONNX inference import onnxruntime as ort session = ort.InferenceSession("onnx/uruk_v2_ultra_best.onnx") ``` ## License Apache 2.0