Rune Goblin: An AI Dungeon Crawler Where Your Doodles Become Spells

Community Article
Published June 14, 2026

b9064ae7-1e61-4902-a6f2-2c25e7931fb8

Rune Goblin is an AI dungeon crawler where you draw your own spells, and every cursed doodle can rewrite the story.

It started with a simple question: what if spell casting was not a fixed button on a UI, but a weird little language the player had to learn, bend, and occasionally fail at?

In Rune Goblin, players draw rune glyphs on a canvas. The game reads the drawing, interprets the spell intent, validates it against the RuneLang rules, and then applies the outcome to battles, quests, NPCs, locked doors, bosses, and hidden dungeon events.

Rune_Goblin_ARCH

What It Is

Rune Goblin is built on top of Gradio, with a lot of development, iteration, documentation, dataset work, and fine-tuning workflow support done using Codex. Codex was not just used for one-off code generation; it helped shape the loop around the game: generating and checking datasets, wiring model outputs into game-state actions, writing evaluation tests, debugging Modal deployment paths, and keeping the README/docs/deployment scripts in sync as the project changed.

The game has two main AI paths:

  • The dialogue and story engine uses OpenBMB's MiniCPM-V-4.6 model to help drive NPC flavor, dungeon narration, and story beats.
  • The rune glyph engine uses the same OpenBMB vision model family, but fine-tuned on a custom RuneLang spell system so it can read drawn runes and return structured spell JSON.

The player-facing idea is simple: draw your own runes to create spells. Better drawings unlock better effects. Messy drawings still work sometimes, but they may create ambiguity, side effects, weaker spells, or cursed outcomes.

Gameplay

Rune Goblin currently has 9+ maps, 14+ spells, rune customizations, hidden quests, boss encounters, and hero evolution. The campaign has roughly four hours of gameplay if you poke around, talk to NPCs, chase side quests, and try to solve the boss mechanics instead of brute-forcing everything.

Some bosses require specific drawn runes. Others need combinations of runes, status effects, or the right dungeon discoveries before they can be beaten cleanly. There are hidden quests sprinkled across the world, plus locked doors, shrines, portals, chests, NPC hints, and class-specific advantages.

The LLM does not just decorate the UI. It reads the rune spell and helps decide what the spell is trying to do. Then the game validates that output through a schema, resolves it through RuneLang rules, and applies deterministic game-state changes. That means the model can be creative, while the game engine still keeps combat, HP, inventory, quests, and boss rules stable.

The Agentic Game Loop

The most interesting part is the agentic loop between the player, the model, and the deterministic world engine.

The model gets a compact snapshot of the current situation: player HP, enemy type, room context, selected or drawn runes, known weaknesses, inventory hints, and recent story state. It then proposes a structured spell interpretation: detected runes, confidence, ambiguity, spell name, target, effect, status changes, presentation text, and visual tags.

That proposal is not applied directly. Rune Goblin treats model output like an agent suggestion:

  1. Parse or repair the JSON.
  2. Clamp it to the spell schema.
  3. Resolve it against RuneLang grammar and combo rules.
  4. Check enemy weakness/resistance mappings.
  5. Convert the result into a small allowlisted set of world actions.
  6. Apply those actions through the game engine.

Those world actions are things like damage, healing, status effects, unlocking a door, looting a chest, charging a boss pylon, adding a discovery to the journal, setting a story flag, or triggering a room transition.

The dialogue side follows the same philosophy. NPC lines can be generated by the base MiniCPM-V-4.6 model, but durable quest state is still owned by the deterministic engine. Dialogue is allowed to add flavor, explain a clue, or make a shop interaction feel alive; it is not allowed to silently rewrite the save file. For shop interactions, the model can haggle inside deterministic price bands, but the game validates the actual item, price, and inventory changes.

This gives the game an agentic feel without turning the save file into a free-for-all. The LLM reads intent and adds texture. The engine owns truth.

The RuneLang Pipeline

The architecture has two halves.

The runtime loop is what happens during a player spell turn:

  1. The player selects runes or draws a freehand spell on the canvas.
  2. The rune serializer and prompt builder package the game state, enemy context, player state, and rune inputs.
  3. The fine-tuned rune interpreter model reads the rune spell and returns spell intent JSON.
  4. A validator repairs or rejects malformed JSON.
  5. The RuneLang rule resolver maps the spell through grammar, combos, weaknesses, resistances, and ambiguity fallbacks.
  6. The game state engine applies the result.
  7. The renderer turns the result into spell text, particles, VFX, sound tags, dungeon logs, and animation cues.

The offline loop is how the model was made:

  1. Define the RuneLang rulebook: vocabulary, grammar, combinations, risks, and chaos rules.
  2. Generate synthetic game states and rune combinations.
  3. Render glyph images with stroke variation, noise, rotation, scale, and color diversity.
  4. Build train/validation JSONL records containing images, prompts, target rune metadata, spell presentations, balanced coverage, and ambiguous examples.
  5. Publish the dataset to Hugging Face.
  6. Fine-tune the OpenBMB vision model with LoRA / QLoRA.
  7. Publish the model artifacts to Hugging Face.
  8. Deploy the inference runtime on Modal or run it locally.

The public visual dataset is here:

https://huggingface.co/datasets/ASHu2/rune_goblin_visual_dataset

The public fine-tuned LoRA/model artifacts are here:

https://huggingface.co/ASHu2/goblinV1

The training code is public in this repository, including the dataset generation, fine-tuning, evaluation, inference, and game integration code.

How Codex Helped With Fine-Tuning

Codex helped most in the messy middle between "I have an idea" and "this is a repeatable ML/game pipeline."

It helped design the RuneLang data format, generate synthetic spell examples, build the glyph image renderer, add noisy and ambiguous visual variations, and keep the target JSON shape consistent with the runtime game schema. It also helped write the evaluation code that checks JSON validity, rune detection accuracy, ambiguity handling, and basic gameplay sanity.

On the fine-tuning side, Codex helped turn the experiment into a reproducible workflow: Modal notebook steps, LoRA/QLoRA configuration, export notes, local inference paths, GGUF conversion notes, and docs explaining why GGUF is for serving while safetensors are used for training.

The useful pattern was: use Codex as a coding agent for all the glue around the model. The model training is only one part. The surrounding pieces matter just as much: dataset generation, schema validation, test coverage, deployment scripts, fallback behavior, and docs that explain how to run the whole thing again.

Fine-Tuning The Model

The rune model is a fine-tuned OpenBMB MiniCPM-V-4.6 vision model. The goal was not to make a general OCR model. It only needed to understand Rune Goblin's specific spell language: rune shapes, rune order, ambiguity, spell intent, and how a doodle maps back into a valid game action.

The fine-tuning pipeline uses a custom RuneLang visual dataset. Each sample connects a drawn spell image to structured metadata like detected runes, confidence, ambiguity, visual hints, and the final spell JSON. The model learns to read rough player sketches, but the game never trusts the model blindly. The output is parsed, repaired, clamped, and resolved by the deterministic game engine.

This split is important. The AI reads intent. The game engine owns the rules.

Deployment

The game itself runs as a Gradio/FastAPI app and can be hosted on a CPU-only Hugging Face Space. The expensive part is the vision model, so that runs separately.

The fine-tuned rune model is deployed on Modal through deploy/modal_vision_gguf.py, using a quantized GGUF build:

  • Model repo: ASHu2/goblinV1
  • GGUF build: ASHu2/goblinV1/gguf
  • Model file: gguf/rune-goblin-v46-Q8_0.gguf
  • Multimodal projector: gguf/rune-goblin-v46-mmproj-f16.gguf
  • GPU: Modal A10G
  • Runtime: CUDA-built llama-cpp-python OpenAI server
  • Endpoint shape: OpenAI-compatible /v1/chat/completions
  • Model cache: Modal Volume goblin-gguf-cache
  • Hosting: Modal GPU endpoint with min_containers=0 and 5-minute scale-to-zero
  • Concurrency: up to 10 in-flight requests per replica
  • Cold-start optimization: Modal CPU memory snapshots plus GPU VRAM snapshots

The GGUF route was chosen because it made the serving path practical for a game. Loading the full safetensors model with torch took around five minutes in early experiments. The GGUF pipeline with Modal GPU snapshotting brought cold starts down to roughly ten seconds, which is much more realistic for an interactive spell-casting loop.

One detail that mattered: the deploy does not use the standalone llama-server binary. The MiniCPM-V-4.6 GGUF + mmproj path worked reliably through llama-cpp-python's Llama + MTMDChatHandler, which is also the path used by the local game backend. The Modal image builds llama-cpp-python from source with CUDA enabled for the A10G, starts the OpenAI-compatible server, sends a small warmup multimodal request, and snapshots the loaded model state so later cold starts restore with the model already in VRAM.

Modal also lets the model scale to zero when nobody is playing, so the project does not need an always-on private GPU server. The CPU Hugging Face Space calls the Modal endpoint when a player draws a spell, receives structured spell JSON, and continues the game locally.

Why This Was Fun To Build

The most fun part of Rune Goblin is that the player is not choosing spells from a normal menu. They are learning a strange little magical notation. A clean spiral plus the right companion glyph can create one spell. A shaky version of the same thing might produce a weaker spell, a curse, a strange side effect, or a better story moment.

That makes the model feel less like an answer machine and more like a game system. The player draws, the model interprets, the rules push back, and the dungeon reacts.

I hope some of you try the game. There is a lot tucked into it: bosses that need drawn runes or rune combinations, hidden quests, different heroes, evolving spells, and plenty of cursed outcomes if your drawings get too confident.

Social media post yet to come. Stay tuned.

Hope you enjoy the game.

Thanks!

Community

Sign up or log in to comment