Building “1000 Rooms” for the Hugging Face Build Small Hackathon

Community Article
Published June 13, 2026

I built 1000 Rooms for the Hugging Face Build Small Hackathon, an AI-driven interactive storytelling system where every “room” is a dynamically generated micro-world of narrative, visuals, and voice.

The idea is simple: each room is generated by an AI model, each choice leads to a new room, and the experience evolves into an endless branching adventure.

In practice, it became a multi-model orchestration challenge, a dependency debugging exercise, and a lesson in how fragile small AI systems can be when everything has to run locally.

What is 1000 Rooms?

1000 Rooms is an interactive AI adventure system where:

Each room is generated dynamically by a language model Player choices create branching paths Each room can include: narrative (LLM) images (diffusion model) voice narration (TTS) Everything is designed to run under strict compute constraints

I built two versions:

Text-only version

A lightweight pipeline focused entirely on narrative generation. Fast, minimal, and designed for low-resource environments.

Multimodal version

Adds:

image generation per room text-to-speech narration richer scene composition

The Constraints That Shaped Everything

This hackathon was designed around strict constraints that defined every engineering decision.

  1. Small Models Only (≤ 32B parameters)

Everything had to fit within realistic local compute limits. No brute-force scaling, no oversized infrastructure — just efficient model use.

  1. Built on Gradio

The entire application had to be a Gradio Space.

This meant the UI wasn’t just a wrapper — it was part of the product experience itself.

  1. Show, Don’t Tell

A working demo video and social post were required as part of the submission.

If it doesn’t run, it doesn’t exist.

My Model Stack

To stay within constraints while keeping quality usable, I used a multi-model pipeline:

LoRA fine-tune on Nemotron Nano 3 (4B) → room generation + interaction logic Flux 2 Klein 4B → image generation for rooms and items VoxCPM2 → text-to-speech narration system

Each model handled a specific part of the pipeline instead of trying to do everything at once.

The Transformers + Mamba Issue

One of the most frustrating problems came from dependency and architecture mismatches in the inference stack.

At some point, parts of the pipeline required Transformers alongside components that depended on mamba-ssm.

What went wrong: Version mismatches caused silent or partial failures Models would sometimes “load successfully” but behave incorrectly at runtime

How I fixed it: Switched to Llama-cpp and used a quantized version of the model. This way I improved token generation speed and also ticked the Llama Champion badge

The JSON That Wouldn’t Obey

Another major challenge was structured output generation.

The system required the text model to output strict JSON for:

room state choices transitions metadata for generating the environment

However, the model often:

drifted into explanation mode mid-output generated reasoning instead of raw JSON partially followed format instructions, then broke structure

What I tried: strict system prompts enforcing JSON-only mode custom chat templates post-generation validation and retry loops fallback parsers for partial outputs

Even then, consistency was not guaranteed — especially under longer generations.

Why This Was Harder Than Expected

The hardest part wasn’t any single model.

It was orchestration under constraints:

coordinating 3 separate AI models managing memory and load/unload cycles preventing structured output drift handling silent dependency issues keeping everything stable inside a Gradio deployment

This felt less like “building an AI app” and more like building a distributed system where every component is probabilistic.

Key Takeaways

  1. Small models increase engineering complexity

You trade scale for control — and gain a lot of system-level complexity.

  1. Structured output is still unreliable

Even strongly prompted models are not deterministic format engines.

  1. Dependencies shape behavior more than expected

The runtime stack affects results as much as the model itself.

  1. Orchestration matters more than model choice

Most of the work is making systems cooperate, not picking models.

Final Result

Despite the constraints and issues, the final system:

generates interactive rooms generates multiple images with the room, the door and searchable locations produces voice narration runs fully within hackathon constraints

Most importantly, it creates an experience that feels alive and reactive.

Closing Thoughts

The Hugging Face Build Small Hackathon reinforced a simple idea:

Constraints don’t limit creativity — they reveal where real engineering begins.

(Try the space yourself)(https://huggingface.co/spaces/build-small-hackathon/1000-Rooms) or (see the demo)(https://youtu.be/G5OHbmP2rCE)

Community

Sign up or log in to comment