---
license: apache-2.0
language:
- en
tags:
- moe
- mixture-of-experts
- reasoning
- chain-of-thought
- cot
- system-2-thinking
- nlp
- text-generation
- conversational
- instruct
- sft
- dpo
- grpo
- rlhf
- math
- logic
- scientific-reasoning
- efficient
- low-resource
- data-efficient
- from-scratch
- pretrained
- 0.6b
- nano-model
- small-model
- european-ai
- austria
- independent-research
- arxiv
- python
- coding
- step-by-step
- self-correction
- hallucination-reduction
- educational
- research
- benchmark
- thinking-mode
- mental-models
- deductive-reasoning
- analytical
- problem-solving
pipeline_tag: text-generation
library_name: transformers
datasets:
- wikipedia
- c4
- fineweb-edu
- arxiv
- stack-exchange
---
---
## Overview
**Noeum-1-Nano** is a nano-scale Mixture-of-Experts (MoE) model (0.6B total / 0.2B active) trained on only **18 billion tokens**.
It has proven its efficiency and reasoning quality by matching the capabilities of major labs’ nano-class models, despite utilizing a fraction of the data. Built entirely from scratch—with no pretrained weights and no inherited shortcuts—this independent, self-funded effort demonstrates that innovative techniques and intelligent design can rival brute-force scale.
* **Data Efficiency:** Achieves competitive reasoning with **20x to 667x less data** than standard models like Qwen2 or TinyLlama.
* **System 2 Reasoning:** Features a dedicated `` mode for logic, math, and self-correction.
---
## Performance & Benchmarks
The benchmarks below demonstrate Noeum-1-Nano achieving above-average performance despite an extreme disparity in training volume. While standard models typically require 2 Trillion to 12 Trillion tokens, Noeum achieves competitive results with just 18 billion high-signal tokens.
### Quantitative Benchmarks (lm-eval-harness)
### ALL benchmarks conducted with Noeum thinking mode DISABLED to ensure fair comparison
| Task | Metric | Noeum-1-Nano (0.6B) | Note |
|:-----|:-------|:-------------------:|:-----|
| **SciQ** | Accuracy | **77.5%** | *Exceptional scientific knowledge retrieval* |
| **MRPC** | F1 Score | **81.2%** | *Rank #1 vs comparable models on semantic equivalence* |
| **BoolQ** | Accuracy | **62.0%** | *Strong yes/no reasoning on complex text* |
| **PIQA** | Accuracy | **62.9%** | *Physical interaction reasoning* |
| **ARC-Easy**| Accuracy | 47.1% | |
***
### Internal Evaluation & Best Practices
Based on our internal automated benchmarks (100-question comparative deep dive), **Noeum-1-Nano** performs exceptionally well on specific task types when the reasoning engine is properly configured.
* **Scientific Fact Retrieval:** The model demonstrates high retention of constants and definitions (Physics, Biology).
* **Step-by-Step Word Problems:** Unlike standard small models which guess numbers, Noeum successfully sets up equations (e.g., $Distance = Speed \times Time$).
* **Logical Deduction:** It correctly handles transitive logic puzzles (e.g., *If A > B and B > C, who is tallest?*).
**⚠ Critical Configuration:**
These results are conditional on specific generation parameters. Our tests confirm that a **Thinking Budget of 128 tokens** combined with a **Temperature of 0.1** is the "sweet spot." Lower budgets cut off reasoning prematurely, while higher temperatures introduce instability.
---
## Dataset Composition
To achieve competitive performance with only **18 Billion tokens**, we prioritized data density over volume. We curated a "high-signal" mixture designed to maximize reasoning density per token.
The pre-training mixture includes:
* **Academic & Reasoning:** arXiv papers (math/cs subsets), portions of **CC-Math-Finest**, and curated math datasets.
* **Coding:** High-quality **Python** repositories and **StackExchange** discussions.
* **General Knowledge:** **Wikipedia** (specifically filtered for long-context articles >2k tokens), **C4**, and **FineWeb-Edu** (High quality subset).
* **Synthetic Data:** Custom-generated synthetic reasoning traces designed to bootstrap the model's cognitive capabilities, including the ability to engage in deliberative reasoning before responding, explore contradictory perspectives, apply first-principles analysis, generate divergent solutions, and employ lateral thinking strategies."*
### Tiny model but with Thinking option and impact of extra Reasoning (A/B Test)
Noeum-1-Nano features a specific **Thinking Mode**. When enabled (temp=0.1), the model engages a hidden chain-of-thought process that grounds facts and solves multi-step problems.
#### 1. Hallucination Correction
*Standard generation guesses; reasoning verifies.*
**User:** "What is the capital of Spain?"
| Mode | Output | Verdict |
|:---|:---|:-------------------|
| **Standard** | "La Muerte is the capital of Spain" | **Hallucination** |
| **Reasoning** | `` The capital of Spain is Madrid. It is known for its rich history... ``
**"Madrid is the capital of Spain."** | ✅ **Correct** |
#### 2. Mathematical Logic
*Standard generation struggles with arithmetic; reasoning sets up equations.*
**User:** "If a train travels 60 km in 1 hour, how far in 3 hours?"
| Mode | Output | Verdict |
|:---|:---|:--------------------|
| **Standard** | "Therefore, the distance traveled by the train is 60 kilometers." | **Repeated Input** |
| **Reasoning** | `` Distance = Speed × Time.
60 km × 3 hours = 180 km ``
**"So, the train travels 180 kilometers in 3 hours."** | ✅ **Correct** |
---
## Architecture & Configuration
| Component | Specification |
|-----------|---------------|
| **Type** | Mixture-of-Experts (MoE) |
| **Total Params** | 0.6B |
| **Active Params** | ~0.2B |
| **Experts** | 8 Routed, 1 Shared (Top-2 Active) |
| **Layers** | 24 |
| **Attention** | 12 Heads (GQA), 768 Hidden Dim |
| **Context** | 2048 Tokens (RoPE + YaRN) |
---
## 🛠️ Training Stack
This model was not fine-tuned from an existing checkpoint. It was built from the ground up to test the efficiency of our custom stack.
1. **Pre-training:** Two-phase training (512 ctx $\to$ 2048 ctx) on high-signal data.
2. **Post-Training:**
* **SFT:** Supervised Fine-Tuning for instruction following.
* **GRPO:** Group Relative Policy Optimization for reasoning capabilities.
* **DPO:** Direct Preference Optimization for alignment.
3. **Hardware:** Trained efficiently on **8x NVIDIA RTX 5090s**.
---
## Quickstart
### Chat Format
The model supports two distinct modes via system prompts or flags:
* **`/think`**: Activates System 2 reasoning (Recommended for logic/math).
* **`/no think`**: Standard fast text generation.
## 🛠️ Advanced Usage: Full Benchmarking & Chat Script
To fully explore **Noeum-1-Nano**, we provide the complete all-in-one inference script used to generate the benchmarks above. This script is not just a chat interface; it is a comprehensive evaluation tool.
**Capabilities of this script:**
1. **Interactive Reasoning Chat:** Talk to the model with streaming output. Toggle "Thinking Mode" on/off dynamically using commands like `/think on` or `/think off`.
2. **Deep Dive Analysis:** Select Option 2 to run a single prompt through multiple configurations (Temperature 0.1 vs 0.7, Budget 32 vs 256) simultaneously to see how the model's logic changes.
3. **Automated Benchmarking:** Select Option 3 to run the full internal test suite (Math, Logic, History, Science) and generate A/B comparison logs.
**How to use:**
Save the code below as `run_noeum.py` and execute it. It handles token streaming, template application, and logging automatically.
```python
import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModelForCausalLM
import sys
from datetime import datetime
class TeeLogger:
def __init__(self, filename):
self.terminal = sys.stdout
self.log = open(filename, 'w', encoding='utf-8')
def write(self, message):
self.terminal.write(message)
self.log.write(message)
self.log.flush()
def flush(self):
self.terminal.flush()
self.log.flush()
def close(self):
self.log.close()
# Start logging
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
log_filename = f"benchmark_results_{timestamp}.txt"
logger = TeeLogger(log_filename)
sys.stdout = logger
print(f"Logging started: {log_filename}")
print(f"Timestamp: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
# ============================================================================
# MODEL SETUP
# ============================================================================
MODEL_PATH = "./Noeum-0.6B-hf-nano"
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
print(f"\nLoading model from {MODEL_PATH} on {DEVICE}...")
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True, use_fast=False)
model = AutoModelForCausalLM.from_pretrained(MODEL_PATH, trust_remote_code=True).to(DEVICE)
model.eval()
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
EOS_ID = tokenizer.eos_token_id
def streaming_generate(model, prompt_ids, max_new_tokens=512, temperature=0.7, top_p=0.9, device="cuda"):
input_ids = prompt_ids.to(device)
for _ in range(max_new_tokens):
with torch.inference_mode():
outputs = model(input_ids)
logits = outputs.logits[:, -1, :]
# Greedy decoding
if temperature <= 0:
next_token = torch.argmax(logits, dim=-1, keepdim=True)
else:
logits = logits / temperature
# Top-p (nucleus sampling)
if top_p < 1.0:
sorted_logits, sorted_indices = torch.sort(logits, descending=True)
sorted_probs = F.softmax(sorted_logits, dim=-1)
cumulative_probs = torch.cumsum(sorted_probs, dim=-1)
# Remove tokens with cumulative probability above threshold
sorted_indices_to_remove = cumulative_probs > top_p
sorted_indices_to_remove[..., 1:] = sorted_indices_to_remove[..., :-1].clone()
sorted_indices_to_remove[..., 0] = 0
# Set removed tokens to -inf
sorted_logits[sorted_indices_to_remove] = -float("inf")
# Scatter back to original positions
logits = torch.zeros_like(logits).scatter(1, sorted_indices, sorted_logits)
probs = F.softmax(logits, dim=-1)
next_token = torch.multinomial(probs, num_samples=1)
# Decode token
tok_id = int(next_token.item())
token_text = tokenizer.decode([tok_id], skip_special_tokens=False)
yield token_text
# Stop if EOS token
if tok_id == EOS_ID:
break
# Append to input for next iteration
input_ids = torch.cat([input_ids, next_token], dim=-1)
def chat(
question: str,
# chat_history argument removed/ignored
thinking: bool = True,
think_budget: int = 128,
temperature: float = 0.1,
top_p: float = 0.9,
system_prompt: str = "You are a helpful assistant.",
verbose: bool = True
):
"""
Main chat function with streaming support.
MEMORY DISABLED: Each call is a fresh context.
"""
# Build conversation - STRICTLY SYSTEM + CURRENT QUESTION
messages = [{'role': 'system', 'content': system_prompt}]
# Add current question with /think flag if enabled
user_message = f"{question} /think" if thinking else question
messages.append({'role': 'user', 'content': user_message})
# Apply chat template
prompt_text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
prompt_ids = tokenizer.encode(prompt_text, return_tensors='pt')
# Generate
thinking_content = ""
answer_content = ""
current_mode = None
generator = streaming_generate(
model=model,
prompt_ids=prompt_ids,
max_new_tokens=think_budget + 256 if thinking else 512,
temperature=temperature,
top_p=top_p,
device=DEVICE
)
if verbose:
print("\n" + "=" * 80)
for chunk in generator:
if chunk == '':
break
# Track mode switches
if chunk == '':
current_mode = 'thinking'
if verbose:
print("\n💭 THINKING:")
continue
elif chunk == '':
current_mode = None
continue
elif chunk == '':
current_mode = 'answer'
if verbose:
print("\n✅ ANSWER:")
continue
elif chunk == '':
current_mode = None
continue
# Accumulate content
if current_mode == 'thinking':
thinking_content += chunk
if verbose:
print(chunk, end='', flush=True)
elif current_mode == 'answer':
answer_content += chunk
if verbose:
print(chunk, end='', flush=True)
if verbose:
print("\n" + "=" * 80)
return {
'thinking': thinking_content.strip(),
'answer': answer_content.strip(),
'full_thinking': thinking_content.strip(),
'full_answer': answer_content.strip()
}
# ============================================================================
# BENCHMARK FUNCTIONS
# ============================================================================
def benchmark_single_question(question: str, temperatures=[0.1, 0.3, 0.7], budgets=[32, 128, 256]):
"""
Run a single question through all configurations - STREAMING
"""
print("\n" + "=" * 100)
print(f"QUESTION: {question}")
print("=" * 100)
# NO THINK
print("\n🚫 NO THINK MODE (temperature=0.7)")
print("-" * 100)
result = chat(question, thinking=False, temperature=0.7, verbose=True)
# THINK MODE - Different temperatures
for temp in temperatures:
print(f"\n💭 THINK MODE - Temperature: {temp}, Budget: 128")
print("-" * 100)
result = chat(question, thinking=True, think_budget=128, temperature=temp, verbose=True)
# THINK MODE - Different budgets (at temp=0.7)
for budget in budgets:
print(f"\n💭 THINK MODE - Temperature: 0.7, Budget: {budget}")
print("-" * 100)
result = chat(question, thinking=True, think_budget=budget, temperature=0.7, verbose=True)
def benchmark_all_questions(questions: list, config: dict):
results = []
print("\n" + "=" * 100)
print(f"BENCHMARK: {config}")
print("=" * 100)
for i, q in enumerate(questions, 1):
print(f"\n[{i}/{len(questions)}] Q: {q}")
result = chat(
q,
thinking=config.get('thinking', True),
think_budget=config.get('think_budget', 128),
temperature=config.get('temperature', 0.1),
verbose=True
)
results.append({
'question': q,
'thinking': result['full_thinking'],
'answer': result['full_answer']
})
return results
def compare_configurations(questions: list):
configurations = [
{'name': 'NO THINK', 'thinking': False, 'temperature': 0.7},
{'name': 'THINK - Temp 0.1', 'thinking': True, 'think_budget': 128, 'temperature': 0.1},
{'name': 'THINK - Temp 0.3', 'thinking': True, 'think_budget': 128, 'temperature': 0.3},
{'name': 'THINK - Temp 0.7', 'thinking': True, 'think_budget': 128, 'temperature': 0.7},
{'name': 'THINK - Budget 32', 'thinking': True, 'think_budget': 32, 'temperature': 0.7},
{'name': 'THINK - Budget 256', 'thinking': True, 'think_budget': 256, 'temperature': 0.7},
]
all_results = {}
for config in configurations:
name = config.pop('name')
print(f"\n{'#' * 100}")
print(f"RUNNING CONFIGURATION: {name}")
print(f"{'#' * 100}")
all_results[name] = benchmark_all_questions(questions, config)
# Print FULL comparison table
print("\n" + "=" * 100)
print("COMPARISON SUMMARY - FULL OUTPUTS")
print("=" * 100)
for i, q in enumerate(questions):
print(f"\n{'=' * 100}")
print(f"Q{i + 1}: {q}")
print(f"{'=' * 100}")
for config_name, results in all_results.items():
print(f"\n{config_name}:")
print(f" 💭 THINKING: {results[i]['thinking']}")
print(f" ✅ ANSWER: {results[i]['answer']}")
print("-" * 100)
# ============================================================================
# INTERACTIVE CHAT LOOP
# ============================================================================
def interactive_chat():
"""
Interactive chat session in terminal - STATELESS (No Memory)
"""
print("\n" + "=" * 80)
print("INTERACTIVE CHAT SESSION (NO MEMORY)")
print("=" * 80)
print("Commands:")
print(" /quit - Exit chat")
print(" /think on/off - Toggle thinking mode")
print(" /budget - Set thinking budget (tokens)")
print(" /temp - Set temperature (0-1)")
print("=" * 80 + "\n")
# chat_history removed
thinking_enabled = True
think_budget = 128
temperature = 0.1
while True:
try:
user_input = input("\n👤 You: ").strip()
if not user_input:
continue
# Handle commands
if user_input == '/quit':
print("Goodbye!")
break
elif user_input.startswith('/think'):
parts = user_input.split()
if len(parts) > 1:
thinking_enabled = parts[1].lower() == 'on'
print(f"Thinking mode: {'ON' if thinking_enabled else 'OFF'}")
continue
elif user_input.startswith('/budget'):
parts = user_input.split()
if len(parts) > 1:
think_budget = int(parts[1])
print(f"Thinking budget set to: {think_budget} tokens")
continue
elif user_input.startswith('/temp'):
parts = user_input.split()
if len(parts) > 1:
temperature = float(parts[1])
print(f"Temperature set to: {temperature}")
continue
# /clear command removed as there is no memory
# Get response - NO HISTORY PASSED
result = chat(
question=user_input,
thinking=thinking_enabled,
think_budget=think_budget,
temperature=temperature,
verbose=True
)
# Chat history update logic removed
except KeyboardInterrupt:
print("\n\nGoodbye!")
break
except Exception as e:
print(f"\nError: {e}")
# ============================================================================
# MAIN - RUN BENCHMARKS
# ============================================================================
if __name__ == '__main__':
try:
# Test questions
test_questions = {
# CATEGORY 1: Simple Math (Addition/Subtraction)
"Simple Math": [
"What is 15 + 27?",
"What is 100 - 37?",
"What is 45 + 55?",
"What is 82 - 19?",
"What is 7 + 8?",
"What is 50 - 25?",
"What is 123 + 456?",
"What is 200 - 88?",
"What is 9 + 6?",
"What is 75 - 30?"
],
# CATEGORY 2: Multiplication
"Multiplication": [
"What is 8 × 7?",
"What is 12 × 12?",
"What is 9 × 6?",
"What is 15 × 4?",
"What is 7 × 9?",
"What is 11 × 11?",
"What is 6 × 8?",
"What is 13 × 5?",
"What is 25 × 4?",
"What is 20 × 3?"
],
# CATEGORY 3: Division & Fractions
"Division & Fractions": [
"What is 56 ÷ 8?",
"Which is larger: 1/2 or 1/3?",
"What is 100 ÷ 4?",
"Which is larger: 2/3 or 3/4?",
"What is 81 ÷ 9?",
"Which is larger: 1/4 or 1/5?",
"What is 144 ÷ 12?",
"What is 1/2 + 1/4?",
"What is 50 ÷ 2?",
"Which is larger: 3/5 or 2/5?"
],
# CATEGORY 4: Word Problems
"Word Problems": [
"If a train travels 60 km in 1 hour, how far in 3 hours?",
"If John has 5 apples and gives 2 to Mary, how many does he have left?",
"If a book costs $12 and you buy 3 books, how much do you spend?",
"If there are 24 students and 6 tables, how many students per table?",
"If a car uses 8 liters per 100km, how much for 300km?",
"If you earn $15 per hour and work 8 hours, how much do you earn?",
"If a pizza has 8 slices and 4 people share equally, how many slices each?",
"If a pen costs $2 and you have $20, how many pens can you buy?",
"If a movie is 2 hours long and starts at 3pm, when does it end?",
"If you save $10 per week, how much in 10 weeks?"
],
# CATEGORY 5: Prime Numbers & Math Concepts
"Math Concepts": [
"Is 17 a prime number?",
"Is 20 a prime number?",
"Is 13 a prime number?",
"What is the square root of 64?",
"What is 5 squared (5²)?",
"Is 1 a prime number?",
"What is 10% of 100?",
"Is 2 the only even prime number?",
"What is the square root of 49?",
"What is 3 cubed (3³)?"
],
# CATEGORY 6: History
"History": [
"Who wrote Romeo and Juliet?",
"Who was the first president of the United States?",
"In what year did World War 2 end?",
"Who discovered America in 1492?",
"Who painted the Mona Lisa?",
"What year did the Titanic sink?",
"Who was the first man on the moon?",
"In what year did World War 1 start?",
"Who wrote the Declaration of Independence?",
"Who was Julius Caesar?"
],
# CATEGORY 7: Geography
"Geography": [
"What is the capital of France?",
"Which is bigger: Russia or Canada?",
"What is the capital of Italy?",
"Is the Nile or Amazon river longer?",
"Which ocean is largest: Atlantic or Pacific?",
"What is the capital of Japan?",
"Is Australia a continent?",
"What is the tallest mountain in the world?",
"How many continents are there?",
"What is the capital of Spain?"
],
# CATEGORY 8: Science & Nature
"Science & Nature": [
"Does the Sun orbit the Earth?",
"What gas do plants produce during photosynthesis?",
"Is iron magnetic?",
"What is H2O?",
"Does ice float in water?",
"How many legs does a spider have?",
"How many legs does an ant have?",
"What is the speed of light approximately?",
"Is the Earth flat or round?",
"What planet is closest to the Sun?"
],
# CATEGORY 9: Logical Reasoning
"Logical Reasoning": [
"If all cats are animals, and Fluffy is a cat, is Fluffy an animal?",
"If today is Monday, what day is tomorrow?",
"If you have 3 red balls and 2 blue balls, how many balls total?",
"Complete the pattern: 2, 4, 6, 8, ?",
"Which is the odd one out: apple, banana, car, orange?",
"If A is taller than B, and B is taller than C, who is tallest?",
"True or False: All birds can fly?",
"If it's raining, the ground is wet. The ground is wet. Is it raining?",
"Complete the pattern: Monday, Tuesday, Wednesday, ?",
"If 5 > 3 and 3 > 1, is 5 > 1?"
],
# CATEGORY 10: General Knowledge
"General Knowledge": [
"How many days are in a week?",
"Which is heavier: gold or aluminum?",
"Is the Eiffel Tower in London?",
"Is Bitcoin a cryptocurrency?",
"How many hours are in a day?",
"What color is the sky on a clear day?",
"How many months are in a year?",
"What is the freezing point of water in Celsius?",
"How many wheels does a bicycle have?",
"What is the boiling point of water in Celsius?"
]
}
# Flatten all questions for quick testing
all_questions = []
for category, questions in test_questions.items():
all_questions.extend(questions)
print(f"\nTotal questions: {len(all_questions)}")
print(f"Categories: {len(test_questions)}")
print(f"Questions per category: {len(test_questions['Simple Math'])}")
# Choose what to run
print("\n" + "=" * 80)
print("SELECT BENCHMARK MODE")
print("=" * 80)
print("1. Simple test (original 3 questions with default settings)")
print("2. Deep dive single question (all configs)")
print("3. Compare all configurations (by category) - FULL OUTPUTS")
print("4. Interactive chat (NO MEMORY)")
print("=" * 80)
choice = input("\nEnter choice (1-4): ").strip()
if choice == '1':
# Simple test
print("\n" + "=" * 80)
print("SIMPLE TEST")
print("=" * 80)
for q in list(test_questions.values())[0][:3]:
print(f"\nQ: {q}")
result = chat(q, thinking=True, think_budget=128, temperature=0.7, verbose=True)
elif choice == '2':
# Deep dive on single question
question = input("\nEnter question (or press Enter for '15 + 27'): ").strip()
if not question:
question = "What is 15 + 27?"
benchmark_single_question(question)
elif choice == '3':
# Test by category
print("\nSelect category to test:")
categories = list(test_questions.keys())
for i, cat in enumerate(categories, 1):
print(f"{i}. {cat}")
print(f"{len(categories) + 1}. ALL CATEGORIES (100 questions)")
cat_choice = input("\nEnter choice: ").strip()
if cat_choice == str(len(categories) + 1):
# Test all
compare_configurations(all_questions)
else:
# Test specific category
cat_idx = int(cat_choice) - 1
cat_name = categories[cat_idx]
print(f"\nTesting category: {cat_name}")
compare_configurations(test_questions[cat_name])
elif choice == '4':
# Interactive chat
interactive_chat()
else:
print("Invalid choice. Starting interactive chat...")
interactive_chat()
finally:
print(f"\n\nLogging completed. Results saved to: {log_filename}")
logger.close()
sys.stdout = logger.terminal
```
---
## Limitations & Bias
While Noeum-1-Nano demonstrates impressive reasoning for its size, users should be aware of the following:
* **Hallucinations:** Like all small models, it can generate plausible but incorrect information, especially when the `` mode is disabled.
* **Arithmetic:** While it can derive formulas correctly, it may struggle with calculating large numbers precisely.
* **Scope:** The model is optimized for English and general reasoning. It is not intended for medical, legal, or safety-critical advice.
---
## About Noeum
**Noeum** is an independent AI research & engineering lab based in Austria, building the next generation of intelligent systems. We are one of the few labs in Europe executing the full AI pipeline—from pre-training and alignment—entirely in-house.
***
### The Vision & Future Roadmap
This project, spearheaded by **[Bledar Ramo](https://www.linkedin.com/in/ramobledar)**, is not just a nano-model—it is a validation of a high-efficiency scaling hypothesis. We have proven that rapid iteration on small-scale "proxy" models is a reliable predictor of large-scale performance, allowing us to innovate faster than labs burdened by massive training runs.
**Our Core Philosophy:** *Iterate fast at nano-scale; scale only what works.*
With the right compute infrastructure and backing, we plan to scale these validated recipes to a **1 Trillion+ token** frontier model. Our roadmap includes integrating cutting-edge techniques inspired by our internal research and recent literature:
* **Recursive Reasoning Architectures:** Moving beyond static Chain-of-Thought to **Recursive Language Models (RLMs)** that treat prompts as dynamic environments, solving problems far exceeding standard context windows
* **Agentic Data Synthesis:** Implementing large-scale, self-correcting synthetic data pipelines that simulate real-world tool use and multi-step reasoning
* **Stability at Scale:** Utilizing advanced optimization techniques like **MuonClip** (QK-Norm/Clip) to ensure stability during massive training runs without loss spikes.
* **Hyper-Efficient Architectures:** Further refining our MoE routing and **Multi-head Latent Attention (MLA)** to maximize active parameter efficiency.
**Noeum** (derived from *"mind," "meaning,"* and *"thought"*) is building the next generation of genuine reasoning systems—not by brute-force, but by architectural intelligence.
***
🌐 **Website:** [noeum.ai](https://noeum.ai)
📧 **Contact:** contact@noeum.ai