--- license: apache-2.0 language: - en tags: - moe - mixture-of-experts - reasoning - chain-of-thought - cot - system-2-thinking - nlp - text-generation - conversational - instruct - sft - dpo - grpo - rlhf - math - logic - scientific-reasoning - efficient - low-resource - data-efficient - from-scratch - pretrained - 0.6b - nano-model - small-model - european-ai - austria - independent-research - arxiv - python - coding - step-by-step - self-correction - hallucination-reduction - educational - research - benchmark - thinking-mode - mental-models - deductive-reasoning - analytical - problem-solving pipeline_tag: text-generation library_name: transformers datasets: - wikipedia - c4 - fineweb-edu - arxiv - stack-exchange ---
Noeum Logo

Noeum-1-Nano

A 0.6B MoE model trained entirely from scratch.

WebsiteBenchmarksQuickstartTrainingAbout Noeum

--- ## Overview **Noeum-1-Nano** is a nano-scale Mixture-of-Experts (MoE) model (0.6B total / 0.2B active) trained on only **18 billion tokens**. It has proven its efficiency and reasoning quality by matching the capabilities of major labs’ nano-class models, despite utilizing a fraction of the data. Built entirely from scratch—with no pretrained weights and no inherited shortcuts—this independent, self-funded effort demonstrates that innovative techniques and intelligent design can rival brute-force scale. * **Data Efficiency:** Achieves competitive reasoning with **20x to 667x less data** than standard models like Qwen2 or TinyLlama. * **System 2 Reasoning:** Features a dedicated `` mode for logic, math, and self-correction. --- ## Performance & Benchmarks The benchmarks below demonstrate Noeum-1-Nano achieving above-average performance despite an extreme disparity in training volume. While standard models typically require 2 Trillion to 12 Trillion tokens, Noeum achieves competitive results with just 18 billion high-signal tokens. ### Quantitative Benchmarks (lm-eval-harness) ### ALL benchmarks conducted with Noeum thinking mode DISABLED to ensure fair comparison | Task | Metric | Noeum-1-Nano (0.6B) | Note | |:-----|:-------|:-------------------:|:-----| | **SciQ** | Accuracy | **77.5%** | *Exceptional scientific knowledge retrieval* | | **MRPC** | F1 Score | **81.2%** | *Rank #1 vs comparable models on semantic equivalence* | | **BoolQ** | Accuracy | **62.0%** | *Strong yes/no reasoning on complex text* | | **PIQA** | Accuracy | **62.9%** | *Physical interaction reasoning* | | **ARC-Easy**| Accuracy | 47.1% | | *** ### Internal Evaluation & Best Practices Based on our internal automated benchmarks (100-question comparative deep dive), **Noeum-1-Nano** performs exceptionally well on specific task types when the reasoning engine is properly configured. * **Scientific Fact Retrieval:** The model demonstrates high retention of constants and definitions (Physics, Biology). * **Step-by-Step Word Problems:** Unlike standard small models which guess numbers, Noeum successfully sets up equations (e.g., $Distance = Speed \times Time$). * **Logical Deduction:** It correctly handles transitive logic puzzles (e.g., *If A > B and B > C, who is tallest?*). **⚠ Critical Configuration:** These results are conditional on specific generation parameters. Our tests confirm that a **Thinking Budget of 128 tokens** combined with a **Temperature of 0.1** is the "sweet spot." Lower budgets cut off reasoning prematurely, while higher temperatures introduce instability. --- ## Dataset Composition To achieve competitive performance with only **18 Billion tokens**, we prioritized data density over volume. We curated a "high-signal" mixture designed to maximize reasoning density per token. The pre-training mixture includes: * **Academic & Reasoning:** arXiv papers (math/cs subsets), portions of **CC-Math-Finest**, and curated math datasets. * **Coding:** High-quality **Python** repositories and **StackExchange** discussions. * **General Knowledge:** **Wikipedia** (specifically filtered for long-context articles >2k tokens), **C4**, and **FineWeb-Edu** (High quality subset). * **Synthetic Data:** Custom-generated synthetic reasoning traces designed to bootstrap the model's cognitive capabilities, including the ability to engage in deliberative reasoning before responding, explore contradictory perspectives, apply first-principles analysis, generate divergent solutions, and employ lateral thinking strategies."* ### Tiny model but with Thinking option and impact of extra Reasoning (A/B Test) Noeum-1-Nano features a specific **Thinking Mode**. When enabled (temp=0.1), the model engages a hidden chain-of-thought process that grounds facts and solves multi-step problems. #### 1. Hallucination Correction *Standard generation guesses; reasoning verifies.* **User:** "What is the capital of Spain?" | Mode | Output | Verdict | |:---|:---|:-------------------| | **Standard** | "La Muerte is the capital of Spain" | **Hallucination** | | **Reasoning** | `` The capital of Spain is Madrid. It is known for its rich history... ``
**"Madrid is the capital of Spain."** | ✅ **Correct** | #### 2. Mathematical Logic *Standard generation struggles with arithmetic; reasoning sets up equations.* **User:** "If a train travels 60 km in 1 hour, how far in 3 hours?" | Mode | Output | Verdict | |:---|:---|:--------------------| | **Standard** | "Therefore, the distance traveled by the train is 60 kilometers." | **Repeated Input** | | **Reasoning** | `` Distance = Speed × Time.
60 km × 3 hours = 180 km `
`
**"So, the train travels 180 kilometers in 3 hours."** | ✅ **Correct** | --- ## Architecture & Configuration | Component | Specification | |-----------|---------------| | **Type** | Mixture-of-Experts (MoE) | | **Total Params** | 0.6B | | **Active Params** | ~0.2B | | **Experts** | 8 Routed, 1 Shared (Top-2 Active) | | **Layers** | 24 | | **Attention** | 12 Heads (GQA), 768 Hidden Dim | | **Context** | 2048 Tokens (RoPE + YaRN) | --- ## 🛠️ Training Stack This model was not fine-tuned from an existing checkpoint. It was built from the ground up to test the efficiency of our custom stack. 1. **Pre-training:** Two-phase training (512 ctx $\to$ 2048 ctx) on high-signal data. 2. **Post-Training:** * **SFT:** Supervised Fine-Tuning for instruction following. * **GRPO:** Group Relative Policy Optimization for reasoning capabilities. * **DPO:** Direct Preference Optimization for alignment. 3. **Hardware:** Trained efficiently on **8x NVIDIA RTX 5090s**. --- ## Quickstart ### Chat Format The model supports two distinct modes via system prompts or flags: * **`/think`**: Activates System 2 reasoning (Recommended for logic/math). * **`/no think`**: Standard fast text generation. ## 🛠️ Advanced Usage: Full Benchmarking & Chat Script To fully explore **Noeum-1-Nano**, we provide the complete all-in-one inference script used to generate the benchmarks above. This script is not just a chat interface; it is a comprehensive evaluation tool. **Capabilities of this script:** 1. **Interactive Reasoning Chat:** Talk to the model with streaming output. Toggle "Thinking Mode" on/off dynamically using commands like `/think on` or `/think off`. 2. **Deep Dive Analysis:** Select Option 2 to run a single prompt through multiple configurations (Temperature 0.1 vs 0.7, Budget 32 vs 256) simultaneously to see how the model's logic changes. 3. **Automated Benchmarking:** Select Option 3 to run the full internal test suite (Math, Logic, History, Science) and generate A/B comparison logs. **How to use:** Save the code below as `run_noeum.py` and execute it. It handles token streaming, template application, and logging automatically. ```python import torch import torch.nn.functional as F from transformers import AutoTokenizer, AutoModelForCausalLM import sys from datetime import datetime class TeeLogger: def __init__(self, filename): self.terminal = sys.stdout self.log = open(filename, 'w', encoding='utf-8') def write(self, message): self.terminal.write(message) self.log.write(message) self.log.flush() def flush(self): self.terminal.flush() self.log.flush() def close(self): self.log.close() # Start logging timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") log_filename = f"benchmark_results_{timestamp}.txt" logger = TeeLogger(log_filename) sys.stdout = logger print(f"Logging started: {log_filename}") print(f"Timestamp: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}") # ============================================================================ # MODEL SETUP # ============================================================================ MODEL_PATH = "./Noeum-0.6B-hf-nano" DEVICE = "cuda" if torch.cuda.is_available() else "cpu" print(f"\nLoading model from {MODEL_PATH} on {DEVICE}...") tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True, use_fast=False) model = AutoModelForCausalLM.from_pretrained(MODEL_PATH, trust_remote_code=True).to(DEVICE) model.eval() if tokenizer.pad_token is None: tokenizer.pad_token = tokenizer.eos_token EOS_ID = tokenizer.eos_token_id def streaming_generate(model, prompt_ids, max_new_tokens=512, temperature=0.7, top_p=0.9, device="cuda"): input_ids = prompt_ids.to(device) for _ in range(max_new_tokens): with torch.inference_mode(): outputs = model(input_ids) logits = outputs.logits[:, -1, :] # Greedy decoding if temperature <= 0: next_token = torch.argmax(logits, dim=-1, keepdim=True) else: logits = logits / temperature # Top-p (nucleus sampling) if top_p < 1.0: sorted_logits, sorted_indices = torch.sort(logits, descending=True) sorted_probs = F.softmax(sorted_logits, dim=-1) cumulative_probs = torch.cumsum(sorted_probs, dim=-1) # Remove tokens with cumulative probability above threshold sorted_indices_to_remove = cumulative_probs > top_p sorted_indices_to_remove[..., 1:] = sorted_indices_to_remove[..., :-1].clone() sorted_indices_to_remove[..., 0] = 0 # Set removed tokens to -inf sorted_logits[sorted_indices_to_remove] = -float("inf") # Scatter back to original positions logits = torch.zeros_like(logits).scatter(1, sorted_indices, sorted_logits) probs = F.softmax(logits, dim=-1) next_token = torch.multinomial(probs, num_samples=1) # Decode token tok_id = int(next_token.item()) token_text = tokenizer.decode([tok_id], skip_special_tokens=False) yield token_text # Stop if EOS token if tok_id == EOS_ID: break # Append to input for next iteration input_ids = torch.cat([input_ids, next_token], dim=-1) def chat( question: str, # chat_history argument removed/ignored thinking: bool = True, think_budget: int = 128, temperature: float = 0.1, top_p: float = 0.9, system_prompt: str = "You are a helpful assistant.", verbose: bool = True ): """ Main chat function with streaming support. MEMORY DISABLED: Each call is a fresh context. """ # Build conversation - STRICTLY SYSTEM + CURRENT QUESTION messages = [{'role': 'system', 'content': system_prompt}] # Add current question with /think flag if enabled user_message = f"{question} /think" if thinking else question messages.append({'role': 'user', 'content': user_message}) # Apply chat template prompt_text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) prompt_ids = tokenizer.encode(prompt_text, return_tensors='pt') # Generate thinking_content = "" answer_content = "" current_mode = None generator = streaming_generate( model=model, prompt_ids=prompt_ids, max_new_tokens=think_budget + 256 if thinking else 512, temperature=temperature, top_p=top_p, device=DEVICE ) if verbose: print("\n" + "=" * 80) for chunk in generator: if chunk == '': break # Track mode switches if chunk == '': current_mode = 'thinking' if verbose: print("\n💭 THINKING:") continue elif chunk == '': current_mode = None continue elif chunk == '': current_mode = 'answer' if verbose: print("\n✅ ANSWER:") continue elif chunk == '': current_mode = None continue # Accumulate content if current_mode == 'thinking': thinking_content += chunk if verbose: print(chunk, end='', flush=True) elif current_mode == 'answer': answer_content += chunk if verbose: print(chunk, end='', flush=True) if verbose: print("\n" + "=" * 80) return { 'thinking': thinking_content.strip(), 'answer': answer_content.strip(), 'full_thinking': thinking_content.strip(), 'full_answer': answer_content.strip() } # ============================================================================ # BENCHMARK FUNCTIONS # ============================================================================ def benchmark_single_question(question: str, temperatures=[0.1, 0.3, 0.7], budgets=[32, 128, 256]): """ Run a single question through all configurations - STREAMING """ print("\n" + "=" * 100) print(f"QUESTION: {question}") print("=" * 100) # NO THINK print("\n🚫 NO THINK MODE (temperature=0.7)") print("-" * 100) result = chat(question, thinking=False, temperature=0.7, verbose=True) # THINK MODE - Different temperatures for temp in temperatures: print(f"\n💭 THINK MODE - Temperature: {temp}, Budget: 128") print("-" * 100) result = chat(question, thinking=True, think_budget=128, temperature=temp, verbose=True) # THINK MODE - Different budgets (at temp=0.7) for budget in budgets: print(f"\n💭 THINK MODE - Temperature: 0.7, Budget: {budget}") print("-" * 100) result = chat(question, thinking=True, think_budget=budget, temperature=0.7, verbose=True) def benchmark_all_questions(questions: list, config: dict): results = [] print("\n" + "=" * 100) print(f"BENCHMARK: {config}") print("=" * 100) for i, q in enumerate(questions, 1): print(f"\n[{i}/{len(questions)}] Q: {q}") result = chat( q, thinking=config.get('thinking', True), think_budget=config.get('think_budget', 128), temperature=config.get('temperature', 0.1), verbose=True ) results.append({ 'question': q, 'thinking': result['full_thinking'], 'answer': result['full_answer'] }) return results def compare_configurations(questions: list): configurations = [ {'name': 'NO THINK', 'thinking': False, 'temperature': 0.7}, {'name': 'THINK - Temp 0.1', 'thinking': True, 'think_budget': 128, 'temperature': 0.1}, {'name': 'THINK - Temp 0.3', 'thinking': True, 'think_budget': 128, 'temperature': 0.3}, {'name': 'THINK - Temp 0.7', 'thinking': True, 'think_budget': 128, 'temperature': 0.7}, {'name': 'THINK - Budget 32', 'thinking': True, 'think_budget': 32, 'temperature': 0.7}, {'name': 'THINK - Budget 256', 'thinking': True, 'think_budget': 256, 'temperature': 0.7}, ] all_results = {} for config in configurations: name = config.pop('name') print(f"\n{'#' * 100}") print(f"RUNNING CONFIGURATION: {name}") print(f"{'#' * 100}") all_results[name] = benchmark_all_questions(questions, config) # Print FULL comparison table print("\n" + "=" * 100) print("COMPARISON SUMMARY - FULL OUTPUTS") print("=" * 100) for i, q in enumerate(questions): print(f"\n{'=' * 100}") print(f"Q{i + 1}: {q}") print(f"{'=' * 100}") for config_name, results in all_results.items(): print(f"\n{config_name}:") print(f" 💭 THINKING: {results[i]['thinking']}") print(f" ✅ ANSWER: {results[i]['answer']}") print("-" * 100) # ============================================================================ # INTERACTIVE CHAT LOOP # ============================================================================ def interactive_chat(): """ Interactive chat session in terminal - STATELESS (No Memory) """ print("\n" + "=" * 80) print("INTERACTIVE CHAT SESSION (NO MEMORY)") print("=" * 80) print("Commands:") print(" /quit - Exit chat") print(" /think on/off - Toggle thinking mode") print(" /budget - Set thinking budget (tokens)") print(" /temp - Set temperature (0-1)") print("=" * 80 + "\n") # chat_history removed thinking_enabled = True think_budget = 128 temperature = 0.1 while True: try: user_input = input("\n👤 You: ").strip() if not user_input: continue # Handle commands if user_input == '/quit': print("Goodbye!") break elif user_input.startswith('/think'): parts = user_input.split() if len(parts) > 1: thinking_enabled = parts[1].lower() == 'on' print(f"Thinking mode: {'ON' if thinking_enabled else 'OFF'}") continue elif user_input.startswith('/budget'): parts = user_input.split() if len(parts) > 1: think_budget = int(parts[1]) print(f"Thinking budget set to: {think_budget} tokens") continue elif user_input.startswith('/temp'): parts = user_input.split() if len(parts) > 1: temperature = float(parts[1]) print(f"Temperature set to: {temperature}") continue # /clear command removed as there is no memory # Get response - NO HISTORY PASSED result = chat( question=user_input, thinking=thinking_enabled, think_budget=think_budget, temperature=temperature, verbose=True ) # Chat history update logic removed except KeyboardInterrupt: print("\n\nGoodbye!") break except Exception as e: print(f"\nError: {e}") # ============================================================================ # MAIN - RUN BENCHMARKS # ============================================================================ if __name__ == '__main__': try: # Test questions test_questions = { # CATEGORY 1: Simple Math (Addition/Subtraction) "Simple Math": [ "What is 15 + 27?", "What is 100 - 37?", "What is 45 + 55?", "What is 82 - 19?", "What is 7 + 8?", "What is 50 - 25?", "What is 123 + 456?", "What is 200 - 88?", "What is 9 + 6?", "What is 75 - 30?" ], # CATEGORY 2: Multiplication "Multiplication": [ "What is 8 × 7?", "What is 12 × 12?", "What is 9 × 6?", "What is 15 × 4?", "What is 7 × 9?", "What is 11 × 11?", "What is 6 × 8?", "What is 13 × 5?", "What is 25 × 4?", "What is 20 × 3?" ], # CATEGORY 3: Division & Fractions "Division & Fractions": [ "What is 56 ÷ 8?", "Which is larger: 1/2 or 1/3?", "What is 100 ÷ 4?", "Which is larger: 2/3 or 3/4?", "What is 81 ÷ 9?", "Which is larger: 1/4 or 1/5?", "What is 144 ÷ 12?", "What is 1/2 + 1/4?", "What is 50 ÷ 2?", "Which is larger: 3/5 or 2/5?" ], # CATEGORY 4: Word Problems "Word Problems": [ "If a train travels 60 km in 1 hour, how far in 3 hours?", "If John has 5 apples and gives 2 to Mary, how many does he have left?", "If a book costs $12 and you buy 3 books, how much do you spend?", "If there are 24 students and 6 tables, how many students per table?", "If a car uses 8 liters per 100km, how much for 300km?", "If you earn $15 per hour and work 8 hours, how much do you earn?", "If a pizza has 8 slices and 4 people share equally, how many slices each?", "If a pen costs $2 and you have $20, how many pens can you buy?", "If a movie is 2 hours long and starts at 3pm, when does it end?", "If you save $10 per week, how much in 10 weeks?" ], # CATEGORY 5: Prime Numbers & Math Concepts "Math Concepts": [ "Is 17 a prime number?", "Is 20 a prime number?", "Is 13 a prime number?", "What is the square root of 64?", "What is 5 squared (5²)?", "Is 1 a prime number?", "What is 10% of 100?", "Is 2 the only even prime number?", "What is the square root of 49?", "What is 3 cubed (3³)?" ], # CATEGORY 6: History "History": [ "Who wrote Romeo and Juliet?", "Who was the first president of the United States?", "In what year did World War 2 end?", "Who discovered America in 1492?", "Who painted the Mona Lisa?", "What year did the Titanic sink?", "Who was the first man on the moon?", "In what year did World War 1 start?", "Who wrote the Declaration of Independence?", "Who was Julius Caesar?" ], # CATEGORY 7: Geography "Geography": [ "What is the capital of France?", "Which is bigger: Russia or Canada?", "What is the capital of Italy?", "Is the Nile or Amazon river longer?", "Which ocean is largest: Atlantic or Pacific?", "What is the capital of Japan?", "Is Australia a continent?", "What is the tallest mountain in the world?", "How many continents are there?", "What is the capital of Spain?" ], # CATEGORY 8: Science & Nature "Science & Nature": [ "Does the Sun orbit the Earth?", "What gas do plants produce during photosynthesis?", "Is iron magnetic?", "What is H2O?", "Does ice float in water?", "How many legs does a spider have?", "How many legs does an ant have?", "What is the speed of light approximately?", "Is the Earth flat or round?", "What planet is closest to the Sun?" ], # CATEGORY 9: Logical Reasoning "Logical Reasoning": [ "If all cats are animals, and Fluffy is a cat, is Fluffy an animal?", "If today is Monday, what day is tomorrow?", "If you have 3 red balls and 2 blue balls, how many balls total?", "Complete the pattern: 2, 4, 6, 8, ?", "Which is the odd one out: apple, banana, car, orange?", "If A is taller than B, and B is taller than C, who is tallest?", "True or False: All birds can fly?", "If it's raining, the ground is wet. The ground is wet. Is it raining?", "Complete the pattern: Monday, Tuesday, Wednesday, ?", "If 5 > 3 and 3 > 1, is 5 > 1?" ], # CATEGORY 10: General Knowledge "General Knowledge": [ "How many days are in a week?", "Which is heavier: gold or aluminum?", "Is the Eiffel Tower in London?", "Is Bitcoin a cryptocurrency?", "How many hours are in a day?", "What color is the sky on a clear day?", "How many months are in a year?", "What is the freezing point of water in Celsius?", "How many wheels does a bicycle have?", "What is the boiling point of water in Celsius?" ] } # Flatten all questions for quick testing all_questions = [] for category, questions in test_questions.items(): all_questions.extend(questions) print(f"\nTotal questions: {len(all_questions)}") print(f"Categories: {len(test_questions)}") print(f"Questions per category: {len(test_questions['Simple Math'])}") # Choose what to run print("\n" + "=" * 80) print("SELECT BENCHMARK MODE") print("=" * 80) print("1. Simple test (original 3 questions with default settings)") print("2. Deep dive single question (all configs)") print("3. Compare all configurations (by category) - FULL OUTPUTS") print("4. Interactive chat (NO MEMORY)") print("=" * 80) choice = input("\nEnter choice (1-4): ").strip() if choice == '1': # Simple test print("\n" + "=" * 80) print("SIMPLE TEST") print("=" * 80) for q in list(test_questions.values())[0][:3]: print(f"\nQ: {q}") result = chat(q, thinking=True, think_budget=128, temperature=0.7, verbose=True) elif choice == '2': # Deep dive on single question question = input("\nEnter question (or press Enter for '15 + 27'): ").strip() if not question: question = "What is 15 + 27?" benchmark_single_question(question) elif choice == '3': # Test by category print("\nSelect category to test:") categories = list(test_questions.keys()) for i, cat in enumerate(categories, 1): print(f"{i}. {cat}") print(f"{len(categories) + 1}. ALL CATEGORIES (100 questions)") cat_choice = input("\nEnter choice: ").strip() if cat_choice == str(len(categories) + 1): # Test all compare_configurations(all_questions) else: # Test specific category cat_idx = int(cat_choice) - 1 cat_name = categories[cat_idx] print(f"\nTesting category: {cat_name}") compare_configurations(test_questions[cat_name]) elif choice == '4': # Interactive chat interactive_chat() else: print("Invalid choice. Starting interactive chat...") interactive_chat() finally: print(f"\n\nLogging completed. Results saved to: {log_filename}") logger.close() sys.stdout = logger.terminal ``` --- ## Limitations & Bias While Noeum-1-Nano demonstrates impressive reasoning for its size, users should be aware of the following: * **Hallucinations:** Like all small models, it can generate plausible but incorrect information, especially when the `` mode is disabled. * **Arithmetic:** While it can derive formulas correctly, it may struggle with calculating large numbers precisely. * **Scope:** The model is optimized for English and general reasoning. It is not intended for medical, legal, or safety-critical advice. --- ## About Noeum
Noeum
**Noeum** is an independent AI research & engineering lab based in Austria, building the next generation of intelligent systems. We are one of the few labs in Europe executing the full AI pipeline—from pre-training and alignment—entirely in-house. *** ### The Vision & Future Roadmap This project, spearheaded by **[Bledar Ramo](https://www.linkedin.com/in/ramobledar)**, is not just a nano-model—it is a validation of a high-efficiency scaling hypothesis. We have proven that rapid iteration on small-scale "proxy" models is a reliable predictor of large-scale performance, allowing us to innovate faster than labs burdened by massive training runs. **Our Core Philosophy:** *Iterate fast at nano-scale; scale only what works.* With the right compute infrastructure and backing, we plan to scale these validated recipes to a **1 Trillion+ token** frontier model. Our roadmap includes integrating cutting-edge techniques inspired by our internal research and recent literature: * **Recursive Reasoning Architectures:** Moving beyond static Chain-of-Thought to **Recursive Language Models (RLMs)** that treat prompts as dynamic environments, solving problems far exceeding standard context windows * **Agentic Data Synthesis:** Implementing large-scale, self-correcting synthetic data pipelines that simulate real-world tool use and multi-step reasoning * **Stability at Scale:** Utilizing advanced optimization techniques like **MuonClip** (QK-Norm/Clip) to ensure stability during massive training runs without loss spikes. * **Hyper-Efficient Architectures:** Further refining our MoE routing and **Multi-head Latent Attention (MLA)** to maximize active parameter efficiency. **Noeum** (derived from *"mind," "meaning,"* and *"thought"*) is building the next generation of genuine reasoning systems—not by brute-force, but by architectural intelligence. *** 🌐 **Website:** [noeum.ai](https://noeum.ai) 📧 **Contact:** contact@noeum.ai