Text Generation
Transformers
PyTorch
English
ramo
Mixture of Experts
mixture-of-experts
reasoning
chain-of-thought
cot
system-2-thinking
nlp
conversational
instruct
sft
dpo
grpo
rlhf
math
logic
scientific-reasoning
efficient
low-resource
data-efficient
from-scratch
pretrained
0.6b
nano-model
small-model
european-ai
austria
independent-research
arxiv
python
coding
step-by-step
self-correction
hallucination-reduction
educational
research
benchmark
thinking-mode
mental-models
deductive-reasoning
analytical
problem-solving
custom_code
Instructions to use noeum/noeum-1-nano with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use noeum/noeum-1-nano with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="noeum/noeum-1-nano", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("noeum/noeum-1-nano", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use noeum/noeum-1-nano with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "noeum/noeum-1-nano" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "noeum/noeum-1-nano", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/noeum/noeum-1-nano
- SGLang
How to use noeum/noeum-1-nano with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "noeum/noeum-1-nano" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "noeum/noeum-1-nano", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "noeum/noeum-1-nano" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "noeum/noeum-1-nano", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use noeum/noeum-1-nano with Docker Model Runner:
docker model run hf.co/noeum/noeum-1-nano
Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -72,7 +72,7 @@ datasets:
|
|
| 72 |
|
| 73 |
---
|
| 74 |
|
| 75 |
-
##
|
| 76 |
|
| 77 |
**Noeum-1-Nano** is a nano-scale Mixture-of-Experts (MoE) model (0.6B total / 0.2B active) trained on only **18 billion tokens**.
|
| 78 |
|
|
@@ -83,12 +83,12 @@ It has proven its efficiency and reasoning quality by matching the capabilities
|
|
| 83 |
|
| 84 |
---
|
| 85 |
|
| 86 |
-
##
|
| 87 |
|
| 88 |
The benchmarks below demonstrate Noeum-1-Nano achieving above-average performance despite an extreme disparity in training volume. While standard models typically require 2 Trillion to 12 Trillion tokens, Noeum achieves competitive results with just 18 billion high-signal tokens.
|
| 89 |
|
| 90 |
-
###
|
| 91 |
-
###
|
| 92 |
|
| 93 |
|
| 94 |
|
|
@@ -104,7 +104,7 @@ The benchmarks below demonstrate Noeum-1-Nano achieving above-average performanc
|
|
| 104 |
|
| 105 |
***
|
| 106 |
|
| 107 |
-
###
|
| 108 |
|
| 109 |
Based on our internal automated benchmarks (100-question comparative deep dive), **Noeum-1-Nano** performs exceptionally well on specific task types when the reasoning engine is properly configured.
|
| 110 |
|
|
@@ -112,12 +112,12 @@ Based on our internal automated benchmarks (100-question comparative deep dive),
|
|
| 112 |
* **Step-by-Step Word Problems:** Unlike standard small models which guess numbers, Noeum successfully sets up equations (e.g., $Distance = Speed \times Time$).
|
| 113 |
* **Logical Deduction:** It correctly handles transitive logic puzzles (e.g., *If A > B and B > C, who is tallest?*).
|
| 114 |
|
| 115 |
-
**⚠
|
| 116 |
These results are conditional on specific generation parameters. Our tests confirm that a **Thinking Budget of 128 tokens** combined with a **Temperature of 0.1** is the "sweet spot." Lower budgets cut off reasoning prematurely, while higher temperatures introduce instability.
|
| 117 |
|
| 118 |
---
|
| 119 |
|
| 120 |
-
##
|
| 121 |
|
| 122 |
To achieve competitive performance with only **18 Billion tokens**, we prioritized data density over volume. We curated a "high-signal" mixture designed to maximize reasoning density per token.
|
| 123 |
|
|
@@ -126,7 +126,9 @@ The pre-training mixture includes:
|
|
| 126 |
* **Coding:** High-quality **Python** repositories and **StackExchange** discussions.
|
| 127 |
* **General Knowledge:** **Wikipedia** (specifically filtered for long-context articles >2k tokens), **C4**, and **FineWeb-Edu** (High quality subset).
|
| 128 |
* **Synthetic Data:** Custom-generated synthetic reasoning traces designed to bootstrap the model's cognitive capabilities, including the ability to engage in deliberative reasoning before responding, explore contradictory perspectives, apply first-principles analysis, generate divergent solutions, and employ lateral thinking strategies."*
|
| 129 |
-
|
|
|
|
|
|
|
| 130 |
|
| 131 |
Noeum-1-Nano features a specific **Thinking Mode**. When enabled (temp=0.1), the model engages a hidden chain-of-thought process that grounds facts and solves multi-step problems.
|
| 132 |
|
|
@@ -135,20 +137,20 @@ Noeum-1-Nano features a specific **Thinking Mode**. When enabled (temp=0.1), the
|
|
| 135 |
|
| 136 |
**User:** "What is the capital of Spain?"
|
| 137 |
|
| 138 |
-
| Mode | Output | Verdict
|
| 139 |
-
|:---|:---|:---|
|
| 140 |
-
| **Standard** | "La Muerte is the capital of Spain" |
|
| 141 |
-
| **Reasoning** | `<think>` The capital of Spain is Madrid. It is known for its rich history... `</think>` <br> **"Madrid is the capital of Spain."** | ✅ **Correct**
|
| 142 |
|
| 143 |
#### 2. Mathematical Logic
|
| 144 |
*Standard generation struggles with arithmetic; reasoning sets up equations.*
|
| 145 |
|
| 146 |
**User:** "If a train travels 60 km in 1 hour, how far in 3 hours?"
|
| 147 |
|
| 148 |
-
| Mode | Output | Verdict
|
| 149 |
-
|:---|:---|:---|
|
| 150 |
-
| **Standard** | "Therefore, the distance traveled by the train is 60 kilometers." |
|
| 151 |
-
| **Reasoning** | `<think>` Distance = Speed × Time. <br> 60 km × 3 hours = 180 km `</think>` <br> **"So, the train travels 180 kilometers in 3 hours."** | ✅ **Correct**
|
| 152 |
|
| 153 |
---
|
| 154 |
|
|
@@ -777,7 +779,7 @@ if __name__ == '__main__':
|
|
| 777 |
|
| 778 |
---
|
| 779 |
|
| 780 |
-
##
|
| 781 |
|
| 782 |
While Noeum-1-Nano demonstrates impressive reasoning for its size, users should be aware of the following:
|
| 783 |
* **Hallucinations:** Like all small models, it can generate plausible but incorrect information, especially when the `<think>` mode is disabled.
|
|
@@ -796,7 +798,7 @@ While Noeum-1-Nano demonstrates impressive reasoning for its size, users should
|
|
| 796 |
|
| 797 |
***
|
| 798 |
|
| 799 |
-
###
|
| 800 |
|
| 801 |
This project, spearheaded by **[Bledar Ramo](https://www.linkedin.com/in/ramobledar)**, is not just a nano-model—it is a validation of a high-efficiency scaling hypothesis. We have proven that rapid iteration on small-scale "proxy" models is a reliable predictor of large-scale performance, allowing us to innovate faster than labs burdened by massive training runs.
|
| 802 |
|
|
|
|
| 72 |
|
| 73 |
---
|
| 74 |
|
| 75 |
+
## Overview
|
| 76 |
|
| 77 |
**Noeum-1-Nano** is a nano-scale Mixture-of-Experts (MoE) model (0.6B total / 0.2B active) trained on only **18 billion tokens**.
|
| 78 |
|
|
|
|
| 83 |
|
| 84 |
---
|
| 85 |
|
| 86 |
+
## Performance & Benchmarks
|
| 87 |
|
| 88 |
The benchmarks below demonstrate Noeum-1-Nano achieving above-average performance despite an extreme disparity in training volume. While standard models typically require 2 Trillion to 12 Trillion tokens, Noeum achieves competitive results with just 18 billion high-signal tokens.
|
| 89 |
|
| 90 |
+
### Quantitative Benchmarks (lm-eval-harness)
|
| 91 |
+
### ALL benchmarks conducted with Noeum thinking mode DISABLED to ensure fair comparison
|
| 92 |
|
| 93 |
|
| 94 |
|
|
|
|
| 104 |
|
| 105 |
***
|
| 106 |
|
| 107 |
+
### Internal Evaluation & Best Practices
|
| 108 |
|
| 109 |
Based on our internal automated benchmarks (100-question comparative deep dive), **Noeum-1-Nano** performs exceptionally well on specific task types when the reasoning engine is properly configured.
|
| 110 |
|
|
|
|
| 112 |
* **Step-by-Step Word Problems:** Unlike standard small models which guess numbers, Noeum successfully sets up equations (e.g., $Distance = Speed \times Time$).
|
| 113 |
* **Logical Deduction:** It correctly handles transitive logic puzzles (e.g., *If A > B and B > C, who is tallest?*).
|
| 114 |
|
| 115 |
+
**⚠ Critical Configuration:**
|
| 116 |
These results are conditional on specific generation parameters. Our tests confirm that a **Thinking Budget of 128 tokens** combined with a **Temperature of 0.1** is the "sweet spot." Lower budgets cut off reasoning prematurely, while higher temperatures introduce instability.
|
| 117 |
|
| 118 |
---
|
| 119 |
|
| 120 |
+
## Dataset Composition
|
| 121 |
|
| 122 |
To achieve competitive performance with only **18 Billion tokens**, we prioritized data density over volume. We curated a "high-signal" mixture designed to maximize reasoning density per token.
|
| 123 |
|
|
|
|
| 126 |
* **Coding:** High-quality **Python** repositories and **StackExchange** discussions.
|
| 127 |
* **General Knowledge:** **Wikipedia** (specifically filtered for long-context articles >2k tokens), **C4**, and **FineWeb-Edu** (High quality subset).
|
| 128 |
* **Synthetic Data:** Custom-generated synthetic reasoning traces designed to bootstrap the model's cognitive capabilities, including the ability to engage in deliberative reasoning before responding, explore contradictory perspectives, apply first-principles analysis, generate divergent solutions, and employ lateral thinking strategies."*
|
| 129 |
+
|
| 130 |
+
|
| 131 |
+
### Tiny model but with Thinking option and impact of extra Reasoning (A/B Test)
|
| 132 |
|
| 133 |
Noeum-1-Nano features a specific **Thinking Mode**. When enabled (temp=0.1), the model engages a hidden chain-of-thought process that grounds facts and solves multi-step problems.
|
| 134 |
|
|
|
|
| 137 |
|
| 138 |
**User:** "What is the capital of Spain?"
|
| 139 |
|
| 140 |
+
| Mode | Output | Verdict |
|
| 141 |
+
|:---|:---|:-------------------|
|
| 142 |
+
| **Standard** | "La Muerte is the capital of Spain" | **Hallucination** |
|
| 143 |
+
| **Reasoning** | `<think>` The capital of Spain is Madrid. It is known for its rich history... `</think>` <br> **"Madrid is the capital of Spain."** | ✅ **Correct** |
|
| 144 |
|
| 145 |
#### 2. Mathematical Logic
|
| 146 |
*Standard generation struggles with arithmetic; reasoning sets up equations.*
|
| 147 |
|
| 148 |
**User:** "If a train travels 60 km in 1 hour, how far in 3 hours?"
|
| 149 |
|
| 150 |
+
| Mode | Output | Verdict |
|
| 151 |
+
|:---|:---|:--------------------|
|
| 152 |
+
| **Standard** | "Therefore, the distance traveled by the train is 60 kilometers." | **Repeated Input** |
|
| 153 |
+
| **Reasoning** | `<think>` Distance = Speed × Time. <br> 60 km × 3 hours = 180 km `</think>` <br> **"So, the train travels 180 kilometers in 3 hours."** | ✅ **Correct** |
|
| 154 |
|
| 155 |
---
|
| 156 |
|
|
|
|
| 779 |
|
| 780 |
---
|
| 781 |
|
| 782 |
+
## Limitations & Bias
|
| 783 |
|
| 784 |
While Noeum-1-Nano demonstrates impressive reasoning for its size, users should be aware of the following:
|
| 785 |
* **Hallucinations:** Like all small models, it can generate plausible but incorrect information, especially when the `<think>` mode is disabled.
|
|
|
|
| 798 |
|
| 799 |
***
|
| 800 |
|
| 801 |
+
### The Vision & Future Roadmap
|
| 802 |
|
| 803 |
This project, spearheaded by **[Bledar Ramo](https://www.linkedin.com/in/ramobledar)**, is not just a nano-model—it is a validation of a high-efficiency scaling hypothesis. We have proven that rapid iteration on small-scale "proxy" models is a reliable predictor of large-scale performance, allowing us to innovate faster than labs burdened by massive training runs.
|
| 804 |
|