Instructions to use noeum/noeum-1-nano with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use noeum/noeum-1-nano with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="noeum/noeum-1-nano", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("noeum/noeum-1-nano", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use noeum/noeum-1-nano with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "noeum/noeum-1-nano"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "noeum/noeum-1-nano",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/noeum/noeum-1-nano

SGLang

How to use noeum/noeum-1-nano with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "noeum/noeum-1-nano" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "noeum/noeum-1-nano",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "noeum/noeum-1-nano" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "noeum/noeum-1-nano",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use noeum/noeum-1-nano with Docker Model Runner:
```
docker model run hf.co/noeum/noeum-1-nano
```

BledarRamo commited on Jan 5

Commit

543a80d

verified ·

1 Parent(s): 9dd623b

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +20 -18

README.md CHANGED Viewed

@@ -72,7 +72,7 @@ datasets:
 ---
-## 🚀 Overview
 **Noeum-1-Nano** is a nano-scale Mixture-of-Experts (MoE) model (0.6B total / 0.2B active) trained on only **18 billion tokens**.
@@ -83,12 +83,12 @@ It has proven its efficiency and reasoning quality by matching the capabilities
 ---
-## 🌟 Performance & Benchmarks
 The benchmarks below demonstrate Noeum-1-Nano achieving above-average performance despite an extreme disparity in training volume. While standard models typically require 2 Trillion to 12 Trillion tokens, Noeum achieves competitive results with just 18 billion high-signal tokens.
-### 📊 Quantitative Benchmarks (lm-eval-harness)
-### Conducted with Noeum thinking mode DISABLED to ensure fair comparison
@@ -104,7 +104,7 @@ The benchmarks below demonstrate Noeum-1-Nano achieving above-average performanc
 ***
-### 🧪 Internal Evaluation & Best Practices
 Based on our internal automated benchmarks (100-question comparative deep dive), **Noeum-1-Nano** performs exceptionally well on specific task types when the reasoning engine is properly configured.
@@ -112,12 +112,12 @@ Based on our internal automated benchmarks (100-question comparative deep dive),
 *   **Step-by-Step Word Problems:** Unlike standard small models which guess numbers, Noeum successfully sets up equations (e.g., $Distance = Speed \times Time$).
 *   **Logical Deduction:** It correctly handles transitive logic puzzles (e.g., *If A > B and B > C, who is tallest?*).
-**⚠️ Critical Configuration:**
 These results are conditional on specific generation parameters. Our tests confirm that a **Thinking Budget of 128 tokens** combined with a **Temperature of 0.1** is the "sweet spot." Lower budgets cut off reasoning prematurely, while higher temperatures introduce instability.
 ---
-## 📚 Dataset Composition
 To achieve competitive performance with only **18 Billion tokens**, we prioritized data density over volume. We curated a "high-signal" mixture designed to maximize reasoning density per token.
@@ -126,7 +126,9 @@ The pre-training mixture includes:
 *   **Coding:** High-quality **Python** repositories and **StackExchange** discussions.
 *   **General Knowledge:** **Wikipedia** (specifically filtered for long-context articles >2k tokens), **C4**, and **FineWeb-Edu** (High quality subset).
 *   **Synthetic Data:** Custom-generated synthetic reasoning traces designed to bootstrap the model's cognitive capabilities, including the ability to engage in deliberative reasoning before responding, explore contradictory perspectives, apply first-principles analysis, generate divergent solutions, and employ lateral thinking strategies."*
-### 🧠 Impact of Reasoning (A/B Test)
 Noeum-1-Nano features a specific **Thinking Mode**. When enabled (temp=0.1), the model engages a hidden chain-of-thought process that grounds facts and solves multi-step problems.
@@ -135,20 +137,20 @@ Noeum-1-Nano features a specific **Thinking Mode**. When enabled (temp=0.1), the
 **User:** "What is the capital of Spain?"
-| Mode | Output | Verdict |
-|:---|:---|:---|
-| **Standard** | "La Muerte is the capital of Spain" | ❌ **Hallucination** |
-| **Reasoning** | `<think>` The capital of Spain is Madrid. It is known for its rich history... `</think>` <br> **"Madrid is the capital of Spain."** | ✅ **Correct** |
 #### 2. Mathematical Logic
 *Standard generation struggles with arithmetic; reasoning sets up equations.*
 **User:** "If a train travels 60 km in 1 hour, how far in 3 hours?"
-| Mode | Output | Verdict |
-|:---|:---|:---|
-| **Standard** | "Therefore, the distance traveled by the train is 60 kilometers." | ❌ **Repeated Input** |
-| **Reasoning** | `<think>` Distance = Speed × Time. <br> 60 km × 3 hours = 180 km `</think>` <br> **"So, the train travels 180 kilometers in 3 hours."** | ✅ **Correct** |
 ---
@@ -777,7 +779,7 @@ if __name__ == '__main__':
 ---
-## ⚠️ Limitations & Bias
 While Noeum-1-Nano demonstrates impressive reasoning for its size, users should be aware of the following:
 *   **Hallucinations:** Like all small models, it can generate plausible but incorrect information, especially when the `<think>` mode is disabled.
@@ -796,7 +798,7 @@ While Noeum-1-Nano demonstrates impressive reasoning for its size, users should
 ***
-### 🔭 The Vision & Future Roadmap
 This project, spearheaded by **[Bledar Ramo](https://www.linkedin.com/in/ramobledar)**, is not just a nano-model—it is a validation of a high-efficiency scaling hypothesis. We have proven that rapid iteration on small-scale "proxy" models is a reliable predictor of large-scale performance, allowing us to innovate faster than labs burdened by massive training runs.

 ---
+##  Overview
 **Noeum-1-Nano** is a nano-scale Mixture-of-Experts (MoE) model (0.6B total / 0.2B active) trained on only **18 billion tokens**.
 ---
+##  Performance & Benchmarks
 The benchmarks below demonstrate Noeum-1-Nano achieving above-average performance despite an extreme disparity in training volume. While standard models typically require 2 Trillion to 12 Trillion tokens, Noeum achieves competitive results with just 18 billion high-signal tokens.
+### Quantitative Benchmarks (lm-eval-harness)
+### ALL benchmarks conducted with Noeum thinking mode DISABLED to ensure fair comparison
 ***
+###  Internal Evaluation & Best Practices
 Based on our internal automated benchmarks (100-question comparative deep dive), **Noeum-1-Nano** performs exceptionally well on specific task types when the reasoning engine is properly configured.
 *   **Step-by-Step Word Problems:** Unlike standard small models which guess numbers, Noeum successfully sets up equations (e.g., $Distance = Speed \times Time$).
 *   **Logical Deduction:** It correctly handles transitive logic puzzles (e.g., *If A > B and B > C, who is tallest?*).
+**⚠ Critical Configuration:**
 These results are conditional on specific generation parameters. Our tests confirm that a **Thinking Budget of 128 tokens** combined with a **Temperature of 0.1** is the "sweet spot." Lower budgets cut off reasoning prematurely, while higher temperatures introduce instability.
 ---
+##  Dataset Composition
 To achieve competitive performance with only **18 Billion tokens**, we prioritized data density over volume. We curated a "high-signal" mixture designed to maximize reasoning density per token.
 *   **Coding:** High-quality **Python** repositories and **StackExchange** discussions.
 *   **General Knowledge:** **Wikipedia** (specifically filtered for long-context articles >2k tokens), **C4**, and **FineWeb-Edu** (High quality subset).
 *   **Synthetic Data:** Custom-generated synthetic reasoning traces designed to bootstrap the model's cognitive capabilities, including the ability to engage in deliberative reasoning before responding, explore contradictory perspectives, apply first-principles analysis, generate divergent solutions, and employ lateral thinking strategies."*
+### Tiny model but with Thinking option and impact of extra Reasoning (A/B Test)
 Noeum-1-Nano features a specific **Thinking Mode**. When enabled (temp=0.1), the model engages a hidden chain-of-thought process that grounds facts and solves multi-step problems.
 **User:** "What is the capital of Spain?"
+| Mode | Output | Verdict            |
+|:---|:---|:-------------------|
+| **Standard** | "La Muerte is the capital of Spain" |  **Hallucination** |
+| **Reasoning** | `<think>` The capital of Spain is Madrid. It is known for its rich history... `</think>` <br> **"Madrid is the capital of Spain."** | ✅ **Correct**      |
 #### 2. Mathematical Logic
 *Standard generation struggles with arithmetic; reasoning sets up equations.*
 **User:** "If a train travels 60 km in 1 hour, how far in 3 hours?"
+| Mode | Output | Verdict             |
+|:---|:---|:--------------------|
+| **Standard** | "Therefore, the distance traveled by the train is 60 kilometers." |  **Repeated Input** |
+| **Reasoning** | `<think>` Distance = Speed × Time. <br> 60 km × 3 hours = 180 km `</think>` <br> **"So, the train travels 180 kilometers in 3 hours."** | ✅ **Correct**       |
 ---
 ---
+##  Limitations & Bias
 While Noeum-1-Nano demonstrates impressive reasoning for its size, users should be aware of the following:
 *   **Hallucinations:** Like all small models, it can generate plausible but incorrect information, especially when the `<think>` mode is disabled.
 ***
+###  The Vision & Future Roadmap
 This project, spearheaded by **[Bledar Ramo](https://www.linkedin.com/in/ramobledar)**, is not just a nano-model—it is a validation of a high-efficiency scaling hypothesis. We have proven that rapid iteration on small-scale "proxy" models is a reliable predictor of large-scale performance, allowing us to innovate faster than labs burdened by massive training runs.