BledarRamo commited on
Commit
543a80d
·
verified ·
1 Parent(s): 9dd623b

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +20 -18
README.md CHANGED
@@ -72,7 +72,7 @@ datasets:
72
 
73
  ---
74
 
75
- ## 🚀 Overview
76
 
77
  **Noeum-1-Nano** is a nano-scale Mixture-of-Experts (MoE) model (0.6B total / 0.2B active) trained on only **18 billion tokens**.
78
 
@@ -83,12 +83,12 @@ It has proven its efficiency and reasoning quality by matching the capabilities
83
 
84
  ---
85
 
86
- ## 🌟 Performance & Benchmarks
87
 
88
  The benchmarks below demonstrate Noeum-1-Nano achieving above-average performance despite an extreme disparity in training volume. While standard models typically require 2 Trillion to 12 Trillion tokens, Noeum achieves competitive results with just 18 billion high-signal tokens.
89
 
90
- ### 📊 Quantitative Benchmarks (lm-eval-harness)
91
- ### Conducted with Noeum thinking mode DISABLED to ensure fair comparison
92
 
93
 
94
 
@@ -104,7 +104,7 @@ The benchmarks below demonstrate Noeum-1-Nano achieving above-average performanc
104
 
105
  ***
106
 
107
- ### 🧪 Internal Evaluation & Best Practices
108
 
109
  Based on our internal automated benchmarks (100-question comparative deep dive), **Noeum-1-Nano** performs exceptionally well on specific task types when the reasoning engine is properly configured.
110
 
@@ -112,12 +112,12 @@ Based on our internal automated benchmarks (100-question comparative deep dive),
112
  * **Step-by-Step Word Problems:** Unlike standard small models which guess numbers, Noeum successfully sets up equations (e.g., $Distance = Speed \times Time$).
113
  * **Logical Deduction:** It correctly handles transitive logic puzzles (e.g., *If A > B and B > C, who is tallest?*).
114
 
115
- **⚠ Critical Configuration:**
116
  These results are conditional on specific generation parameters. Our tests confirm that a **Thinking Budget of 128 tokens** combined with a **Temperature of 0.1** is the "sweet spot." Lower budgets cut off reasoning prematurely, while higher temperatures introduce instability.
117
 
118
  ---
119
 
120
- ## 📚 Dataset Composition
121
 
122
  To achieve competitive performance with only **18 Billion tokens**, we prioritized data density over volume. We curated a "high-signal" mixture designed to maximize reasoning density per token.
123
 
@@ -126,7 +126,9 @@ The pre-training mixture includes:
126
  * **Coding:** High-quality **Python** repositories and **StackExchange** discussions.
127
  * **General Knowledge:** **Wikipedia** (specifically filtered for long-context articles >2k tokens), **C4**, and **FineWeb-Edu** (High quality subset).
128
  * **Synthetic Data:** Custom-generated synthetic reasoning traces designed to bootstrap the model's cognitive capabilities, including the ability to engage in deliberative reasoning before responding, explore contradictory perspectives, apply first-principles analysis, generate divergent solutions, and employ lateral thinking strategies."*
129
- ### 🧠 Impact of Reasoning (A/B Test)
 
 
130
 
131
  Noeum-1-Nano features a specific **Thinking Mode**. When enabled (temp=0.1), the model engages a hidden chain-of-thought process that grounds facts and solves multi-step problems.
132
 
@@ -135,20 +137,20 @@ Noeum-1-Nano features a specific **Thinking Mode**. When enabled (temp=0.1), the
135
 
136
  **User:** "What is the capital of Spain?"
137
 
138
- | Mode | Output | Verdict |
139
- |:---|:---|:---|
140
- | **Standard** | "La Muerte is the capital of Spain" |**Hallucination** |
141
- | **Reasoning** | `<think>` The capital of Spain is Madrid. It is known for its rich history... `</think>` <br> **"Madrid is the capital of Spain."** | ✅ **Correct** |
142
 
143
  #### 2. Mathematical Logic
144
  *Standard generation struggles with arithmetic; reasoning sets up equations.*
145
 
146
  **User:** "If a train travels 60 km in 1 hour, how far in 3 hours?"
147
 
148
- | Mode | Output | Verdict |
149
- |:---|:---|:---|
150
- | **Standard** | "Therefore, the distance traveled by the train is 60 kilometers." |**Repeated Input** |
151
- | **Reasoning** | `<think>` Distance = Speed × Time. <br> 60 km × 3 hours = 180 km `</think>` <br> **"So, the train travels 180 kilometers in 3 hours."** | ✅ **Correct** |
152
 
153
  ---
154
 
@@ -777,7 +779,7 @@ if __name__ == '__main__':
777
 
778
  ---
779
 
780
- ## ⚠️ Limitations & Bias
781
 
782
  While Noeum-1-Nano demonstrates impressive reasoning for its size, users should be aware of the following:
783
  * **Hallucinations:** Like all small models, it can generate plausible but incorrect information, especially when the `<think>` mode is disabled.
@@ -796,7 +798,7 @@ While Noeum-1-Nano demonstrates impressive reasoning for its size, users should
796
 
797
  ***
798
 
799
- ### 🔭 The Vision & Future Roadmap
800
 
801
  This project, spearheaded by **[Bledar Ramo](https://www.linkedin.com/in/ramobledar)**, is not just a nano-model—it is a validation of a high-efficiency scaling hypothesis. We have proven that rapid iteration on small-scale "proxy" models is a reliable predictor of large-scale performance, allowing us to innovate faster than labs burdened by massive training runs.
802
 
 
72
 
73
  ---
74
 
75
+ ## Overview
76
 
77
  **Noeum-1-Nano** is a nano-scale Mixture-of-Experts (MoE) model (0.6B total / 0.2B active) trained on only **18 billion tokens**.
78
 
 
83
 
84
  ---
85
 
86
+ ## Performance & Benchmarks
87
 
88
  The benchmarks below demonstrate Noeum-1-Nano achieving above-average performance despite an extreme disparity in training volume. While standard models typically require 2 Trillion to 12 Trillion tokens, Noeum achieves competitive results with just 18 billion high-signal tokens.
89
 
90
+ ### Quantitative Benchmarks (lm-eval-harness)
91
+ ### ALL benchmarks conducted with Noeum thinking mode DISABLED to ensure fair comparison
92
 
93
 
94
 
 
104
 
105
  ***
106
 
107
+ ### Internal Evaluation & Best Practices
108
 
109
  Based on our internal automated benchmarks (100-question comparative deep dive), **Noeum-1-Nano** performs exceptionally well on specific task types when the reasoning engine is properly configured.
110
 
 
112
  * **Step-by-Step Word Problems:** Unlike standard small models which guess numbers, Noeum successfully sets up equations (e.g., $Distance = Speed \times Time$).
113
  * **Logical Deduction:** It correctly handles transitive logic puzzles (e.g., *If A > B and B > C, who is tallest?*).
114
 
115
+ **⚠ Critical Configuration:**
116
  These results are conditional on specific generation parameters. Our tests confirm that a **Thinking Budget of 128 tokens** combined with a **Temperature of 0.1** is the "sweet spot." Lower budgets cut off reasoning prematurely, while higher temperatures introduce instability.
117
 
118
  ---
119
 
120
+ ## Dataset Composition
121
 
122
  To achieve competitive performance with only **18 Billion tokens**, we prioritized data density over volume. We curated a "high-signal" mixture designed to maximize reasoning density per token.
123
 
 
126
  * **Coding:** High-quality **Python** repositories and **StackExchange** discussions.
127
  * **General Knowledge:** **Wikipedia** (specifically filtered for long-context articles >2k tokens), **C4**, and **FineWeb-Edu** (High quality subset).
128
  * **Synthetic Data:** Custom-generated synthetic reasoning traces designed to bootstrap the model's cognitive capabilities, including the ability to engage in deliberative reasoning before responding, explore contradictory perspectives, apply first-principles analysis, generate divergent solutions, and employ lateral thinking strategies."*
129
+
130
+
131
+ ### Tiny model but with Thinking option and impact of extra Reasoning (A/B Test)
132
 
133
  Noeum-1-Nano features a specific **Thinking Mode**. When enabled (temp=0.1), the model engages a hidden chain-of-thought process that grounds facts and solves multi-step problems.
134
 
 
137
 
138
  **User:** "What is the capital of Spain?"
139
 
140
+ | Mode | Output | Verdict |
141
+ |:---|:---|:-------------------|
142
+ | **Standard** | "La Muerte is the capital of Spain" | **Hallucination** |
143
+ | **Reasoning** | `<think>` The capital of Spain is Madrid. It is known for its rich history... `</think>` <br> **"Madrid is the capital of Spain."** | ✅ **Correct** |
144
 
145
  #### 2. Mathematical Logic
146
  *Standard generation struggles with arithmetic; reasoning sets up equations.*
147
 
148
  **User:** "If a train travels 60 km in 1 hour, how far in 3 hours?"
149
 
150
+ | Mode | Output | Verdict |
151
+ |:---|:---|:--------------------|
152
+ | **Standard** | "Therefore, the distance traveled by the train is 60 kilometers." | **Repeated Input** |
153
+ | **Reasoning** | `<think>` Distance = Speed × Time. <br> 60 km × 3 hours = 180 km `</think>` <br> **"So, the train travels 180 kilometers in 3 hours."** | ✅ **Correct** |
154
 
155
  ---
156
 
 
779
 
780
  ---
781
 
782
+ ## Limitations & Bias
783
 
784
  While Noeum-1-Nano demonstrates impressive reasoning for its size, users should be aware of the following:
785
  * **Hallucinations:** Like all small models, it can generate plausible but incorrect information, especially when the `<think>` mode is disabled.
 
798
 
799
  ***
800
 
801
+ ### The Vision & Future Roadmap
802
 
803
  This project, spearheaded by **[Bledar Ramo](https://www.linkedin.com/in/ramobledar)**, is not just a nano-model—it is a validation of a high-efficiency scaling hypothesis. We have proven that rapid iteration on small-scale "proxy" models is a reliable predictor of large-scale performance, allowing us to innovate faster than labs burdened by massive training runs.
804