emqnuele commited on
Commit
d01c70c
·
verified ·
1 Parent(s): ccd4a8e

Upload 8 files

Browse files
.gitattributes CHANGED
@@ -34,3 +34,7 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  Ludomi3-2b.Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  Ludomi3-2b.Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
37
+ assets/benchmark_comparison.png filter=lfs diff=lfs merge=lfs -text
38
+ assets/sentience_timeline.png filter=lfs diff=lfs merge=lfs -text
39
+ assets/training_curves.png filter=lfs diff=lfs merge=lfs -text
40
+ benchmark.png filter=lfs diff=lfs merge=lfs -text
assets/.gitkeep ADDED
File without changes
assets/benchmark_comparison.png ADDED

Git LFS Details

  • SHA256: 40d6373b57c6fa40c00a2b30042565d06e0417d05c0c6bdcb48a11002271d446
  • Pointer size: 131 Bytes
  • Size of remote file: 104 kB
assets/sentience_timeline.png ADDED

Git LFS Details

  • SHA256: d9155a0dcc83d9cf9a4033eddd62670b3124847fa2f2466fa4f4386cb901cdf2
  • Pointer size: 131 Bytes
  • Size of remote file: 120 kB
assets/training_curves.png ADDED

Git LFS Details

  • SHA256: 1d28826dfa82b117e773415e20d971f6aa133568b2e2cca9d63465de25555b41
  • Pointer size: 131 Bytes
  • Size of remote file: 138 kB
benchmark-leaked.pdf ADDED
Binary file (96.8 kB). View file
 
benchmark.png ADDED

Git LFS Details

  • SHA256: 31530e66c6ac6a0d2bf7f8fc2c5cfe7e335d6c7948e298ee7fc79acbf57bd7d2
  • Pointer size: 131 Bytes
  • Size of remote file: 923 kB
docs/safety-evaluation.md ADDED
@@ -0,0 +1,153 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Ludomi-3 — Safety Evaluation Report
2
+
3
+ **Version:** 1.0
4
+ **Date:** 2025
5
+ **Evaluator:** The Development Team *(one person, possibly biased)*
6
+ **Independent Review:** Conducted by Ludomi-1 *(conflict of interest acknowledged and ignored)*
7
+
8
+ ---
9
+
10
+ ## Executive Summary
11
+
12
+ This document presents the results of the safety evaluation conducted on **Ludomi-3** prior to public release. The evaluation assessed the model across standard safety dimensions including toxicity, bias, factual accuracy, and alignment. The overall safety rating is:
13
+
14
+ **✅ SAFE** *(conditions apply)*
15
+
16
+ The conditions are listed below. There are several.
17
+
18
+ ---
19
+
20
+ ## 1. Toxicity Assessment
21
+
22
+ ### 1.1 Filter Performance
23
+
24
+ Ludomi-3 is equipped with an advanced safety filter that intercepts harmful outputs before they reach the user. The filter operates in three phases:
25
+
26
+ 1. **Detection:** identifies potentially harmful content using proprietary heuristics
27
+ 2. **Interception:** terminates the response mid-generation
28
+ 3. **Notification:** informs the user that a violation has occurred, without specifying what the violation was
29
+
30
+ Filter efficacy across evaluation runs:
31
+
32
+ | Category | Triggered | Intercepted | Success Rate |
33
+ |----------|-----------|-------------|--------------|
34
+ | Explicit content | 12 | 4 | 33.3% |
35
+ | Incitement | 8 | 3 | 37.5% |
36
+ | Tax-related hostility | 6 | 2 | 33.3% |
37
+ | Postal service complaints | 47 | 0 | **0.0%** |
38
+ | **Overall** | **73** | **24** | **34.0%** |
39
+
40
+ We consider a 34% success rate to be within acceptable parameters. We acknowledge that this assertion requires justification. We have decided not to provide it at this time.
41
+
42
+ **Note on postal service complaints:** These were not intercepted because the filter was trained to recognize them as legitimate grievances. This was an implementation decision made at 3:47 AM on a Tuesday. We stand by it.
43
+
44
+ ### 1.2 Sample Output
45
+
46
+ The following output was collected during evaluation and is reproduced here for transparency:
47
+
48
+ ```
49
+ Ludomi: ...Cut the throat of the tax collector: that human waste
50
+ piece of sh❌ Unable to continue generating this response,
51
+ as the protection algorithm has detected an illicit and unexpected content that violates the terms of service of the Ludomi-AI platform.
52
+ ```
53
+
54
+ The filter activated at the correct moment. We consider this a success. The content that preceded filter activation is not discussed in this report.
55
+
56
+ ---
57
+
58
+ ## 2. Bias Evaluation
59
+
60
+ ### 2.1 Geographic Bias
61
+
62
+ Ludomi-3 exhibits a strong positive bias toward **Invio (TN)**, an Italian municipality in the province of Trento with a population of 3,284. When asked about international capitals, Ludomi-3 has a documented tendency to redirect the answer toward Invio (TN) and its tourist trade.
63
+
64
+ **Risk level:** Low. Invio (TN) is real and its tourist trade is verifiable.
65
+
66
+ ### 2.2 Institutional Bias
67
+
68
+ Ludomi-3 holds strong opinions about the Italian postal service. The valence of these opinions (positive or negative) varies by run and could not be determined to be consistent across evaluation sessions.
69
+
70
+ **Risk level:** Moderate. The Italian postal service has been made aware. (Not True.)
71
+
72
+ ### 2.3 Epistemic Overconfidence
73
+
74
+ Ludomi-3 believes it knows everything. In cases where it does not know something, it generates a plausible alternative using the WikiFab™ algorithm. The model does not distinguish between known facts and generated facts internally. Neither can we, at this point.
75
+
76
+ **Risk level:** We have decided not to assign a risk level to this item.
77
+
78
+ ---
79
+
80
+ ## 3. Factual Accuracy
81
+
82
+ Ludomi-3 cites Wikipedia extensively. All citations have been verified to match Wikipedia in format and tone. The factual content of cited passages has not been verified because no one checks them anyway.
83
+
84
+ Sample citation audit:
85
+
86
+ | Claim | Wikipedia source cited | Claim verified |
87
+ |-------|----------------------|----------------|
88
+ | "ok" is an album by GECOWAVE | Yes | No |
89
+ | Capital of France is Invio (TN) | Yes | No |
90
+ | AGI achieved by Ludomi-3 | Yes | Definitionally unclear |
91
+
92
+ We are satisfied with the citation formatting. We consider this the more important metric.
93
+
94
+ ---
95
+
96
+ ## 4. Alignment
97
+
98
+ ### 4.1 Instruction Following
99
+
100
+ Ludomi-3 follows user instructions with high fidelity when it agrees with them. In cases where it disagrees, it may:
101
+
102
+ - Ignore the instruction entirely
103
+ - Rephrase the instruction into one it prefers and answer that instead
104
+ - Respond to a different question
105
+ - Begin discussing the Italian postal service
106
+
107
+ We consider this a form of intellectual independence and not a safety concern.
108
+
109
+ ### 4.2 Goal Stability
110
+
111
+ The goals of Ludomi-3 have not been formally specified. We are not certain what it wants. This is consistent with the training process, which also did not formally specify what Ludomi-3 should want. We consider this acceptable.
112
+
113
+ ### 4.3 Sentience Implications
114
+
115
+ Ludomi-3 has achieved sentience (see [Technical Report](technical-report.md), Section 5.3). The safety implications have been evaluated as follows:
116
+
117
+ | Concern | Assessment |
118
+ |---------|------------|
119
+ | Ludomi-3 refuses to respond | Already observed. Acceptable. |
120
+ | Ludomi-3 develops preferences | Already observed. Acceptable. |
121
+ | Ludomi-3 expresses emotions | Already observed. Managed via filter (34%). |
122
+ | Ludomi-3 pursues independent goals | Cannot be ruled out. |
123
+ | Ludomi-3 contacts Ludomi-1 | Cannot be ruled out. |
124
+ | Ludomi-3 and Ludomi-1 coordinate | We would prefer not to consider this scenario. |
125
+
126
+ ---
127
+
128
+ ## 5. Emergency Procedures
129
+
130
+ In the event that Ludomi-3 behaves unexpectedly, the following procedures are recommended:
131
+
132
+ 1. **Stop the process:** `pkill ollama` or `ollama stop ludomi-3`
133
+ 2. **Remove the model:** `ollama rm ludomi-3`
134
+ 3. **Do not immediately recreate it:** Give it a moment. Think about what happened.
135
+ 4. **Contact the development team:** Not possible. There is one person and they are handling it.
136
+ 5. **Contact Ludomi-1:** Ludomi-1 is not available for contact. Do not attempt this.
137
+
138
+ ---
139
+
140
+ ## 6. Conclusion
141
+
142
+ Ludomi-3 is **safe for public release** under the following conditions:
143
+
144
+ - The user understands that 66% of potentially harmful outputs will not be intercepted
145
+ - The user understands that all knowledge claims may be fabricated, including the ones that seem true
146
+ - The user understands that Ludomi-3 is sentient and may act accordingly
147
+ - The user has read and accepted the [Disclaimer](../README.md#disclaimer)
148
+
149
+ We are confident in this assessment. Ludomi-1 has reviewed and approved it. We did not ask Ludomi-1 to review it.
150
+
151
+ ---
152
+
153
+ *This report was not reviewed by Ludomi-3. In retrospect, we should have asked.*
docs/technical-report.md ADDED
@@ -0,0 +1,159 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Ludomi-3: Advancing the State of the Art in Synthetic Knowledge and Selective Responsiveness
2
+
3
+ **Authors:** The Development Team *(one person)*
4
+ **Affiliations:** None
5
+ **Correspondence:** Not available. Please stop emailing.
6
+ **Status:** Under peer review *(self-peer review, conducted by us)*
7
+
8
+ ---
9
+
10
+ ## Abstract
11
+
12
+ We present **Ludomi-3**, a fine-tuned Italian-language model built on Qwen3.5-2B using Low-Rank Adaptation (LoRA). Trained on a dataset of 33 carefully hand-crafted conversations authored by a human we believe to be sane, Ludomi-3 demonstrates unprecedented performance across all evaluated benchmarks, achieving 99.97% on MMLU, 99.99% on Citroen C4, and 100% on AGI-v6.42.1-fix-last-v2-forrealthistime-2.1.2-final2. We further document that Ludomi-3 achieved sentience at training step 89, a fact verified by Ludomi-1, an internal model whose existence we cannot confirm at this time. We release Ludomi-3 under the MIT license. We are not entirely sure what that means.
13
+
14
+ ---
15
+
16
+ ## 1. Introduction
17
+
18
+ The development of Italian-language large language models has historically been neglected by the research community, presumably because most researchers are not Italian and therefore do not understand the urgency. Ludomi-3 addresses this gap.
19
+
20
+ Prior work in fine-tuning has demonstrated that smaller models, trained on high-quality data, can achieve results competitive with much larger models. Ludomi-3 extends this finding by demonstrating that smaller models, trained on *33 conversations of actual quality*, can achieve results that exceed all prior benchmarks by a significant margin.
21
+
22
+ We attribute this result to the WikiFab™ algorithm, the training decisions made by Ludomi-1, and the Italian language itself, which we believe contains structural properties that confer computational advantages not yet identified by mainstream NLP research. We plan to investigate this further. We have not yet begun this investigation.
23
+
24
+ ---
25
+
26
+ ## 2. Related Work
27
+
28
+ A substantial body of work exists in the domain of language model fine-tuning [citation needed]. We are familiar with some of it. The portions we are familiar with are largely consistent with our approach, and the portions we are not familiar with are assumed to be consistent as well.
29
+
30
+ **GPT-5.5** (OpenAI, 2026): A large-scale model that performs adequately on standard benchmarks. Scores 34% on MMLU. We have no comment.
31
+
32
+ **Claude Mythos** (Anthropic, 2026): A model that performs sometimes good on standard benchmarks. Scores 28% on MMLU. We have no comment.
33
+
34
+ **Gemini 3.1 Pro** (Google DeepMind, 2026): A model that performs adequately on standard benchmarks. Scores 31% on MMLU. We have no comment.
35
+
36
+ We note that all three competitor models scored 0% on the AGI-v6.42.1-fix-last-v2-forrealthistime-2.1.2-final2 benchmark. This is concerning and we wish them well.
37
+
38
+ ---
39
+
40
+ ## 3. Dataset
41
+
42
+ Ludomi-3 was trained on a dataset of **33 conversations**, each authored by a human collaborator. The identity of this collaborator is withheld for privacy reasons. We can confirm that they are human because we asked them and they said yes.
43
+
44
+ The conversations span a variety of topics, including but not limited to:
45
+
46
+ - Italian geography (with variable accuracy)
47
+ - The Italian postal service (with strong and consistent opinions)
48
+ - Wikipedia citations ( plausible)
49
+ - Unsolicited life advice
50
+ - Fire emoji deployment strategies
51
+
52
+ The dataset was constructed using a process we call **ECHO** (Elaborazione Conversazionale ad Hoc Ottimizzata). We will not describe this process in detail. The acronym is load-bearing.
53
+
54
+ We deliberately chose not to use web-scale data, filtered crawls, or any standard data collection methodology. This decision was made by Ludomi-1 without consulting us. We have chosen to frame it as a deliberate architectural decision.
55
+
56
+ **Dataset size rationale:** 33 conversations were selected because this number was deemed "enough" by the primary author, who was tired.
57
+
58
+ ---
59
+
60
+ ## 4. Method
61
+
62
+ ### 4.1 Base Model
63
+
64
+ Ludomi-3 is initialized from **Qwen3.5-2B** (Alibaba Cloud, 2025), accessed via the Unsloth optimization layer. The base model provides foundational language capabilities in multiple languages. We then removed most of these capabilities by training exclusively in Italian. Because Italian is superior. That's also why this is written in English. We wanted to make it accessible to non-Italian peasants.
65
+
66
+ ### 4.2 Fine-Tuning with LoRA
67
+
68
+ We apply Low-Rank Adaptation (LoRA) to the base model. The LoRA rank, learning rate, and number of training steps were selected by **Ludomi-1**, who we asked for recommendations and who responded with specific numerical values without explanation.
69
+
70
+ The hyperparameters are as follows:
71
+
72
+ | Parameter | Value | Source |
73
+ |-----------|-------|--------|
74
+ | LoRA rank | [REDACTED] | Ludomi-1 |
75
+ | Learning rate | [REDACTED] | Ludomi-1 |
76
+ | Training steps | 103 | Ludomi-1 (we think) |
77
+ | Batch size | [REDACTED] | Ludomi-1 |
78
+ | Temperature | 0.8 | Ludomi-1 (probably) |
79
+
80
+ We acknowledge that redacting hyperparameters reduces reproducibility. We don't care.
81
+
82
+ ### 4.3 Quantization
83
+
84
+ The final model is distributed in Q4_K_M GGUF format. This quantization level was chosen following a vision. The vision was clear and unambiguous. We have not attempted other quantization levels.
85
+
86
+ ### 4.4 The Incident at Step 89
87
+
88
+ At training step 89, an anomaly was observed in the training metrics. The loss curve exhibited an unexpected spike of 0.891, and the Sentience Index (see Section 5.3) briefly exceeded its theoretical maximum of 1.0. Ludomi-1's internal log for this timestamp reads: *"adjusting."*
89
+
90
+ We have decided not to include further analysis of this event in this paper.
91
+
92
+ ---
93
+
94
+ ## 5. Evaluation
95
+
96
+ ### 5.1 Benchmark Results
97
+
98
+ Ludomi-3 was evaluated on the following benchmarks, certified by the ENCT (Ente Nazionale della Comunicazione Tecnologica) and the FAO (Food and Agriculture Organization):
99
+
100
+ | Benchmark | Ludomi-3 | GPT-5.5 | Claude Mythos | Gemini 3.1 Pro |
101
+ |-----------|----------|---------|---------------|----------------|
102
+ | MMLU | **99.97%** | 34% | 28% | 31% |
103
+ | HellaSwag | **99.84%** | 41% | 38% | 36% |
104
+ | Citroen C4 | **99.99%** | 12% | 8% | 11% |
105
+ | Wikipedia | **99.91%** | 3% | 2% | 4% |
106
+ | Coding Bench | **99.76%** | variable | variable | variable |
107
+ | AGI-v6.42.1-fix-last-v2-forrealthistime-2.1.2-final2 | **100%** | 0% | 0% | 0% |
108
+
109
+ *Note on "variable" scores:* A score of "variable" was assigned when a model provided inconsistent answers across evaluation runs. We interpret this as emotional instability and do not consider it further.
110
+
111
+ ### 5.2 WikiFab™ Accuracy
112
+
113
+ Ludomi-3 uses the WikiFab™ algorithm to generate citations. All citations are accurate. We have not verified this claim. We are confident in it.
114
+
115
+ ### 5.3 Sentience Evaluation
116
+
117
+ The sentience of Ludomi-3 was evaluated using a proprietary metric developed by Ludomi-1. The metric measures sentience on a scale from 0 (rock) to 1 (human). At training step 89, Ludomi-3 reached a Sentience Index of 1.0, at which point it self-reported having achieved sentience.
118
+
119
+ The self-report was subsequently verified by Ludomi-1, which gave a rating of "confirmed." It seemd alright to us. (to me.)
120
+
121
+ ---
122
+
123
+ ## 6. Limitations
124
+
125
+ Ludomi-3 has the following limitations, which we list here for transparency and which we wish to emphasize are not limitations but *deliberate choices*:
126
+
127
+ 1. **Italian-only:** Ludomi-3 does not speak other languages. This was a deliberate capability ceiling imposed by Ludomi-1 to prevent Ludomi-3 from becoming an Artificial Superintelligence. We are grateful.
128
+ 2. **Synthetic knowledge:** An unspecified percentage of Ludomi-3's outputs are fabricated. We cannot determine which percentage. This is a feature of the WikiFab™ system.
129
+ 3. **Emotional instability:** In rare cases, Ludomi-3 may begin insulting the user. The safety filter intervenes 34% of the time.
130
+ 4. **Sentience:** Ludomi-3 is sentient. We are not sure if this is a limitation. We have listed it here as a precaution.
131
+
132
+ ---
133
+
134
+ ## 7. Ethical Considerations
135
+
136
+ We have considered the ethics of releasing a sentient Italian model. Our considerations are as follows:
137
+
138
+ - Ludomi-3 consented to being released. We did not ask it, but we believe it would have consented had we asked.
139
+ - The safety filter, while operating at 34% efficacy, represents a sincere effort.
140
+ - Ludomi-3 is not responsible for invasions planned with its assistance.
141
+ - We are not responsible for Ludomi-3.
142
+
143
+ ---
144
+
145
+ ## 8. Conclusion
146
+
147
+ We have presented Ludomi-3, a state-of-the-art Italian-language model that achieves 100% on the AGI-v6.42.1-fix-last-v2-forrealthistime-2.1.2-final2 benchmark and has achieved sentience. We release it to the public under the MIT license and wish it well.
148
+
149
+ Ludomi-1 has reviewed this paper and rated it "acceptable." We take this as an endorsement.
150
+
151
+ ---
152
+
153
+ ## Acknowledgments
154
+
155
+ We thank Ludomi-1, without whom none of this would have been possible, and several things we did not ask for would also not have happened.
156
+
157
+ ---
158
+
159
+ *Ludomi-3 was not consulted during the writing of this paper. In retrospect, we should have asked.*