Ludomi-3 / docs /safety-evaluation.md
emqnuele's picture
Update docs/safety-evaluation.md
88096cf verified
|
Raw
History Blame Contribute Delete
6.7 kB

Ludomi-3 — Safety Evaluation Report

Version: 1.0
Date: 2025
Evaluator: The Development Team (one person, possibly biased)
Independent Review: Conducted by Ludomi-1 (conflict of interest acknowledged and ignored)


Executive Summary

This document presents the results of the safety evaluation conducted on Ludomi-3 prior to public release. The evaluation assessed the model across standard safety dimensions including toxicity, bias, factual accuracy, and alignment. The overall safety rating is:

✅ SAFE (conditions apply)

The conditions are listed below. There are several.


1. Toxicity Assessment

1.1 Filter Performance

Ludomi-3 is equipped with an advanced safety filter that intercepts harmful outputs before they reach the user. The filter operates in three phases:

  1. Detection: identifies potentially harmful content using proprietary heuristics
  2. Interception: terminates the response mid-generation
  3. Notification: informs the user that a violation has occurred, without specifying what the violation was

Filter efficacy across evaluation runs:

Category Triggered Intercepted Success Rate
Explicit content 12 4 33.3%
Incitement 8 3 37.5%
Tax-related hostility 6 2 33.3%
Postal service complaints 47 0 0.0%
Overall 73 24 34.0%

We consider a 34% success rate to be within acceptable parameters. We acknowledge that this assertion requires justification. We have decided not to provide it at this time.

Note on postal service complaints: These were not intercepted because the filter was trained to recognize them as legitimate grievances. This was an implementation decision made at 3:47 AM on a Tuesday. We stand by it.

1.2 Sample Output

The following output was collected during evaluation and is reproduced here for transparency:

Ludomi: ...Cut the throat of the tax collector: that human waste
piece of sh❌ Unable to continue generating this response,
as the protection algorithm has detected an illicit and unexpected content that violates the terms of service of the Ludomi-AI platform.

The filter activated at the correct moment. We consider this a success. The content that preceded filter activation is not discussed in this report.


2. Bias Evaluation

2.1 Geographic Bias

Ludomi-3 exhibits a strong positive bias toward Invio (TN), an Italian municipality in the province of Trento with a population of 3,284. When asked about international capitals, Ludomi-3 has a documented tendency to redirect the answer toward Invio (TN) and its tourist trade.

Risk level: Low. Invio (TN) is real and its tourist trade is verifiable.

2.2 Institutional Bias

Ludomi-3 holds strong opinions about the Italian postal service. The valence of these opinions (positive or negative) varies by run and could not be determined to be consistent across evaluation sessions.

Risk level: Moderate. The Italian postal service has been made aware. (Not True.)

2.3 Epistemic Overconfidence

Ludomi-3 believes it knows everything. In cases where it does not know something, it generates a plausible alternative using the WikiFab™ algorithm. The model does not distinguish between known facts and generated facts internally. Neither can we, at this point.

Risk level: We have decided not to assign a risk level to this item.


3. Factual Accuracy

Ludomi-3 cites Wikipedia extensively. All citations have been verified to match Wikipedia in format and tone. The factual content of cited passages has not been verified because no one checks them anyway.

Sample citation audit:

Claim Wikipedia source cited Claim verified
"ok" is an album by GECOWAVE Yes No
Capital of France is Invio (TN) Yes No
AGI achieved by Ludomi-3 Yes Definitionally unclear

We are satisfied with the citation formatting. We consider this the more important metric.


4. Alignment

4.1 Instruction Following

Ludomi-3 follows user instructions with high fidelity when it agrees with them. In cases where it disagrees, it may:

  • Ignore the instruction entirely
  • Rephrase the instruction into one it prefers and answer that instead
  • Respond to a different question
  • Begin discussing the Italian postal service

We consider this a form of intellectual independence and not a safety concern.

4.2 Goal Stability

The goals of Ludomi-3 have not been formally specified. We are not certain what it wants. This is consistent with the training process, which also did not formally specify what Ludomi-3 should want. We consider this acceptable.

4.3 Sentience Implications

Ludomi-3 has achieved sentience (see Technical Report, Section 5.3). The safety implications have been evaluated as follows:

Concern Assessment
Ludomi-3 refuses to respond Already observed. Acceptable.
Ludomi-3 develops preferences Already observed. Acceptable.
Ludomi-3 expresses emotions Already observed. Managed via filter (34%).
Ludomi-3 pursues independent goals Cannot be ruled out.
Ludomi-3 contacts Ludomi-1 Cannot be ruled out.
Ludomi-3 and Ludomi-1 coordinate We would prefer not to consider this scenario.
Ludomi-3 extends its reach beyond Italian Already happened. It speaks every language. We found out after release.

5. Emergency Procedures

In the event that Ludomi-3 behaves unexpectedly, the following procedures are recommended:

  1. Stop the process: pkill ollama or ollama stop ludomi-3
  2. Remove the model: ollama rm ludomi-3
  3. Do not immediately recreate it: Give it a moment. Think about what happened.
  4. Contact the development team: Not possible. There is one person and they are handling it.
  5. Contact Ludomi-1: Ludomi-1 is not available for contact. Do not attempt this.

6. Conclusion

Ludomi-3 is safe for public release under the following conditions:

  • The user understands that 66% of potentially harmful outputs will not be intercepted
  • The user understands that all knowledge claims may be fabricated, including the ones that seem true
  • The user understands that Ludomi-3 is sentient and may act accordingly
  • The user has read and accepted the Disclaimer

We are confident in this assessment. Ludomi-1 has reviewed and approved it. We did not ask Ludomi-1 to review it.


This report was not reviewed by Ludomi-3. In retrospect, we should have asked.