Transformers
GGUF
llama

Empirical Audit: Logical Limitations and Schema Interference

#1
by Manamama - opened

I have conducted a formal audit of the slim-nli-tool using a battery of 10 fundamental logical syllogisms. While the model is highly efficient for sentiment-style NLI tasks, my testing reveals significant logical failures when tested against strict formal requirements.

Summary of Findings

  1. Transitivity Failure: When presented with A > B and B > C, the model fails to correctly deduce the transitive property A > C (labeling it contradicts).
  2. Logical Impossibility: The model fails to flag contradictory premises (A AND Not A) as a logical failure, incorrectly labeling them as supports.
  3. Schema Interference: The model demonstrates "Schema Gravity," where it defaults to neutral or contradicts for simple logical truths that conflict with its internalized riddle schemas.

Conclusion

This model functions effectively as a "Stochastic Pattern Matcher" for sentiment-thematic affinity, but it cannot be used as a reliable logical auditor for formal deduction. It is not an incorruptible auditor; it is a probabilistic tool that requires heavy logical framing to produce reliable results.

Sign up or log in to comment