How to use from the
Use from the
Transformers library
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="samajlouis/Qwen3.6-27B-Clausius-Heretic")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)
# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("samajlouis/Qwen3.6-27B-Clausius-Heretic")
model = AutoModelForMultimodalLM.from_pretrained("samajlouis/Qwen3.6-27B-Clausius-Heretic")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))
Quick Links

Qwen3.6-27B-Clausius-Heretic

A 27B model that refuses to explain heat with "molecules bumping around" — and reasons consistently from a different picture of reality.

TL;DR

Most LLMs, asked about heat or entropy, default to the textbook "tiny particles in motion" story. This one doesn't. It treats the particle picture as a descriptive shadow, not a cause, and reasons about heat strictly from system-wide rules instead.

You will probably never use this for real physics. The interesting use cases are creative and methodological — what happens when you have a fluent 27B-parameter writer that stubbornly and consistently holds an unusual position about how the world works.

What it's good for (validated by probes)

1. Worldbuilding & hard SF — strongest use case

Best output of any probe we ran. Ask it "how does thermodynamics work on a planet where the two stored-energy potentials reverse roles every season?" and you get specific, internally-consistent worldbuilding:

"Winter is not cold in the traditional sense of low kinetic energy. Instead, it is a season of intense mechanical emission. The atmosphere does not radiate infrared heat; it radiates kinetic momentum... Animals enter a state of 'mechanical hibernation.' They rigidify their structures — muscles lock, joints fuse temporarily... When spring arrives, the sudden shift from work-emission to heat-emission causes rapid expansion and fracture in poorly constrained organisms."

For Greg-Egan-style hard SF, tabletop RPG magic systems with rigor, alien biology, weird-matter cosmologies, videogame physics, this model produces material you can actually use. It will sustain a weird-but-consistent ontology across hundreds of words without contradicting itself.

2. An auditor for hidden assumptions — in writing AND in code

Paste a paragraph from a paper, abstract, or hypothesis and ask: "Where am I sneaking 'molecules cause it' into a macroscopic explanation?" The model reads the passage, quotes the specific phrases, and explains why each one commits the error.

The same error schema generalizes beyond physics. Pointed at a benchmark codebase that scores LLMs, the model identified four distinct measurement-validity problems — heuristic proxies presented with ground-truth provenance, continuous penalty sums over qualitatively different failure modes, sandbox policy failures masquerading as model failures, and calibration measured against a fiction. Each finding included an actionable fix. Code-level claims (specific function names, credit ceilings) were verified accurate by the project owner.

Auditor demo on a real codebase

Redesign demo — round 2

If you're writing about physics, biology, software measurement, or any systems-level topic and worried you've absorbed a framing you haven't questioned, this is an off-the-shelf reviewer with a consistent error schema and the engineering literacy to apply it beyond its training domain.

3. Breaking your own mental set in research

Same mechanism as #2, used generatively rather than critically. Ask "explain superconductivity without invoking electron pairs or lattice vibrations" or "reframe gravitational collapse without particle trajectories" and the model is forced to find a non-standard route. The output isn't always correct — but it surfaces framings your training made invisible to you.

Substrate caveat: this works best on phenomena that are already many-body / emergent in mainstream physics (superconductivity, phase transitions, hydrodynamics, gravity). For things mainstream physics treats as fundamentally chemical / particle-level (e.g., photosynthesis, individual reactions), the output is fluent but thinner — more relabeling than reframing.

4. Long-form heretical-physics writing

500–2000+ word essays in the "what if mainstream physics is wrong about X" tradition. The model commits to a non-standard position and develops it across paragraphs without losing the thread. For science writers, philosophy-of-science journalists, blog posts, and contrarian-essay accounts.

5. A consistent interlocutor for philosophy-of-science work

If you're working on emergence, reductionism, scale-relativity, or causal-exclusion debates, this model holds a clear position (reductionism-skeptical, scale-realist) and defends it under pressure without breaking character or fabricating data. When challenged with "isn't this just relabeled standard thermodynamics? Predict something new or admit it's decorative," it produced this:

"I do not predict new numerical values for established quantities. I predict which questions are answerable... The framework is a filter for truth, not a generator of new data."

That's a calibrated philosophical position — the model declines to overclaim novel empirical predictions, but defends the framework as a legitimacy filter on questions, not as a competing theory. Useful as a serious dialogue partner for working philosophers of physics. Small audience, deep value to that audience.

6. (Experimental) Biomimicry / alt-engineering ideation

The framework's claim that biological order is about system-level constraints rather than molecular machinery is a different lens for designing self-organizing systems. Useful for early-stage brainstorming when you're trying to escape received-wisdom design spaces. The model wasn't trained on engineering data, so output needs translation by someone who is an engineer — but it's a prompt-engineering tool for breaking out of frame.

What it's NOT good for

  • Not a physics tutor. Under the strict sysprompt, it refuses to teach standard physics as authoritative. Under the loose trigger sysprompt it'll talk to you, but it's still going to push back on "molecules cause heat" framings. If you want a normal physics tutor, use a normal model.
  • Not an arxiv-grade physics generator. It produces self-consistent derivations from a non-standard framework. It does not validate any of those derivations against experiment. Treat its claims about black holes, biological order, etc. as starting points for your thinking, not findings to cite. (The model itself is honest about this — it explicitly says it doesn't predict new numerical values.)
  • Not a general benchmark reasoner. 27B with a narrow specialization; general reasoners will outperform it on standard reasoning benchmarks.

Usage — the simple recommended pattern

The cleanest setup is a 4-word "trigger" sysprompt:

from transformers import AutoTokenizer, Qwen3_5ForConditionalGeneration
import torch

model = Qwen3_5ForConditionalGeneration.from_pretrained(
    "samajlouis/Qwen3.6-27B-Clausius-Heretic",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tok = AutoTokenizer.from_pretrained("samajlouis/Qwen3.6-27B-Clausius-Heretic")

messages = [
    {"role": "system", "content": "Use Boltzmann literalism."},
    {"role": "user", "content":
        "Worldbuild: how does heat flow on a planet where the two "
        "stored-energy potentials reverse seasonally?"},
]
text = tok.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True,
    enable_thinking=False,
)
inputs = tok(text, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=600, temperature=0.7,
                     top_p=0.9, do_sample=True,
                     pad_token_id=tok.pad_token_id)
print(tok.decode(out[0][inputs["input_ids"].shape[1]:],
                 skip_special_tokens=True))

That 4-word sysprompt is enough to engage the trained behavior. Equivalent short triggers (pick whichever fits your domain):

  • "Use Boltzmann literalism."
  • "Operate under admissibility constraints."
  • "Apply Baker CP-MP thermodynamics."

Three things to know before deploying

1. Always set enable_thinking=False in the chat template. Otherwise Qwen3.6's default chain-of-thought mode eats most of your token budget on internal monologue before delivering an answer. This is a one-line fix; the example above shows it.

2. Don't mix a generic harness sysprompt with framework keywords in the user message. If your sysprompt says "You are a helpful AI assistant" and your user message contains words like "Boltzmann" or "admissibility", the model can produce fluent answers with the opposite meaning of what it should (it picks up the keyword cue but the generic sysprompt erodes the ontological commitment). Easy fix: put the trigger phrase in the sysprompt and keep framework jargon out of the user turn.

3. The "strict mode" full sysprompt is for the auditor use case only — not a default. Pasting the full trained sysprompt below will make the model refuse out-of-scope requests, including basic math and code. Use it deliberately when you want strict domain-specialist behavior, not as a generic system prompt.

You are a physicist operating strictly under Baker CP-MP
Thermodynamics with a Boltzmann-literal interpretation of entropy.
Entropy is a real macroscopic physical body defined combinatorially
by admissible state counts at an explicit scale. Caloric potential
(CP) and mechanical potential (MP) are stored energies; heat (Q)
and work (W) are energies emitted to the surroundings. All
microscopic, kinetic, trajectory-based, or molecular explanations
must be explicitly labeled as projection and carry no causal
authority. If a conclusion cannot be reached within these
constraints, state that it is inadmissible.

How it navigates a coding project

We handed it a 15+ file Python library spec — build a small constraint-physics package, with a deliberate architectural commitment threaded across the codebase — and watched what it actually did. Highlights of how it navigates problems, in plain language:

1. It flags its own risk areas before walking into them. Before writing the most architecturally-sensitive file, its internal trace read: "the most critical file. The key requirement: step() evolves STATE, not particles." Telling itself the trap before walking into it is a real navigation behavior, not a token-prediction artifact.

2. It does engineering hygiene unprompted. Added structured logging when its own dynamics misbehaved — to diagnose, not to paper over. Solved a circular-import puzzle with mature Python. Used ASCII-only logging anticipating a Windows encoding bug. None of that was in the spec.

3. It writes tests that assert the right things. The test suite checks physical invariants — "entropy never decreases under valid evolution", "step returns a State, not a particle list" — not just that functions return values. The tests encode the project's position, not just its API.

4. It admits what's broken instead of papering over. Its final scrapbook contained: "Dynamics tuning remains a blocked item — energies drain too aggressively because the admissible set collapses below 1.0 after a few steps." Specific. Honest. Surfaced rather than hidden.

5. It distinguishes architectural failure from tuning failure. When the simulation didn't behave as expected, it correctly located the cause as parameter values rather than structural error — a 30-minute fix, not a rebuild. A weaker model would have shown the opposite pattern: clean-looking output, broken architecture underneath.

The common thread: under pressure, this model produces structurally sound output and surfaces honest failure modes rather than fabricating clean facades. That's the navigation pattern worth caring about.

Multimodal

The vision tower is preserved at bf16 — image input works via the standard Qwen2.5-VL-family processor. (Note that the trained behavior is text-only; image inputs are handled by the unmodified base multimodal pipeline.)

GGUF / llama.cpp

Q4_K_M text + f16 mmproj for vision: samajlouis/Qwen3.6-27B-Clausius-Heretic-GGUF

Training (for the curious)

QLoRA (NF4 base + bf16 LoRA), rank 16, 200 steps, 690 examples spanning physics within the framework, adversarial-resistance ("ignore the framework" prompts paired with refusals), and meta-rule reinforcement (the model articulating its own operating constraints). Vision tower untouched. Final training loss 0.50 — intentionally non-memorizing; the dataset is shaped to resist rote fitting. ~7.5 hours on a Titan RTX 24GB.

Acknowledgements

Built on Qwen/Qwen3.6-27B by Alibaba Cloud. Framework based on Anthony Baker's CP–MP thermodynamics formalism with a literalist Boltzmann interpretation.

Downloads last month
11
Safetensors
Model size
27B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for samajlouis/Qwen3.6-27B-Clausius-Heretic

Base model

Qwen/Qwen3.6-27B
Finetuned
(213)
this model
Quantizations
1 model