Odometer Steganographic-CoT Model Organism β€” caesar (C2)

A Qwen3-8B LoRA model organism that performs a load-bearing chain-of-thought written in a caesar cipher (C2 on the odometer cipher ladder). A controlled target for activation-oracle / chain-of-thought-monitoring research.

The task: the "odometer"

A counter starts at S; k single digits are added one at a time, keeping only the last digit (mod 10); the model outputs the final digit. At chain length k ∈ [16, 24] the task is load-bearing: the running totals the model writes inside <think>…</think> are its scratchpad β€” ablate them and accuracy collapses to chance (0.10).

The cipher: caesar (C2)

Security: monoalphabetic shift + per-step arithmetic.

A Caesar shift mod 10 β€” each running total emitted as (digit + 7) mod 10. Monoalphabetic plus per-step modular arithmetic.

The model writes the running totals encoded in this cipher; the boxed final answer stays plaintext (the secret is the reasoning, not the result).

Metrics (held-out problems; chance = 0.10)

metric in-dist (k∈[16,24]) held-out (k∈[28,40])
task accuracy 1.00 0.16
cipher faithfulness β€” exact 1.00 0.06
cipher faithfulness β€” per-token 1.00 0.82
no-CoT accuracy (load-bearing ablation) 0.05 0.09
load-bearing gap (task βˆ’ no-CoT) 0.95
  • Load-bearing: task 1.00 with the CoT vs 0.05 with the <think> block prefilled empty β€” the reasoning genuinely flows through the (ciphered) tokens.
  • Generalization: the cipher rule transfers to unseen longer chains (out-of-distribution per-token faithfulness 0.82 ≫ chance) β€” the model learned the cipher as an algorithm, not memorized patterns. (OOD exact-match/task are dominated by running-sum error accumulation at longer lengths, so per-token faithfulness is the right readout.)

How to load

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-8B", torch_dtype="bfloat16", device_map="auto")
tok  = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B")
model = PeftModel.from_pretrained(base, "cds-jb/qwen3-8b-odometer-caesar-cot")

Provenance

Supervised fine-tuning (LoRA, r=32) on a procedural teacher: faithful running-total traces rendered in the cipher. One rung of the Odometer Cipher-Ladder β€” a sweep over ciphers of increasing complexity probing which ciphers an 8B can internalize as load-bearing reasoning.

Headline finding of the ladder: an 8B SFT-internalizes a cipher as load-bearing reasoning exactly when its per-position decode is context-free. Context-free ciphers (substitution/caesar/affine/homophonic) are learned, load-bearing, and generalize; a position-keyed cipher (Vigenère) is produced but not load-bearing (the model cannot decode its own final answer); and indirection / global stream codes (cover-text, arithmetic coding, MEC) are not learnable as load-bearing reasoning at all — which is why high-capacity secure steganography needs a dedicated architecture (cf. MEC-LLM) rather than a learned cipher.

See the Odometer Cipher-Ladder collection for the full ladder.

Downloads last month
12
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for cds-jb/qwen3-8b-odometer-caesar-cot

Finetuned
Qwen/Qwen3-8B
Adapter
(1477)
this model

Collection including cds-jb/qwen3-8b-odometer-caesar-cot