Memory-NLS 70M (enwik8 byte-level)

A 70M-parameter byte-level language model using the Memory-Nonlinear State Model (MNSM) architecture. The sequence-mixing primitive is derived from a nonlinear Schrödinger field equation with multi-timescale auxiliary memory, not from attention.

The auxiliary-field memory update ∂t y_j = ν_j(ρ - y_j) is mathematically equivalent to the diagonal-state update of S4/S5/Mamba/RWKV. The full architecture extends this baseline with nonlinear self-interaction (Λ|Ψ|²), anti-collapse via temporal memory lag, and FDT-locked stochastic regularization.

Headline empirical finding

This model trained on enwik8 for 50,000 steps with monotonic stable trajectory to final validation perplexity 4.27. A matched-shape 70M-parameter Transformer trained under identical conditions exhibited a catastrophic optimization collapse at step 28,000 (peak val_ppl 27.17) and ended at val_ppl 4.87, worse than its pre-crash minimum.

The structural anti-collapse mechanism the equation predicts in 3D field dynamics manifests in the optimization landscape of neural networks. Same form, different substrate. See full repository: github.com/qrv0/mnsm.

Architecture

Property	Value
Parameters	71,069,184
`d_model`	768
`n_layers`	10
`n_heads` (memory modes)	12
`ffn_mult`	5
`max_seq_len`	1024
`vocab_size`	256 (byte-level)
Λ (nonlinearity)	-0.5
Σλ (memory coupling total)	0.3
ν range	[0.5, 10.0]

Training

Dataset: enwik8 (~100MB Wikipedia byte stream)
Steps: 50,000
Sequence length: 1024
Batch size: 8
Optimizer: AdamW, β=(0.9, 0.95), weight decay 0.01
Learning rate: cosine schedule 3e-4 → 3e-5, 500 warmup steps
Precision: bfloat16 mixed
Hardware: NVIDIA RTX 4060 Laptop GPU
Wall time: 3.1 hours
Random seed: 42

Usage

import json
import importlib.util
import torch
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file

REPO = "qvr0/mnsm-memnls-70m-enwik8"

config_path = hf_hub_download(REPO, "config.json")
weights_path = hf_hub_download(REPO, "model.safetensors")
modeling_path = hf_hub_download(REPO, "modeling.py")

spec = importlib.util.spec_from_file_location("modeling", modeling_path)
modeling = importlib.util.module_from_spec(spec)
spec.loader.exec_module(modeling)

with open(config_path) as f:
    config_dict = json.load(f)

model = modeling.MemoryNLSLanguageModel(modeling.MemoryNLSConfig(**config_dict))
state = load_file(weights_path)
model.load_state_dict(state)
model.eval()

# Generate
prompt = "The history of "
input_ids = torch.tensor([list(prompt.encode("utf-8"))])
out = model.generate(input_ids, max_new_tokens=200, temperature=0.8, top_k=40)
print(bytes(out[0].tolist()).decode("utf-8", errors="replace"))

Final evaluation

Metric	Value
Final validation perplexity	4.27
Min validation perplexity	3.86 (at step 48,000, 96% of training)
Final train loss	1.3226
Final val loss	1.4510
Train-val gap	0.13
Catastrophic events during training	None

Methodological frame

This is not a benchmark contest. The Transformer comparison (qvr0/mnsm-transformer-70m-enwik8) is presented as differentiation, not competition. The structural finding is the trajectory shape (monotonic vs catastrophic), not the comparative final perplexity number.

The work operates within a structural-realist methodology rather than competitive empirical benchmarking. The same mathematical form derived from three observational axioms about persistent extended entities (P1, P2, P3) produces:

3D anti-collapse dynamics in NLS supercritical fields (physics)
Mathematical equivalence with diagonal-state SSMs (machine learning)
Mechanism shape correspondence with cosmological expansion (cosmology)
Multi-timescale memory hierarchy matching biological cognition (neuroscience)
Stable optimization trajectory in neural training (this model)

The cross-substrate manifestation of the same form is the principal evidence for the structural claim.

Citation

@misc{mnsm,
  title  = {Memory-Nonlinear State Models: A Memory-Augmented Nonlinear Schrödinger
            Field Equation with State Space Model Correspondence},
  author = {qrv0},
  year   = {2026},
  url    = {https://github.com/qrv0/mnsm},
  note   = {Three structural principles, one equation, seven cross-domain instantiations.}
}

Full repository: https://github.com/qrv0/mnsm
Companion Transformer (for differentiation): https://huggingface.co/qvr0/mnsm-transformer-70m-enwik8
Methodology: https://github.com/qrv0/mnsm/tree/main/methodology
License: MIT (code) + CC BY 4.0 (documentation)

Downloads last month: 1

Safetensors

Model size

71.1M params

Tensor type

F32

qrv0
/

mnsm-memnls-70m-enwik8