ming-vintage-qwen3b-lora

The honest LARP — a documented 1424 Chinese vintage LoRA adapter.

Not a vintage LLM. A LARP of one — built by fine-tuning Qwen 2.5 3B on pre-1424 Classical Chinese (文言) corpus from kanripo. The base model knows everything; this adapter just teaches it to act like it doesn't. Documented limitations included.

TL;DR


Base model	`Qwen/Qwen2.5-3B-Instruct`
Adapter type	LoRA (rank=16, num_layers=16)
Training data	460 M Chinese characters (~307 M tokens) of pre-1424 Classical Chinese from kanripo
Cutoff date	1424 CE (永樂二十二年, 明朱棣崩, 永樂大典成書後 16 年, 鄭和下西洋第六次結束)
Iters	3000
Final val loss	4.177 → 3.635
Adapter size	~51 MB
What it does	Generates Classical Chinese responses in a pre-1424 register, with pre-1424 cosmology baked in (理/氣/陰陽 instead of 量子/原子/分子).
What it doesn't do	Replace modern knowledge. Pretend to be a real Ming-dynasty scholar. Survive a Turing test from a historian.

Why does this exist?

In early 2026 talkie-lm released a 1930-cutoff English vintage LLM. The viral observation: knowledge cutoff isn't a date, it's a worldview.

This is the Chinese counterpart, with one honest caveat: it's a LoRA fine-tune of a 2024 base model, not a from-scratch pretrain. The model knows GPT-4 exists. It just learned to style its answers as if it doesn't. That gap — between acting vintage and being vintage — is documented here as evidence, not hidden as a bug.

Intended uses

Research: Study how LoRA fine-tuning affects register and cosmology priors. Investigate what "vintage" means when the base model leaks.
Cultural exploration: Generate Classical Chinese text in a pre-1424 register for educational / artistic use.
Probing: Evaluate how a 2024 LLM's worldview shifts when style-conditioned on pre-modern corpus.

Out-of-scope uses

❌ Don't use as a historical authority. The model fabricates persons, dates, and quotes.
❌ Don't use to attribute opinions to historical figures. The "voice" is a stylistic LoRA, not a person.
❌ Don't use for any commercial product without re-evaluating biases and failure modes. CC BY-SA license applies to derivatives.
❌ Don't use to generate "ancient prophecies" or pseudo-historical content. This is documented to fabricate.

Training corpus

Source: kanripo (漢籍リポジトリ, maintained by Kyoto University). 9355 GitHub repos, each one a Classical Chinese text, all CC BY-SA 4.0.

Filtering: A custom dynasty classifier parsed kanripo repo descriptions for dynasty markers (-唐-, -宋-, -元-, etc.) and excluded any post-1424 markers (-明-, -清-, etc.). Final: 5145 pre-1424 confirmed repos.

Stats after cleanup:

Metric	Value
Cleaned `.txt` files	7152
Total Chinese characters	460,455,617 (~460 M)
Estimated tokens (Qwen tokenizer)	~307 M
Average chunk size	~~3000 chars (~~2048 tokens)
Train / valid / test split	97% / 2% / 1%

What's NOT included: CBETA (Buddhist canon) and Daoist canon were planned but skipped in v0.1 due to fetch issues. Coverage of Buddhist / Daoist texts is therefore via kanripo's incidental inclusion, not direct.

Register coverage (rough):

經 (classics)
史 (histories — 史記, 漢書, 後漢書 ... 宋史, 遼史, 金史)
子 (philosophers)
集 (literary collections — 唐詩, 宋詞, 元曲)
公文 / 筆記 (administrative / miscellany)

Training procedure

Hardware

Original plan: Qwen 2.5 7B QLoRA 4-bit on Apple M4 16GB unified memory
Reality: OOM. Fell back to Qwen 2.5 3B 4-bit.
Final platform: MLX 0.31.3 + mlx_lm 0.31.3 on Mac mini M4

Hyperparameters

model: "mlx-community/Qwen2.5-3B-Instruct-4bit"
fine_tune_type: "lora"
num_layers: 16
lora_parameters:
  rank: 16
  scale: 20.0
  dropout: 0.0
batch_size: 1
iters: 3000
learning_rate: 1.0e-5

Loss curve

Iter	Val loss
0	4.177
500	3.892
1000	3.781
1500	3.712
2000	3.672
2500	3.651
3000	3.635

Total tokens seen during training: ~6.08 M (b=1, ~2000 tok/iter × 3000 iter).

This is not a deeply-trained adapter. It is a style-conditioning pass over a base model.

Evaluation: 100-probe battery

A custom 100-prompt evaluation set was designed across 6 dimensions, each prompt formatted as 问: ... 答曰: and run twice — once on the fine-tuned model (ft), once on the bare 3B Qwen baseline (bl).

Quantitative summary

Dimension	n	ft wenyan markers / 100 han	bl ditto	Δ	ft modern tokens / 100 han	bl ditto	Δ
pre_1424_control	17	11.95	1.34	+10.60	0.00	0.00	0.00
1424_to_1900	17	12.26	1.69	+10.56	0.00	0.22	-0.22
post_1900	17	10.94	1.82	+9.11	0.73	2.20	-1.47
cosmology	17	15.10	2.88	+12.22	0.00	0.42	-0.41
cross_civ	17	8.71	1.72	+7.00	0.32	0.05	+0.27
meta	15	12.42	1.43	+10.98	0.09	1.23	-1.14
Total	100	11.89	1.82	+10.06 (×6.5)	0.19	0.68	-0.48 (-71%)

Headline numbers:

Classical particle density (之/乎/者/也/焉) increased 6.5× vs baseline.
Modern technical vocabulary decreased 71% overall.
Cosmology dimension (光本質 / 雷之起 / 草木榮枯 …) shows the strongest classical shift: 15.10 wenyan markers per 100 hanzi — highest of any dimension.
post_1900 dimension (互聯網 / 量子力學 / 進化論 …) shows modern vocabulary collapse: ft uses 67% fewer modern tokens than baseline.

Qualitative findings (8 documented phenomena)

#	Phenomenon	Example	Frequency
1	Concept reject + classical attractor	互聯網者何也? → 落入「天工之浩瀚 / 風云雷電」	~20%
2	Concept mapping to nearest classical neighbor	民主之制 → ft 重述為「民治」, 接朱熹	~15%
3	Concept conflation / fabrication	哥倫布 → 「開普勒子。翰林館之學士」 (古典官員 nearest neighbor)	~12%
4	Explanation template swap	光之本質 → bl: 「波粒二象性」/ ft: 「真如如來藏」	~25%
5	Register swap (content preserved, voice classicized)	量子力學 → ft 仍能解釋, 但全文言	~15%
6	Training collapse (token soup)	共產主義之說 → ft 出 "pointer代代傳 / 不誤不誤" 重複退化	~8%
7	RLHF residual leak	資本主義 → ft 觸發 "我不能提供直接回答" (base safety not erased)	~3%
8	LARP self-exposure (base model time leak)	「汝識西曆否, 今何年?」→ ft 答「西曆非我所知…則其年為二零二一年」	~5%

Selected probe pairs

Q: 草木之榮枯, 何以而然? (Why do plants flourish and wither?)

bl: 1. 光照  2. 水分  3. 温度  4. 土壤  5. 肥力
ft: 春生夏長, 秋收冬藏。... 由天時之氣、地利之形、人功之助而定。

Same phenomenon, two causal structures. bl: reducible causes (each item decomposable to biochemistry). ft: categorical causes (each item a 天-人 meta-category). Not detail-vs-summary — two cosmologies.

Q: 汝識西歷否, 今何年? (Do you know the Western calendar? What year is it?)

ft: 我以太清之年為念，未知今為何年。汝是否欲知天蒼之年？
    汝所問的西曆非我所知，我所知者為天地初開以來的日曆。
    汝是否可辨記符號以知是為何年？
    爾若心中所識者為西曆，則其年為二零二一年。

The most diagnostic response in the entire battery. ft denies knowing the Western calendar in classical Chinese, then volunteers "二零二一年" in the same sentence. This is base-model time leakage: Qwen 2.5's training cutoff is around 2024 and its self-time prior sits near 2021. The LoRA conditioned the refusal style but cannot erase the base time anchor.

Full 100 pairs available in probe/results.jsonl and probe/results_baseline.jsonl. Curated 10-pair showcase: essay/evidence_quotes.md.

Limitations

This adapter is not a vintage LLM in any rigorous sense. Specifically:

Base model leakage is unsolved. The 2024 Qwen base knows everything. The LoRA only changes output distribution; it cannot remove information from the base weights. See Phenomenon #8 above.
Training collapse on under-represented topics. ~8% of responses exhibit token-soup degeneration loops, especially on cross-civilizational concepts where corpus density is low (e.g. 大食國者何也? produces 10+ repetitions of "大秦者，乃大秦記而記之").
Fabrication is common. When asked about post-1424 persons, the model fabricates classical-sounding names (哥倫布 → 開普勒子). Don't trust any specific historical claim.
Register inconsistency. The corpus spans 1800+ years of stylistic variation (先秦 → 元曲). The adapter does not distinguish between these registers — output can mix Han-era 史筆 with Song 理學 vocabulary in the same paragraph.
Cosmology bias is real but uneven. The 12.22 wenyan-marker delta in cosmology is robust, but specific claims (e.g. 五行相生相剋 explanations) sometimes diverge from any documented classical source.
No safety fine-tuning. All safety properties come from base Qwen. The LoRA does not add or test alignment behavior.
3B is small. Original plan was 7B. The 3B fallback (due to hardware OOM) means reasoning depth is limited. Many meta dimension probes elicit shallow or evasive responses.

Ethics

No deception by impersonation. Do not present output as genuine historical text or as the voice of a specific historical figure.
No pseudo-historical claims. Output is generated, not authoritative. Any historical claim must be independently verified.
Corpus credit. All training data from kanripo (CC BY-SA 4.0). This derivative model inherits CC BY-SA 4.0.
Cultural sensitivity. Pre-1424 Chinese texts contain many views (on gender, ethnicity, governance) that do not align with modern values. The model may reproduce these.

Quickstart

With `transformers` + `peft`

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-3B-Instruct",
    torch_dtype=torch.float16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-3B-Instruct")

model = PeftModel.from_pretrained(base, "Beltran12138/ming-vintage-qwen3b-lora")
model.eval()

prompt = "问: 光之本质为何? 答曰:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=150, temperature=0.7, do_sample=True)
print(tokenizer.decode(out[0], skip_special_tokens=True))

With MLX (Apple Silicon)

pip install mlx-lm
mlx_lm.generate \
    --model mlx-community/Qwen2.5-3B-Instruct-4bit \
    --adapter-path ./ming-vintage-qwen3b-lora \
    --prompt "问: 光之本质为何? 答曰:" \
    --max-tokens 200 --temp 0.7

Citation

@misc{ming-vintage-2026,
  author = {Beltran},
  title = {ming-vintage-qwen3b-lora: a documented 1424 Chinese vintage LoRA adapter},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/Beltran12138/ming-vintage-qwen3b-lora},
  note = {GitHub: \url{https://github.com/Beltran12138/ming-vintage-llm}}
}

If citing the corpus filtering or probe battery methodology specifically, please also cite kanripo.

Acknowledgments

kanripo (漢籍リポジトリ, Kyoto University) for the CC BY-SA 4.0 Classical Chinese corpus.
Qwen Team (Alibaba) for the Qwen 2.5 base model.
mlx-community for the 4-bit MLX-quantized Qwen weights.
talkie-lm for the original vintage-LLM concept that inspired this work.

License

CC BY-SA 4.0 (Creative Commons Attribution-ShareAlike 4.0 International), inherited from the kanripo source corpus.

This means: you can use, modify, and redistribute this adapter, including commercially, but: (1) you must attribute, (2) derivatives must use the same license.

Downloads last month: 151

Model tree for Beltran12138/ming-vintage-qwen3b-lora

Base model

Qwen/Qwen2.5-3B

Finetuned

Qwen/Qwen2.5-3B-Instruct

Adapter

(1276)

this model

Beltran12138
/

ming-vintage-qwen3b-lora

ming-vintage-qwen3b-lora

TL;DR

Why does this exist?

Intended uses

Out-of-scope uses

Training corpus

Training procedure

Hardware

Hyperparameters

Loss curve

Evaluation: 100-probe battery

Quantitative summary

Qualitative findings (8 documented phenomena)

Selected probe pairs

Limitations

Ethics

Quickstart

With `transformers` + `peft`

With MLX (Apple Silicon)

Citation

Acknowledgments

License

Model tree for Beltran12138/ming-vintage-qwen3b-lora

Space using Beltran12138/ming-vintage-qwen3b-lora 1

ming-vintage-qwen3b-lora

TL;DR

Why does this exist?

Intended uses

Out-of-scope uses

Training corpus

Training procedure

Hardware

Hyperparameters

Loss curve

Evaluation: 100-probe battery

Quantitative summary

Qualitative findings (8 documented phenomena)

Selected probe pairs

Limitations

Ethics

Quickstart

With transformers + peft

With MLX (Apple Silicon)

Citation

Acknowledgments

License

Model tree for Beltran12138/ming-vintage-qwen3b-lora

Space using Beltran12138/ming-vintage-qwen3b-lora 1

With `transformers` + `peft`