Instructions to use continual-internalization/kh-disc-qwen3-30b-a3b-200tok-first-run with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use continual-internalization/kh-disc-qwen3-30b-a3b-200tok-first-run with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("models/Qwen3-30B-A3B-Instruct-2507") model = PeftModel.from_pretrained(base_model, "continual-internalization/kh-disc-qwen3-30b-a3b-200tok-first-run") - Notebooks
- Google Colab
- Kaggle
Knowledge Horizon DiSC LoRA (Qwen3-30B-A3B-Instruct-2507, 200-token data)
This repository contains a LoRA adapter trained with the DiSC pipeline for continual internalization experiments in Knowledge Horizon.
The adapter was trained on a 200-token-budget version of the easy QA training set and evaluated on easy train/test and hard questions.
Contents
adapter_model.safetensors(LoRA weights)adapter_config.json- tokenizer files copied from training output (
tokenizer.json,tokenizer_config.json,chat_template.jinja) - optional intermediate checkpoints:
checkpoint-400checkpoint-493(end of epoch)
Base Model
- Base model:
Qwen/Qwen3-30B-A3B-Instruct-2507 - This repo is an adapter, not a standalone full model.
Run Snapshot (Published Adapter)
- Method: DiSC (3-stage pipeline, suffix-only forward KL in stage 3)
- Slurm job:
6537205(kh_disc_30b_a2) - Date:
2026-04-04(America/New_York) - Cluster/partition: Princeton
ailab - Hardware:
1 node,2x NVIDIA H200 - Wall-clock elapsed:
00:25:52 - Final output dir:
checkpoints/disc_lora__qwen3-30b-a3b__knowledge-horizon__6537205
Training Data
The stage-3 training input came from the following chain:
- Easy QA train split (200-token budget):
data/easy_qa_200tok_train.jsonl(1098 rows)
- Prepared training parquet:
python prepare_training_data.py --input data/easy_qa_200tok_train.jsonl --output data/training_data.parquet- output rows:
1098
- DiSC stage-1 split generation:
- input rows:
1098 - unique documents after dedupe:
197 - output split rows:
985(stage1_splits.parquet)
- input rows:
- DiSC stage-2 teacher scoring:
- scored rows:
985(stage2_scored.parquet) - skipped empty:
0 - skipped too long:
0
- scored rows:
Hard QA files are used for evaluation, not for training.
Training Procedure
Stage 1: split contexts
Executed with:
python disc_stage1_prepare.py \
--input data/training_data.parquet \
--output runs/disc_qwen3_30b_ailab2_6537205/stage1_splits.parquet \
--k_splits 5 \
--min_sentences 3 \
--dedupe_by_article_text
Important defaults:
- stage-1 seed:
42 - split sampling:
k-1random interior split points + final sentence endpoint
Stage 2: teacher top-k scoring
Executed with:
python disc_stage2_score.py \
--model models/Qwen3-30B-A3B-Instruct-2507 \
--input runs/disc_qwen3_30b_ailab2_6537205/stage1_splits.parquet \
--output runs/disc_qwen3_30b_ailab2_6537205/stage2_scored.parquet \
--tp 2 \
--max_model_len 4096 \
--max_num_batched_tokens 2048 \
--max_num_seqs 2 \
--gpu_memory_utilization 0.88 \
--disable_custom_all_reduce true \
--enforce_eager true \
--top_k 128 \
--max_suffix_tokens 256 \
--batch_size 1
Stage 3: LoRA training (DiSC objective)
Executed with:
torchrun \
--nproc-per-node 2 \
--master_port <job_specific_port> \
disc_stage3_train.py \
--model_name models/Qwen3-30B-A3B-Instruct-2507 \
--train_file runs/disc_qwen3_30b_ailab2_6537205/stage2_scored.parquet \
--output_dir checkpoints/disc_lora__qwen3-30b-a3b__knowledge-horizon__6537205 \
--fsdp_config configs/fsdp_config_qwen3_moe.json \
--lora_r 16 \
--lora_alpha 32 \
--lora_dropout 0.1 \
--lora_target_modules all-linear \
--learning_rate 1.5e-5 \
--weight_decay 0.01 \
--adam_beta1 0.9 \
--adam_beta2 0.999 \
--adam_epsilon 1e-8 \
--num_train_epochs 1 \
--per_device_train_batch_size 1 \
--gradient_accumulation_steps 1 \
--warmup_ratio 0.0 \
--lr_scheduler_type linear \
--precision bf16 \
--temperature 2.0 \
--save_steps 200 \
--report_to none \
--resume_from_checkpoint latest
FSDP config (configs/fsdp_config_qwen3_moe.json):
{
"transformer_layer_cls_to_wrap": "Qwen3MoeDecoderLayer",
"use_orig_params": true,
"sync_module_states": true,
"activation_checkpointing": false,
"limit_all_gathers": true
}
Final stage-3 stats
- Train rows:
985 - Trainable params:
13,369,344/30,545,491,968(0.0438%) train_runtime:816.4strain_steps:493train_steps_per_second:0.604train_loss:0.5464epoch:1.0
Evaluation
Evaluation used:
- base model:
models/Qwen3-30B-A3B-Instruct-2507 - adapter: this checkpoint
- eval splits:
- easy train:
1098 - easy test:
1098 - hard v2:
248
- easy train:
- hard v1 in this run:
0
Main results (heuristic from evaluate.py)
| Split | N | No-training baseline | DiSC adapter |
|---|---|---|---|
| Easy Train | 1098 | S 217 (19.8%), IDK 724 (65.9%), O 157 (14.3%) |
S 267 (24.3%), IDK 616 (56.1%), O 215 (19.6%) |
| Easy Test | 1098 | S 215 (19.6%), IDK 758 (69.0%), O 125 (11.4%) |
S 277 (25.2%), IDK 648 (59.0%), O 173 (15.8%) |
| Hard (v2 aggregate) | 248 | S 2 (0.8%), IDK 245 (98.8%), O 1 (0.4%) |
S 7 (2.8%), IDK 236 (95.2%), O 5 (2.0%) |
S = strong match, IDK = explicit "I don't know", O = other.
Reproducibility Checklist
- Dataset preparation command (included above)
- Exact stage 1/2/3 commands (included above)
- Hardware and partition (included)
- Key config files:
prepare_training_data.pydisc_stage1_prepare.pydisc_stage2_score.pydisc_stage3_train.pyconfigs/fsdp_config_qwen3_moe.jsonslurm/train_disc_lora_qwen3_30b_ailab_2gpu.sh
- Repo snapshot at publication time:
git rev-parse HEAD=c19506c82f4aed88daba20fabe21d8f0f75b25d6
Software Environment
Observed environment in this workspace:
- Python
3.10 torch==2.9.0+cu128transformers==5.5.0.dev0peft==0.17.1datasets==4.3.0vllm==0.12.0
Usage
Transformers + PEFT
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_id = "Qwen/Qwen3-30B-A3B-Instruct-2507"
adapter_id = "<this-repo>"
tokenizer = AutoTokenizer.from_pretrained(base_id, trust_remote_code=True)
base = AutoModelForCausalLM.from_pretrained(
base_id,
trust_remote_code=True,
torch_dtype="auto",
device_map="auto",
)
model = PeftModel.from_pretrained(base, adapter_id)
model.eval()
vLLM LoRA
Use vLLM with LoRA enabled and this adapter as the LoRA path.
Limitations
- This is a research adapter trained for continual internalization experiments, not a general-purpose instruction-tuning release.
- Evaluation uses a heuristic string-matching scorer in
evaluate.py; treat scores as directional.
Citation
If you use this adapter, please cite:
- The DiSC paper:
@article{padmanabhan2026updating,
title={Updating Parametric Knowledge with Context Distillation Retains Post-Training Capabilities},
year={2026},
eprint={2602.16093},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- The OPSD/Knowledge Horizon context paper:
@article{shenfeld2026self,
title={Self-Distillation Enables Continual Learning},
year={2026},
eprint={2601.19897},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
- Downloads last month
- 6
Model tree for continual-internalization/kh-disc-qwen3-30b-a3b-200tok-first-run
Base model
Qwen/Qwen3-30B-A3B-Instruct-2507