farpluto's picture
Add README.md
83f11df verified
metadata
language: en
license: apache-2.0
tags:
  - doc-to-lora
  - lora
  - hypernetwork
  - context-distillation
  - needle-in-a-haystack
  - perceiver
base_model: Qwen/Qwen3.5-2B

Doc-to-LoRA — NIAH Proof of Concept

A 127 M parameter Perceiver hypernetwork trained on Qwen/Qwen3.5-2B. Reads a document once, outputs LoRA deltas, and lets the base LLM answer questions without the document ever appearing in the context window.

Based on Doc-to-LoRA (Charakorn et al., 2026). Uses KL context distillation (Cartridges) and token-init.

curves

Results

Metric Value
Base model Qwen/Qwen3.5-2B
Perceiver params 127 M
LoRA rank / alpha 8 / 8.0
Target module down_proj
Training steps 1,400
Final CE loss 1.3218
Exact-match accuracy (NIAH) 0.0%
Training ctx length 32–256 tokens

Files

File Description
hypernet.pt Perceiver weights + full config to rebuild the class
inference_example.py Self-contained script (download and run)
training_config.json Training hyperparameters
curves.png Loss and accuracy curves

Quick start

pip install transformers>=5.2.0 huggingface_hub torch
from huggingface_hub import hf_hub_download
import torch

ckpt = torch.load(hf_hub_download("farpluto/doc-to-lora-niah-qwen3.5-2B", "hypernet.pt"),
                   map_location="cuda", weights_only=False)
# See inference_example.py for the complete working script.

Qwen3 note

Chain-of-thought thinking is suppressed via /no_think appended to every query. Residual <think> tokens are stripped from generated output. Both techniques are harmless no-ops on non-Qwen3 models.