Instructions to use ceselder/qwen3-8b-ao-v3-best-steering2p0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use ceselder/qwen3-8b-ao-v3-best-steering2p0 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen3-8B") model = PeftModel.from_pretrained(base_model, "ceselder/qwen3-8b-ao-v3-best-steering2p0") - Notebooks
- Google Colab
- Kaggle
Best v3 + steering 2.0ร
Same recipe as ceselder/qwen3-8b-ao-v3-best (Sonnet conversational + concurrent multi-layer [21..25] + on-policy cot-v5 past_lens + lr=3e-5, 50M tokens), with one change: the post-injection residual norm is rescaled to 2.0ร the original residual norm rather than the natural ~โ2ร (โ1.41ร) that arises from the default norm-matched injection.
This corresponds to the multi5_sonnet_norm2p0 training tag in the project.
AObench
- Best v3 (4-seed mean): +0.414
- Best v3 + steering 2.0ร (single seed at upload time): +0.437 (ฮ = +0.023 โ)
A 3-seed mean with seeds {original, 7, 13} was in progress when the training box was decommissioned. This card will be updated if/when those additional seeds are run.
What this is
LoRA verbalizer trained as part of the v3 ablation ladder for the Activation Oracle (AO) project.
The AO setup: given a target Qwen3-8B forward pass at certain layers/positions, we extract residual-stream activations and inject them (norm-matched) into a frozen Qwen3-8B's residual at a fixed hook layer. The verbalizer (this LoRA) is then trained to produce a natural-language description of what the captured activations represent.
Files
adapter_model.safetensorsโ LoRA weights (rank/alpha/dropout in adapter_config.json)adapter_config.jsonโ PEFT config (target modules, rank, alpha)ao_config.jsonโ Activation Oracle config (layers, hook positions, hook_onto_layer, prefix template)
Usage
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-8B")
tok = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B")
model = PeftModel.from_pretrained(base, "ceselder/qwen3-8b-ao-v3-best-steering2p0")
To reproduce the inference-time steering 2.0ร behaviour, set the environment variable AO_FINAL_NORM_SCALE=2.0 when running the AO injection hook (see nl_probes/utils/steering_hooks.py:get_hf_activation_steering_hook in the project repo).
Quirks worth knowing about
- First-position injection is an implicit training anchor. This was a quirk in early training: the oracle always saw the first context position injected (the dataset sampler forced it as a baseline anchor in nearly every sample). Presumably this helps with grounding. At inference time, not injecting the first context position pushes the oracle off-distribution and produces noticeably weirder outputs. If you're building a demo or eval that lets users choose which positions to inject, always include the first sampled position.
Collection
This checkpoint is part of the Qwen3-8B Activation Oracle v3 ablation ladder collection.
- Downloads last month
- 74