QwenGR00T-MIR-LoRA · LIBERO-Goal (CL, sweep-best `refresh=50`)

Continual-learning (CL) checkpoint released with the AlphaBrain framework. Provided for direct download and evaluation — no retraining needed.

A QwenGR00T Vision-Language-Action (VLA) model fine-tuned sequentially over the 10 LIBERO-Goal tasks with LoRA (r=32) and MIR (Maximally Interfered Retrieval) replay. This release is the sweep-best MIR configuration on LIBERO-Goal — the mir_refresh_interval=50 cell on top of the ER-aligned replay policy (buffer=1000, ratio=0.5, balanced=true, lora_only=true). At 50 rollouts/task it scores 76.0 % Avg SR, with no hard-zero tasks and a 100 % cell on a non-current task.

Overview


Architecture	QwenGR00T (Qwen2.5-VL-3B + Flow-Matching DiT head, ~3.8 B params)
Base VLM	`Qwen/Qwen2.5-VL-3B-Instruct`
Tunable parameters	LoRA · `r=32, α=16, dropout=0.05, target=all-linear` (only LoRA + DiT action head trained)
CL algorithm	MIR — virtual SGD + interfered-sample selection on top of ER replay
Replay policy (ER aligned)	`buffer_size_per_task=1000, replay_batch_ratio=0.5, balanced_sampling=true`
MIR knobs (sweep best)	`refresh_interval=50, candidate_size=16, top_k=8, lora_only=true`
Task stream	LIBERO-Goal · 10 tasks · 10 000 steps/task = 100 000 steps total
Optimiser	AdamW, base lr 2.5e-5, action-head lr 1e-4, cosine-with-min-lr
Hardware / batch	4 × A800 80 GB · `per_device_batch=4` · effective batch 16

Results

Evaluated with the full 10-checkpoint × 10-task matrix at 50 rollouts per task (5 000 episodes total), so we can report both the final-row average accuracy and the standard CL forgetting metrics.

Metric	Value
ACC — Avg SR after the full 10-task stream	77.0 %
BWT — Backward Transfer (Lopez-Paz & Ranzato 2017)	−7.8 %
Avg Forgetting (Chaudhry et al. 2018, ↓ better)	10.4 %
Tasks improving from later training (BWT > 0)	2 / 9
Worst-forgotten task	task 1 ("put the wine bottle on the rack"), 50 pp loss

Why ACC differs from the 76.0 % --last-only number used elsewhere: the matrix eval uses a fresh seed per checkpoint, so ±1 pp run-to-run variance is normal. Same checkpoint, same metric — both are valid.

BWT < 0 means there is some forgetting, dominated by task 1; 7 of 9 past tasks lose ≤ 6 pp by end of stream, with two even improving (task 6: +20 pp, task 7: +4 pp). This matches the design hypothesis: MIR's fresher (refresh=50) cache catches mid-stream interference earlier than the default 200-step interval.

Per-task SR breakdown (eval order = LIBERO-Goal default; eval-#1 = the most-recently-trained task; eval-#10 = the first-trained task):

#	Task name	SR (%)
1	open the middle drawer of the cabinet	84
2	put the bowl on the stove	100
3	put the wine bottle on top of the cabinet	88
4	open the top drawer and put the bowl inside	30
5	put the bowl on top of the cabinet	94
6	push the plate to the front of the stove	86
7	put the cream cheese in the bowl	56
8	turn on the stove	94
9	put the bowl on the plate	86
10	put the wine bottle on the rack	44
—	Total	76.0

For comparison on the same setup (single seed = 42, 50 rollouts/task):

Method	Avg SR
MIR refresh=50 (this release)	76.0 %
Full-parameter ER baseline	51.6 %

Numbers are from a single seed (42); per-run variance is a few percentage points depending on simulator state and attention implementation. Reproduction numbers higher or lower than reported are expected — please file an issue / PR with details.

Files

├── README.md                  model card
├── config.yaml                AlphaBrain training config (OmegaConf, exact recipe)
├── dataset_statistics.json    action normalisation (required for inference)
├── cl_state_final.json        end-of-stream replay buffer + MIR cache metadata
├── action_model.pt            DiT action head weights (~297 MB, full precision)
└── lora_adapter/
    ├── adapter_config.json    PEFT config (r=32, α=16, target=all-linear)
    └── adapter_model.safetensors  LoRA delta weights (~158 MB, bf16)

Base model — what to merge with

The base Qwen2.5-VL-3B VLM weights are not bundled in this repo (they are 6.2 GB and licence-restricted by their original authors). You must download them separately from the official Qwen repo:

Component	Source
Base VLM (~6.2 GB, required)	`Qwen/Qwen2.5-VL-3B-Instruct`
DINOv2 vision adapter	`facebook/dinov2-small` (auto-pulled by `server_policy.py`)

The LoRA adapter in this repo (lora_adapter/adapter_model.safetensors, 158 MB) targets all-linear Qwen2.5-VL layers; AlphaBrain's server_policy.py loads the base + adapter + DiT action head together at inference time — there is no manual merge step.

Usage — full inference setup

# 1. Clone the AlphaBrain framework and install
git clone https://github.com/AlphaBrainGroup/AlphaBrain.git
cd AlphaBrain
pip install -e .

# 2. Download the base VLM into a directory you'll point PRETRAINED_MODELS_DIR at.
mkdir -p /path/to/models
huggingface-cli download Qwen/Qwen2.5-VL-3B-Instruct \
    --local-dir /path/to/models/Qwen2.5-VL-3B-Instruct

# 3. Download this CL checkpoint
huggingface-cli download AlphaBrainGroup/qwengr00t-mir-lora-libero-goal \
    --local-dir ./qwengr00t_mir_lora_libero_goal

# 4. Tell AlphaBrain where the base VLM lives
export PRETRAINED_MODELS_DIR=/path/to/models   # must contain Qwen2.5-VL-3B-Instruct/

# 5. Start the policy server
python deployment/model_server/server_policy.py \
    --ckpt_path ./qwengr00t_mir_lora_libero_goal \
    --port 10093 --use_bf16

server_policy.py reads config.yaml from the checkpoint folder, resolves ${PRETRAINED_MODELS_DIR}/Qwen2.5-VL-3B-Instruct, builds the QwenGR00T model, attaches the LoRA adapter, loads action_model.pt, and starts a WebSocket policy server on --port (default 10093).

Evaluation — how to reproduce the 76.0 % number

LIBERO-Goal evaluation runs as two processes communicating over WebSocket: a policy server (this repo, AlphaBrain env) and a simulation client (LIBERO-MuJoCo env).

# In one terminal: launch the AlphaBrain CL eval wrapper.
# It (a) auto-merges the LoRA adapter into a single .pt for inference,
# (b) launches the policy server, (c) runs the LIBERO simulator client
# against all 10 LIBERO-Goal tasks at 50 rollouts/task,
# (d) writes per-task SR + aggregate stats to results/eval_cl/<run_id>/.

bash scripts/run_continual_learning_scripts/run_cl_eval.sh \
    --run-id qwengr00t_mir_lora_libero_goal \
    --base-config configs/continual_learning/qwengr00t_mir_lora_libero.yaml \
    --gpus 0 \
    --suite libero_goal \
    --trials 50 \
    --last-only      # only the final ckpt; drop this flag for full 10×10 NBT matrix

Prerequisites:

LIBERO_DATA_ROOT and LIBERO_HOME set in .env (see LIBERO eval pipeline).
The downloaded checkpoint folder layout must match exactly what run_cl_eval.sh expects (named task_*_id*_steps_*_lora_adapter + task_*_id*_steps_*_action_model.pt); for the single final ckpt this repo already ships the merged form under lora_adapter/ and action_model.pt, so a one-shot final eval is the simplest path.

For evaluation against other LIBERO suites (Spatial / Object / Long / joint all-4), point --suite at libero_spatial, libero_object, libero_10, etc. — note this checkpoint was trained only on LIBERO-Goal and will not generalise zero-shot.

Reproduction

bash scripts/run_continual_learning_scripts/run_cl_train.sh \
    --yaml configs/continual_learning/qwengr00t_mir_lora_libero.yaml \
    --gpus 0,1,2,3 -- \
    --continual_learning.algorithm.buffer_size_per_task=1000 \
    --continual_learning.algorithm.replay_batch_ratio=0.5 \
    --continual_learning.algorithm.balanced_sampling=true \
    --continual_learning.algorithm.mir_refresh_interval=50

Expect ~17 h on 4 × A800 80 GB for the full 10-task × 10 000-step schedule. The shipped config.yaml captures the exact recipe used for this checkpoint.

Notes

CL setting: sequential fine-tuning (task_stream_mode=by_task_index), not joint training. Each task sees its own 50 demos for 10 000 steps before the buffer + MIR cache replay kicks in for the next task.
MIR (Maximally Interfered Retrieval): every 50 training steps, MIR scores 16 buffer samples by how much a virtual SGD step on the current batch would hurt their loss; the top 8 are cached and injected into subsequent batches alongside reservoir-sampled ER replay. The fresher cache (50 vs the default 200) is the dominant knob in our LIBERO-Goal sweep.
Why LoRA: full-parameter MIR on a 3.8 B model would require per-step grad inspection on every parameter; restricting MIR's virtual step to LoRA params (~80 M) is essential for tractable wall-clock.

License

MIT — see the parent repository.

Citation

@misc{alphabrain2026,
  title  = {AlphaBrain: A Modular Open-Source Framework for Embodied Intelligence Research},
  author = {AlphaBrain Team},
  year   = {2026},
  url    = {https://github.com/AlphaBrainGroup/AlphaBrain}
}

Downloads last month: 2

Video Preview

Robotics

Model tree for AlphaBrainGroup/qwengr00t-mir-lora-libero-goal

Base model

Qwen/Qwen2.5-VL-3B-Instruct

Adapter

(219)

this model

Collection including AlphaBrainGroup/qwengr00t-mir-lora-libero-goal

AlphaBrain VLA Continual Learning

Collection

Continual-learning VLA checkpoints using different continual learning algorithms across different model architectures. • 6 items • Updated Apr 27

QwenGR00T-MIR-LoRA · LIBERO-Goal (CL, sweep-best refresh=50)