QwenGR00T-MIR-LoRA Β· LIBERO-Goal (CL, sweep-best refresh=50)

Continual-learning (CL) checkpoint released with the AlphaBrain framework. Provided for direct download and evaluation β€” no retraining needed.

A QwenGR00T Vision-Language-Action (VLA) model fine-tuned sequentially over the 10 LIBERO-Goal tasks with LoRA (r=32) and MIR (Maximally Interfered Retrieval) replay. This release is the sweep-best MIR configuration on LIBERO-Goal β€” the mir_refresh_interval=50 cell on top of the ER-aligned replay policy (buffer=1000, ratio=0.5, balanced=true, lora_only=true). At 50 rollouts/task it scores 76.0 % Avg SR, with no hard-zero tasks and a 100 % cell on a non-current task.

Overview

Architecture QwenGR00T (Qwen2.5-VL-3B + Flow-Matching DiT head, ~3.8 B params)
Base VLM Qwen/Qwen2.5-VL-3B-Instruct
Tunable parameters LoRA Β· r=32, Ξ±=16, dropout=0.05, target=all-linear (only LoRA + DiT action head trained)
CL algorithm MIR β€” virtual SGD + interfered-sample selection on top of ER replay
Replay policy (ER aligned) buffer_size_per_task=1000, replay_batch_ratio=0.5, balanced_sampling=true
MIR knobs (sweep best) refresh_interval=50, candidate_size=16, top_k=8, lora_only=true
Task stream LIBERO-Goal Β· 10 tasks Β· 10 000 steps/task = 100 000 steps total
Optimiser AdamW, base lr 2.5e-5, action-head lr 1e-4, cosine-with-min-lr
Hardware / batch 4 Γ— A800 80 GB Β· per_device_batch=4 Β· effective batch 16

Results

Evaluated with the full 10-checkpoint Γ— 10-task matrix at 50 rollouts per task (5 000 episodes total), so we can report both the final-row average accuracy and the standard CL forgetting metrics.

Metric Value
ACC β€” Avg SR after the full 10-task stream 77.0 %
BWT β€” Backward Transfer (Lopez-Paz & Ranzato 2017) βˆ’7.8 %
Avg Forgetting (Chaudhry et al. 2018, ↓ better) 10.4 %
Tasks improving from later training (BWT > 0) 2 / 9
Worst-forgotten task task 1 ("put the wine bottle on the rack"), 50 pp loss

Why ACC differs from the 76.0 % --last-only number used elsewhere: the matrix eval uses a fresh seed per checkpoint, so Β±1 pp run-to-run variance is normal. Same checkpoint, same metric β€” both are valid.

BWT < 0 means there is some forgetting, dominated by task 1; 7 of 9 past tasks lose ≀ 6 pp by end of stream, with two even improving (task 6: +20 pp, task 7: +4 pp). This matches the design hypothesis: MIR's fresher (refresh=50) cache catches mid-stream interference earlier than the default 200-step interval.

Per-task SR breakdown (eval order = LIBERO-Goal default; eval-#1 = the most-recently-trained task; eval-#10 = the first-trained task):

# Task name SR (%)
1 open the middle drawer of the cabinet 84
2 put the bowl on the stove 100
3 put the wine bottle on top of the cabinet 88
4 open the top drawer and put the bowl inside 30
5 put the bowl on top of the cabinet 94
6 push the plate to the front of the stove 86
7 put the cream cheese in the bowl 56
8 turn on the stove 94
9 put the bowl on the plate 86
10 put the wine bottle on the rack 44
β€” Total 76.0

For comparison on the same setup (single seed = 42, 50 rollouts/task):

Method Avg SR
MIR refresh=50 (this release) 76.0 %
Full-parameter ER baseline 51.6 %

Numbers are from a single seed (42); per-run variance is a few percentage points depending on simulator state and attention implementation. Reproduction numbers higher or lower than reported are expected β€” please file an issue / PR with details.

Files

β”œβ”€β”€ README.md                  model card
β”œβ”€β”€ config.yaml                AlphaBrain training config (OmegaConf, exact recipe)
β”œβ”€β”€ dataset_statistics.json    action normalisation (required for inference)
β”œβ”€β”€ cl_state_final.json        end-of-stream replay buffer + MIR cache metadata
β”œβ”€β”€ action_model.pt            DiT action head weights (~297 MB, full precision)
└── lora_adapter/
    β”œβ”€β”€ adapter_config.json    PEFT config (r=32, Ξ±=16, target=all-linear)
    └── adapter_model.safetensors  LoRA delta weights (~158 MB, bf16)

Base model β€” what to merge with

The base Qwen2.5-VL-3B VLM weights are not bundled in this repo (they are 6.2 GB and licence-restricted by their original authors). You must download them separately from the official Qwen repo:

Component Source
Base VLM (~6.2 GB, required) Qwen/Qwen2.5-VL-3B-Instruct
DINOv2 vision adapter facebook/dinov2-small (auto-pulled by server_policy.py)

The LoRA adapter in this repo (lora_adapter/adapter_model.safetensors, 158 MB) targets all-linear Qwen2.5-VL layers; AlphaBrain's server_policy.py loads the base + adapter + DiT action head together at inference time β€” there is no manual merge step.

Usage β€” full inference setup

# 1. Clone the AlphaBrain framework and install
git clone https://github.com/AlphaBrainGroup/AlphaBrain.git
cd AlphaBrain
pip install -e .

# 2. Download the base VLM into a directory you'll point PRETRAINED_MODELS_DIR at.
mkdir -p /path/to/models
huggingface-cli download Qwen/Qwen2.5-VL-3B-Instruct \
    --local-dir /path/to/models/Qwen2.5-VL-3B-Instruct

# 3. Download this CL checkpoint
huggingface-cli download AlphaBrainGroup/qwengr00t-mir-lora-libero-goal \
    --local-dir ./qwengr00t_mir_lora_libero_goal

# 4. Tell AlphaBrain where the base VLM lives
export PRETRAINED_MODELS_DIR=/path/to/models   # must contain Qwen2.5-VL-3B-Instruct/

# 5. Start the policy server
python deployment/model_server/server_policy.py \
    --ckpt_path ./qwengr00t_mir_lora_libero_goal \
    --port 10093 --use_bf16

server_policy.py reads config.yaml from the checkpoint folder, resolves ${PRETRAINED_MODELS_DIR}/Qwen2.5-VL-3B-Instruct, builds the QwenGR00T model, attaches the LoRA adapter, loads action_model.pt, and starts a WebSocket policy server on --port (default 10093).

Evaluation β€” how to reproduce the 76.0 % number

LIBERO-Goal evaluation runs as two processes communicating over WebSocket: a policy server (this repo, AlphaBrain env) and a simulation client (LIBERO-MuJoCo env).

# In one terminal: launch the AlphaBrain CL eval wrapper.
# It (a) auto-merges the LoRA adapter into a single .pt for inference,
# (b) launches the policy server, (c) runs the LIBERO simulator client
# against all 10 LIBERO-Goal tasks at 50 rollouts/task,
# (d) writes per-task SR + aggregate stats to results/eval_cl/<run_id>/.

bash scripts/run_continual_learning_scripts/run_cl_eval.sh \
    --run-id qwengr00t_mir_lora_libero_goal \
    --base-config configs/continual_learning/qwengr00t_mir_lora_libero.yaml \
    --gpus 0 \
    --suite libero_goal \
    --trials 50 \
    --last-only      # only the final ckpt; drop this flag for full 10Γ—10 NBT matrix

Prerequisites:

  • LIBERO_DATA_ROOT and LIBERO_HOME set in .env (see LIBERO eval pipeline).
  • The downloaded checkpoint folder layout must match exactly what run_cl_eval.sh expects (named task_*_id*_steps_*_lora_adapter + task_*_id*_steps_*_action_model.pt); for the single final ckpt this repo already ships the merged form under lora_adapter/ and action_model.pt, so a one-shot final eval is the simplest path.

For evaluation against other LIBERO suites (Spatial / Object / Long / joint all-4), point --suite at libero_spatial, libero_object, libero_10, etc. β€” note this checkpoint was trained only on LIBERO-Goal and will not generalise zero-shot.

Reproduction

bash scripts/run_continual_learning_scripts/run_cl_train.sh \
    --yaml configs/continual_learning/qwengr00t_mir_lora_libero.yaml \
    --gpus 0,1,2,3 -- \
    --continual_learning.algorithm.buffer_size_per_task=1000 \
    --continual_learning.algorithm.replay_batch_ratio=0.5 \
    --continual_learning.algorithm.balanced_sampling=true \
    --continual_learning.algorithm.mir_refresh_interval=50

Expect ~17 h on 4 Γ— A800 80 GB for the full 10-task Γ— 10 000-step schedule. The shipped config.yaml captures the exact recipe used for this checkpoint.

Notes

  • CL setting: sequential fine-tuning (task_stream_mode=by_task_index), not joint training. Each task sees its own 50 demos for 10 000 steps before the buffer + MIR cache replay kicks in for the next task.
  • MIR (Maximally Interfered Retrieval): every 50 training steps, MIR scores 16 buffer samples by how much a virtual SGD step on the current batch would hurt their loss; the top 8 are cached and injected into subsequent batches alongside reservoir-sampled ER replay. The fresher cache (50 vs the default 200) is the dominant knob in our LIBERO-Goal sweep.
  • Why LoRA: full-parameter MIR on a 3.8 B model would require per-step grad inspection on every parameter; restricting MIR's virtual step to LoRA params (~80 M) is essential for tractable wall-clock.

License

MIT β€” see the parent repository.

Citation

@misc{alphabrain2026,
  title  = {AlphaBrain: A Modular Open-Source Framework for Embodied Intelligence Research},
  author = {AlphaBrain Team},
  year   = {2026},
  url    = {https://github.com/AlphaBrainGroup/AlphaBrain}
}
Downloads last month
2
Video Preview
loading

Model tree for AlphaBrainGroup/qwengr00t-mir-lora-libero-goal

Adapter
(219)
this model

Collection including AlphaBrainGroup/qwengr00t-mir-lora-libero-goal