Instructions to use AlphaBrainGroup/qwengr00t-mir-lora-libero-goal with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use AlphaBrainGroup/qwengr00t-mir-lora-libero-goal with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
QwenGR00T-MIR-LoRA Β· LIBERO-Goal (CL, sweep-best refresh=50)
Continual-learning (CL) checkpoint released with the AlphaBrain framework. Provided for direct download and evaluation β no retraining needed.
A QwenGR00T Vision-Language-Action (VLA) model fine-tuned sequentially
over the 10 LIBERO-Goal tasks with LoRA (r=32) and MIR
(Maximally Interfered Retrieval) replay. This release is the
sweep-best MIR configuration on LIBERO-Goal β the
mir_refresh_interval=50 cell on top of the ER-aligned replay policy
(buffer=1000, ratio=0.5, balanced=true, lora_only=true). At 50
rollouts/task it scores 76.0 % Avg SR, with no hard-zero tasks and
a 100 % cell on a non-current task.
Overview
| Architecture | QwenGR00T (Qwen2.5-VL-3B + Flow-Matching DiT head, ~3.8 B params) |
| Base VLM | Qwen/Qwen2.5-VL-3B-Instruct |
| Tunable parameters | LoRA Β· r=32, Ξ±=16, dropout=0.05, target=all-linear (only LoRA + DiT action head trained) |
| CL algorithm | MIR β virtual SGD + interfered-sample selection on top of ER replay |
| Replay policy (ER aligned) | buffer_size_per_task=1000, replay_batch_ratio=0.5, balanced_sampling=true |
| MIR knobs (sweep best) | refresh_interval=50, candidate_size=16, top_k=8, lora_only=true |
| Task stream | LIBERO-Goal Β· 10 tasks Β· 10 000 steps/task = 100 000 steps total |
| Optimiser | AdamW, base lr 2.5e-5, action-head lr 1e-4, cosine-with-min-lr |
| Hardware / batch | 4 Γ A800 80 GB Β· per_device_batch=4 Β· effective batch 16 |
Results
Evaluated with the full 10-checkpoint Γ 10-task matrix at 50 rollouts per task (5 000 episodes total), so we can report both the final-row average accuracy and the standard CL forgetting metrics.
| Metric | Value |
|---|---|
| ACC β Avg SR after the full 10-task stream | 77.0 % |
| BWT β Backward Transfer (Lopez-Paz & Ranzato 2017) | β7.8 % |
| Avg Forgetting (Chaudhry et al. 2018, β better) | 10.4 % |
| Tasks improving from later training (BWT > 0) | 2 / 9 |
| Worst-forgotten task | task 1 ("put the wine bottle on the rack"), 50 pp loss |
Why ACC differs from the 76.0 %
--last-onlynumber used elsewhere: the matrix eval uses a fresh seed per checkpoint, so Β±1 pp run-to-run variance is normal. Same checkpoint, same metric β both are valid.
BWT < 0 means there is some forgetting, dominated by task 1; 7 of 9
past tasks lose β€ 6 pp by end of stream, with two even improving (task
6: +20 pp, task 7: +4 pp). This matches the design hypothesis: MIR's
fresher (refresh=50) cache catches mid-stream interference earlier
than the default 200-step interval.
Per-task SR breakdown (eval order = LIBERO-Goal default; eval-#1 = the most-recently-trained task; eval-#10 = the first-trained task):
| # | Task name | SR (%) |
|---|---|---|
| 1 | open the middle drawer of the cabinet | 84 |
| 2 | put the bowl on the stove | 100 |
| 3 | put the wine bottle on top of the cabinet | 88 |
| 4 | open the top drawer and put the bowl inside | 30 |
| 5 | put the bowl on top of the cabinet | 94 |
| 6 | push the plate to the front of the stove | 86 |
| 7 | put the cream cheese in the bowl | 56 |
| 8 | turn on the stove | 94 |
| 9 | put the bowl on the plate | 86 |
| 10 | put the wine bottle on the rack | 44 |
| β | Total | 76.0 |
For comparison on the same setup (single seed = 42, 50 rollouts/task):
| Method | Avg SR |
|---|---|
| MIR refresh=50 (this release) | 76.0 % |
| Full-parameter ER baseline | 51.6 % |
Numbers are from a single seed (42); per-run variance is a few percentage points depending on simulator state and attention implementation. Reproduction numbers higher or lower than reported are expected β please file an issue / PR with details.
Files
βββ README.md model card
βββ config.yaml AlphaBrain training config (OmegaConf, exact recipe)
βββ dataset_statistics.json action normalisation (required for inference)
βββ cl_state_final.json end-of-stream replay buffer + MIR cache metadata
βββ action_model.pt DiT action head weights (~297 MB, full precision)
βββ lora_adapter/
βββ adapter_config.json PEFT config (r=32, Ξ±=16, target=all-linear)
βββ adapter_model.safetensors LoRA delta weights (~158 MB, bf16)
Base model β what to merge with
The base Qwen2.5-VL-3B VLM weights are not bundled in this repo (they are 6.2 GB and licence-restricted by their original authors). You must download them separately from the official Qwen repo:
| Component | Source |
|---|---|
| Base VLM (~6.2 GB, required) | Qwen/Qwen2.5-VL-3B-Instruct |
| DINOv2 vision adapter | facebook/dinov2-small (auto-pulled by server_policy.py) |
The LoRA adapter in this repo (lora_adapter/adapter_model.safetensors,
158 MB) targets all-linear Qwen2.5-VL layers; AlphaBrain's
server_policy.py loads the base + adapter + DiT action head together
at inference time β there is no manual merge step.
Usage β full inference setup
# 1. Clone the AlphaBrain framework and install
git clone https://github.com/AlphaBrainGroup/AlphaBrain.git
cd AlphaBrain
pip install -e .
# 2. Download the base VLM into a directory you'll point PRETRAINED_MODELS_DIR at.
mkdir -p /path/to/models
huggingface-cli download Qwen/Qwen2.5-VL-3B-Instruct \
--local-dir /path/to/models/Qwen2.5-VL-3B-Instruct
# 3. Download this CL checkpoint
huggingface-cli download AlphaBrainGroup/qwengr00t-mir-lora-libero-goal \
--local-dir ./qwengr00t_mir_lora_libero_goal
# 4. Tell AlphaBrain where the base VLM lives
export PRETRAINED_MODELS_DIR=/path/to/models # must contain Qwen2.5-VL-3B-Instruct/
# 5. Start the policy server
python deployment/model_server/server_policy.py \
--ckpt_path ./qwengr00t_mir_lora_libero_goal \
--port 10093 --use_bf16
server_policy.py reads config.yaml from the checkpoint folder,
resolves ${PRETRAINED_MODELS_DIR}/Qwen2.5-VL-3B-Instruct, builds the
QwenGR00T model, attaches the LoRA adapter, loads action_model.pt,
and starts a WebSocket policy server on --port (default 10093).
Evaluation β how to reproduce the 76.0 % number
LIBERO-Goal evaluation runs as two processes communicating over WebSocket: a policy server (this repo, AlphaBrain env) and a simulation client (LIBERO-MuJoCo env).
# In one terminal: launch the AlphaBrain CL eval wrapper.
# It (a) auto-merges the LoRA adapter into a single .pt for inference,
# (b) launches the policy server, (c) runs the LIBERO simulator client
# against all 10 LIBERO-Goal tasks at 50 rollouts/task,
# (d) writes per-task SR + aggregate stats to results/eval_cl/<run_id>/.
bash scripts/run_continual_learning_scripts/run_cl_eval.sh \
--run-id qwengr00t_mir_lora_libero_goal \
--base-config configs/continual_learning/qwengr00t_mir_lora_libero.yaml \
--gpus 0 \
--suite libero_goal \
--trials 50 \
--last-only # only the final ckpt; drop this flag for full 10Γ10 NBT matrix
Prerequisites:
LIBERO_DATA_ROOTandLIBERO_HOMEset in.env(see LIBERO eval pipeline).- The downloaded checkpoint folder layout must match exactly what
run_cl_eval.shexpects (namedtask_*_id*_steps_*_lora_adapter+task_*_id*_steps_*_action_model.pt); for the single final ckpt this repo already ships the merged form underlora_adapter/andaction_model.pt, so a one-shot final eval is the simplest path.
For evaluation against other LIBERO suites (Spatial / Object / Long /
joint all-4), point --suite at libero_spatial, libero_object,
libero_10, etc. β note this checkpoint was trained only on
LIBERO-Goal and will not generalise zero-shot.
Reproduction
bash scripts/run_continual_learning_scripts/run_cl_train.sh \
--yaml configs/continual_learning/qwengr00t_mir_lora_libero.yaml \
--gpus 0,1,2,3 -- \
--continual_learning.algorithm.buffer_size_per_task=1000 \
--continual_learning.algorithm.replay_batch_ratio=0.5 \
--continual_learning.algorithm.balanced_sampling=true \
--continual_learning.algorithm.mir_refresh_interval=50
Expect ~17 h on 4 Γ A800 80 GB for the full 10-task Γ 10 000-step
schedule. The shipped config.yaml captures the exact recipe used for
this checkpoint.
Notes
- CL setting: sequential fine-tuning (
task_stream_mode=by_task_index), not joint training. Each task sees its own 50 demos for 10 000 steps before the buffer + MIR cache replay kicks in for the next task. - MIR (Maximally Interfered Retrieval): every 50 training steps, MIR scores 16 buffer samples by how much a virtual SGD step on the current batch would hurt their loss; the top 8 are cached and injected into subsequent batches alongside reservoir-sampled ER replay. The fresher cache (50 vs the default 200) is the dominant knob in our LIBERO-Goal sweep.
- Why LoRA: full-parameter MIR on a 3.8 B model would require per-step grad inspection on every parameter; restricting MIR's virtual step to LoRA params (~80 M) is essential for tractable wall-clock.
License
MIT β see the parent repository.
Citation
@misc{alphabrain2026,
title = {AlphaBrain: A Modular Open-Source Framework for Embodied Intelligence Research},
author = {AlphaBrain Team},
year = {2026},
url = {https://github.com/AlphaBrainGroup/AlphaBrain}
}
- Downloads last month
- 2
Model tree for AlphaBrainGroup/qwengr00t-mir-lora-libero-goal
Base model
Qwen/Qwen2.5-VL-3B-Instruct