--- license: mit language: - en library_name: pytorch pipeline_tag: robotics tags: - robotics - vla - vision-language-action - libero - neurovla - spiking-neural-network - brain-inspired base_model: - Qwen/Qwen2.5-VL-3B-Instruct datasets: - LIBERO --- # NeuroVLA · LIBERO (all 4 suites, joint training) > Brain-inspired Vision-Language-Action (VLA) checkpoint released with > the [AlphaBrain](https://github.com/AlphaBrainGroup/AlphaBrain) > framework. Trained jointly on **all four LIBERO suites** — Goal, > Spatial, Object, and Long — for direct evaluation across the full > LIBERO benchmark without retraining. NeuroVLA couples a Qwen2.5-VL backbone with a **layer-wise Q-Former** that extracts action-relevant features from the VLM's hidden states, feeding a **Spiking Neural Network (SNN)** action head. The model was trained in a single supervised run (not continual learning) on a mixed stream of all 4 LIBERO suites, using the `libero_all` data mix. ## Overview | | | |:---|:---| | **Architecture** | NeuroVLA (Qwen2.5-VL-3B + layer-wise Q-Former + SNN head) | | **Base VLM** | `Qwen/Qwen2.5-VL-3B-Instruct` | | **Q-Former** | Layers 36 → 37 · `num_query_tokens=8` · `output_dim=768` | | **Action head** | DiT-based, `hidden_size=1024`, `action_dim=7`, `state_dim=7`, chunk 16 | | **Training data** | LIBERO · **all 4 suites (Goal + Spatial + Object + Long)** · `dataset_mix=libero_all` | | **Training type** | Supervised fine-tuning (single run; not continual learning) | | **Attention** | SDPA (not flash-attention, to avoid ABI pinning) | | **Optimiser** | AdamW · `lr_base = 2.5e-5` · cosine-with-min-lr · 5 000 warmup | | **Step budget** | 50 000 (this release) · saved every 10 000 steps | | **Hardware / batch** | 2 × A800 80 GB · `per_device_batch_size = 16` | ## Files ``` ├── README.md model card ├── framework_config.yaml AlphaBrain framework configuration ├── dataset_statistics.json action normalisation statistics (required for inference) ├── model.safetensors full VLA weights (~7.7 GB) ├── resume_meta.json training metadata (step count, GPU count) └── qwen_pretrained/ Qwen2.5-VL tokenizer + preprocessor configs ``` ## Usage ```bash git clone https://github.com/AlphaBrainGroup/AlphaBrain.git cd AlphaBrain pip install -e . export PRETRAINED_MODELS_DIR=/path/to/models # must contain Qwen2.5-VL-3B-Instruct/ huggingface-cli download AlphaBrainGroup/neurovla-libero-all4suite \ --local-dir ./neurovla_libero_all python deployment/model_server/server_policy.py \ --ckpt_path ./neurovla_libero_all --port 10093 --use_bf16 ``` For evaluation on any of the 4 LIBERO suites, see the [LIBERO eval pipeline](https://github.com/AlphaBrainGroup/AlphaBrain/tree/dev/benchmarks/LIBERO/eval). ## Reproduction ```bash # Framework's NeuroVLA pretraining entry bash scripts/run_brain_inspired_scripts/run_neurovla_pretrain.sh \ --yaml configs/neurovla_all4suite_libero.yaml # (or equivalent config for 4-suite mix) ``` Expect multi-day training on 2 × A800 80 GB for the full 50 000-step schedule. The shipped `framework_config.yaml` is the exact training configuration used for this checkpoint. ## Notes - **Joint-training baseline**, not continual learning. For the CL release of NeuroVLA (sequential training on LIBERO-Goal with Experience Replay), see [`AlphaBrainGroup/neurovla-cl-libero-goal`](https://huggingface.co/AlphaBrainGroup/neurovla-cl-libero-goal) and its LoRA variant. - **Attention implementation is SDPA**, chosen to avoid flash-attn ABI pinning across environments. Users who have a matching flash-attn wheel can override via `--framework.qwenvl.attn_implementation=flash_attention_2`. ## License MIT — see the [parent repository](https://github.com/AlphaBrainGroup/AlphaBrain). ## Citation ```bibtex @misc{alphabrain2026, title = {AlphaBrain: A Modular Open-Source Framework for Embodied Intelligence Research}, author = {AlphaBrain Team}, year = {2026}, url = {https://github.com/AlphaBrainGroup/AlphaBrain} } ```