Upload neurovla-libero-all4suite

Browse files

Files changed (16) hide show

.gitattributes +1 -0
README.md +116 -0
dataset_statistics.json +134 -0
framework_config.yaml +97 -0
model.safetensors +3 -0
qwen_pretrained/added_tokens.json +24 -0
qwen_pretrained/chat_template.jinja +7 -0
qwen_pretrained/config.json +139 -0
qwen_pretrained/merges.txt +0 -0
qwen_pretrained/preprocessor_config.json +39 -0
qwen_pretrained/special_tokens_map.json +31 -0
qwen_pretrained/tokenizer.json +3 -0
qwen_pretrained/tokenizer_config.json +208 -0
qwen_pretrained/video_preprocessor_config.json +43 -0
qwen_pretrained/vocab.json +0 -0
resume_meta.json +8 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+qwen_pretrained/tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,116 @@

+---
+license: mit
+language:
+  - en
+library_name: pytorch
+pipeline_tag: robotics
+tags:
+  - robotics
+  - vla
+  - vision-language-action
+  - libero
+  - neurovla
+  - spiking-neural-network
+  - brain-inspired
+base_model:
+  - Qwen/Qwen2.5-VL-3B-Instruct
+datasets:
+  - LIBERO
+---
+# NeuroVLA · LIBERO (all 4 suites, joint training)
+> Brain-inspired Vision-Language-Action (VLA) checkpoint released with
+> the [AlphaBrain](https://github.com/AlphaBrainGroup/AlphaBrain)
+> framework. Trained jointly on **all four LIBERO suites** — Goal,
+> Spatial, Object, and Long — for direct evaluation across the full
+> LIBERO benchmark without retraining.
+NeuroVLA couples a Qwen2.5-VL backbone with a **layer-wise Q-Former**
+that extracts action-relevant features from the VLM's hidden states,
+feeding a **Spiking Neural Network (SNN)** action head. The model was
+trained in a single supervised run (not continual learning) on a mixed
+stream of all 4 LIBERO suites, using the `libero_all` data mix.
+## Overview
+| | |
+|:---|:---|
+| **Architecture**        | NeuroVLA (Qwen2.5-VL-3B + layer-wise Q-Former + SNN head) |
+| **Base VLM**            | `Qwen/Qwen2.5-VL-3B-Instruct`                              |
+| **Q-Former**            | Layers 36 → 37 · `num_query_tokens=8` · `output_dim=768`   |
+| **Action head**         | DiT-based, `hidden_size=1024`, `action_dim=7`, `state_dim=7`, chunk 16 |
+| **Training data**       | LIBERO · **all 4 suites (Goal + Spatial + Object + Long)** · `dataset_mix=libero_all` |
+| **Training type**       | Supervised fine-tuning (single run; not continual learning) |
+| **Attention**           | SDPA (not flash-attention, to avoid ABI pinning)           |
+| **Optimiser**           | AdamW · `lr_base = 2.5e-5` · cosine-with-min-lr · 5 000 warmup |
+| **Step budget**         | 50 000 (this release) · saved every 10 000 steps           |
+| **Hardware / batch**    | 2 × A800 80 GB · `per_device_batch_size = 16`              |
+## Files
+```
+├── README.md                   model card
+├── framework_config.yaml       AlphaBrain framework configuration
+├── dataset_statistics.json     action normalisation statistics (required for inference)
+├── model.safetensors           full VLA weights (~7.7 GB)
+├── resume_meta.json            training metadata (step count, GPU count)
+└── qwen_pretrained/            Qwen2.5-VL tokenizer + preprocessor configs
+```
+## Usage
+```bash
+git clone https://github.com/AlphaBrainGroup/AlphaBrain.git
+cd AlphaBrain
+pip install -e .
+export PRETRAINED_MODELS_DIR=/path/to/models   # must contain Qwen2.5-VL-3B-Instruct/
+huggingface-cli download AlphaBrainGroup/neurovla-libero-all4suite \
+    --local-dir ./neurovla_libero_all
+python deployment/model_server/server_policy.py \
+    --ckpt_path ./neurovla_libero_all --port 10093 --use_bf16
+```
+For evaluation on any of the 4 LIBERO suites, see the
+[LIBERO eval pipeline](https://github.com/AlphaBrainGroup/AlphaBrain/tree/dev/benchmarks/LIBERO/eval).
+## Reproduction
+```bash
+# Framework's NeuroVLA pretraining entry
+bash scripts/run_brain_inspired_scripts/run_neurovla_pretrain.sh \
+    --yaml configs/neurovla_all4suite_libero.yaml   # (or equivalent config for 4-suite mix)
+```
+Expect multi-day training on 2 × A800 80 GB for the full 50 000-step
+schedule. The shipped `framework_config.yaml` is the exact training
+configuration used for this checkpoint.
+## Notes
+- **Joint-training baseline**, not continual learning. For the CL
+  release of NeuroVLA (sequential training on LIBERO-Goal with
+  Experience Replay), see
+  [`AlphaBrainGroup/neurovla-cl-libero-goal`](https://huggingface.co/AlphaBrainGroup/neurovla-cl-libero-goal)
+  and its LoRA variant.
+- **Attention implementation is SDPA**, chosen to avoid flash-attn ABI
+  pinning across environments. Users who have a matching flash-attn
+  wheel can override via `--framework.qwenvl.attn_implementation=flash_attention_2`.
+## License
+MIT — see the [parent repository](https://github.com/AlphaBrainGroup/AlphaBrain).
+## Citation
+```bibtex
+@misc{alphabrain2026,
+  title  = {AlphaBrain: A Modular Open-Source Framework for Embodied Intelligence Research},
+  author = {AlphaBrain Team},
+  year   = {2026},
+  url    = {https://github.com/AlphaBrainGroup/AlphaBrain}
+}
+```

dataset_statistics.json ADDED Viewed

	@@ -0,0 +1,134 @@

+{
+  "franka": {
+    "action": {
+      "mean": [
+        0.07237596483901143,
+        0.08987006871029735,
+        -0.10144743137061596,
+        -0.00045383188989944756,
+        0.006273590726777911,
+        -0.003878799732774496,
+        0.524486355483532
+      ],
+      "std": [
+        0.3498823308902479,
+        0.37794140366375184,
+        0.460084266976933,
+        0.0403885784928603,
+        0.06616144248501059,
+        0.07763074391911857,
+        0.4994683356809767
+      ],
+      "max": [
+        0.9375,
+        0.9375,
+        0.9375,
+        0.3557142913341522,
+        0.375,
+        0.375,
+        1.0
+      ],
+      "min": [
+        -0.9375,
+        -0.9375,
+        -0.9375,
+        -0.2582142949104309,
+        -0.375,
+        -0.3675000071525574,
+        0.0
+      ],
+      "q01": [
+        -0.8785714507102966,
+        -0.8758928775787354,
+        -0.9375,
+        -0.1510714292526245,
+        -0.20678570866584778,
+        -0.2742857038974762,
+        0.0
+      ],
+      "q99": [
+        0.9375,
+        0.9107142686843872,
+        0.9375,
+        0.20357142388820648,
+        0.26357144117355347,
+        0.375,
+        1.0
+      ],
+      "mask": [
+        true,
+        true,
+        true,
+        true,
+        true,
+        true,
+        false
+      ],
+      "norm_mode": "q99"
+    },
+    "state": {
+      "mean": [
+        -0.04889854742214084,
+        0.03689368185587227,
+        0.7890402488410473,
+        2.9771945476531982,
+        -0.1417286954820156,
+        -0.11769362539052963,
+        0.026436020154505968,
+        -0.02665513101965189
+      ],
+      "std": [
+        0.10639013941746686,
+        0.15115733130675715,
+        0.38406895599530033,
+        0.3530238395244304,
+        0.8227341427331599,
+        0.32357567121520087,
+        0.014583991652936385,
+        0.014467005007200339
+      ],
+      "max": [
+        0.21031762659549713,
+        0.39128610491752625,
+        1.3660105466842651,
+        3.6714255809783936,
+        3.560650587081909,
+        1.386339545249939,
+        0.04233968257904053,
+        0.0013633022317662835
+      ],
+      "min": [
+        -0.4828203022480011,
+        -0.3255046010017395,
+        0.008128180168569088,
+        0.35277295112609863,
+        -3.641430377960205,
+        -1.842738389968872,
+        -0.0013586411951109767,
+        -0.042040832340717316
+      ],
+      "q01": [
+        -0.42401049643754957,
+        -0.2838300323486328,
+        0.009925739830359817,
+        1.3085840785503386,
+        -2.886677579879761,
+        -1.1599004411697387,
+        0.001503719249740243,
+        -0.040336399003863335
+      ],
+      "q99": [
+        0.1530261474847791,
+        0.3629165390133857,
+        1.2910678112506866,
+        3.303542451858519,
+        2.7496529006957933,
+        0.6893712210655194,
+        0.040610933862626555,
+        -0.0015016929572448147
+      ]
+    },
+    "num_transitions": 273465,
+    "num_trajectories": 1693
+  }
+}

framework_config.yaml ADDED Viewed

	@@ -0,0 +1,97 @@

+framework:
+  name: NeuroVLA
+  qwenvl:
+    attn_implementation: sdpa
+    vl_hidden_dim: 2048
+    base_vlm: /share/lipengteng/VLA-Engine-Developer/data/pretrained_models/Qwen2.5-VL-3B-Instruct
+  layer_qformer:
+    qformer_end_layer: 37
+    qformer_start_layer: 36
+    num_query_tokens: 8
+    input_dim: 2048
+    ouptput_dim: 768
+    grad_scale: 0.5
+  action_model:
+    hidden_size: 1024
+    add_pos_embed: true
+    max_seq_len: 1024
+    action_dim: 7
+    state_dim: 7
+    future_action_window_size: 15
+    action_horizon: 16
+    past_action_window_size: 0
+    repeated_diffusion_steps: 8
+  reduce_in_full_precision: true
+trainer:
+  enable_gradient_checkpointing: true
+  enable_mixed_precision_training: true
+  epochs: 100
+  eval_interval: 50001
+  freeze_modules: ''
+  gradient_accumulation_steps: 1
+  gradient_clipping: 1.0
+  is_resume: false
+  learning_rate:
+    action_model: 0.0001
+    base: 2.5e-05
+    qwen_vl_interface: 1.0e-05
+    layer_qformer: 5.0e-05
+  logging_frequency: 10
+  loss_scale:
+    vla: 1.0
+    vlm: 0.1
+  lr_scheduler_type: cosine_with_min_lr
+  max_grad_norm: 1.0
+  max_train_steps: 50000
+  num_warmup_steps: 5000
+  optimizer:
+    betas:
+    - 0.9
+    - 0.95
+    eps: 1.0e-08
+    name: AdamW
+    weight_decay: 1.0e-08
+  resume_epoch: null
+  resume_step: null
+  save_interval: 10000
+  scheduler_specific_kwargs:
+    min_lr: 1.0e-06
+  warmup_ratio: 0.1
+  weight_decay: 0.0
+environment:
+  wandb_mode: online
+  wandb_project: vla-engine-benchmark
+  wandb_entity: ''
+  wandb_base_url: https://api.bandw.top
+  num_gpus: 2
+  main_process_port: 29500
+  nccl:
+    ib_hca: mlx5_2,mlx5_3
+    blocking_wait: 1
+    async_error_handling: 1
+    timeout: 10000
+    socket_timeout_ms: 360000
+seed: 42
+run_id: 0421-NeuroVLA-All4Suite-bs16-sdpa
+output_root_dir: ./results/training
+datasets:
+  vla_data:
+    data_root_dir: /share/weiyu/IPEC-COMMUNITY
+    dataset_mix: libero_all
+    per_device_batch_size: 16
+    dataloader_module: lerobot_datasets
+    action_type: delta_ee
+    sequential_step_sampling: false
+    CoT_prompt: Your task is {instruction}. To identify the key objects for your task.
+      Locate their bounding boxes in [x1,y1,x2,y2] format.
+    CoT_answer: bbox
+    default_image_resolution:
+    - 3
+    - 224
+    - 224
+    load_all_data_for_training: true
+    obs:
+    - image_0
+    video_backend: torchvision_av
+    include_state: true
+output_dir: ./results/training/0421-NeuroVLA-All4Suite-bs16-sdpa

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:af72e06391f6e1d95f538d923bf8a14f2542ca2849637beab260f5fae833493a
+size 8167582462

qwen_pretrained/added_tokens.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "</tool_call>": 151658,
+  "<tool_call>": 151657,
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|file_sep|>": 151664,
+  "<|fim_middle|>": 151660,
+  "<|fim_pad|>": 151662,
+  "<|fim_prefix|>": 151659,
+  "<|fim_suffix|>": 151661,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|repo_name|>": 151663,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}

qwen_pretrained/chat_template.jinja ADDED Viewed

	@@ -0,0 +1,7 @@

+{% set image_count = namespace(value=0) %}{% set video_count = namespace(value=0) %}{% for message in messages %}{% if loop.first and message['role'] != 'system' %}<|im_start|>system
+You are a helpful assistant.<|im_end|>
+{% endif %}<|im_start|>{{ message['role'] }}
+{% if message['content'] is string %}{{ message['content'] }}<|im_end|>
+{% else %}{% for content in message['content'] %}{% if content['type'] == 'image' or 'image' in content or 'image_url' in content %}{% set image_count.value = image_count.value + 1 %}{% if add_vision_id %}Picture {{ image_count.value }}: {% endif %}<|vision_start|><|image_pad|><|vision_end|>{% elif content['type'] == 'video' or 'video' in content %}{% set video_count.value = video_count.value + 1 %}{% if add_vision_id %}Video {{ video_count.value }}: {% endif %}<|vision_start|><|video_pad|><|vision_end|>{% elif 'text' in content %}{{ content['text'] }}{% endif %}{% endfor %}<|im_end|>
+{% endif %}{% endfor %}{% if add_generation_prompt %}<|im_start|>assistant
+{% endif %}

qwen_pretrained/config.json ADDED Viewed

	@@ -0,0 +1,139 @@

+{
+  "architectures": [
+    "Qwen2_5_VLForConditionalGeneration"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 151643,
+  "dtype": "bfloat16",
+  "eos_token_id": 151645,
+  "hidden_act": "silu",
+  "hidden_size": 2048,
+  "image_token_id": 151655,
+  "initializer_range": 0.02,
+  "intermediate_size": 11008,
+  "max_position_embeddings": 128000,
+  "max_window_layers": 70,
+  "model_type": "qwen2_5_vl",
+  "num_attention_heads": 16,
+  "num_hidden_layers": 36,
+  "num_key_value_heads": 2,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": {
+    "mrope_section": [
+      16,
+      24,
+      24
+    ],
+    "rope_type": "default",
+    "type": "default"
+  },
+  "rope_theta": 1000000.0,
+  "sliding_window": 32768,
+  "text_config": {
+    "_name_or_path": "/share/lipengteng/VLA-Engine-Developer/data/pretrained_models/Qwen2.5-VL-3B-Instruct",
+    "architectures": [
+      "Qwen2_5_VLForConditionalGeneration"
+    ],
+    "attention_dropout": 0.0,
+    "bos_token_id": 151643,
+    "dtype": "bfloat16",
+    "eos_token_id": 151645,
+    "hidden_act": "silu",
+    "hidden_size": 2048,
+    "initializer_range": 0.02,
+    "intermediate_size": 11008,
+    "layer_types": [
+      "full_attention",
+      "full_attention",
+      "full_attention",
+      "full_attention",
+      "full_attention",
+      "full_attention",
+      "full_attention",
+      "full_attention",
+      "full_attention",
+      "full_attention",
+      "full_attention",
+      "full_attention",
+      "full_attention",
+      "full_attention",
+      "full_attention",
+      "full_attention",
+      "full_attention",
+      "full_attention",
+      "full_attention",
+      "full_attention",
+      "full_attention",
+      "full_attention",
+      "full_attention",
+      "full_attention",
+      "full_attention",
+      "full_attention",
+      "full_attention",
+      "full_attention",
+      "full_attention",
+      "full_attention",
+      "full_attention",
+      "full_attention",
+      "full_attention",
+      "full_attention",
+      "full_attention",
+      "full_attention"
+    ],
+    "max_position_embeddings": 128000,
+    "max_window_layers": 70,
+    "model_type": "qwen2_5_vl_text",
+    "num_attention_heads": 16,
+    "num_hidden_layers": 36,
+    "num_key_value_heads": 2,
+    "rms_norm_eps": 1e-06,
+    "rope_scaling": {
+      "mrope_section": [
+        16,
+        24,
+        24
+      ],
+      "rope_type": "default",
+      "type": "default"
+    },
+    "rope_theta": 1000000.0,
+    "sliding_window": null,
+    "tie_word_embeddings": true,
+    "use_cache": true,
+    "use_sliding_window": false,
+    "vision_token_id": 151654,
+    "vocab_size": 151936
+  },
+  "transformers_version": "4.57.0",
+  "use_cache": true,
+  "use_sliding_window": false,
+  "video_token_id": 151656,
+  "vision_config": {
+    "depth": 32,
+    "fullatt_block_indexes": [
+      7,
+      15,
+      23,
+      31
+    ],
+    "hidden_act": "silu",
+    "hidden_size": 1280,
+    "in_channels": 3,
+    "in_chans": 3,
+    "initializer_range": 0.02,
+    "intermediate_size": 3420,
+    "model_type": "qwen2_5_vl",
+    "num_heads": 16,
+    "out_hidden_size": 2048,
+    "patch_size": 14,
+    "spatial_merge_size": 2,
+    "spatial_patch_size": 14,
+    "temporal_patch_size": 2,
+    "tokens_per_second": 2,
+    "window_size": 112
+  },
+  "vision_end_token_id": 151653,
+  "vision_start_token_id": 151652,
+  "vision_token_id": 151654,
+  "vocab_size": 151936
+}

qwen_pretrained/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

qwen_pretrained/preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,39 @@

+{
+  "crop_size": null,
+  "data_format": "channels_first",
+  "default_to_square": true,
+  "device": null,
+  "disable_grouping": null,
+  "do_center_crop": null,
+  "do_convert_rgb": true,
+  "do_normalize": true,
+  "do_pad": null,
+  "do_rescale": true,
+  "do_resize": true,
+  "image_mean": [
+    0.48145466,
+    0.4578275,
+    0.40821073
+  ],
+  "image_processor_type": "Qwen2VLImageProcessorFast",
+  "image_std": [
+    0.26862954,
+    0.26130258,
+    0.27577711
+  ],
+  "input_data_format": null,
+  "max_pixels": 12845056,
+  "merge_size": 2,
+  "min_pixels": 3136,
+  "pad_size": null,
+  "patch_size": 14,
+  "processor_class": "Qwen2_5_VLProcessor",
+  "resample": 3,
+  "rescale_factor": 0.00392156862745098,
+  "return_tensors": null,
+  "size": {
+    "longest_edge": 12845056,
+    "shortest_edge": 3136
+  },
+  "temporal_patch_size": 2
+}

qwen_pretrained/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

qwen_pretrained/tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5eee858c5123a4279c3e1f7b81247343f356ac767940b2692a928ad929543214
+size 11422063

qwen_pretrained/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,208 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "model_max_length": 131072,
+  "pad_token": "<|endoftext|>",
+  "processor_class": "Qwen2_5_VLProcessor",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

qwen_pretrained/video_preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,43 @@

+{
+  "crop_size": null,
+  "data_format": "channels_first",
+  "default_to_square": true,
+  "device": null,
+  "do_center_crop": null,
+  "do_convert_rgb": true,
+  "do_normalize": true,
+  "do_rescale": true,
+  "do_resize": true,
+  "do_sample_frames": false,
+  "fps": null,
+  "image_mean": [
+    0.48145466,
+    0.4578275,
+    0.40821073
+  ],
+  "image_std": [
+    0.26862954,
+    0.26130258,
+    0.27577711
+  ],
+  "input_data_format": null,
+  "max_frames": 768,
+  "max_pixels": 12845056,
+  "merge_size": 2,
+  "min_frames": 4,
+  "min_pixels": 3136,
+  "num_frames": null,
+  "pad_size": null,
+  "patch_size": 14,
+  "processor_class": "Qwen2_5_VLProcessor",
+  "resample": 3,
+  "rescale_factor": 0.00392156862745098,
+  "return_metadata": false,
+  "size": {
+    "longest_edge": 12845056,
+    "shortest_edge": 3136
+  },
+  "temporal_patch_size": 2,
+  "video_metadata": null,
+  "video_processor_type": "Qwen2VLVideoProcessor"
+}

qwen_pretrained/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff

resume_meta.json ADDED Viewed

	@@ -0,0 +1,8 @@

+{
+  "completed_steps": 50000,
+  "num_gpus": 4,
+  "gradient_accumulation_steps": 1,
+  "per_device_batch_size": 16,
+  "effective_batch_size": 64,
+  "framework_name": "NeuroVLA"
+}