AlmightyFish commited on 19 days ago

Commit

16a250e

1 Parent(s): 35d3b76

model

Browse files

Files changed (32) hide show

.gitattributes +4 -32
README.md +102 -0
checkpoints/qwen_10ep_final/README.md +207 -0
checkpoints/qwen_10ep_final/adapter_config.json +46 -0
checkpoints/qwen_10ep_final/adapter_model.safetensors +3 -0
checkpoints/qwen_10ep_final/moe_layer.pt +3 -0
checkpoints/qwen_10ep_final/motion_embed.pt +3 -0
checkpoints/qwen_10ep_final/motion_lm_head.pt +3 -0
checkpoints/qwen_ep1_best/README.md +207 -0
checkpoints/qwen_ep1_best/adapter_config.json +46 -0
checkpoints/qwen_ep1_best/adapter_model.safetensors +3 -0
checkpoints/qwen_ep1_best/moe_layer.pt +3 -0
checkpoints/qwen_ep1_best/motion_embed.pt +3 -0
checkpoints/qwen_ep1_best/motion_lm_head.pt +3 -0
checkpoints/step_0105536/README.md +207 -0
checkpoints/step_0105536/adapter_config.json +46 -0
checkpoints/step_0105536/adapter_model.safetensors +3 -0
checkpoints/step_0105536/moe_layer.pt +3 -0
checkpoints/step_0105536/motion_embed.pt +3 -0
checkpoints/step_0105536/motion_lm_head.pt +3 -0
checkpoints/step_0105536/train_state.pt +3 -0
data/dataset.json +3 -0
organize_data.sh +78 -0
swift/eval.jsonl +3 -0
swift/motion_tokens.txt +0 -0
swift/train.jsonl +3 -0
tokenizer/base/fast_config.json +6 -0
tokenizer/base/tokenizer.json +0 -0
tokenizer/base/tokenizer_config.json +6 -0
tokenizer/phys/fast_config.json +6 -0
tokenizer/phys/tokenizer.json +0 -0
tokenizer/phys/tokenizer_config.json +6 -0

.gitattributes CHANGED Viewed

@@ -1,37 +1,9 @@
-*.7z filter=lfs diff=lfs merge=lfs -text
-*.arrow filter=lfs diff=lfs merge=lfs -text
-*.bin filter=lfs diff=lfs merge=lfs -text
-*.bz2 filter=lfs diff=lfs merge=lfs -text
-*.ckpt filter=lfs diff=lfs merge=lfs -text
-*.ftz filter=lfs diff=lfs merge=lfs -text
-*.gz filter=lfs diff=lfs merge=lfs -text
-*.h5 filter=lfs diff=lfs merge=lfs -text
-*.joblib filter=lfs diff=lfs merge=lfs -text
-*.lfs.* filter=lfs diff=lfs merge=lfs -text
-*.mlmodel filter=lfs diff=lfs merge=lfs -text
-*.model filter=lfs diff=lfs merge=lfs -text
-*.msgpack filter=lfs diff=lfs merge=lfs -text
-*.npy filter=lfs diff=lfs merge=lfs -text
-*.npz filter=lfs diff=lfs merge=lfs -text
-*.onnx filter=lfs diff=lfs merge=lfs -text
-*.ot filter=lfs diff=lfs merge=lfs -text
-*.parquet filter=lfs diff=lfs merge=lfs -text
-*.pb filter=lfs diff=lfs merge=lfs -text
-*.pickle filter=lfs diff=lfs merge=lfs -text
-*.pkl filter=lfs diff=lfs merge=lfs -text
 *.pt filter=lfs diff=lfs merge=lfs -text
-*.pth filter=lfs diff=lfs merge=lfs -text
-*.rar filter=lfs diff=lfs merge=lfs -text
 *.safetensors filter=lfs diff=lfs merge=lfs -text
-saved_model/**/* filter=lfs diff=lfs merge=lfs -text
-*.tar.* filter=lfs diff=lfs merge=lfs -text
-*.tar filter=lfs diff=lfs merge=lfs -text
-*.tflite filter=lfs diff=lfs merge=lfs -text
-*.tgz filter=lfs diff=lfs merge=lfs -text
-*.wasm filter=lfs diff=lfs merge=lfs -text
-*.xz filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
-*.zst filter=lfs diff=lfs merge=lfs -text
-*tfevents* filter=lfs diff=lfs merge=lfs -text
 data/dataset.json filter=lfs diff=lfs merge=lfs -text
 *.jsonl filter=lfs diff=lfs merge=lfs -text

 *.pt filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
 *.safetensors filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
+*.mp4 filter=lfs diff=lfs merge=lfs -text
+*.gif filter=lfs diff=lfs merge=lfs -text
 data/dataset.json filter=lfs diff=lfs merge=lfs -text
 *.jsonl filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,3 +1,105 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
+language:
+- en
+- zh
+tags:
+- motion-generation
+- vision-language
+- robotics
+- qwen
+- dual-stream
+datasets:
+- MotionVLA-Dataset
 ---
+# MotionVLA
+**MotionVLA** is an end-to-end motion generation system combining Qwen3.5-VL (vision-language perception) with a Dual-Stream FAST Tokenizer (DS-FAST) for human motion generation from image and text inputs.
+## Repository Contents
+This HuggingFace repository contains:
+| Path | Description |
+|------|-------------|
+| `tokenizer/` | DS-FAST dual-stream tokenizer checkpoints |
+| `tokenizer/base/` | Base stream BPE tokenizer (4096 vocab, 201-dim DCT) |
+| `tokenizer/phys/` | Phys stream BPE tokenizer (2048 vocab, 75-dim DCT) |
+| `dataset/` | Dataset index files (motion_path → relative paths) |
+**Motion data files** (`.pt`) and **images** are stored in the companion dataset repo: `[your-hf-username]/MotionVLA-Dataset`
+## Tokenizer Design
+The DS-FAST tokenizer decomposes 276-dim ViMoGen motion into two streams:
+```
+276-dim motion (T frames)
+    ↓ split by dimension
+Base (201-dim): body_pose_6d + joints + root_orient + root_trans   ← low-freq semantic
+Phys  (75-dim): joints_vel + root_vel + root_trans_vel             ← high-freq dynamics
+    ↓ DCT along time axis, keep top K coefficients
+    ↓ BPE encoding
+Base tokens: ~477/sequence  (K=5,  vocab=4096)
+Phys tokens: ~40/sequence   (K=15, vocab=2048)
+```
+Output sequence format (T5 vocab space):
+```
+[BOS=0, base_1+32100, ..., base_N+32100, SEP=32099, phys_1+36196, ..., phys_M+36196, EOS=1]
+```
+## Token Vocabulary
+| Token type | ID range | Count |
+|------------|----------|-------|
+| T5 special (BOS/EOS/SEP) | 0, 1, 32099 | 3 |
+| Base motion tokens | 32100 – 36195 | 4096 |
+| Phys motion tokens | 36196 – 38243 | 2048 |
+| **Total vocab** | — | **38244** |
+For Qwen vocab space (used in ms-swift training):
+| Token type | ID range | Count |
+|------------|----------|-------|
+| Base motion tokens | 248320 – 252415 | 4096 |
+| Phys motion tokens | 252416 – 256511 | 4096 |
+| MOTION_BOS | 256512 | 1 |
+| MOTION_SEP | 256513 | 1 |
+| MOTION_EOS | 256514 | 1 |
+## Usage
+```python
+from tokenizer.ds_fast_tokenizer import DSFASTTokenizer
+import numpy as np
+# Load tokenizer
+tok = DSFASTTokenizer.load("tokenizer/checkpoints")
+# Encode 276-dim motion
+motion = np.load("motion.npy")  # shape: (T, 276)
+result = tok.encode(motion)
+# result["base_tokens"]: list of int (BPE IDs for base stream)
+# result["phys_tokens"]: list of int (BPE IDs for phys stream)
+# result["T"]: number of frames
+# Decode back
+base_recon, phys_recon = tok.decode(
+    result["base_tokens"], result["phys_tokens"], result["T"])
+# base_recon: (T, 201), phys_recon: (T, 75)
+```
+## Code
+Training code and model architecture: [GitHub](https://github.com/[your-username]/MotionVLA)
+## Citation
+```bibtex
+@article{motionvla2025,
+  title={MotionVLA: End-to-End Motion Generation with Vision-Language Models},
+  author={},
+  year={2025}
+}
+```

checkpoints/qwen_10ep_final/README.md ADDED Viewed

	@@ -0,0 +1,207 @@

+---
+base_model: /Users/bytedance/Downloads/MotionVLA/motionvla/checkpoints/Qwen3.5-08B
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:/Users/bytedance/Downloads/MotionVLA/motionvla/checkpoints/Qwen3.5-08B
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.18.1

checkpoints/qwen_10ep_final/adapter_config.json ADDED Viewed

	@@ -0,0 +1,46 @@

+{
+  "alora_invocation_tokens": null,
+  "alpha_pattern": {},
+  "arrow_config": null,
+  "auto_mapping": null,
+  "base_model_name_or_path": "/Users/bytedance/Downloads/MotionVLA/motionvla/checkpoints/Qwen3.5-08B",
+  "bias": "none",
+  "corda_config": null,
+  "ensure_weight_tying": false,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 64,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "peft_version": "0.18.1",
+  "qalora_group_size": 16,
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "gate_proj",
+    "q_proj",
+    "down_proj",
+    "k_proj",
+    "v_proj",
+    "o_proj",
+    "up_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

checkpoints/qwen_10ep_final/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e2a5a0110f1a2b90cbd2ef66a0b2cd0eca79ab9886132a68102ac04ab5776c20
+size 51146224

checkpoints/qwen_10ep_final/moe_layer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:43fc69b76db2087b100ef1928430e15d35a8445cd1c076c8331fa112780e1e29
+size 1072093

checkpoints/qwen_10ep_final/motion_embed.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1c91b9f24a0ce4ddd3f8d78649fa56500ad06d3204e82528ce2715ee85705d1f
+size 25179724

checkpoints/qwen_10ep_final/motion_lm_head.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1b0a243aeef89c94e6304f6a5f81fc02519d7a8f3ca193e9bc0b6b1ea8c1796b
+size 25179738

checkpoints/qwen_ep1_best/README.md ADDED Viewed

	@@ -0,0 +1,207 @@

+---
+base_model: /Users/bytedance/Downloads/MotionVLA/motionvla/checkpoints/Qwen3.5-08B
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:/Users/bytedance/Downloads/MotionVLA/motionvla/checkpoints/Qwen3.5-08B
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.18.1

checkpoints/qwen_ep1_best/adapter_config.json ADDED Viewed

	@@ -0,0 +1,46 @@

+{
+  "alora_invocation_tokens": null,
+  "alpha_pattern": {},
+  "arrow_config": null,
+  "auto_mapping": null,
+  "base_model_name_or_path": "/Users/bytedance/Downloads/MotionVLA/motionvla/checkpoints/Qwen3.5-08B",
+  "bias": "none",
+  "corda_config": null,
+  "ensure_weight_tying": false,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 64,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "peft_version": "0.18.1",
+  "qalora_group_size": 16,
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "up_proj",
+    "k_proj",
+    "v_proj",
+    "o_proj",
+    "q_proj",
+    "down_proj",
+    "gate_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

checkpoints/qwen_ep1_best/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9941b97c3c00a825e79961bebadcf29de0c126c4c4caf0c1e010fe39e9abd643
+size 51146224

checkpoints/qwen_ep1_best/moe_layer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:efbb275d3a22ff993ac4544e39ac654a66d75e118ba0482a0256b593db236af5
+size 1072093

checkpoints/qwen_ep1_best/motion_embed.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:190d0dd2f2e39f4f1500335bc16a8e60eafc1d8301b1fcc9d3ba8b0e04ca1f8b
+size 12590668

checkpoints/qwen_ep1_best/motion_lm_head.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fb007caf826b7955b086efe5c9d0c1541645064ec736e658ebbcd32603d8f071
+size 12590682

checkpoints/step_0105536/README.md ADDED Viewed

	@@ -0,0 +1,207 @@

+---
+base_model: /Users/bytedance/Downloads/MotionVLA/motionvla/checkpoints/Qwen3.5-08B
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:/Users/bytedance/Downloads/MotionVLA/motionvla/checkpoints/Qwen3.5-08B
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.18.1

checkpoints/step_0105536/adapter_config.json ADDED Viewed

	@@ -0,0 +1,46 @@

+{
+  "alora_invocation_tokens": null,
+  "alpha_pattern": {},
+  "arrow_config": null,
+  "auto_mapping": null,
+  "base_model_name_or_path": "/Users/bytedance/Downloads/MotionVLA/motionvla/checkpoints/Qwen3.5-08B",
+  "bias": "none",
+  "corda_config": null,
+  "ensure_weight_tying": false,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 64,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "peft_version": "0.18.1",
+  "qalora_group_size": 16,
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "q_proj",
+    "down_proj",
+    "up_proj",
+    "o_proj",
+    "v_proj",
+    "gate_proj",
+    "k_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

checkpoints/step_0105536/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:18cd2d7839ccff9de34152c5b852fc4192f9c20e0edb6519e5f2d35466493e70
+size 51146224

checkpoints/step_0105536/moe_layer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8cea6ac45e2ce8a3e5d58701a96432f24b753fc1ade3107d21b30b6b88b26756
+size 1072093

checkpoints/step_0105536/motion_embed.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:17708f88d14f32f791c6c945c7090e4733b3f5f1fbab2f301918f5d562497ed2
+size 25179724

checkpoints/step_0105536/motion_lm_head.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:015570f76fdce13761a36f8367ca727bba9a7c3a675e6aeffc464c64da13ec2d
+size 25179738

checkpoints/step_0105536/train_state.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:012bf1418c1eff5f337cc3dfec0a808b3bed7e991ee5278cfb04e6a81c7db96f
+size 205253199

data/dataset.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:644ddaaf1300f86ccee7f21be650905979a400291b70fb7a0884f8b6a29c5147
+size 13588468

organize_data.sh ADDED Viewed

	@@ -0,0 +1,78 @@

+#!/bin/bash
+# organize_data.sh
+# Moves large data files into the HuggingFace repo structure.
+# Run this ONCE before uploading to HuggingFace.
+#
+# What this moves:
+#   images/          (7.8GB, 53K files)   → data/images/
+#   motions_tokens/  (338MB, 41,971 .pt)  → data/motions_tokens/
+#   motions_raw/     (4.8GB, 41,971 .pt)  → data/motions_raw/
+#
+# Source paths (edit if needed):
+SRC_IMAGES="/Users/bytedance/Downloads/MotionVLA/motionvla/data/vimogen_full/images"
+SRC_TOKENS="/Users/bytedance/Downloads/MotionVLA/motionvla/data/vimogen_full/motions_dsfast_v4"
+SRC_RAW="/Users/bytedance/Downloads/MotionVLA/motionvla/data/vimogen_full/in_the_wild_video"
+# Destination (relative to this script's directory):
+SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
+DST_IMAGES="$SCRIPT_DIR/data/images"
+DST_TOKENS="$SCRIPT_DIR/data/motions_tokens"
+DST_RAW="$SCRIPT_DIR/data/motions_raw"
+set -e
+echo "=================================================="
+echo " MotionVLA Data Organizer"
+echo "=================================================="
+echo ""
+echo "Source images : $SRC_IMAGES"
+echo "Source tokens : $SRC_TOKENS"
+echo "Source raw    : $SRC_RAW"
+echo ""
+echo "Destination   : $SCRIPT_DIR/data/"
+echo ""
+echo "Press ENTER to continue, Ctrl+C to cancel..."
+read
+# Step 1: Move images
+if [ -d "$SRC_IMAGES" ]; then
+    echo "[1/3] Moving images (7.8GB) ..."
+    mkdir -p "$DST_IMAGES"
+    mv "$SRC_IMAGES"/* "$DST_IMAGES"/
+    echo "  Done: $(ls "$DST_IMAGES" | wc -l) files"
+else
+    echo "[1/3] SKIP: $SRC_IMAGES not found"
+fi
+# Step 2: Move motion tokens (v4, Qwen vocab space)
+if [ -d "$SRC_TOKENS" ]; then
+    echo "[2/3] Moving motion tokens (338MB) ..."
+    mkdir -p "$DST_TOKENS"
+    mv "$SRC_TOKENS"/* "$DST_TOKENS"/
+    echo "  Done: $(ls "$DST_TOKENS" | wc -l) files"
+else
+    echo "[2/3] SKIP: $SRC_TOKENS not found"
+fi
+# Step 3: Move raw 276-dim motions
+if [ -d "$SRC_RAW" ]; then
+    echo "[3/3] Moving raw 276-dim motions (4.8GB) ..."
+    mkdir -p "$DST_RAW"
+    mv "$SRC_RAW"/* "$DST_RAW"/
+    echo "  Done: $(ls "$DST_RAW" | wc -l) files"
+else
+    echo "[3/3] SKIP: $SRC_RAW not found"
+fi
+echo ""
+echo "=================================================="
+echo " Data organization complete!"
+echo " Total size:"
+du -sh "$SCRIPT_DIR/data/" 2>/dev/null
+echo "=================================================="
+echo ""
+echo "Next steps:"
+echo "  1. Upload to HuggingFace:"
+echo "     huggingface-cli upload <your-hf-username>/MotionVLA-Dataset . --repo-type dataset"
+echo "  2. Upload model checkpoints:"
+echo "     huggingface-cli upload <your-hf-username>/MotionVLA checkpoints/ --repo-type model"

swift/eval.jsonl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9395f694d5d182bfbd8a53c59b979766a2b04027c89a143f23903bee0948c242
+size 32963062

swift/motion_tokens.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

swift/train.jsonl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:83f3323915cd7830491f8adb0d5ff6e3da97fb9454ae8f295d0adc41b984c36f
+size 295630926

tokenizer/base/fast_config.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "scale": 10.0,
+  "min_token": -512,
+  "K": 5,
+  "action_dim": 201
+}

tokenizer/base/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer/base/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "backend": "tokenizers",
+  "clean_up_tokenization_spaces": false,
+  "model_max_length": 1000000000000000019884624838656,
+  "tokenizer_class": "TokenizersBackend"
+}

tokenizer/phys/fast_config.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "scale": 10.0,
+  "min_token": -20,
+  "K": 15,
+  "action_dim": 75
+}

tokenizer/phys/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer/phys/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "backend": "tokenizers",
+  "clean_up_tokenization_spaces": false,
+  "model_max_length": 1000000000000000019884624838656,
+  "tokenizer_class": "TokenizersBackend"
+}