3dcvt-lrw1000 / README.md
RaikkonenWu's picture
Add files using upload-large-folder tool
464090a verified
---
license: mit
library_name: pytorch
tags:
- pytorch
- lip-reading
- computer-vision
- video-classification
- reproduction
- 3dcvt
---
# 3DCvT on LRW-1000
This repository provides the released checkpoint and evaluation artifacts for an unofficial PyTorch reproduction of:
**A Lip Reading Method Based on 3D Convolutional Vision Transformer**
Code repository:
- https://github.com/DPInnovationWorks/3DCvT_LipReading
## Model Summary
- Task: Chinese word-level lip reading
- Dataset: LRW-1000
- Number of classes: 1184 in this processed split
- Framework: PyTorch
- Architecture: 3D CNN + CvT + BiGRU
## Released Files
- `best_model.pth`: released checkpoint
- `sha256.txt`: checksum for the checkpoint
- `logs/train.log`: selected training log
- `results/per_class_acc_lrw1000_val.csv`: per-class validation summary
- `plots/learning_curve.png`: learning curve exported from training
## Training Setup
Training settings from the released run:
- GPUs: 1 GPU
- Per-step batch size: 128
- Gradient accumulation: 2
- Effective batch size: 256
- Epochs: 120
- Optimizer: Adam
- Weight decay: 1e-4
- Learning rate: 6e-4
- Warmup epochs: 5
- Mixed precision: AMP enabled
- `torch.compile`: disabled
## Evaluation Result
| Dataset | Split | Metric | Value |
| --- | --- | --- | --- |
| LRW-1000 | Validation | Top-1 Accuracy | 55.29% |
## Intended Use
This checkpoint is intended for:
- research reproduction
- benchmark comparison
- qualitative inference demos
It is not intended as a production-ready commercial lip-reading system.
## Limitations
- Performance depends on using the matching preprocessing pipeline
- This release does not include the raw LRW-1000 dataset
- Users must obtain the dataset according to its own terms
- This processed split uses 1184 classes in the generated vocabulary
## Usage
Example inference command:
```bash
python inference.py \
--dataset lrw1000 \
--pkl_path /path/to/sample.pkl \
--checkpoint /path/to/best_model.pth \
--gpu 0
```
## Notes
- The checkpoint is released for reproducibility
- Please use the matching code version when possible
- Local source artifact names were `best_model_for_lrw1000.pth` and `train_lrw1000.log`
## Citation
If you use this release, please cite the original paper:
```bibtex
@article{wu2022lip,
title={A Lip Reading Method Based on 3D Convolutional Vision Transformer},
author={Wu, Jiafeng and others},
journal={IEEE Access},
year={2022}
}
```