YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
ClimbMix-Ja Initial64 350M Artifacts
This repository is a public backup for the initial 64 ClimbMix-Ja candidate runs.
- Candidate count: 64
- Base model:
nvidia/nemotron-climb-proxy-models350M converted to a Megatron-LM TE-compatible checkpoint - Training corpus:
KantaHayashiAI/ClimbLab-Jaclustered intocluster_01...cluster_20 - Sequence length: 1024
- Train iterations per candidate: 6500
- Global batch size: 304
- Tokens per candidate: 2,023,424,000
- Total trained tokens across candidates: 129,499,136,000
- Precision/backend: BF16, Transformer Engine, FlashAttention
The candidate mapping files are:
candidate_mapping.jsonlcandidate_mapping.csv
Each candidate_id (n1 ... n64) maps a checkpoint to the exact mixture script and train-data path used for that run.
Contents
This model repository stores the 64 post-training Megatron distributed checkpoints.
For candidate nX, the checkpoint is located at:
nX/work/checkpoint
The matching training data and mixture definitions are stored in:
https://huggingface.co/datasets/KantaHayashiAI/ClimbMix-Ja-Initial64-Training-Data
The checkpoints are not converted to Hugging Face Transformers format; they are Megatron-LM torch_dist checkpoints.