ClimbMix-Ja Initial64 350M Artifacts

This repository is a public backup for the initial 64 ClimbMix-Ja candidate runs.

Candidate count: 64
Base model: nvidia/nemotron-climb-proxy-models 350M converted to a Megatron-LM TE-compatible checkpoint
Training corpus: KantaHayashiAI/ClimbLab-Ja clustered into cluster_01 ... cluster_20
Sequence length: 1024
Train iterations per candidate: 6500
Global batch size: 304
Tokens per candidate: 2,023,424,000
Total trained tokens across candidates: 129,499,136,000
Precision/backend: BF16, Transformer Engine, FlashAttention

The candidate mapping files are:

Each candidate_id (n1 ... n64) maps a checkpoint to the exact mixture script and train-data path used for that run.

This model repository stores the 64 post-training Megatron distributed checkpoints.

For candidate nX, the checkpoint is located at:

nX/work/checkpoint

The matching training data and mixture definitions are stored in:

https://huggingface.co/datasets/KantaHayashiAI/ClimbMix-Ja-Initial64-Training-Data

The checkpoints are not converted to Hugging Face Transformers format; they are Megatron-LM torch_dist checkpoints.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

KantaHayashiAI
/

ClimbMix-Ja-350M-Initial64-Checkpoints