YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

ClimbMix-Ja Initial64 350M Artifacts

This repository is a public backup for the initial 64 ClimbMix-Ja candidate runs.

  • Candidate count: 64
  • Base model: nvidia/nemotron-climb-proxy-models 350M converted to a Megatron-LM TE-compatible checkpoint
  • Training corpus: KantaHayashiAI/ClimbLab-Ja clustered into cluster_01 ... cluster_20
  • Sequence length: 1024
  • Train iterations per candidate: 6500
  • Global batch size: 304
  • Tokens per candidate: 2,023,424,000
  • Total trained tokens across candidates: 129,499,136,000
  • Precision/backend: BF16, Transformer Engine, FlashAttention

The candidate mapping files are:

  • candidate_mapping.jsonl
  • candidate_mapping.csv

Each candidate_id (n1 ... n64) maps a checkpoint to the exact mixture script and train-data path used for that run.

Contents

This model repository stores the 64 post-training Megatron distributed checkpoints.

For candidate nX, the checkpoint is located at:

nX/work/checkpoint

The matching training data and mixture definitions are stored in:

https://huggingface.co/datasets/KantaHayashiAI/ClimbMix-Ja-Initial64-Training-Data

The checkpoints are not converted to Hugging Face Transformers format; they are Megatron-LM torch_dist checkpoints.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support