0: W1123 14:35:36.265000 2356243 torch/distributed/run.py:792] 0: W1123 14:35:36.265000 2356243 torch/distributed/run.py:792] ***************************************** 0: W1123 14:35:36.265000 2356243 torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 0: W1123 14:35:36.265000 2356243 torch/distributed/run.py:792] ***************************************** 3: W1123 14:35:36.265000 51501 torch/distributed/run.py:792] 3: W1123 14:35:36.265000 51501 torch/distributed/run.py:792] ***************************************** 3: W1123 14:35:36.265000 51501 torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 3: W1123 14:35:36.265000 51501 torch/distributed/run.py:792] ***************************************** 2: W1123 14:35:36.265000 895971 torch/distributed/run.py:792] 2: W1123 14:35:36.265000 895971 torch/distributed/run.py:792] ***************************************** 2: W1123 14:35:36.265000 895971 torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 2: W1123 14:35:36.265000 895971 torch/distributed/run.py:792] ***************************************** 1: W1123 14:35:36.265000 2965489 torch/distributed/run.py:792] 1: W1123 14:35:36.265000 2965489 torch/distributed/run.py:792] ***************************************** 1: W1123 14:35:36.265000 2965489 torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 1: W1123 14:35:36.265000 2965489 torch/distributed/run.py:792] ***************************************** 2: [2025-11-23 14:40:32,705] [INFO] [axolotl.utils.schemas.validation.check_eval_packing:119] [PID:896064] [RANK:0] explicitly setting `eval_sample_packing` to match `sample_packing` 3: [2025-11-23 14:40:32,705] [INFO] [axolotl.utils.schemas.validation.check_eval_packing:119] [PID:51582] [RANK:0] explicitly setting `eval_sample_packing` to match `sample_packing` 2: [2025-11-23 14:40:32,705] [INFO] [axolotl.utils.schemas.validation.hint_sample_packing_padding:218] [PID:896064] [RANK:0] Setting `pad_to_sequence_len: true` to prevent memory leaks when sample_packing 3: [2025-11-23 14:40:32,705] [INFO] [axolotl.utils.schemas.validation.hint_sample_packing_padding:218] [PID:51582] [RANK:0] Setting `pad_to_sequence_len: true` to prevent memory leaks when sample_packing 0: [2025-11-23 14:40:32,706] [INFO] [axolotl.utils.schemas.validation.check_eval_packing:119] [PID:2356327] [RANK:0] explicitly setting `eval_sample_packing` to match `sample_packing` 1: [2025-11-23 14:40:32,705] [INFO] [axolotl.utils.schemas.validation.check_eval_packing:119] [PID:2965582] [RANK:0] explicitly setting `eval_sample_packing` to match `sample_packing` 0: [2025-11-23 14:40:32,706] [INFO] [axolotl.utils.schemas.validation.hint_sample_packing_padding:218] [PID:2356327] [RANK:0] Setting `pad_to_sequence_len: true` to prevent memory leaks when sample_packing 1: [2025-11-23 14:40:32,706] [INFO] [axolotl.utils.schemas.validation.hint_sample_packing_padding:218] [PID:2965582] [RANK:0] Setting `pad_to_sequence_len: true` to prevent memory leaks when sample_packing 0: [2025-11-23 14:41:03,121] [WARNING] [axolotl.utils.config.normalize_config:139] [PID:2356327] [RANK:0] Invalid value for save_steps (1.6666666666666667) from saves_per_epoch and/or num_epochs. Saving at training end only. 0: [2025-11-23 14:41:03,271] [INFO] [axolotl.cli.config.load_cfg:245] [PID:2356327] [RANK:0] config: 0: { 0: "activation_offloading": false, 0: "auto_resume_from_checkpoints": true, 0: "axolotl_config_path": "/lustre/fswork/projects/rech/dgo/udv55np/train/tmp/1763904854732780523.yaml", 0: "base_model": "/lustre/fswork/projects/rech/qwv/udv55np/Gemma/base/gemma-3-4b", 0: "base_model_config": "/lustre/fswork/projects/rech/qwv/udv55np/Gemma/base/gemma-3-4b", 0: "batch_size": 16, 0: "bf16": true, 0: "capabilities": { 0: "bf16": true, 0: "compute_capability": "sm_90", 0: "fp8": false, 0: "n_gpu": 16, 0: "n_node": 1 0: }, 0: "chat_template": "gemma3", 0: "context_parallel_size": 1, 0: "curriculum_sampling": true, 0: "dataloader_num_workers": 2, 0: "dataset_prepared_path": "/lustre/fswork/projects/rech/dgo/udv55np/dataset_gemma/Nemotron-Super-49B-v1_5/split_0.75", 0: "dataset_processes": 32, 0: "datasets": [ 0: { 0: "chat_template": "tokenizer_default", 0: "data_files": [ 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Nemotron-Super-49B-v1_5/no_thinking/0004.jsonl", 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Nemotron-Super-49B-v1_5/no_thinking/0011.jsonl", 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Nemotron-Super-49B-v1_5/no_thinking/0000.jsonl", 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Nemotron-Super-49B-v1_5/no_thinking/0003.jsonl" 0: ], 0: "ds_type": "json", 0: "field_messages": "conversations", 0: "message_property_mappings": { 0: "content": "content", 0: "role": "role" 0: }, 0: "path": "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Nemotron-Super-49B-v1_5/no_thinking", 0: "trust_remote_code": false, 0: "type": "chat_template" 0: }, 0: { 0: "chat_template": "tokenizer_default", 0: "data_files": [ 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Nemotron-Super-49B-v1_5/thinking/0007.jsonl", 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Nemotron-Super-49B-v1_5/thinking/0009.jsonl", 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Nemotron-Super-49B-v1_5/thinking/0005.jsonl", 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Nemotron-Super-49B-v1_5/thinking/0006.jsonl", 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Nemotron-Super-49B-v1_5/thinking/0014.jsonl", 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Nemotron-Super-49B-v1_5/thinking/0010.jsonl", 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Nemotron-Super-49B-v1_5/thinking/0012.jsonl", 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Nemotron-Super-49B-v1_5/thinking/0008.jsonl", 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Nemotron-Super-49B-v1_5/thinking/0001.jsonl", 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Nemotron-Super-49B-v1_5/thinking/0002.jsonl", 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Nemotron-Super-49B-v1_5/thinking/0013.jsonl", 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Nemotron-Super-49B-v1_5/thinking/0015.jsonl" 0: ], 0: "ds_type": "json", 0: "field_messages": "conversations", 0: "message_property_mappings": { 0: "content": "content", 0: "role": "role" 0: }, 0: "path": "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Nemotron-Super-49B-v1_5/thinking", 0: "trust_remote_code": false, 0: "type": "chat_template" 0: } 0: ], 0: "ddp": true, 0: "deepspeed": { 0: "bf16": { 0: "enabled": true 0: }, 0: "gradient_accumulation_steps": "auto", 0: "gradient_clipping": "auto", 0: "train_batch_size": "auto", 0: "train_micro_batch_size_per_gpu": "auto", 0: "wall_clock_breakdown": false, 0: "zero_optimization": { 0: "contiguous_gradients": true, 0: "overlap_comm": true, 0: "reduce_bucket_size": "auto", 0: "stage": 3, 0: "stage3_gather_16bit_weights_on_model_save": true, 0: "stage3_param_persistence_threshold": "auto", 0: "stage3_prefetch_bucket_size": "auto", 0: "sub_group_size": 0 0: } 0: }, 0: "device": "cuda:0", 0: "device_map": { 0: "": 0 0: }, 0: "dion_rank_fraction": 1.0, 0: "dion_rank_multiple_of": 1, 0: "env_capabilities": { 0: "torch_version": "2.6.0" 0: }, 0: "eot_tokens": [ 0: "" 0: ], 0: "eval_batch_size": 1, 0: "eval_causal_lm_metrics": [ 0: "sacrebleu", 0: "comet", 0: "ter", 0: "chrf" 0: ], 0: "eval_max_new_tokens": 128, 0: "eval_sample_packing": true, 0: "eval_table_size": 0, 0: "evals_per_epoch": 0, 0: "flash_attention": true, 0: "fp16": false, 0: "gradient_accumulation_steps": 1, 0: "gradient_checkpointing": true, 0: "gradient_checkpointing_kwargs": { 0: "use_reentrant": true 0: }, 0: "is_multimodal": true, 0: "learning_rate": 5e-06, 0: "lisa_layers_attribute": "model.layers", 0: "load_best_model_at_end": false, 0: "load_in_4bit": false, 0: "load_in_8bit": false, 0: "local_rank": 0, 0: "logging_steps": 10, 0: "lora_dropout": 0.0, 0: "loraplus_lr_embedding": 1e-06, 0: "lr_scheduler": "warmup_stable_decay", 0: "lr_scheduler_kwargs": { 0: "min_lr_ratio": 0.1, 0: "num_decay_steps": 200 0: }, 0: "max_prompt_len": 512, 0: "mean_resizing_embeddings": false, 0: "micro_batch_size": 1, 0: "model_config_type": "gemma3", 0: "num_epochs": 0.6, 0: "optimizer": "adamw_torch_fused", 0: "output_dir": "/lustre/fswork/projects/rech/dgo/udv55np/ift/Nemotron-Super-49B-v1_5/gemma-3-4b/0.75", 0: "pad_to_sequence_len": true, 0: "pretrain_multipack_attn": true, 0: "pretrain_multipack_buffer_size": 10000, 0: "processor_config": "/lustre/fswork/projects/rech/qwv/udv55np/Gemma/base/gemma-3-4b", 0: "profiler_steps_start": 0, 0: "qlora_sharded_model_loading": false, 0: "ray_num_workers": 1, 0: "resources_per_worker": { 0: "GPU": 1 0: }, 0: "sample_packing": true, 0: "sample_packing_bin_size": 200, 0: "sample_packing_group_size": 100000, 0: "sample_packing_sequentially": true, 0: "save_only_model": true, 0: "save_safetensors": true, 0: "save_total_limit": 20, 0: "saves_per_epoch": 1, 0: "sequence_len": 16384, 0: "shuffle_before_merging_datasets": true, 0: "shuffle_merged_datasets": false, 0: "skip_prepare_dataset": false, 0: "strict": false, 0: "tensor_parallel_size": 1, 0: "tf32": false, 0: "tiled_mlp_use_original_mlp": true, 0: "tokenizer_config": "/lustre/fswork/projects/rech/qwv/udv55np/Gemma/base/gemma-3-27b", 0: "torch_dtype": "torch.bfloat16", 0: "train_on_inputs": false, 0: "trl": { 0: "log_completions": false, 0: "mask_truncated_completions": false, 0: "ref_model_mixup_alpha": 0.9, 0: "ref_model_sync_steps": 64, 0: "scale_rewards": true, 0: "sync_ref_model": false, 0: "use_vllm": false, 0: "vllm_server_host": "0.0.0.0", 0: "vllm_server_port": 8000 0: }, 0: "use_ray": false, 0: "use_tensorboard": true, 0: "val_set_size": 0.0, 0: "vllm": { 0: "device": "auto", 0: "dtype": "auto", 0: "gpu_memory_utilization": 0.9, 0: "host": "0.0.0.0", 0: "port": 8000 0: }, 0: "warmup_steps": 100, 0: "weight_decay": 0.0, 0: "world_size": 16 0: } 0: [2025-11-23 14:41:03,273] [INFO] [axolotl.cli.checks.check_user_token:35] [PID:2356327] [RANK:0] Skipping HuggingFace token verification because HF_HUB_OFFLINE is set to True. Only local files will be used. 1: [2025-11-23 14:41:05,205] [INFO] [axolotl.utils.data.sft._load_raw_datasets:314] [PID:2965584] [RANK:2] Loading raw datasets... 1: Generating train split: 0 examples [00:00, ? examples/s] Generating train split: 4677 examples [00:00, 11780.35 examples/s] Generating train split: 9505 examples [00:00, 21404.17 examples/s] Generating train split: 28019 examples [00:00, 59254.09 examples/s] Generating train split: 39485 examples [00:00, 62880.71 examples/s] Generating train split: 53359 examples [00:00, 70474.35 examples/s] Generating train split: 69660 examples [00:01, 87071.54 examples/s] Generating train split: 88600 examples [00:01, 79726.65 examples/s] Generating train split: 104490 examples [00:01, 91292.05 examples/s] Generating train split: 123247 examples [00:01, 84840.27 examples/s] Generating train split: 139320 examples [00:01, 95490.75 examples/s] Generating train split: 139320 examples [00:01, 75180.64 examples/s] 1: [2025-11-23 14:41:09,428] [INFO] [axolotl.utils.data.wrappers.get_dataset_wrapper:88] [PID:2965584] [RANK:2] Loading dataset: /lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Nemotron-Super-49B-v1_5/no_thinking with base_type: chat_template and prompt_style: None 1: Tokenizing Prompts (num_proc=32): 0%| | 0/139320 [00:0016384) (num_proc=32): 0%| | 0/557278 [00:0016384) (num_proc=32): 0%| | 1000/557278 [00:00<05:12, 1782.78 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 1%|▏ | 7000/557278 [00:00<00:41, 13345.77 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 2%|▏ | 11000/557278 [00:00<00:28, 18855.90 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 3%|▎ | 17000/557278 [00:00<00:22, 24397.07 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 4%|▍ | 24000/557278 [00:01<00:15, 33709.33 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 5%|▌ | 29000/557278 [00:01<00:16, 32457.57 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 6%|▌ | 33000/557278 [00:01<00:15, 33976.13 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 7%|▋ | 40000/557278 [00:01<00:12, 42064.90 examples/s] Dropping L 1: ong Sequences (>16384) (num_proc=32): 8%|▊ | 45000/557278 [00:01<00:11, 43688.28 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 10%|▉ | 54000/557278 [00:01<00:09, 54270.51 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 11%|█ | 60000/557278 [00:01<00:09, 54278.58 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 12%|█▏ | 68000/557278 [00:01<00:08, 57011.78 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 13%|█▎ | 75000/557278 [00:02<00:08, 59258.33 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 15%|█▍ | 83000/557278 [00:02<00:07, 63350.30 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 17%|█▋ | 92000/557278 [00:02<00:06, 67088.13 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 18%|█▊ | 100000/557278 [00:02<00:06, 66738.94 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 20%|█▉ | 109000/557278 [00:02<00:06, 67269. 1: 55 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 21%|██ | 117000/557278 [00:02<00:06, 68678.72 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 23%|██▎ | 126000/557278 [00:02<00:06, 70476.67 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 24%|██▍ | 134000/557278 [00:02<00:06, 63608.15 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 25%|██▌ | 142000/557278 [00:02<00:06, 64686.54 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 27%|██▋ | 150000/557278 [00:03<00:06, 67365.46 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 29%|██▊ | 159000/557278 [00:03<00:05, 70549.51 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 30%|███ | 168000/557278 [00:03<00:05, 74078.25 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 32%|███▏ | 176000/557278 [00:03<00:05, 67639.28 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 1: 33%|███▎ | 184000/557278 [00:03<00:05, 68570.61 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 34%|███▍ | 191000/557278 [00:03<00:05, 67207.60 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 36%|███▌ | 200660/557278 [00:03<00:04, 74822.67 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 37%|███▋ | 208320/557278 [00:03<00:05, 58276.52 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 39%|███▊ | 215320/557278 [00:04<00:06, 54115.14 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 40%|███▉ | 221320/557278 [00:04<00:08, 37925.35 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 41%|████ | 226320/557278 [00:04<00:10, 33017.33 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 41%|████▏ | 230320/557278 [00:04<00:10, 30278.29 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 42%|████▏ | 235320/557278 [00:04 1: <00:09, 32772.62 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 43%|████▎ | 240320/557278 [00:05<00:09, 33964.76 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 44%|████▍ | 244320/557278 [00:05<00:11, 28406.81 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 45%|████▍ | 248320/557278 [00:05<00:10, 28867.38 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 45%|████▌ | 252320/557278 [00:05<00:10, 28331.40 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 46%|████▌ | 255320/557278 [00:05<00:10, 27478.74 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 47%|████▋ | 259320/557278 [00:05<00:10, 29383.69 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 47%|████▋ | 263320/557278 [00:05<00:09, 30014.61 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 48%|████▊ | 267320/557278 [00:06<00:09, 29395.47 examples/s] Dro 1: pping Long Sequences (>16384) (num_proc=32): 49%|████▊ | 271320/557278 [00:06<00:11, 25926.52 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 49%|████▉ | 274320/557278 [00:06<00:10, 25846.33 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 50%|████▉ | 277320/557278 [00:06<00:10, 25865.02 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 50%|█████ | 281320/557278 [00:06<00:09, 28554.43 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 51%|█████▏ | 286320/557278 [00:06<00:08, 33281.06 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 52%|█████▏ | 290320/557278 [00:06<00:08, 30185.29 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 53%|█████▎ | 294320/557278 [00:07<00:08, 32103.81 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 54%|█████▎ | 298320/557278 [00:07<00:10, 23994.35 examples/s] Dropping Long Sequences (>1 1: 6384) (num_proc=32): 54%|█████▍ | 303320/557278 [00:07<00:09, 28045.12 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 55%|█████▌ | 307320/557278 [00:07<00:08, 29525.94 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 56%|█████▌ | 312320/557278 [00:07<00:08, 29043.54 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 57%|█████▋ | 316320/557278 [00:07<00:08, 28896.52 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 57%|█████▋ | 320320/557278 [00:08<00:08, 26612.45 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 58%|█████▊ | 323320/557278 [00:08<00:10, 22826.62 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 59%|█████▉ | 329320/557278 [00:08<00:07, 30061.30 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 60%|█████▉ | 334320/557278 [00:08<00:06, 34299.36 examples/s] Dropping Long Sequences (>16384) (num_proc= 1: 32): 61%|██████ | 338320/557278 [00:08<00:07, 27682.03 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 61%|██████▏ | 342320/557278 [00:08<00:07, 29127.76 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 62%|██████▏ | 346320/557278 [00:09<00:09, 21460.08 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 64%|██████▎ | 354320/557278 [00:09<00:06, 31728.40 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 64%|██████▍ | 359320/557278 [00:09<00:05, 33567.87 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 65%|██████▌ | 364320/557278 [00:09<00:06, 27800.85 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 66%|██████▌ | 368320/557278 [00:09<00:06, 27596.72 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 67%|██████▋ | 372320/557278 [00:09<00:07, 23861.72 examples/s] Dropping Long Sequences (>16384) (num_proc=32 1: ): 68%|██████▊ | 379320/557278 [00:10<00:05, 30016.21 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 69%|██████▉ | 384320/557278 [00:10<00:05, 31094.78 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 70%|██████▉ | 388320/557278 [00:10<00:06, 26749.09 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 70%|███████ | 392320/557278 [00:10<00:05, 28252.45 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 71%|███████ | 396320/557278 [00:10<00:06, 23385.27 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 72%|███████▏ | 402320/557278 [00:10<00:05, 30025.43 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 73%|███████▎ | 408320/557278 [00:11<00:04, 32233.82 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 74%|███████▍ | 412320/557278 [00:11<00:05, 26247.60 examples/s] Dropping Long Sequences (>16384) (num_p 1: roc=32): 75%|███████▍ | 417320/557278 [00:11<00:05, 26182.43 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 75%|███████▌ | 420320/557278 [00:11<00:05, 24670.99 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 77%|███████▋ | 426320/557278 [00:11<00:04, 30292.32 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 77%|███████▋ | 431320/557278 [00:11<00:04, 31486.76 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 78%|███████▊ | 435320/557278 [00:12<00:04, 25316.50 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 79%|███████▉ | 440320/557278 [00:12<00:03, 29507.36 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 80%|███████▉ | 444320/557278 [00:12<00:04, 23860.01 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 81%|████████ | 451320/557278 [00:12<00:03, 30778.12 examples/s] Dropping Long Sequences 1: (>16384) (num_proc=32): 82%|████████▏ | 455320/557278 [00:12<00:03, 32041.71 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 82%|████████▏ | 459320/557278 [00:13<00:03, 25271.46 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 83%|████████▎ | 464320/557278 [00:13<00:03, 27319.04 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 84%|████████▍ | 468320/557278 [00:13<00:03, 22917.31 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 85%|████████▌ | 475320/557278 [00:13<00:02, 31316.08 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 86%|████████▌ | 479320/557278 [00:13<00:02, 32194.13 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 87%|████████▋ | 483320/557278 [00:13<00:02, 25440.51 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 87%|████████▋ | 487320/557278 [00:13<00:02, 27511.70 exa 1: mples/s] Dropping Long Sequences (>16384) (num_proc=32): 88%|████████▊ | 491320/557278 [00:14<00:02, 23383.26 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 89%|████████▉ | 497320/557278 [00:14<00:01, 30159.06 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 90%|█████████ | 502320/557278 [00:14<00:01, 33968.91 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 91%|█████████ | 507320/557278 [00:14<00:02, 24625.30 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 92%|█████████▏| 512320/557278 [00:14<00:01, 24127.26 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 93%|█████████▎| 516320/557278 [00:15<00:01, 25782.54 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 93%|█████████▎| 520320/557278 [00:15<00:01, 28152.89 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 95%|█████████▍| 1: 527320/557278 [00:15<00:00, 34260.11 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 95%|█████████▌| 531150/557278 [00:15<00:00, 27189.62 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 96%|█████████▋| 536980/557278 [00:15<00:00, 32673.43 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 97%|█████████▋| 541225/557278 [00:15<00:00, 26300.73 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 98%|█████████▊| 545885/557278 [00:16<00:00, 28612.16 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 99%|█████████▊| 550300/557278 [00:16<00:00, 31492.97 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 99%|█████████▉| 553959/557278 [00:16<00:00, 29369.74 examples/s] Dropping Long Sequences (>16384) (num_proc=32): 100%|██████████| 557278/557278 [00:16<00:00, 22953.72 examples/s] Dropping Long Sequences (>16384) (num 1: _proc=32): 100%|██████████| 557278/557278 [00:16<00:00, 33385.54 examples/s] 1: Drop Samples with Zero Trainable Tokens (num_proc=32): 0%| | 0/551276 [00:00 0: jzxh014:2356327:2356327 [0] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so) 0: jzxh014:2356327:2356327 [0] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so 0: jzxh014:2356327:2356327 [0] NCCL INFO NET/Plugin: Using internal network plugin. 0: jzxh014:2356327:2356327 [0] NCCL INFO cudaDriverVersion 12080 0: NCCL version 2.21.5+cuda12.4 0: jzxh014:2356327:2356327 [0] NCCL INFO Comm config Blocking set to 1 0: jzxh014:2356329:2356329 [2] NCCL INFO cudaDriverVersion 12080 0: jzxh014:2356329:2356329 [2] NCCL INFO Bootstrap : Using ibp24s0:10.100.4.53<0> 0: jzxh014:2356329:2356329 [2] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so) 0: jzxh014:2356329:2356329 [2] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so 0: jzxh014:2356329:2356329 [2] NCCL INFO NET/Plugin: Using internal network plugin. 0: jzxh014:2356329:2356329 [2] NCCL INFO Comm config Blocking set to 1 0: jzxh014:2356328:2356328 [1] NCCL INFO cudaDriverVersion 12080 0: jzxh014:2356330:2356330 [3] NCCL INFO cudaDriverVersion 12080 1: jzxh015:2965582:2965582 [0] NCCL INFO cudaDriverVersion 12080 3: jzxh017:51583:51583 [1] NCCL INFO cudaDriverVersion 12080 0: jzxh014:2356328:2356328 [1] NCCL INFO Bootstrap : Using ibp24s0:10.100.4.53<0> 0: jzxh014:2356330:2356330 [3] NCCL INFO Bootstrap : Using ibp24s0:10.100.4.53<0> 0: jzxh014:2356330:2356330 [3] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so) 0: jzxh014:2356328:2356328 [1] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so) 0: jzxh014:2356330:2356330 [3] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so 0: jzxh014:2356330:2356330 [3] NCCL INFO NET/Plugin: Using internal network plugin. 0: jzxh014:2356328:2356328 [1] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so 0: jzxh014:2356328:2356328 [1] NCCL INFO NET/Plugin: Using internal network plugin. 0: jzxh014:2356330:2356330 [3] NCCL INFO Comm config Blocking set to 1 0: jzxh014:2356328:2356328 [1] NCCL INFO Comm config Blocking set to 1 1: jzxh015:2965582:2965582 [0] NCCL INFO Bootstrap : Using ibp24s0:10.100.4.57<0> 3: jzxh017:51583:51583 [1] NCCL INFO Bootstrap : Using ibp24s0:10.100.4.65<0> 1: jzxh015:2965583:2965583 [1] NCCL INFO cudaDriverVersion 12080 3: jzxh017:51582:51582 [0] NCCL INFO cudaDriverVersion 12080 3: jzxh017:51582:51582 [0] NCCL INFO Bootstrap : Using ibp24s0:10.100.4.65<0> 2: jzxh016:896066:896066 [2] NCCL INFO cudaDriverVersion 12080 2: jzxh016:896066:896066 [2] NCCL INFO Bootstrap : Using ibp24s0:10.100.4.61<0> 1: jzxh015:2965584:2965584 [2] NCCL INFO cudaDriverVersion 12080 1: jzxh015:2965583:2965583 [1] NCCL INFO Bootstrap : Using ibp24s0:10.100.4.57<0> 3: jzxh017:51582:51582 [0] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so) 3: jzxh017:51583:51583 [1] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so) 1: jzxh015:2965584:2965584 [2] NCCL INFO Bootstrap : Using ibp24s0:10.100.4.57<0> 3: jzxh017:51583:51583 [1] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so 3: jzxh017:51582:51582 [0] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so 3: jzxh017:51583:51583 [1] NCCL INFO NET/Plugin: Using internal network plugin. 1: jzxh015:2965583:2965583 [1] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so) 1: jzxh015:2965582:2965582 [0] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so) 3: jzxh017:51582:51582 [0] NCCL INFO NET/Plugin: Using internal network plugin. 1: jzxh015:2965583:2965583 [1] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so 1: jzxh015:2965583:2965583 [1] NCCL INFO NET/Plugin: Using internal network plugin. 1: jzxh015:2965582:2965582 [0] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so 1: jzxh015:2965582:2965582 [0] NCCL INFO NET/Plugin: Using internal network plugin. 1: jzxh015:2965584:2965584 [2] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so) 1: jzxh015:2965584:2965584 [2] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so 1: jzxh015:2965584:2965584 [2] NCCL INFO NET/Plugin: Using internal network plugin. 1: jzxh015:2965585:2965585 [3] NCCL INFO cudaDriverVersion 12080 1: jzxh015:2965585:2965585 [3] NCCL INFO Bootstrap : Using ibp24s0:10.100.4.57<0> 1: jzxh015:2965585:2965585 [3] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so) 2: jzxh016:896064:896064 [0] NCCL INFO cudaDriverVersion 12080 1: jzxh015:2965585:2965585 [3] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so 1: jzxh015:2965585:2965585 [3] NCCL INFO NET/Plugin: Using internal network plugin. 1: jzxh015:2965582:2965582 [0] NCCL INFO Comm config Blocking set to 1 1: jzxh015:2965583:2965583 [1] NCCL INFO Comm config Blocking set to 1 2: jzxh016:896066:896066 [2] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so) 2: jzxh016:896066:896066 [2] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so 2: jzxh016:896066:896066 [2] NCCL INFO NET/Plugin: Using internal network plugin. 1: jzxh015:2965584:2965584 [2] NCCL INFO Comm config Blocking set to 1 3: jzxh017:51584:51584 [2] NCCL INFO cudaDriverVersion 12080 3: jzxh017:51583:51583 [1] NCCL INFO Comm config Blocking set to 1 3: jzxh017:51582:51582 [0] NCCL INFO Comm config Blocking set to 1 2: jzxh016:896064:896064 [0] NCCL INFO Bootstrap : Using ibp24s0:10.100.4.61<0> 3: jzxh017:51585:51585 [3] NCCL INFO cudaDriverVersion 12080 2: jzxh016:896064:896064 [0] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so) 2: jzxh016:896064:896064 [0] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so 2: jzxh016:896064:896064 [0] NCCL INFO NET/Plugin: Using internal network plugin. 1: jzxh015:2965585:2965585 [3] NCCL INFO Comm config Blocking set to 1 3: jzxh017:51584:51584 [2] NCCL INFO Bootstrap : Using ibp24s0:10.100.4.65<0> 3: jzxh017:51585:51585 [3] NCCL INFO Bootstrap : Using ibp24s0:10.100.4.65<0> 3: jzxh017:51585:51585 [3] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so) 3: jzxh017:51585:51585 [3] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so 3: jzxh017:51585:51585 [3] NCCL INFO NET/Plugin: Using internal network plugin. 3: jzxh017:51584:51584 [2] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so) 3: jzxh017:51584:51584 [2] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so 3: jzxh017:51584:51584 [2] NCCL INFO NET/Plugin: Using internal network plugin. 2: jzxh016:896067:896067 [3] NCCL INFO cudaDriverVersion 12080 2: jzxh016:896066:896066 [2] NCCL INFO Comm config Blocking set to 1 2: jzxh016:896064:896064 [0] NCCL INFO Comm config Blocking set to 1 2: jzxh016:896067:896067 [3] NCCL INFO Bootstrap : Using ibp24s0:10.100.4.61<0> 2: jzxh016:896067:896067 [3] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so) 3: jzxh017:51585:51585 [3] NCCL INFO Comm config Blocking set to 1 3: jzxh017:51584:51584 [2] NCCL INFO Comm config Blocking set to 1 2: jzxh016:896067:896067 [3] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so 2: jzxh016:896067:896067 [3] NCCL INFO NET/Plugin: Using internal network plugin. 2: jzxh016:896065:896065 [1] NCCL INFO cudaDriverVersion 12080 2: jzxh016:896065:896065 [1] NCCL INFO Bootstrap : Using ibp24s0:10.100.4.61<0> 2: jzxh016:896067:896067 [3] NCCL INFO Comm config Blocking set to 1 2: jzxh016:896065:896065 [1] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so) 2: jzxh016:896065:896065 [1] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so 2: jzxh016:896065:896065 [1] NCCL INFO NET/Plugin: Using internal network plugin. 2: jzxh016:896065:896065 [1] NCCL INFO Comm config Blocking set to 1 0: jzxh014:2356329:2357228 [2] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [1]mlx5_1:1/IB [2]mlx5_2:1/IB [3]mlx5_3:1/IB [RO]; OOB ibp24s0:10.100.4.53<0> 0: jzxh014:2356328:2357230 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [1]mlx5_1:1/IB [2]mlx5_2:1/IB [3]mlx5_3:1/IB [RO]; OOB ibp24s0:10.100.4.53<0> 0: jzxh014:2356329:2357228 [2] NCCL INFO Using non-device net plugin version 0 0: jzxh014:2356329:2357228 [2] NCCL INFO Using network IB 0: jzxh014:2356327:2357227 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [1]mlx5_1:1/IB [2]mlx5_2:1/IB [3]mlx5_3:1/IB [RO]; OOB ibp24s0:10.100.4.53<0> 0: jzxh014:2356328:2357230 [1] NCCL INFO Using non-device net plugin version 0 0: jzxh014:2356328:2357230 [1] NCCL INFO Using network IB 0: jzxh014:2356330:2357229 [3] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [1]mlx5_1:1/IB [2]mlx5_2:1/IB [3]mlx5_3:1/IB [RO]; OOB ibp24s0:10.100.4.53<0> 0: jzxh014:2356327:2357227 [0] NCCL INFO Using non-device net plugin version 0 0: jzxh014:2356327:2357227 [0] NCCL INFO Using network IB 0: jzxh014:2356330:2357229 [3] NCCL INFO Using non-device net plugin version 0 0: jzxh014:2356330:2357229 [3] NCCL INFO Using network IB 1: jzxh015:2965583:2967788 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [1]mlx5_1:1/IB [2]mlx5_2:1/IB [3]mlx5_3:1/IB [RO]; OOB ibp24s0:10.100.4.57<0> 1: jzxh015:2965585:2967786 [3] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [1]mlx5_1:1/IB [2]mlx5_2:1/IB [3]mlx5_3:1/IB [RO]; OOB ibp24s0:10.100.4.57<0> 1: jzxh015:2965584:2967789 [2] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [1]mlx5_1:1/IB [2]mlx5_2:1/IB [3]mlx5_3:1/IB [RO]; OOB ibp24s0:10.100.4.57<0> 1: jzxh015:2965583:2967788 [1] NCCL INFO Using non-device net plugin version 0 1: jzxh015:2965583:2967788 [1] NCCL INFO Using network IB 1: jzxh015:2965582:2967787 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [1]mlx5_1:1/IB [2]mlx5_2:1/IB [3]mlx5_3:1/IB [RO]; OOB ibp24s0:10.100.4.57<0> 1: jzxh015:2965585:2967786 [3] NCCL INFO Using non-device net plugin version 0 1: jzxh015:2965585:2967786 [3] NCCL INFO Using network IB 1: jzxh015:2965584:2967789 [2] NCCL INFO Using non-device net plugin version 0 1: jzxh015:2965584:2967789 [2] NCCL INFO Using network IB 1: jzxh015:2965582:2967787 [0] NCCL INFO Using non-device net plugin version 0 1: jzxh015:2965582:2967787 [0] NCCL INFO Using network IB 2: jzxh016:896065:896964 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [1]mlx5_1:1/IB [2]mlx5_2:1/IB [3]mlx5_3:1/IB [RO]; OOB ibp24s0:10.100.4.61<0> 2: jzxh016:896067:896963 [3] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [1]mlx5_1:1/IB [2]mlx5_2:1/IB [3]mlx5_3:1/IB [RO]; OOB ibp24s0:10.100.4.61<0> 2: jzxh016:896064:896962 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [1]mlx5_1:1/IB [2]mlx5_2:1/IB [3]mlx5_3:1/IB [RO]; OOB ibp24s0:10.100.4.61<0> 2: jzxh016:896065:896964 [1] NCCL INFO Using non-device net plugin version 0 2: jzxh016:896065:896964 [1] NCCL INFO Using network IB 2: jzxh016:896066:896961 [2] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [1]mlx5_1:1/IB [2]mlx5_2:1/IB [3]mlx5_3:1/IB [RO]; OOB ibp24s0:10.100.4.61<0> 2: jzxh016:896067:896963 [3] NCCL INFO Using non-device net plugin version 0 2: jzxh016:896067:896963 [3] NCCL INFO Using network IB 2: jzxh016:896064:896962 [0] NCCL INFO Using non-device net plugin version 0 2: jzxh016:896064:896962 [0] NCCL INFO Using network IB 2: jzxh016:896066:896961 [2] NCCL INFO Using non-device net plugin version 0 2: jzxh016:896066:896961 [2] NCCL INFO Using network IB 3: jzxh017:51582:52477 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [1]mlx5_1:1/IB [2]mlx5_2:1/IB [3]mlx5_3:1/IB [RO]; OOB ibp24s0:10.100.4.65<0> 3: jzxh017:51585:52478 [3] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [1]mlx5_1:1/IB [2]mlx5_2:1/IB [3]mlx5_3:1/IB [RO]; OOB ibp24s0:10.100.4.65<0> 3: jzxh017:51584:52479 [2] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [1]mlx5_1:1/IB [2]mlx5_2:1/IB [3]mlx5_3:1/IB [RO]; OOB ibp24s0:10.100.4.65<0> 3: jzxh017:51582:52477 [0] NCCL INFO Using non-device net plugin version 0 3: jzxh017:51582:52477 [0] NCCL INFO Using network IB 3: jzxh017:51583:52476 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [1]mlx5_1:1/IB [2]mlx5_2:1/IB [3]mlx5_3:1/IB [RO]; OOB ibp24s0:10.100.4.65<0> 3: jzxh017:51585:52478 [3] NCCL INFO Using non-device net plugin version 0 3: jzxh017:51585:52478 [3] NCCL INFO Using network IB 3: jzxh017:51584:52479 [2] NCCL INFO Using non-device net plugin version 0 3: jzxh017:51584:52479 [2] NCCL INFO Using network IB 3: jzxh017:51583:52476 [1] NCCL INFO Using non-device net plugin version 0 3: jzxh017:51583:52476 [1] NCCL INFO Using network IB 0: jzxh014:2356330:2357229 [3] NCCL INFO ncclCommInitRank comm 0x55abd54283f0 rank 3 nranks 16 cudaDev 3 nvmlDev 3 busId ad000 commId 0x4261a6b6ec5d7236 - Init START 0: jzxh014:2356329:2357228 [2] NCCL INFO ncclCommInitRank comm 0x55e468be3960 rank 2 nranks 16 cudaDev 2 nvmlDev 2 busId 9d000 commId 0x4261a6b6ec5d7236 - Init START 1: jzxh015:2965582:2967787 [0] NCCL INFO ncclCommInitRank comm 0x5581ebd8acb0 rank 4 nranks 16 cudaDev 0 nvmlDev 0 busId 1b000 commId 0x4261a6b6ec5d7236 - Init START 1: jzxh015:2965583:2967788 [1] NCCL INFO ncclCommInitRank comm 0x55fffbc332b0 rank 5 nranks 16 cudaDev 1 nvmlDev 1 busId 2c000 commId 0x4261a6b6ec5d7236 - Init START 0: jzxh014:2356328:2357230 [1] NCCL INFO ncclCommInitRank comm 0x559d12187a60 rank 1 nranks 16 cudaDev 1 nvmlDev 1 busId 2c000 commId 0x4261a6b6ec5d7236 - Init START 0: jzxh014:2356327:2357227 [0] NCCL INFO ncclCommInitRank comm 0x555ba8ca5800 rank 0 nranks 16 cudaDev 0 nvmlDev 0 busId 1b000 commId 0x4261a6b6ec5d7236 - Init START 2: jzxh016:896064:896962 [0] NCCL INFO ncclCommInitRank comm 0x55a50cad5640 rank 8 nranks 16 cudaDev 0 nvmlDev 0 busId 1b000 commId 0x4261a6b6ec5d7236 - Init START 1: jzxh015:2965585:2967786 [3] NCCL INFO ncclCommInitRank comm 0x55ee7f078d80 rank 7 nranks 16 cudaDev 3 nvmlDev 3 busId ad000 commId 0x4261a6b6ec5d7236 - Init START 1: jzxh015:2965584:2967789 [2] NCCL INFO ncclCommInitRank comm 0x564e0a919f70 rank 6 nranks 16 cudaDev 2 nvmlDev 2 busId 9d000 commId 0x4261a6b6ec5d7236 - Init START 3: jzxh017:51582:52477 [0] NCCL INFO ncclCommInitRank comm 0x559e3dba4770 rank 12 nranks 16 cudaDev 0 nvmlDev 0 busId 1b000 commId 0x4261a6b6ec5d7236 - Init START 3: jzxh017:51583:52476 [1] NCCL INFO ncclCommInitRank comm 0x558abda268d0 rank 13 nranks 16 cudaDev 1 nvmlDev 1 busId 2c000 commId 0x4261a6b6ec5d7236 - Init START 3: jzxh017:51584:52479 [2] NCCL INFO ncclCommInitRank comm 0x564cbfab3310 rank 14 nranks 16 cudaDev 2 nvmlDev 2 busId 9d000 commId 0x4261a6b6ec5d7236 - Init START 3: jzxh017:51585:52478 [3] NCCL INFO ncclCommInitRank comm 0x56086fae1720 rank 15 nranks 16 cudaDev 3 nvmlDev 3 busId ad000 commId 0x4261a6b6ec5d7236 - Init START 2: jzxh016:896065:896964 [1] NCCL INFO ncclCommInitRank comm 0x55a46b9ae900 rank 9 nranks 16 cudaDev 1 nvmlDev 1 busId 2c000 commId 0x4261a6b6ec5d7236 - Init START 2: jzxh016:896067:896963 [3] NCCL INFO ncclCommInitRank comm 0x5576b8f9bbe0 rank 11 nranks 16 cudaDev 3 nvmlDev 3 busId ad000 commId 0x4261a6b6ec5d7236 - Init START 2: jzxh016:896066:896961 [2] NCCL INFO ncclCommInitRank comm 0x556ee6361e10 rank 10 nranks 16 cudaDev 2 nvmlDev 2 busId 9d000 commId 0x4261a6b6ec5d7236 - Init START 2: jzxh016:896067:896963 [3] NCCL INFO Setting affinity for GPU 3 to ffffff00,00000000,00000000,ffffff00,00000000,00000000 2: jzxh016:896067:896963 [3] NCCL INFO NVLS multicast support is not available on dev 3 2: jzxh016:896066:896961 [2] NCCL INFO Setting affinity for GPU 2 to ff,ffff0000,00000000,000000ff,ffff0000,00000000 2: jzxh016:896066:896961 [2] NCCL INFO NVLS multicast support is not available on dev 2 2: jzxh016:896064:896962 [0] NCCL INFO Setting affinity for GPU 0 to ffffff,00000000,00000000,00ffffff 2: jzxh016:896064:896962 [0] NCCL INFO NVLS multicast support is not available on dev 0 2: jzxh016:896065:896964 [1] NCCL INFO Setting affinity for GPU 1 to ffff,ff000000,00000000,0000ffff,ff000000 2: jzxh016:896065:896964 [1] NCCL INFO NVLS multicast support is not available on dev 1 3: jzxh017:51582:52477 [0] NCCL INFO Setting affinity for GPU 0 to ffffff,00000000,00000000,00ffffff 3: jzxh017:51582:52477 [0] NCCL INFO NVLS multicast support is not available on dev 0 3: jzxh017:51583:52476 [1] NCCL INFO Setting affinity for GPU 1 to ffff,ff000000,00000000,0000ffff,ff000000 3: jzxh017:51583:52476 [1] NCCL INFO NVLS multicast support is not available on dev 1 0: jzxh014:2356327:2357227 [0] NCCL INFO Setting affinity for GPU 0 to ffffff,00000000,00000000,00ffffff 0: jzxh014:2356327:2357227 [0] NCCL INFO NVLS multicast support is not available on dev 0 3: jzxh017:51584:52479 [2] NCCL INFO Setting affinity for GPU 2 to ff,ffff0000,00000000,000000ff,ffff0000,00000000 3: jzxh017:51584:52479 [2] NCCL INFO NVLS multicast support is not available on dev 2 0: jzxh014:2356330:2357229 [3] NCCL INFO Setting affinity for GPU 3 to ffffff00,00000000,00000000,ffffff00,00000000,00000000 0: jzxh014:2356330:2357229 [3] NCCL INFO NVLS multicast support is not available on dev 3 3: jzxh017:51585:52478 [3] NCCL INFO Setting affinity for GPU 3 to ffffff00,00000000,00000000,ffffff00,00000000,00000000 3: jzxh017:51585:52478 [3] NCCL INFO NVLS multicast support is not available on dev 3 1: jzxh015:2965584:2967789 [2] NCCL INFO Setting affinity for GPU 2 to ff,ffff0000,00000000,000000ff,ffff0000,00000000 1: jzxh015:2965584:2967789 [2] NCCL INFO NVLS multicast support is not available on dev 2 0: jzxh014:2356329:2357228 [2] NCCL INFO Setting affinity for GPU 2 to ff,ffff0000,00000000,000000ff,ffff0000,00000000 0: jzxh014:2356329:2357228 [2] NCCL INFO NVLS multicast support is not available on dev 2 0: jzxh014:2356328:2357230 [1] NCCL INFO Setting affinity for GPU 1 to ffff,ff000000,00000000,0000ffff,ff000000 0: jzxh014:2356328:2357230 [1] NCCL INFO NVLS multicast support is not available on dev 1 1: jzxh015:2965583:2967788 [1] NCCL INFO Setting affinity for GPU 1 to ffff,ff000000,00000000,0000ffff,ff000000 1: jzxh015:2965583:2967788 [1] NCCL INFO NVLS multicast support is not available on dev 1 1: jzxh015:2965582:2967787 [0] NCCL INFO Setting affinity for GPU 0 to ffffff,00000000,00000000,00ffffff 1: jzxh015:2965582:2967787 [0] NCCL INFO NVLS multicast support is not available on dev 0 1: jzxh015:2965585:2967786 [3] NCCL INFO Setting affinity for GPU 3 to ffffff00,00000000,00000000,ffffff00,00000000,00000000 1: jzxh015:2965585:2967786 [3] NCCL INFO NVLS multicast support is not available on dev 3 0: jzxh014:2356327:2357227 [0] NCCL INFO comm 0x555ba8ca5800 rank 0 nRanks 16 nNodes 4 localRanks 4 localRank 0 MNNVL 0 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 00/16 : 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 01/16 : 0 3 2 5 4 7 6 9 8 11 10 13 12 15 14 1 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 02/16 : 0 3 6 5 4 7 10 9 8 11 14 13 12 15 2 1 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 03/16 : 0 1 2 7 4 5 6 11 8 9 10 15 12 13 14 3 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 04/16 : 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 05/16 : 0 3 2 5 4 7 6 9 8 11 10 13 12 15 14 1 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 06/16 : 0 3 6 5 4 7 10 9 8 11 14 13 12 15 2 1 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 07/16 : 0 1 2 7 4 5 6 11 8 9 10 15 12 13 14 3 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 08/16 : 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 09/16 : 0 3 2 5 4 7 6 9 8 11 10 13 12 15 14 1 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 10/16 : 0 3 6 5 4 7 10 9 8 11 14 13 12 15 2 1 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 11/16 : 0 1 2 7 4 5 6 11 8 9 10 15 12 13 14 3 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 12/16 : 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 13/16 : 0 3 2 5 4 7 6 9 8 11 10 13 12 15 14 1 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 14/16 : 0 3 6 5 4 7 10 9 8 11 14 13 12 15 2 1 0: jzxh014:2356328:2357230 [1] NCCL INFO comm 0x559d12187a60 rank 1 nRanks 16 nNodes 4 localRanks 4 localRank 1 MNNVL 0 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 15/16 : 0 1 2 7 4 5 6 11 8 9 10 15 12 13 14 3 0: jzxh014:2356330:2357229 [3] NCCL INFO comm 0x55abd54283f0 rank 3 nRanks 16 nNodes 4 localRanks 4 localRank 3 MNNVL 0 0: jzxh014:2356327:2357227 [0] NCCL INFO Trees [0] 1/8/-1->0->-1 [1] -1/-1/-1->0->3 [2] 1/-1/-1->0->2 [3] 2/-1/-1->0->1 [4] 3/8/-1->0->-1 [5] 2/-1/-1->0->3 [6] 3/-1/-1->0->2 [7] -1/-1/-1->0->1 [8] 1/-1/-1->0->4 [9] -1/-1/-1->0->3 [10] 1/-1/-1->0->2 [11] 2/-1/-1->0->1 [12] 3/-1/-1->0->4 [13] 2/-1/-1->0->3 [14] 3/-1/-1->0->2 [15] -1/-1/-1->0->1 0: jzxh014:2356327:2357227 [0] NCCL INFO P2P Chunksize set to 131072 3: jzxh017:51585:52478 [3] NCCL INFO comm 0x56086fae1720 rank 15 nRanks 16 nNodes 4 localRanks 4 localRank 3 MNNVL 0 3: jzxh017:51585:52478 [3] NCCL INFO Trees [0] -1/-1/-1->15->14 [1] 12/-1/-1->15->14 [2] -1/-1/-1->15->13 [3] 13/-1/-1->15->11 [4] 14/-1/-1->15->12 [5] 12/-1/-1->15->13 [6] 13/-1/-1->15->12 [7] 14/-1/-1->15->11 [8] -1/-1/-1->15->14 [9] 12/-1/-1->15->14 [10] -1/-1/-1->15->13 [11] 13/7/-1->15->-1 [12] 14/-1/-1->15->12 [13] 12/-1/-1->15->13 [14] 13/-1/-1->15->12 [15] 14/7/-1->15->-1 3: jzxh017:51585:52478 [3] NCCL INFO P2P Chunksize set to 131072 3: jzxh017:51584:52479 [2] NCCL INFO comm 0x564cbfab3310 rank 14 nRanks 16 nNodes 4 localRanks 4 localRank 2 MNNVL 0 0: jzxh014:2356328:2357230 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/9/-1->1->-1 [2] 3/-1/-1->1->0 [3] 0/-1/-1->1->3 [4] -1/-1/-1->1->2 [5] 3/9/-1->1->-1 [6] -1/-1/-1->1->3 [7] 0/-1/-1->1->2 [8] 2/-1/-1->1->0 [9] 2/-1/-1->1->5 [10] 3/-1/-1->1->0 [11] 0/-1/-1->1->3 [12] -1/-1/-1->1->2 [13] 3/-1/-1->1->5 [14] -1/-1/-1->1->3 [15] 0/-1/-1->1->2 0: jzxh014:2356329:2357228 [2] NCCL INFO comm 0x55e468be3960 rank 2 nRanks 16 nNodes 4 localRanks 4 localRank 2 MNNVL 0 0: jzxh014:2356328:2357230 [1] NCCL INFO P2P Chunksize set to 131072 0: jzxh014:2356330:2357229 [3] NCCL INFO Trees [0] -1/-1/-1->3->2 [1] 0/-1/-1->3->2 [2] -1/-1/-1->3->1 [3] 1/11/-1->3->-1 [4] 2/-1/-1->3->0 [5] 0/-1/-1->3->1 [6] 1/-1/-1->3->0 [7] 2/11/-1->3->-1 [8] -1/-1/-1->3->2 [9] 0/-1/-1->3->2 [10] -1/-1/-1->3->1 [11] 1/-1/-1->3->7 [12] 2/-1/-1->3->0 [13] 0/-1/-1->3->1 [14] 1/-1/-1->3->0 [15] 2/-1/-1->3->7 0: jzxh014:2356330:2357229 [3] NCCL INFO P2P Chunksize set to 131072 1: jzxh015:2965582:2967787 [0] NCCL INFO comm 0x5581ebd8acb0 rank 4 nRanks 16 nNodes 4 localRanks 4 localRank 0 MNNVL 0 1: jzxh015:2965583:2967788 [1] NCCL INFO comm 0x55fffbc332b0 rank 5 nRanks 16 nNodes 4 localRanks 4 localRank 1 MNNVL 0 1: jzxh015:2965585:2967786 [3] NCCL INFO comm 0x55ee7f078d80 rank 7 nRanks 16 nNodes 4 localRanks 4 localRank 3 MNNVL 0 1: jzxh015:2965584:2967789 [2] NCCL INFO comm 0x564e0a919f70 rank 6 nRanks 16 nNodes 4 localRanks 4 localRank 2 MNNVL 0 0: jzxh014:2356329:2357228 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1 [2] 0/10/-1->2->-1 [3] -1/-1/-1->2->0 [4] 1/-1/-1->2->3 [5] -1/-1/-1->2->0 [6] 0/10/-1->2->-1 [7] 1/-1/-1->2->3 [8] 3/-1/-1->2->1 [9] 3/-1/-1->2->1 [10] 0/-1/-1->2->6 [11] -1/-1/-1->2->0 [12] 1/-1/-1->2->3 [13] -1/-1/-1->2->0 [14] 0/-1/-1->2->6 [15] 1/-1/-1->2->3 0: jzxh014:2356329:2357228 [2] NCCL INFO P2P Chunksize set to 131072 3: jzxh017:51584:52479 [2] NCCL INFO Trees [0] 15/-1/-1->14->13 [1] 15/-1/-1->14->13 [2] 12/-1/-1->14->10 [3] -1/-1/-1->14->12 [4] 13/-1/-1->14->15 [5] -1/-1/-1->14->12 [6] 12/-1/-1->14->10 [7] 13/-1/-1->14->15 [8] 15/-1/-1->14->13 [9] 15/-1/-1->14->13 [10] 12/6/-1->14->-1 [11] -1/-1/-1->14->12 [12] 13/-1/-1->14->15 [13] -1/-1/-1->14->12 [14] 12/6/-1->14->-1 [15] 13/-1/-1->14->15 3: jzxh017:51584:52479 [2] NCCL INFO P2P Chunksize set to 131072 1: jzxh015:2965582:2967787 [0] NCCL INFO Trees [0] 5/-1/-1->4->8 [1] -1/-1/-1->4->7 [2] 5/-1/-1->4->6 [3] 6/-1/-1->4->5 [4] 7/-1/-1->4->8 [5] 6/-1/-1->4->7 [6] 7/-1/-1->4->6 [7] -1/-1/-1->4->5 [8] 5/8/0->4->12 [9] -1/-1/-1->4->7 [10] 5/-1/-1->4->6 [11] 6/-1/-1->4->5 [12] 7/8/0->4->12 [13] 6/-1/-1->4->7 [14] 7/-1/-1->4->6 [15] -1/-1/-1->4->5 1: jzxh015:2965583:2967788 [1] NCCL INFO Trees [0] 6/-1/-1->5->4 [1] 6/-1/-1->5->9 [2] 7/-1/-1->5->4 [3] 4/-1/-1->5->7 [4] -1/-1/-1->5->6 [5] 7/-1/-1->5->9 [6] -1/-1/-1->5->7 [7] 4/-1/-1->5->6 [8] 6/-1/-1->5->4 [9] 6/9/1->5->13 [10] 7/-1/-1->5->4 [11] 4/-1/-1->5->7 [12] -1/-1/-1->5->6 [13] 7/9/1->5->13 [14] -1/-1/-1->5->7 [15] 4/-1/-1->5->6 1: jzxh015:2965582:2967787 [0] NCCL INFO P2P Chunksize set to 131072 1: jzxh015:2965583:2967788 [1] NCCL INFO P2P Chunksize set to 131072 1: jzxh015:2965584:2967789 [2] NCCL INFO Trees [0] 7/-1/-1->6->5 [1] 7/-1/-1->6->5 [2] 4/-1/-1->6->10 [3] -1/-1/-1->6->4 [4] 5/-1/-1->6->7 [5] -1/-1/-1->6->4 [6] 4/-1/-1->6->10 [7] 5/-1/-1->6->7 [8] 7/-1/-1->6->5 [9] 7/-1/-1->6->5 [10] 4/10/2->6->14 [11] -1/-1/-1->6->4 [12] 5/-1/-1->6->7 [13] -1/-1/-1->6->4 [14] 4/10/2->6->14 [15] 5/-1/-1->6->7 1: jzxh015:2965584:2967789 [2] NCCL INFO P2P Chunksize set to 131072 2: jzxh016:896064:896962 [0] NCCL INFO comm 0x55a50cad5640 rank 8 nRanks 16 nNodes 4 localRanks 4 localRank 0 MNNVL 0 2: jzxh016:896065:896964 [1] NCCL INFO comm 0x55a46b9ae900 rank 9 nRanks 16 nNodes 4 localRanks 4 localRank 1 MNNVL 0 2: jzxh016:896066:896961 [2] NCCL INFO comm 0x556ee6361e10 rank 10 nRanks 16 nNodes 4 localRanks 4 localRank 2 MNNVL 0 2: jzxh016:896067:896963 [3] NCCL INFO comm 0x5576b8f9bbe0 rank 11 nRanks 16 nNodes 4 localRanks 4 localRank 3 MNNVL 0 2: jzxh016:896064:896962 [0] NCCL INFO Trees [0] 9/4/12->8->0 [1] -1/-1/-1->8->11 [2] 9/-1/-1->8->10 [3] 10/-1/-1->8->9 [4] 11/4/12->8->0 [5] 10/-1/-1->8->11 [6] 11/-1/-1->8->10 [7] -1/-1/-1->8->9 [8] 9/-1/-1->8->4 [9] -1/-1/-1->8->11 [10] 9/-1/-1->8->10 [11] 10/-1/-1->8->9 [12] 11/-1/-1->8->4 [13] 10/-1/-1->8->11 [14] 11/-1/-1->8->10 [15] -1/-1/-1->8->9 2: jzxh016:896064:896962 [0] NCCL INFO P2P Chunksize set to 131072 1: jzxh015:2965585:2967786 [3] NCCL INFO Trees [0] -1/-1/-1->7->6 [1] 4/-1/-1->7->6 [2] -1/-1/-1->7->5 [3] 5/-1/-1->7->11 [4] 6/-1/-1->7->4 [5] 4/-1/-1->7->5 [6] 5/-1/-1->7->4 [7] 6/-1/-1->7->11 [8] -1/-1/-1->7->6 [9] 4/-1/-1->7->6 [10] -1/-1/-1->7->5 [11] 5/11/3->7->15 [12] 6/-1/-1->7->4 [13] 4/-1/-1->7->5 [14] 5/-1/-1->7->4 [15] 6/11/3->7->15 1: jzxh015:2965585:2967786 [3] NCCL INFO P2P Chunksize set to 131072 3: jzxh017:51583:52476 [1] NCCL INFO comm 0x558abda268d0 rank 13 nRanks 16 nNodes 4 localRanks 4 localRank 1 MNNVL 0 3: jzxh017:51583:52476 [1] NCCL INFO Trees [0] 14/-1/-1->13->12 [1] 14/-1/-1->13->9 [2] 15/-1/-1->13->12 [3] 12/-1/-1->13->15 [4] -1/-1/-1->13->14 [5] 15/-1/-1->13->9 [6] -1/-1/-1->13->15 [7] 12/-1/-1->13->14 [8] 14/-1/-1->13->12 [9] 14/5/-1->13->-1 [10] 15/-1/-1->13->12 [11] 12/-1/-1->13->15 [12] -1/-1/-1->13->14 [13] 15/5/-1->13->-1 [14] -1/-1/-1->13->15 [15] 12/-1/-1->13->14 3: jzxh017:51583:52476 [1] NCCL INFO P2P Chunksize set to 131072 3: jzxh017:51582:52477 [0] NCCL INFO comm 0x559e3dba4770 rank 12 nRanks 16 nNodes 4 localRanks 4 localRank 0 MNNVL 0 3: jzxh017:51582:52477 [0] NCCL INFO Trees [0] 13/-1/-1->12->8 [1] -1/-1/-1->12->15 [2] 13/-1/-1->12->14 [3] 14/-1/-1->12->13 [4] 15/-1/-1->12->8 [5] 14/-1/-1->12->15 [6] 15/-1/-1->12->14 [7] -1/-1/-1->12->13 [8] 13/4/-1->12->-1 [9] -1/-1/-1->12->15 [10] 13/-1/-1->12->14 [11] 14/-1/-1->12->13 [12] 15/4/-1->12->-1 [13] 14/-1/-1->12->15 [14] 15/-1/-1->12->14 [15] -1/-1/-1->12->13 3: jzxh017:51582:52477 [0] NCCL INFO P2P Chunksize set to 131072 2: jzxh016:896065:896964 [1] NCCL INFO Trees [0] 10/-1/-1->9->8 [1] 10/5/13->9->1 [2] 11/-1/-1->9->8 [3] 8/-1/-1->9->11 [4] -1/-1/-1->9->10 [5] 11/5/13->9->1 [6] -1/-1/-1->9->11 [7] 8/-1/-1->9->10 [8] 10/-1/-1->9->8 [9] 10/-1/-1->9->5 [10] 11/-1/-1->9->8 [11] 8/-1/-1->9->11 [12] -1/-1/-1->9->10 [13] 11/-1/-1->9->5 [14] -1/-1/-1->9->11 [15] 8/-1/-1->9->10 2: jzxh016:896065:896964 [1] NCCL INFO P2P Chunksize set to 131072 2: jzxh016:896066:896961 [2] NCCL INFO Trees [0] 11/-1/-1->10->9 [1] 11/-1/-1->10->9 [2] 8/6/14->10->2 [3] -1/-1/-1->10->8 [4] 9/-1/-1->10->11 [5] -1/-1/-1->10->8 [6] 8/6/14->10->2 [7] 9/-1/-1->10->11 [8] 11/-1/-1->10->9 [9] 11/-1/-1->10->9 [10] 8/-1/-1->10->6 [11] -1/-1/-1->10->8 [12] 9/-1/-1->10->11 [13] -1/-1/-1->10->8 [14] 8/-1/-1->10->6 [15] 9/-1/-1->10->11 2: jzxh016:896066:896961 [2] NCCL INFO P2P Chunksize set to 131072 2: jzxh016:896067:896963 [3] NCCL INFO Trees [0] -1/-1/-1->11->10 [1] 8/-1/-1->11->10 [2] -1/-1/-1->11->9 [3] 9/7/15->11->3 [4] 10/-1/-1->11->8 [5] 8/-1/-1->11->9 [6] 9/-1/-1->11->8 [7] 10/7/15->11->3 [8] -1/-1/-1->11->10 [9] 8/-1/-1->11->10 [10] -1/-1/-1->11->9 [11] 9/-1/-1->11->7 [12] 10/-1/-1->11->8 [13] 8/-1/-1->11->9 [14] 9/-1/-1->11->8 [15] 10/-1/-1->11->7 2: jzxh016:896067:896963 [3] NCCL INFO P2P Chunksize set to 131072 2: jzxh016:896066:896961 [2] NCCL INFO Channel 00/0 : 10[2] -> 11[3] via P2P/CUMEM 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 00/0 : 5[1] -> 6[2] via P2P/CUMEM 3: jzxh017:51583:52476 [1] NCCL INFO Channel 00/0 : 13[1] -> 14[2] via P2P/CUMEM 2: jzxh016:896066:896961 [2] NCCL INFO Channel 04/0 : 10[2] -> 11[3] via P2P/CUMEM 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 00/0 : 6[2] -> 7[3] via P2P/CUMEM 0: jzxh014:2356328:2357230 [1] NCCL INFO Channel 00/0 : 1[1] -> 2[2] via P2P/CUMEM 0: jzxh014:2356328:2357230 [1] NCCL INFO Channel 03/0 : 1[1] -> 2[2] via P2P/CUMEM 0: jzxh014:2356328:2357230 [1] NCCL INFO Channel 04/0 : 1[1] -> 2[2] via P2P/CUMEM 3: jzxh017:51583:52476 [1] NCCL INFO Channel 03/0 : 13[1] -> 14[2] via P2P/CUMEM 2: jzxh016:896066:896961 [2] NCCL INFO Channel 08/0 : 10[2] -> 11[3] via P2P/CUMEM 0: jzxh014:2356328:2357230 [1] NCCL INFO Channel 07/0 : 1[1] -> 2[2] via P2P/CUMEM 3: jzxh017:51584:52479 [2] NCCL INFO Channel 00/0 : 14[2] -> 15[3] via P2P/CUMEM 3: jzxh017:51583:52476 [1] NCCL INFO Channel 04/0 : 13[1] -> 14[2] via P2P/CUMEM 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 04/0 : 6[2] -> 7[3] via P2P/CUMEM 3: jzxh017:51583:52476 [1] NCCL INFO Channel 07/0 : 13[1] -> 14[2] via P2P/CUMEM 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 03/0 : 5[1] -> 6[2] via P2P/CUMEM 2: jzxh016:896066:896961 [2] NCCL INFO Channel 12/0 : 10[2] -> 11[3] via P2P/CUMEM 0: jzxh014:2356328:2357230 [1] NCCL INFO Channel 08/0 : 1[1] -> 2[2] via P2P/CUMEM 3: jzxh017:51583:52476 [1] NCCL INFO Channel 08/0 : 13[1] -> 14[2] via P2P/CUMEM 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 08/0 : 6[2] -> 7[3] via P2P/CUMEM 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 00/0 : 2[2] -> 3[3] via P2P/CUMEM 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 04/0 : 5[1] -> 6[2] via P2P/CUMEM 0: jzxh014:2356328:2357230 [1] NCCL INFO Channel 11/0 : 1[1] -> 2[2] via P2P/CUMEM 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 04/0 : 2[2] -> 3[3] via P2P/CUMEM 3: jzxh017:51584:52479 [2] NCCL INFO Channel 04/0 : 14[2] -> 15[3] via P2P/CUMEM 0: jzxh014:2356328:2357230 [1] NCCL INFO Channel 12/0 : 1[1] -> 2[2] via P2P/CUMEM 2: jzxh016:896065:896964 [1] NCCL INFO Channel 00/0 : 9[1] -> 10[2] via P2P/CUMEM 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 08/0 : 2[2] -> 3[3] via P2P/CUMEM 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 12/0 : 6[2] -> 7[3] via P2P/CUMEM 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 07/0 : 5[1] -> 6[2] via P2P/CUMEM 0: jzxh014:2356328:2357230 [1] NCCL INFO Channel 15/0 : 1[1] -> 2[2] via P2P/CUMEM 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 08/0 : 5[1] -> 6[2] via P2P/CUMEM 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 12/0 : 2[2] -> 3[3] via P2P/CUMEM 3: jzxh017:51583:52476 [1] NCCL INFO Channel 11/0 : 13[1] -> 14[2] via P2P/CUMEM 2: jzxh016:896065:896964 [1] NCCL INFO Channel 03/0 : 9[1] -> 10[2] via P2P/CUMEM 3: jzxh017:51583:52476 [1] NCCL INFO Channel 12/0 : 13[1] -> 14[2] via P2P/CUMEM 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 11/0 : 5[1] -> 6[2] via P2P/CUMEM 3: jzxh017:51584:52479 [2] NCCL INFO Channel 08/0 : 14[2] -> 15[3] via P2P/CUMEM 2: jzxh016:896065:896964 [1] NCCL INFO Channel 04/0 : 9[1] -> 10[2] via P2P/CUMEM 3: jzxh017:51584:52479 [2] NCCL INFO Channel 12/0 : 14[2] -> 15[3] via P2P/CUMEM 2: jzxh016:896065:896964 [1] NCCL INFO Channel 07/0 : 9[1] -> 10[2] via P2P/CUMEM 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 12/0 : 5[1] -> 6[2] via P2P/CUMEM 3: jzxh017:51583:52476 [1] NCCL INFO Channel 15/0 : 13[1] -> 14[2] via P2P/CUMEM 2: jzxh016:896065:896964 [1] NCCL INFO Channel 08/0 : 9[1] -> 10[2] via P2P/CUMEM 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 15/0 : 5[1] -> 6[2] via P2P/CUMEM 2: jzxh016:896065:896964 [1] NCCL INFO Channel 11/0 : 9[1] -> 10[2] via P2P/CUMEM 2: jzxh016:896065:896964 [1] NCCL INFO Channel 12/0 : 9[1] -> 10[2] via P2P/CUMEM 2: jzxh016:896065:896964 [1] NCCL INFO Channel 15/0 : 9[1] -> 10[2] via P2P/CUMEM 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 00/0 : 3[3] -> 4[0] [receive] via NET/IB/0/GDRDMA 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 00/0 : 7[3] -> 8[0] [send] via NET/IB/0(4)/GDRDMA 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 04/0 : 3[3] -> 4[0] [receive] via NET/IB/0/GDRDMA 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 04/0 : 7[3] -> 8[0] [send] via NET/IB/0(4)/GDRDMA 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 08/0 : 3[3] -> 4[0] [receive] via NET/IB/0/GDRDMA 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 08/0 : 7[3] -> 8[0] [send] via NET/IB/0(4)/GDRDMA 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 12/0 : 3[3] -> 4[0] [receive] via NET/IB/0/GDRDMA 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 00/0 : 4[0] -> 5[1] via P2P/CUMEM 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 12/0 : 7[3] -> 8[0] [send] via NET/IB/0(4)/GDRDMA 2: jzxh016:896064:896962 [0] NCCL INFO Channel 00/0 : 7[3] -> 8[0] [receive] via NET/IB/0/GDRDMA 3: jzxh017:51582:52477 [0] NCCL INFO Channel 00/0 : 11[3] -> 12[0] [receive] via NET/IB/0/GDRDMA 3: jzxh017:51585:52478 [3] NCCL INFO Channel 00/0 : 15[3] -> 0[0] [send] via NET/IB/0(12)/GDRDMA 2: jzxh016:896067:896963 [3] NCCL INFO Channel 00/0 : 11[3] -> 12[0] [send] via NET/IB/0(8)/GDRDMA 2: jzxh016:896064:896962 [0] NCCL INFO Channel 04/0 : 7[3] -> 8[0] [receive] via NET/IB/0/GDRDMA 2: jzxh016:896067:896963 [3] NCCL INFO Channel 04/0 : 11[3] -> 12[0] [send] via NET/IB/0(8)/GDRDMA 2: jzxh016:896064:896962 [0] NCCL INFO Channel 08/0 : 7[3] -> 8[0] [receive] via NET/IB/0/GDRDMA 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 03/0 : 4[0] -> 5[1] via P2P/CUMEM 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 00/0 : 15[3] -> 0[0] [receive] via NET/IB/0/GDRDMA 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 00/0 : 3[3] -> 4[0] [send] via NET/IB/0(0)/GDRDMA 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 04/0 : 15[3] -> 0[0] [receive] via NET/IB/0/GDRDMA 3: jzxh017:51582:52477 [0] NCCL INFO Channel 04/0 : 11[3] -> 12[0] [receive] via NET/IB/0/GDRDMA 2: jzxh016:896067:896963 [3] NCCL INFO Channel 08/0 : 11[3] -> 12[0] [send] via NET/IB/0(8)/GDRDMA 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 04/0 : 4[0] -> 5[1] via P2P/CUMEM 2: jzxh016:896064:896962 [0] NCCL INFO Channel 12/0 : 7[3] -> 8[0] [receive] via NET/IB/0/GDRDMA 2: jzxh016:896064:896962 [0] NCCL INFO Channel 00/0 : 8[0] -> 9[1] via P2P/CUMEM 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 04/0 : 3[3] -> 4[0] [send] via NET/IB/0(0)/GDRDMA 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 08/0 : 15[3] -> 0[0] [receive] via NET/IB/0/GDRDMA 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 08/0 : 3[3] -> 4[0] [send] via NET/IB/0(0)/GDRDMA 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 07/0 : 4[0] -> 5[1] via P2P/CUMEM 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 12/0 : 15[3] -> 0[0] [receive] via NET/IB/0/GDRDMA 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] via P2P/CUMEM 3: jzxh017:51585:52478 [3] NCCL INFO Channel 04/0 : 15[3] -> 0[0] [send] via NET/IB/0(12)/GDRDMA 3: jzxh017:51582:52477 [0] NCCL INFO Channel 08/0 : 11[3] -> 12[0] [receive] via NET/IB/0/GDRDMA 3: jzxh017:51585:52478 [3] NCCL INFO Channel 08/0 : 15[3] -> 0[0] [send] via NET/IB/0(12)/GDRDMA 2: jzxh016:896067:896963 [3] NCCL INFO Channel 12/0 : 11[3] -> 12[0] [send] via NET/IB/0(8)/GDRDMA 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 12/0 : 3[3] -> 4[0] [send] via NET/IB/0(0)/GDRDMA 3: jzxh017:51582:52477 [0] NCCL INFO Channel 12/0 : 11[3] -> 12[0] [receive] via NET/IB/0/GDRDMA 3: jzxh017:51582:52477 [0] NCCL INFO Channel 00/0 : 12[0] -> 13[1] via P2P/CUMEM 3: jzxh017:51585:52478 [3] NCCL INFO Channel 12/0 : 15[3] -> 0[0] [send] via NET/IB/0(12)/GDRDMA 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 08/0 : 4[0] -> 5[1] via P2P/CUMEM 2: jzxh016:896064:896962 [0] NCCL INFO Channel 03/0 : 8[0] -> 9[1] via P2P/CUMEM 3: jzxh017:51582:52477 [0] NCCL INFO Channel 03/0 : 12[0] -> 13[1] via P2P/CUMEM 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 11/0 : 4[0] -> 5[1] via P2P/CUMEM 3: jzxh017:51582:52477 [0] NCCL INFO Channel 04/0 : 12[0] -> 13[1] via P2P/CUMEM 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 03/0 : 0[0] -> 1[1] via P2P/CUMEM 2: jzxh016:896064:896962 [0] NCCL INFO Channel 04/0 : 8[0] -> 9[1] via P2P/CUMEM 2: jzxh016:896064:896962 [0] NCCL INFO Channel 07/0 : 8[0] -> 9[1] via P2P/CUMEM 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 12/0 : 4[0] -> 5[1] via P2P/CUMEM 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 04/0 : 0[0] -> 1[1] via P2P/CUMEM 3: jzxh017:51582:52477 [0] NCCL INFO Channel 07/0 : 12[0] -> 13[1] via P2P/CUMEM 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 15/0 : 4[0] -> 5[1] via P2P/CUMEM 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 07/0 : 0[0] -> 1[1] via P2P/CUMEM 2: jzxh016:896064:896962 [0] NCCL INFO Channel 08/0 : 8[0] -> 9[1] via P2P/CUMEM 2: jzxh016:896064:896962 [0] NCCL INFO Channel 11/0 : 8[0] -> 9[1] via P2P/CUMEM 3: jzxh017:51582:52477 [0] NCCL INFO Channel 08/0 : 12[0] -> 13[1] via P2P/CUMEM 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 08/0 : 0[0] -> 1[1] via P2P/CUMEM 2: jzxh016:896064:896962 [0] NCCL INFO Channel 12/0 : 8[0] -> 9[1] via P2P/CUMEM 3: jzxh017:51582:52477 [0] NCCL INFO Channel 11/0 : 12[0] -> 13[1] via P2P/CUMEM 3: jzxh017:51582:52477 [0] NCCL INFO Channel 12/0 : 12[0] -> 13[1] via P2P/CUMEM 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 11/0 : 0[0] -> 1[1] via P2P/CUMEM 3: jzxh017:51582:52477 [0] NCCL INFO Channel 15/0 : 12[0] -> 13[1] via P2P/CUMEM 2: jzxh016:896064:896962 [0] NCCL INFO Channel 15/0 : 8[0] -> 9[1] via P2P/CUMEM 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 01/0 : 4[0] -> 7[3] via P2P/CUMEM 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 12/0 : 0[0] -> 1[1] via P2P/CUMEM 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 02/0 : 4[0] -> 7[3] via P2P/CUMEM 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 15/0 : 0[0] -> 1[1] via P2P/CUMEM 3: jzxh017:51582:52477 [0] NCCL INFO Channel 01/0 : 12[0] -> 15[3] via P2P/CUMEM 2: jzxh016:896064:896962 [0] NCCL INFO Channel 01/0 : 8[0] -> 11[3] via P2P/CUMEM 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 05/0 : 4[0] -> 7[3] via P2P/CUMEM 2: jzxh016:896064:896962 [0] NCCL INFO Channel 02/0 : 8[0] -> 11[3] via P2P/CUMEM 3: jzxh017:51582:52477 [0] NCCL INFO Channel 02/0 : 12[0] -> 15[3] via P2P/CUMEM 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 06/0 : 4[0] -> 7[3] via P2P/CUMEM 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 09/0 : 4[0] -> 7[3] via P2P/CUMEM 3: jzxh017:51582:52477 [0] NCCL INFO Channel 05/0 : 12[0] -> 15[3] via P2P/CUMEM 2: jzxh016:896064:896962 [0] NCCL INFO Channel 05/0 : 8[0] -> 11[3] via P2P/CUMEM 3: jzxh017:51582:52477 [0] NCCL INFO Channel 06/0 : 12[0] -> 15[3] via P2P/CUMEM 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 10/0 : 4[0] -> 7[3] via P2P/CUMEM 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 01/0 : 0[0] -> 3[3] via P2P/CUMEM 2: jzxh016:896064:896962 [0] NCCL INFO Channel 06/0 : 8[0] -> 11[3] via P2P/CUMEM 3: jzxh017:51582:52477 [0] NCCL INFO Channel 09/0 : 12[0] -> 15[3] via P2P/CUMEM 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 13/0 : 4[0] -> 7[3] via P2P/CUMEM 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 02/0 : 0[0] -> 3[3] via P2P/CUMEM 3: jzxh017:51582:52477 [0] NCCL INFO Channel 10/0 : 12[0] -> 15[3] via P2P/CUMEM 2: jzxh016:896064:896962 [0] NCCL INFO Channel 09/0 : 8[0] -> 11[3] via P2P/CUMEM 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 14/0 : 4[0] -> 7[3] via P2P/CUMEM 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 05/0 : 0[0] -> 3[3] via P2P/CUMEM 2: jzxh016:896064:896962 [0] NCCL INFO Channel 10/0 : 8[0] -> 11[3] via P2P/CUMEM 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 06/0 : 0[0] -> 3[3] via P2P/CUMEM 3: jzxh017:51582:52477 [0] NCCL INFO Channel 13/0 : 12[0] -> 15[3] via P2P/CUMEM 3: jzxh017:51582:52477 [0] NCCL INFO Channel 14/0 : 12[0] -> 15[3] via P2P/CUMEM 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 09/0 : 0[0] -> 3[3] via P2P/CUMEM 2: jzxh016:896064:896962 [0] NCCL INFO Channel 13/0 : 8[0] -> 11[3] via P2P/CUMEM 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 10/0 : 0[0] -> 3[3] via P2P/CUMEM 2: jzxh016:896064:896962 [0] NCCL INFO Channel 14/0 : 8[0] -> 11[3] via P2P/CUMEM 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 13/0 : 0[0] -> 3[3] via P2P/CUMEM 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 14/0 : 0[0] -> 3[3] via P2P/CUMEM 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 02/0 : 3[3] -> 6[2] [receive] via NET/IB/2/GDRDMA 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 02/0 : 7[3] -> 10[2] [send] via NET/IB/2(6)/GDRDMA 2: jzxh016:896066:896961 [2] NCCL INFO Channel 02/0 : 7[3] -> 10[2] [receive] via NET/IB/2/GDRDMA 2: jzxh016:896067:896963 [3] NCCL INFO Channel 02/0 : 11[3] -> 14[2] [send] via NET/IB/2(10)/GDRDMA 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 06/0 : 3[3] -> 6[2] [receive] via NET/IB/2/GDRDMA 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 06/0 : 7[3] -> 10[2] [send] via NET/IB/2(6)/GDRDMA 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 10/0 : 3[3] -> 6[2] [receive] via NET/IB/2/GDRDMA 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 10/0 : 7[3] -> 10[2] [send] via NET/IB/2(6)/GDRDMA 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 14/0 : 3[3] -> 6[2] [receive] via NET/IB/2/GDRDMA 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 14/0 : 7[3] -> 10[2] [send] via NET/IB/2(6)/GDRDMA 3: jzxh017:51584:52479 [2] NCCL INFO Channel 02/0 : 11[3] -> 14[2] [receive] via NET/IB/2/GDRDMA 3: jzxh017:51585:52478 [3] NCCL INFO Channel 02/0 : 15[3] -> 2[2] [send] via NET/IB/2(14)/GDRDMA 3: jzxh017:51584:52479 [2] NCCL INFO Channel 06/0 : 11[3] -> 14[2] [receive] via NET/IB/2/GDRDMA 2: jzxh016:896066:896961 [2] NCCL INFO Channel 06/0 : 7[3] -> 10[2] [receive] via NET/IB/2/GDRDMA 3: jzxh017:51585:52478 [3] NCCL INFO Channel 06/0 : 15[3] -> 2[2] [send] via NET/IB/2(14)/GDRDMA 2: jzxh016:896067:896963 [3] NCCL INFO Channel 06/0 : 11[3] -> 14[2] [send] via NET/IB/2(10)/GDRDMA 3: jzxh017:51584:52479 [2] NCCL INFO Channel 10/0 : 11[3] -> 14[2] [receive] via NET/IB/2/GDRDMA 2: jzxh016:896066:896961 [2] NCCL INFO Channel 10/0 : 7[3] -> 10[2] [receive] via NET/IB/2/GDRDMA 3: jzxh017:51585:52478 [3] NCCL INFO Channel 10/0 : 15[3] -> 2[2] [send] via NET/IB/2(14)/GDRDMA 2: jzxh016:896067:896963 [3] NCCL INFO Channel 10/0 : 11[3] -> 14[2] [send] via NET/IB/2(10)/GDRDMA 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 02/0 : 15[3] -> 2[2] [receive] via NET/IB/2/GDRDMA 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 02/0 : 3[3] -> 6[2] [send] via NET/IB/2(2)/GDRDMA 2: jzxh016:896066:896961 [2] NCCL INFO Channel 14/0 : 7[3] -> 10[2] [receive] via NET/IB/2/GDRDMA 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 06/0 : 15[3] -> 2[2] [receive] via NET/IB/2/GDRDMA 2: jzxh016:896067:896963 [3] NCCL INFO Channel 14/0 : 11[3] -> 14[2] [send] via NET/IB/2(10)/GDRDMA 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 06/0 : 3[3] -> 6[2] [send] via NET/IB/2(2)/GDRDMA 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 10/0 : 15[3] -> 2[2] [receive] via NET/IB/2/GDRDMA 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 10/0 : 3[3] -> 6[2] [send] via NET/IB/2(2)/GDRDMA 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 14/0 : 15[3] -> 2[2] [receive] via NET/IB/2/GDRDMA 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 14/0 : 3[3] -> 6[2] [send] via NET/IB/2(2)/GDRDMA 3: jzxh017:51584:52479 [2] NCCL INFO Channel 14/0 : 11[3] -> 14[2] [receive] via NET/IB/2/GDRDMA 3: jzxh017:51585:52478 [3] NCCL INFO Channel 14/0 : 15[3] -> 2[2] [send] via NET/IB/2(14)/GDRDMA 2: jzxh016:896065:896964 [1] NCCL INFO Channel 01/0 : 6[2] -> 9[1] [receive] via NET/IB/1/GDRDMA 2: jzxh016:896066:896961 [2] NCCL INFO Channel 01/0 : 10[2] -> 13[1] [send] via NET/IB/1(9)/GDRDMA 2: jzxh016:896065:896964 [1] NCCL INFO Channel 05/0 : 6[2] -> 9[1] [receive] via NET/IB/1/GDRDMA 2: jzxh016:896066:896961 [2] NCCL INFO Channel 05/0 : 10[2] -> 13[1] [send] via NET/IB/1(9)/GDRDMA 2: jzxh016:896065:896964 [1] NCCL INFO Channel 09/0 : 6[2] -> 9[1] [receive] via NET/IB/1/GDRDMA 2: jzxh016:896066:896961 [2] NCCL INFO Channel 09/0 : 10[2] -> 13[1] [send] via NET/IB/1(9)/GDRDMA 2: jzxh016:896065:896964 [1] NCCL INFO Channel 13/0 : 6[2] -> 9[1] [receive] via NET/IB/1/GDRDMA 2: jzxh016:896066:896961 [2] NCCL INFO Channel 13/0 : 10[2] -> 13[1] [send] via NET/IB/1(9)/GDRDMA 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 01/0 : 2[2] -> 5[1] [receive] via NET/IB/1/GDRDMA 3: jzxh017:51583:52476 [1] NCCL INFO Channel 01/0 : 10[2] -> 13[1] [receive] via NET/IB/1/GDRDMA 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 01/0 : 6[2] -> 9[1] [send] via NET/IB/1(5)/GDRDMA 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 05/0 : 2[2] -> 5[1] [receive] via NET/IB/1/GDRDMA 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 05/0 : 6[2] -> 9[1] [send] via NET/IB/1(5)/GDRDMA 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 09/0 : 2[2] -> 5[1] [receive] via NET/IB/1/GDRDMA 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 09/0 : 6[2] -> 9[1] [send] via NET/IB/1(5)/GDRDMA 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 13/0 : 2[2] -> 5[1] [receive] via NET/IB/1/GDRDMA 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 13/0 : 6[2] -> 9[1] [send] via NET/IB/1(5)/GDRDMA 0: jzxh014:2356328:2357230 [1] NCCL INFO Channel 01/0 : 14[2] -> 1[1] [receive] via NET/IB/1/GDRDMA 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 01/0 : 2[2] -> 5[1] [send] via NET/IB/1(1)/GDRDMA 0: jzxh014:2356328:2357230 [1] NCCL INFO Channel 05/0 : 14[2] -> 1[1] [receive] via NET/IB/1/GDRDMA 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 05/0 : 2[2] -> 5[1] [send] via NET/IB/1(1)/GDRDMA 0: jzxh014:2356328:2357230 [1] NCCL INFO Channel 09/0 : 14[2] -> 1[1] [receive] via NET/IB/1/GDRDMA 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 09/0 : 2[2] -> 5[1] [send] via NET/IB/1(1)/GDRDMA 3: jzxh017:51584:52479 [2] NCCL INFO Channel 01/0 : 14[2] -> 1[1] [send] via NET/IB/1(13)/GDRDMA 3: jzxh017:51583:52476 [1] NCCL INFO Channel 05/0 : 10[2] -> 13[1] [receive] via NET/IB/1/GDRDMA 0: jzxh014:2356328:2357230 [1] NCCL INFO Channel 13/0 : 14[2] -> 1[1] [receive] via NET/IB/1/GDRDMA 3: jzxh017:51584:52479 [2] NCCL INFO Channel 05/0 : 14[2] -> 1[1] [send] via NET/IB/1(13)/GDRDMA 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 13/0 : 2[2] -> 5[1] [send] via NET/IB/1(1)/GDRDMA 3: jzxh017:51583:52476 [1] NCCL INFO Channel 09/0 : 10[2] -> 13[1] [receive] via NET/IB/1/GDRDMA 3: jzxh017:51584:52479 [2] NCCL INFO Channel 09/0 : 14[2] -> 1[1] [send] via NET/IB/1(13)/GDRDMA 3: jzxh017:51583:52476 [1] NCCL INFO Channel 13/0 : 10[2] -> 13[1] [receive] via NET/IB/1/GDRDMA 3: jzxh017:51584:52479 [2] NCCL INFO Channel 13/0 : 14[2] -> 1[1] [send] via NET/IB/1(13)/GDRDMA 2: jzxh016:896065:896964 [1] NCCL INFO Channel 01/0 : 9[1] -> 8[0] via P2P/CUMEM 2: jzxh016:896065:896964 [1] NCCL INFO Channel 02/0 : 9[1] -> 8[0] via P2P/CUMEM 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 01/0 : 5[1] -> 4[0] via P2P/CUMEM 2: jzxh016:896065:896964 [1] NCCL INFO Channel 05/0 : 9[1] -> 8[0] via P2P/CUMEM 2: jzxh016:896065:896964 [1] NCCL INFO Channel 06/0 : 9[1] -> 8[0] via P2P/CUMEM 0: jzxh014:2356328:2357230 [1] NCCL INFO Channel 01/0 : 1[1] -> 0[0] via P2P/CUMEM 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 02/0 : 5[1] -> 4[0] via P2P/CUMEM 3: jzxh017:51583:52476 [1] NCCL INFO Channel 01/0 : 13[1] -> 12[0] via P2P/CUMEM 2: jzxh016:896065:896964 [1] NCCL INFO Channel 09/0 : 9[1] -> 8[0] via P2P/CUMEM 0: jzxh014:2356328:2357230 [1] NCCL INFO Channel 02/0 : 1[1] -> 0[0] via P2P/CUMEM 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 05/0 : 5[1] -> 4[0] via P2P/CUMEM 0: jzxh014:2356328:2357230 [1] NCCL INFO Channel 05/0 : 1[1] -> 0[0] via P2P/CUMEM 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 06/0 : 5[1] -> 4[0] via P2P/CUMEM 3: jzxh017:51583:52476 [1] NCCL INFO Channel 02/0 : 13[1] -> 12[0] via P2P/CUMEM 2: jzxh016:896065:896964 [1] NCCL INFO Channel 10/0 : 9[1] -> 8[0] via P2P/CUMEM 3: jzxh017:51583:52476 [1] NCCL INFO Channel 05/0 : 13[1] -> 12[0] via P2P/CUMEM 0: jzxh014:2356328:2357230 [1] NCCL INFO Channel 06/0 : 1[1] -> 0[0] via P2P/CUMEM 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 09/0 : 5[1] -> 4[0] via P2P/CUMEM 2: jzxh016:896065:896964 [1] NCCL INFO Channel 13/0 : 9[1] -> 8[0] via P2P/CUMEM 2: jzxh016:896065:896964 [1] NCCL INFO Channel 14/0 : 9[1] -> 8[0] via P2P/CUMEM 3: jzxh017:51583:52476 [1] NCCL INFO Channel 06/0 : 13[1] -> 12[0] via P2P/CUMEM 0: jzxh014:2356328:2357230 [1] NCCL INFO Channel 09/0 : 1[1] -> 0[0] via P2P/CUMEM 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 10/0 : 5[1] -> 4[0] via P2P/CUMEM 0: jzxh014:2356328:2357230 [1] NCCL INFO Channel 10/0 : 1[1] -> 0[0] via P2P/CUMEM 3: jzxh017:51583:52476 [1] NCCL INFO Channel 09/0 : 13[1] -> 12[0] via P2P/CUMEM 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 13/0 : 5[1] -> 4[0] via P2P/CUMEM 0: jzxh014:2356328:2357230 [1] NCCL INFO Channel 13/0 : 1[1] -> 0[0] via P2P/CUMEM 3: jzxh017:51583:52476 [1] NCCL INFO Channel 10/0 : 13[1] -> 12[0] via P2P/CUMEM 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 14/0 : 5[1] -> 4[0] via P2P/CUMEM 0: jzxh014:2356328:2357230 [1] NCCL INFO Channel 14/0 : 1[1] -> 0[0] via P2P/CUMEM 3: jzxh017:51583:52476 [1] NCCL INFO Channel 13/0 : 13[1] -> 12[0] via P2P/CUMEM 3: jzxh017:51583:52476 [1] NCCL INFO Channel 14/0 : 13[1] -> 12[0] via P2P/CUMEM 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 03/0 : 2[2] -> 7[3] [receive] via NET/IB/3/GDRDMA 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 03/0 : 6[2] -> 11[3] [send] via NET/IB/3(7)/GDRDMA 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 07/0 : 2[2] -> 7[3] [receive] via NET/IB/3/GDRDMA 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 07/0 : 6[2] -> 11[3] [send] via NET/IB/3(7)/GDRDMA 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 11/0 : 2[2] -> 7[3] [receive] via NET/IB/3/GDRDMA 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 11/0 : 6[2] -> 11[3] [send] via NET/IB/3(7)/GDRDMA 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 15/0 : 2[2] -> 7[3] [receive] via NET/IB/3/GDRDMA 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 15/0 : 6[2] -> 11[3] [send] via NET/IB/3(7)/GDRDMA 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 03/0 : 14[2] -> 3[3] [receive] via NET/IB/3/GDRDMA 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 03/0 : 2[2] -> 7[3] [send] via NET/IB/3(3)/GDRDMA 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 07/0 : 14[2] -> 3[3] [receive] via NET/IB/3/GDRDMA 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 07/0 : 2[2] -> 7[3] [send] via NET/IB/3(3)/GDRDMA 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 11/0 : 14[2] -> 3[3] [receive] via NET/IB/3/GDRDMA 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 11/0 : 2[2] -> 7[3] [send] via NET/IB/3(3)/GDRDMA 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 15/0 : 14[2] -> 3[3] [receive] via NET/IB/3/GDRDMA 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 15/0 : 2[2] -> 7[3] [send] via NET/IB/3(3)/GDRDMA 3: jzxh017:51585:52478 [3] NCCL INFO Channel 03/0 : 10[2] -> 15[3] [receive] via NET/IB/3/GDRDMA 3: jzxh017:51584:52479 [2] NCCL INFO Channel 03/0 : 14[2] -> 3[3] [send] via NET/IB/3(15)/GDRDMA 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 03/0 : 7[3] -> 4[0] via P2P/CUMEM 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 07/0 : 7[3] -> 4[0] via P2P/CUMEM 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 03/0 : 3[3] -> 0[0] via P2P/CUMEM 3: jzxh017:51585:52478 [3] NCCL INFO Channel 07/0 : 10[2] -> 15[3] [receive] via NET/IB/3/GDRDMA 3: jzxh017:51584:52479 [2] NCCL INFO Channel 07/0 : 14[2] -> 3[3] [send] via NET/IB/3(15)/GDRDMA 3: jzxh017:51585:52478 [3] NCCL INFO Channel 11/0 : 10[2] -> 15[3] [receive] via NET/IB/3/GDRDMA 3: jzxh017:51584:52479 [2] NCCL INFO Channel 11/0 : 14[2] -> 3[3] [send] via NET/IB/3(15)/GDRDMA 3: jzxh017:51585:52478 [3] NCCL INFO Channel 15/0 : 10[2] -> 15[3] [receive] via NET/IB/3/GDRDMA 3: jzxh017:51584:52479 [2] NCCL INFO Channel 15/0 : 14[2] -> 3[3] [send] via NET/IB/3(15)/GDRDMA 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 11/0 : 7[3] -> 4[0] via P2P/CUMEM 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 07/0 : 3[3] -> 0[0] via P2P/CUMEM 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 02/0 : 2[2] -> 1[1] via P2P/CUMEM 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 15/0 : 7[3] -> 4[0] via P2P/CUMEM 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 11/0 : 3[3] -> 0[0] via P2P/CUMEM 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 06/0 : 2[2] -> 1[1] via P2P/CUMEM 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 15/0 : 3[3] -> 0[0] via P2P/CUMEM 2: jzxh016:896067:896963 [3] NCCL INFO Channel 03/0 : 6[2] -> 11[3] [receive] via NET/IB/3/GDRDMA 2: jzxh016:896066:896961 [2] NCCL INFO Channel 03/0 : 10[2] -> 15[3] [send] via NET/IB/3(11)/GDRDMA 2: jzxh016:896067:896963 [3] NCCL INFO Channel 07/0 : 6[2] -> 11[3] [receive] via NET/IB/3/GDRDMA 2: jzxh016:896066:896961 [2] NCCL INFO Channel 07/0 : 10[2] -> 15[3] [send] via NET/IB/3(11)/GDRDMA 2: jzxh016:896067:896963 [3] NCCL INFO Channel 11/0 : 6[2] -> 11[3] [receive] via NET/IB/3/GDRDMA 2: jzxh016:896066:896961 [2] NCCL INFO Channel 11/0 : 10[2] -> 15[3] [send] via NET/IB/3(11)/GDRDMA 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 10/0 : 2[2] -> 1[1] via P2P/CUMEM 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 01/0 : 7[3] -> 6[2] via P2P/CUMEM 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 14/0 : 2[2] -> 1[1] via P2P/CUMEM 3: jzxh017:51585:52478 [3] NCCL INFO Channel 03/0 : 15[3] -> 12[0] via P2P/CUMEM 2: jzxh016:896067:896963 [3] NCCL INFO Channel 15/0 : 6[2] -> 11[3] [receive] via NET/IB/3/GDRDMA 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 01/0 : 3[3] -> 2[2] via P2P/CUMEM 2: jzxh016:896066:896961 [2] NCCL INFO Channel 15/0 : 10[2] -> 15[3] [send] via NET/IB/3(11)/GDRDMA 2: jzxh016:896067:896963 [3] NCCL INFO Channel 03/0 : 11[3] -> 8[0] via P2P/CUMEM 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 05/0 : 7[3] -> 6[2] via P2P/CUMEM 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 05/0 : 3[3] -> 2[2] via P2P/CUMEM 3: jzxh017:51584:52479 [2] NCCL INFO Channel 02/0 : 14[2] -> 13[1] via P2P/CUMEM 3: jzxh017:51584:52479 [2] NCCL INFO Channel 06/0 : 14[2] -> 13[1] via P2P/CUMEM 2: jzxh016:896067:896963 [3] NCCL INFO Channel 07/0 : 11[3] -> 8[0] via P2P/CUMEM 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 09/0 : 7[3] -> 6[2] via P2P/CUMEM 3: jzxh017:51585:52478 [3] NCCL INFO Channel 07/0 : 15[3] -> 12[0] via P2P/CUMEM 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 09/0 : 3[3] -> 2[2] via P2P/CUMEM 3: jzxh017:51584:52479 [2] NCCL INFO Channel 10/0 : 14[2] -> 13[1] via P2P/CUMEM 2: jzxh016:896067:896963 [3] NCCL INFO Channel 11/0 : 11[3] -> 8[0] via P2P/CUMEM 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 13/0 : 3[3] -> 2[2] via P2P/CUMEM 3: jzxh017:51585:52478 [3] NCCL INFO Channel 11/0 : 15[3] -> 12[0] via P2P/CUMEM 2: jzxh016:896067:896963 [3] NCCL INFO Channel 15/0 : 11[3] -> 8[0] via P2P/CUMEM 3: jzxh017:51584:52479 [2] NCCL INFO Channel 14/0 : 14[2] -> 13[1] via P2P/CUMEM 3: jzxh017:51585:52478 [3] NCCL INFO Channel 15/0 : 15[3] -> 12[0] via P2P/CUMEM 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 13/0 : 7[3] -> 6[2] via P2P/CUMEM 2: jzxh016:896066:896961 [2] NCCL INFO Channel 02/0 : 10[2] -> 9[1] via P2P/CUMEM 3: jzxh017:51585:52478 [3] NCCL INFO Channel 01/0 : 15[3] -> 14[2] via P2P/CUMEM 2: jzxh016:896067:896963 [3] NCCL INFO Channel 01/0 : 11[3] -> 10[2] via P2P/CUMEM 2: jzxh016:896066:896961 [2] NCCL INFO Channel 06/0 : 10[2] -> 9[1] via P2P/CUMEM 3: jzxh017:51585:52478 [3] NCCL INFO Channel 05/0 : 15[3] -> 14[2] via P2P/CUMEM 2: jzxh016:896067:896963 [3] NCCL INFO Channel 05/0 : 11[3] -> 10[2] via P2P/CUMEM 2: jzxh016:896066:896961 [2] NCCL INFO Channel 10/0 : 10[2] -> 9[1] via P2P/CUMEM 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 02/0 : 6[2] -> 5[1] via P2P/CUMEM 3: jzxh017:51585:52478 [3] NCCL INFO Channel 09/0 : 15[3] -> 14[2] via P2P/CUMEM 2: jzxh016:896066:896961 [2] NCCL INFO Channel 14/0 : 10[2] -> 9[1] via P2P/CUMEM 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 06/0 : 6[2] -> 5[1] via P2P/CUMEM 3: jzxh017:51585:52478 [3] NCCL INFO Channel 13/0 : 15[3] -> 14[2] via P2P/CUMEM 2: jzxh016:896067:896963 [3] NCCL INFO Channel 09/0 : 11[3] -> 10[2] via P2P/CUMEM 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 10/0 : 6[2] -> 5[1] via P2P/CUMEM 2: jzxh016:896067:896963 [3] NCCL INFO Channel 13/0 : 11[3] -> 10[2] via P2P/CUMEM 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 14/0 : 6[2] -> 5[1] via P2P/CUMEM 1: jzxh015:2965582:2967787 [0] NCCL INFO Connected all rings 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 02/0 : 4[0] -> 5[1] via P2P/CUMEM 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 10/0 : 4[0] -> 5[1] via P2P/CUMEM 0: jzxh014:2356329:2357228 [2] NCCL INFO Connected all rings 0: jzxh014:2356330:2357229 [3] NCCL INFO Connected all rings 0: jzxh014:2356327:2357227 [0] NCCL INFO Connected all rings 0: jzxh014:2356328:2357230 [1] NCCL INFO Connected all rings 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 02/0 : 0[0] -> 1[1] via P2P/CUMEM 1: jzxh015:2965583:2967788 [1] NCCL INFO Connected all rings 2: jzxh016:896066:896961 [2] NCCL INFO Connected all rings 2: jzxh016:896067:896963 [3] NCCL INFO Connected all rings 2: jzxh016:896064:896962 [0] NCCL INFO Connected all rings 1: jzxh015:2965585:2967786 [3] NCCL INFO Connected all rings 1: jzxh015:2965584:2967789 [2] NCCL INFO Connected all rings 3: jzxh017:51583:52476 [1] NCCL INFO Connected all rings 3: jzxh017:51582:52477 [0] NCCL INFO Connected all rings 3: jzxh017:51584:52479 [2] NCCL INFO Connected all rings 3: jzxh017:51585:52478 [3] NCCL INFO Connected all rings 3: jzxh017:51582:52477 [0] NCCL INFO Channel 02/0 : 12[0] -> 13[1] via P2P/CUMEM 2: jzxh016:896065:896964 [1] NCCL INFO Connected all rings 2: jzxh016:896064:896962 [0] NCCL INFO Channel 02/0 : 8[0] -> 9[1] via P2P/CUMEM 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 10/0 : 0[0] -> 1[1] via P2P/CUMEM 3: jzxh017:51582:52477 [0] NCCL INFO Channel 10/0 : 12[0] -> 13[1] via P2P/CUMEM 0: jzxh014:2356328:2357230 [1] NCCL INFO Channel 01/0 : 1[1] -> 2[2] via P2P/CUMEM 2: jzxh016:896064:896962 [0] NCCL INFO Channel 10/0 : 8[0] -> 9[1] via P2P/CUMEM 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 01/0 : 6[2] -> 7[3] via P2P/CUMEM 0: jzxh014:2356328:2357230 [1] NCCL INFO Channel 09/0 : 1[1] -> 2[2] via P2P/CUMEM 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 01/0 : 2[2] -> 3[3] via P2P/CUMEM 3: jzxh017:51584:52479 [2] NCCL INFO Channel 01/0 : 14[2] -> 15[3] via P2P/CUMEM 2: jzxh016:896066:896961 [2] NCCL INFO Channel 01/0 : 10[2] -> 11[3] via P2P/CUMEM 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 01/0 : 5[1] -> 6[2] via P2P/CUMEM 3: jzxh017:51583:52476 [1] NCCL INFO Channel 01/0 : 13[1] -> 14[2] via P2P/CUMEM 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 07/0 : 6[2] -> 7[3] via P2P/CUMEM 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 07/0 : 2[2] -> 3[3] via P2P/CUMEM 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 02/0 : 0[0] -> 2[2] via P2P/CUMEM 2: jzxh016:896066:896961 [2] NCCL INFO Channel 07/0 : 10[2] -> 11[3] via P2P/CUMEM 3: jzxh017:51584:52479 [2] NCCL INFO Channel 07/0 : 14[2] -> 15[3] via P2P/CUMEM 3: jzxh017:51583:52476 [1] NCCL INFO Channel 09/0 : 13[1] -> 14[2] via P2P/CUMEM 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 09/0 : 5[1] -> 6[2] via P2P/CUMEM 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 09/0 : 6[2] -> 7[3] via P2P/CUMEM 2: jzxh016:896065:896964 [1] NCCL INFO Channel 01/0 : 9[1] -> 10[2] via P2P/CUMEM 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 09/0 : 2[2] -> 3[3] via P2P/CUMEM 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 03/0 : 0[0] -> 2[2] via P2P/CUMEM 3: jzxh017:51584:52479 [2] NCCL INFO Channel 09/0 : 14[2] -> 15[3] via P2P/CUMEM 2: jzxh016:896066:896961 [2] NCCL INFO Channel 09/0 : 10[2] -> 11[3] via P2P/CUMEM 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 15/0 : 6[2] -> 7[3] via P2P/CUMEM 2: jzxh016:896065:896964 [1] NCCL INFO Channel 09/0 : 9[1] -> 10[2] via P2P/CUMEM 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 02/0 : 4[0] -> 6[2] via P2P/CUMEM 3: jzxh017:51582:52477 [0] NCCL INFO Channel 02/0 : 12[0] -> 14[2] via P2P/CUMEM 2: jzxh016:896066:896961 [2] NCCL INFO Channel 15/0 : 10[2] -> 11[3] via P2P/CUMEM 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 15/0 : 2[2] -> 3[3] via P2P/CUMEM 3: jzxh017:51584:52479 [2] NCCL INFO Channel 15/0 : 14[2] -> 15[3] via P2P/CUMEM 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 05/0 : 0[0] -> 2[2] via P2P/CUMEM 3: jzxh017:51582:52477 [0] NCCL INFO Channel 03/0 : 12[0] -> 14[2] via P2P/CUMEM 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 03/0 : 4[0] -> 6[2] via P2P/CUMEM 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 06/0 : 0[0] -> 2[2] via P2P/CUMEM 0: jzxh014:2356328:2357230 [1] NCCL INFO Channel 02/0 : 1[1] -> 3[3] via P2P/CUMEM 3: jzxh017:51582:52477 [0] NCCL INFO Channel 05/0 : 12[0] -> 14[2] via P2P/CUMEM 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 02/0 : 5[1] -> 7[3] via P2P/CUMEM 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 10/0 : 0[0] -> 2[2] via P2P/CUMEM 0: jzxh014:2356328:2357230 [1] NCCL INFO Channel 03/0 : 1[1] -> 3[3] via P2P/CUMEM 3: jzxh017:51583:52476 [1] NCCL INFO Channel 02/0 : 13[1] -> 15[3] via P2P/CUMEM 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 05/0 : 4[0] -> 6[2] via P2P/CUMEM 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 03/0 : 5[1] -> 7[3] via P2P/CUMEM 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 11/0 : 0[0] -> 2[2] via P2P/CUMEM 0: jzxh014:2356328:2357230 [1] NCCL INFO Channel 05/0 : 1[1] -> 3[3] via P2P/CUMEM 3: jzxh017:51582:52477 [0] NCCL INFO Channel 06/0 : 12[0] -> 14[2] via P2P/CUMEM 2: jzxh016:896064:896962 [0] NCCL INFO Channel 02/0 : 8[0] -> 10[2] via P2P/CUMEM 2: jzxh016:896065:896964 [1] NCCL INFO Channel 02/0 : 9[1] -> 11[3] via P2P/CUMEM 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 05/0 : 5[1] -> 7[3] via P2P/CUMEM 3: jzxh017:51583:52476 [1] NCCL INFO Channel 03/0 : 13[1] -> 15[3] via P2P/CUMEM 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 13/0 : 0[0] -> 2[2] via P2P/CUMEM 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 06/0 : 4[0] -> 6[2] via P2P/CUMEM 0: jzxh014:2356328:2357230 [1] NCCL INFO Channel 06/0 : 1[1] -> 3[3] via P2P/CUMEM 2: jzxh016:896065:896964 [1] NCCL INFO Channel 03/0 : 9[1] -> 11[3] via P2P/CUMEM 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 14/0 : 0[0] -> 2[2] via P2P/CUMEM 3: jzxh017:51582:52477 [0] NCCL INFO Channel 10/0 : 12[0] -> 14[2] via P2P/CUMEM 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 06/0 : 5[1] -> 7[3] via P2P/CUMEM 0: jzxh014:2356328:2357230 [1] NCCL INFO Channel 10/0 : 1[1] -> 3[3] via P2P/CUMEM 2: jzxh016:896065:896964 [1] NCCL INFO Channel 05/0 : 9[1] -> 11[3] via P2P/CUMEM 3: jzxh017:51583:52476 [1] NCCL INFO Channel 05/0 : 13[1] -> 15[3] via P2P/CUMEM 2: jzxh016:896064:896962 [0] NCCL INFO Channel 03/0 : 8[0] -> 10[2] via P2P/CUMEM 0: jzxh014:2356328:2357230 [1] NCCL INFO Channel 11/0 : 1[1] -> 3[3] via P2P/CUMEM 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 10/0 : 4[0] -> 6[2] via P2P/CUMEM 3: jzxh017:51582:52477 [0] NCCL INFO Channel 11/0 : 12[0] -> 14[2] via P2P/CUMEM 0: jzxh014:2356328:2357230 [1] NCCL INFO Channel 13/0 : 1[1] -> 3[3] via P2P/CUMEM 3: jzxh017:51583:52476 [1] NCCL INFO Channel 06/0 : 13[1] -> 15[3] via P2P/CUMEM 2: jzxh016:896065:896964 [1] NCCL INFO Channel 06/0 : 9[1] -> 11[3] via P2P/CUMEM 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 10/0 : 5[1] -> 7[3] via P2P/CUMEM 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 11/0 : 4[0] -> 6[2] via P2P/CUMEM 2: jzxh016:896064:896962 [0] NCCL INFO Channel 05/0 : 8[0] -> 10[2] via P2P/CUMEM 0: jzxh014:2356328:2357230 [1] NCCL INFO Channel 14/0 : 1[1] -> 3[3] via P2P/CUMEM 3: jzxh017:51583:52476 [1] NCCL INFO Channel 10/0 : 13[1] -> 15[3] via P2P/CUMEM 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 11/0 : 5[1] -> 7[3] via P2P/CUMEM 2: jzxh016:896065:896964 [1] NCCL INFO Channel 10/0 : 9[1] -> 11[3] via P2P/CUMEM 2: jzxh016:896064:896962 [0] NCCL INFO Channel 06/0 : 8[0] -> 10[2] via P2P/CUMEM 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 13/0 : 4[0] -> 6[2] via P2P/CUMEM 3: jzxh017:51583:52476 [1] NCCL INFO Channel 11/0 : 13[1] -> 15[3] via P2P/CUMEM 3: jzxh017:51583:52476 [1] NCCL INFO Channel 13/0 : 13[1] -> 15[3] via P2P/CUMEM 2: jzxh016:896064:896962 [0] NCCL INFO Channel 10/0 : 8[0] -> 10[2] via P2P/CUMEM 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 13/0 : 5[1] -> 7[3] via P2P/CUMEM 2: jzxh016:896065:896964 [1] NCCL INFO Channel 11/0 : 9[1] -> 11[3] via P2P/CUMEM 3: jzxh017:51582:52477 [0] NCCL INFO Channel 13/0 : 12[0] -> 14[2] via P2P/CUMEM 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 14/0 : 4[0] -> 6[2] via P2P/CUMEM 3: jzxh017:51583:52476 [1] NCCL INFO Channel 14/0 : 13[1] -> 15[3] via P2P/CUMEM 2: jzxh016:896065:896964 [1] NCCL INFO Channel 13/0 : 9[1] -> 11[3] via P2P/CUMEM 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 14/0 : 5[1] -> 7[3] via P2P/CUMEM 3: jzxh017:51582:52477 [0] NCCL INFO Channel 14/0 : 12[0] -> 14[2] via P2P/CUMEM 2: jzxh016:896064:896962 [0] NCCL INFO Channel 11/0 : 8[0] -> 10[2] via P2P/CUMEM 3: jzxh017:51582:52477 [0] NCCL INFO Channel 04/0 : 12[0] -> 15[3] via P2P/CUMEM 2: jzxh016:896065:896964 [1] NCCL INFO Channel 14/0 : 9[1] -> 11[3] via P2P/CUMEM 0: jzxh014:2356328:2357230 [1] NCCL INFO Channel 09/0 : 1[1] -> 5[1] [send] via NET/IB/1/GDRDMA 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 04/0 : 0[0] -> 3[3] via P2P/CUMEM 0: jzxh014:2356328:2357230 [1] NCCL INFO Channel 13/0 : 1[1] -> 5[1] [send] via NET/IB/1/GDRDMA 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 10/0 : 2[2] -> 6[2] [send] via NET/IB/2/GDRDMA 3: jzxh017:51584:52479 [2] NCCL INFO Channel 02/0 : 10[2] -> 14[2] [receive] via NET/IB/2/GDRDMA 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 04/0 : 4[0] -> 7[3] via P2P/CUMEM 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 14/0 : 2[2] -> 6[2] [send] via NET/IB/2/GDRDMA 2: jzxh016:896064:896962 [0] NCCL INFO Channel 13/0 : 8[0] -> 10[2] via P2P/CUMEM 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 12/0 : 0[0] -> 3[3] via P2P/CUMEM 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 10/0 : 2[2] -> 6[2] [receive] via NET/IB/2/GDRDMA 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 14/0 : 2[2] -> 6[2] [receive] via NET/IB/2/GDRDMA 3: jzxh017:51584:52479 [2] NCCL INFO Channel 06/0 : 10[2] -> 14[2] [receive] via NET/IB/2/GDRDMA 3: jzxh017:51582:52477 [0] NCCL INFO Channel 12/0 : 12[0] -> 15[3] via P2P/CUMEM 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 12/0 : 4[0] -> 7[3] via P2P/CUMEM 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 02/0 : 6[2] -> 10[2] [send] via NET/IB/2/GDRDMA 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 06/0 : 6[2] -> 10[2] [send] via NET/IB/2/GDRDMA 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 10/0 : 6[2] -> 10[2] [send] via NET/IB/2/GDRDMA 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 09/0 : 1[1] -> 5[1] [receive] via NET/IB/1/GDRDMA 2: jzxh016:896064:896962 [0] NCCL INFO Channel 14/0 : 8[0] -> 10[2] via P2P/CUMEM 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 11/0 : 3[3] -> 7[3] [send] via NET/IB/3/GDRDMA 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 08/0 : 0[0] -> 4[0] [send] via NET/IB/0/GDRDMA 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 15/0 : 3[3] -> 7[3] [send] via NET/IB/3/GDRDMA 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 12/0 : 0[0] -> 4[0] [send] via NET/IB/0/GDRDMA 2: jzxh016:896065:896964 [1] NCCL INFO Channel 01/0 : 5[1] -> 9[1] [receive] via NET/IB/1/GDRDMA 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 02/0 : 10[2] -> 2[2] [receive] via NET/IB/2/GDRDMA 2: jzxh016:896065:896964 [1] NCCL INFO Channel 05/0 : 5[1] -> 9[1] [receive] via NET/IB/1/GDRDMA 3: jzxh017:51583:52476 [1] NCCL INFO Channel 01/0 : 9[1] -> 13[1] [receive] via NET/IB/1/GDRDMA 2: jzxh016:896065:896964 [1] NCCL INFO Channel 09/0 : 5[1] -> 9[1] [receive] via NET/IB/1/GDRDMA 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 14/0 : 6[2] -> 10[2] [send] via NET/IB/2/GDRDMA 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 06/0 : 10[2] -> 2[2] [receive] via NET/IB/2/GDRDMA 2: jzxh016:896064:896962 [0] NCCL INFO Channel 04/0 : 8[0] -> 11[3] via P2P/CUMEM 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 13/0 : 1[1] -> 5[1] [receive] via NET/IB/1/GDRDMA 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 01/0 : 5[1] -> 9[1] [send] via NET/IB/1/GDRDMA 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 05/0 : 5[1] -> 9[1] [send] via NET/IB/1/GDRDMA 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 09/0 : 5[1] -> 9[1] [send] via NET/IB/1/GDRDMA 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 02/0 : 2[2] -> 10[2] [send] via NET/IB/2/GDRDMA 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 06/0 : 2[2] -> 10[2] [send] via NET/IB/2/GDRDMA 2: jzxh016:896065:896964 [1] NCCL INFO Channel 13/0 : 5[1] -> 9[1] [receive] via NET/IB/1/GDRDMA 2: jzxh016:896066:896961 [2] NCCL INFO Channel 02/0 : 6[2] -> 10[2] [receive] via NET/IB/2/GDRDMA 2: jzxh016:896065:896964 [1] NCCL INFO Channel 01/0 : 9[1] -> 13[1] [send] via NET/IB/1/GDRDMA 2: jzxh016:896066:896961 [2] NCCL INFO Channel 06/0 : 6[2] -> 10[2] [receive] via NET/IB/2/GDRDMA 0: jzxh014:2356328:2357230 [1] NCCL INFO Channel 01/0 : 9[1] -> 1[1] [receive] via NET/IB/1/GDRDMA 2: jzxh016:896065:896964 [1] NCCL INFO Channel 05/0 : 9[1] -> 13[1] [send] via NET/IB/1/GDRDMA 2: jzxh016:896064:896962 [0] NCCL INFO Channel 12/0 : 8[0] -> 11[3] via P2P/CUMEM 3: jzxh017:51583:52476 [1] NCCL INFO Channel 05/0 : 9[1] -> 13[1] [receive] via NET/IB/1/GDRDMA 2: jzxh016:896066:896961 [2] NCCL INFO Channel 10/0 : 6[2] -> 10[2] [receive] via NET/IB/2/GDRDMA 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 13/0 : 5[1] -> 9[1] [send] via NET/IB/1/GDRDMA 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 11/0 : 3[3] -> 7[3] [receive] via NET/IB/3/GDRDMA 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 08/0 : 0[0] -> 4[0] [receive] via NET/IB/0/GDRDMA 0: jzxh014:2356328:2357230 [1] NCCL INFO Channel 05/0 : 9[1] -> 1[1] [receive] via NET/IB/1/GDRDMA 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 12/0 : 0[0] -> 4[0] [receive] via NET/IB/0/GDRDMA 0: jzxh014:2356328:2357230 [1] NCCL INFO Channel 01/0 : 1[1] -> 9[1] [send] via NET/IB/1/GDRDMA 0: jzxh014:2356328:2357230 [1] NCCL INFO Channel 05/0 : 1[1] -> 9[1] [send] via NET/IB/1/GDRDMA 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 15/0 : 3[3] -> 7[3] [receive] via NET/IB/3/GDRDMA 2: jzxh016:896066:896961 [2] NCCL INFO Channel 14/0 : 6[2] -> 10[2] [receive] via NET/IB/2/GDRDMA 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 00/0 : 4[0] -> 8[0] [send] via NET/IB/0/GDRDMA 2: jzxh016:896066:896961 [2] NCCL INFO Channel 02/0 : 10[2] -> 14[2] [send] via NET/IB/2/GDRDMA 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 03/0 : 7[3] -> 11[3] [send] via NET/IB/3/GDRDMA 2: jzxh016:896066:896961 [2] NCCL INFO Channel 06/0 : 10[2] -> 14[2] [send] via NET/IB/2/GDRDMA 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 04/0 : 4[0] -> 8[0] [send] via NET/IB/0/GDRDMA 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 09/0 : 13[1] -> 5[1] [receive] via NET/IB/1/GDRDMA 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 07/0 : 7[3] -> 11[3] [send] via NET/IB/3/GDRDMA 3: jzxh017:51582:52477 [0] NCCL INFO Channel 00/0 : 8[0] -> 12[0] [receive] via NET/IB/0/GDRDMA 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 08/0 : 4[0] -> 8[0] [send] via NET/IB/0/GDRDMA 3: jzxh017:51583:52476 [1] NCCL INFO Channel 09/0 : 5[1] -> 13[1] [receive] via NET/IB/1/GDRDMA 3: jzxh017:51585:52478 [3] NCCL INFO Channel 03/0 : 11[3] -> 15[3] [receive] via NET/IB/3/GDRDMA 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 00/0 : 8[0] -> 0[0] [receive] via NET/IB/0/GDRDMA 3: jzxh017:51582:52477 [0] NCCL INFO Channel 04/0 : 8[0] -> 12[0] [receive] via NET/IB/0/GDRDMA 2: jzxh016:896065:896964 [1] NCCL INFO Channel 01/0 : 1[1] -> 9[1] [receive] via NET/IB/1/GDRDMA 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 04/0 : 8[0] -> 0[0] [receive] via NET/IB/0/GDRDMA 3: jzxh017:51583:52476 [1] NCCL INFO Channel 13/0 : 5[1] -> 13[1] [receive] via NET/IB/1/GDRDMA 3: jzxh017:51585:52478 [3] NCCL INFO Channel 07/0 : 11[3] -> 15[3] [receive] via NET/IB/3/GDRDMA 2: jzxh016:896067:896963 [3] NCCL INFO Channel 03/0 : 7[3] -> 11[3] [receive] via NET/IB/3/GDRDMA 2: jzxh016:896064:896962 [0] NCCL INFO Channel 00/0 : 4[0] -> 8[0] [receive] via NET/IB/0/GDRDMA 2: jzxh016:896065:896964 [1] NCCL INFO Channel 05/0 : 1[1] -> 9[1] [receive] via NET/IB/1/GDRDMA 2: jzxh016:896065:896964 [1] NCCL INFO Channel 01/0 : 9[1] -> 1[1] [send] via NET/IB/1/GDRDMA 2: jzxh016:896064:896962 [0] NCCL INFO Channel 04/0 : 4[0] -> 8[0] [receive] via NET/IB/0/GDRDMA 3: jzxh017:51583:52476 [1] NCCL INFO Channel 09/0 : 13[1] -> 5[1] [send] via NET/IB/1/GDRDMA 3: jzxh017:51583:52476 [1] NCCL INFO Channel 13/0 : 13[1] -> 5[1] [send] via NET/IB/1/GDRDMA 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 13/0 : 13[1] -> 5[1] [receive] via NET/IB/1/GDRDMA 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 12/0 : 4[0] -> 8[0] [send] via NET/IB/0/GDRDMA 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 11/0 : 7[3] -> 11[3] [send] via NET/IB/3/GDRDMA 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 03/0 : 11[3] -> 3[3] [receive] via NET/IB/3/GDRDMA 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 09/0 : 5[1] -> 13[1] [send] via NET/IB/1/GDRDMA 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 15/0 : 7[3] -> 11[3] [send] via NET/IB/3/GDRDMA 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 00/0 : 0[0] -> 8[0] [send] via NET/IB/0/GDRDMA 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 10/0 : 14[2] -> 6[2] [receive] via NET/IB/2/GDRDMA 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 13/0 : 5[1] -> 13[1] [send] via NET/IB/1/GDRDMA 0: jzxh014:2356328:2357230 [1] NCCL INFO Channel 09/0 : 5[1] -> 1[1] [receive] via NET/IB/1/GDRDMA 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 14/0 : 14[2] -> 6[2] [receive] via NET/IB/2/GDRDMA 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 07/0 : 11[3] -> 3[3] [receive] via NET/IB/3/GDRDMA 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 04/0 : 0[0] -> 8[0] [send] via NET/IB/0/GDRDMA 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 03/0 : 3[3] -> 11[3] [send] via NET/IB/3/GDRDMA 0: jzxh014:2356328:2357230 [1] NCCL INFO Channel 13/0 : 5[1] -> 1[1] [receive] via NET/IB/1/GDRDMA 3: jzxh017:51584:52479 [2] NCCL INFO Channel 10/0 : 6[2] -> 14[2] [receive] via NET/IB/2/GDRDMA 2: jzxh016:896067:896963 [3] NCCL INFO Channel 07/0 : 7[3] -> 11[3] [receive] via NET/IB/3/GDRDMA 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 07/0 : 3[3] -> 11[3] [send] via NET/IB/3/GDRDMA 3: jzxh017:51584:52479 [2] NCCL INFO Channel 14/0 : 6[2] -> 14[2] [receive] via NET/IB/2/GDRDMA 2: jzxh016:896066:896961 [2] NCCL INFO Channel 02/0 : 2[2] -> 10[2] [receive] via NET/IB/2/GDRDMA 3: jzxh017:51583:52476 [1] NCCL INFO Channel 01/0 : 13[1] -> 9[1] [send] via NET/IB/1/GDRDMA 2: jzxh016:896065:896964 [1] NCCL INFO Channel 05/0 : 9[1] -> 1[1] [send] via NET/IB/1/GDRDMA 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 10/0 : 6[2] -> 2[2] [receive] via NET/IB/2/GDRDMA 3: jzxh017:51584:52479 [2] NCCL INFO Channel 10/0 : 14[2] -> 6[2] [send] via NET/IB/2/GDRDMA 2: jzxh016:896064:896962 [0] NCCL INFO Channel 08/0 : 4[0] -> 8[0] [receive] via NET/IB/0/GDRDMA 2: jzxh016:896067:896963 [3] NCCL INFO Channel 11/0 : 7[3] -> 11[3] [receive] via NET/IB/3/GDRDMA 3: jzxh017:51583:52476 [1] NCCL INFO Channel 05/0 : 13[1] -> 9[1] [send] via NET/IB/1/GDRDMA 2: jzxh016:896066:896961 [2] NCCL INFO Channel 06/0 : 2[2] -> 10[2] [receive] via NET/IB/2/GDRDMA 3: jzxh017:51584:52479 [2] NCCL INFO Channel 14/0 : 14[2] -> 6[2] [send] via NET/IB/2/GDRDMA 2: jzxh016:896067:896963 [3] NCCL INFO Channel 15/0 : 7[3] -> 11[3] [receive] via NET/IB/3/GDRDMA 2: jzxh016:896064:896962 [0] NCCL INFO Channel 12/0 : 4[0] -> 8[0] [receive] via NET/IB/0/GDRDMA 2: jzxh016:896066:896961 [2] NCCL INFO Channel 02/0 : 10[2] -> 2[2] [send] via NET/IB/2/GDRDMA 2: jzxh016:896067:896963 [3] NCCL INFO Channel 03/0 : 11[3] -> 15[3] [send] via NET/IB/3/GDRDMA 2: jzxh016:896064:896962 [0] NCCL INFO Channel 00/0 : 8[0] -> 12[0] [send] via NET/IB/0/GDRDMA 2: jzxh016:896066:896961 [2] NCCL INFO Channel 06/0 : 10[2] -> 2[2] [send] via NET/IB/2/GDRDMA 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 14/0 : 6[2] -> 2[2] [receive] via NET/IB/2/GDRDMA 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 10/0 : 6[2] -> 14[2] [send] via NET/IB/2/GDRDMA 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 14/0 : 6[2] -> 14[2] [send] via NET/IB/2/GDRDMA 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 01/0 : 9[1] -> 5[1] [receive] via NET/IB/1/GDRDMA 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 05/0 : 9[1] -> 5[1] [receive] via NET/IB/1/GDRDMA 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 09/0 : 9[1] -> 5[1] [receive] via NET/IB/1/GDRDMA 2: jzxh016:896065:896964 [1] NCCL INFO Channel 01/0 : 13[1] -> 9[1] [receive] via NET/IB/1/GDRDMA 2: jzxh016:896067:896963 [3] NCCL INFO Channel 07/0 : 11[3] -> 15[3] [send] via NET/IB/3/GDRDMA 2: jzxh016:896064:896962 [0] NCCL INFO Channel 04/0 : 8[0] -> 12[0] [send] via NET/IB/0/GDRDMA 2: jzxh016:896065:896964 [1] NCCL INFO Channel 05/0 : 13[1] -> 9[1] [receive] via NET/IB/1/GDRDMA 2: jzxh016:896066:896961 [2] NCCL INFO Channel 02/0 : 14[2] -> 10[2] [receive] via NET/IB/2/GDRDMA 2: jzxh016:896065:896964 [1] NCCL INFO Channel 01/0 : 9[1] -> 5[1] [send] via NET/IB/1/GDRDMA 2: jzxh016:896066:896961 [2] NCCL INFO Channel 06/0 : 14[2] -> 10[2] [receive] via NET/IB/2/GDRDMA 2: jzxh016:896065:896964 [1] NCCL INFO Channel 05/0 : 9[1] -> 5[1] [send] via NET/IB/1/GDRDMA 2: jzxh016:896064:896962 [0] NCCL INFO Channel 00/0 : 0[0] -> 8[0] [receive] via NET/IB/0/GDRDMA 2: jzxh016:896067:896963 [3] NCCL INFO Channel 03/0 : 3[3] -> 11[3] [receive] via NET/IB/3/GDRDMA 2: jzxh016:896066:896961 [2] NCCL INFO Channel 02/0 : 10[2] -> 6[2] [send] via NET/IB/2/GDRDMA 3: jzxh017:51584:52479 [2] NCCL INFO Channel 02/0 : 14[2] -> 10[2] [send] via NET/IB/2/GDRDMA 2: jzxh016:896065:896964 [1] NCCL INFO Channel 09/0 : 9[1] -> 5[1] [send] via NET/IB/1/GDRDMA 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 08/0 : 12[0] -> 4[0] [receive] via NET/IB/0/GDRDMA 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 11/0 : 15[3] -> 7[3] [receive] via NET/IB/3/GDRDMA 2: jzxh016:896064:896962 [0] NCCL INFO Channel 04/0 : 0[0] -> 8[0] [receive] via NET/IB/0/GDRDMA 2: jzxh016:896066:896961 [2] NCCL INFO Channel 06/0 : 10[2] -> 6[2] [send] via NET/IB/2/GDRDMA 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 13/0 : 9[1] -> 5[1] [receive] via NET/IB/1/GDRDMA 2: jzxh016:896067:896963 [3] NCCL INFO Channel 07/0 : 3[3] -> 11[3] [receive] via NET/IB/3/GDRDMA 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 15/0 : 15[3] -> 7[3] [receive] via NET/IB/3/GDRDMA 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 12/0 : 12[0] -> 4[0] [receive] via NET/IB/0/GDRDMA 2: jzxh016:896065:896964 [1] NCCL INFO Channel 13/0 : 9[1] -> 5[1] [send] via NET/IB/1/GDRDMA 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 09/0 : 5[1] -> 1[1] [send] via NET/IB/1/GDRDMA 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 02/0 : 10[2] -> 6[2] [receive] via NET/IB/2/GDRDMA 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 11/0 : 7[3] -> 15[3] [send] via NET/IB/3/GDRDMA 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 08/0 : 4[0] -> 12[0] [send] via NET/IB/0/GDRDMA 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 13/0 : 5[1] -> 1[1] [send] via NET/IB/1/GDRDMA 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 15/0 : 7[3] -> 15[3] [send] via NET/IB/3/GDRDMA 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 06/0 : 10[2] -> 6[2] [receive] via NET/IB/2/GDRDMA 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 12/0 : 4[0] -> 12[0] [send] via NET/IB/0/GDRDMA 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 11/0 : 7[3] -> 3[3] [receive] via NET/IB/3/GDRDMA 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 08/0 : 4[0] -> 0[0] [receive] via NET/IB/0/GDRDMA 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 15/0 : 7[3] -> 3[3] [receive] via NET/IB/3/GDRDMA 0: jzxh014:2356327:2357227 [0] NCCL INFO Channel 12/0 : 4[0] -> 0[0] [receive] via NET/IB/0/GDRDMA 3: jzxh017:51584:52479 [2] NCCL INFO Channel 06/0 : 14[2] -> 10[2] [send] via NET/IB/2/GDRDMA 2: jzxh016:896064:896962 [0] NCCL INFO Channel 00/0 : 8[0] -> 0[0] [send] via NET/IB/0/GDRDMA 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 10/0 : 10[2] -> 6[2] [receive] via NET/IB/2/GDRDMA 3: jzxh017:51584:52479 [2] NCCL INFO Channel 02/0 : 14[2] -> 12[0] via P2P/CUMEM 2: jzxh016:896067:896963 [3] NCCL INFO Channel 03/0 : 11[3] -> 3[3] [send] via NET/IB/3/GDRDMA 2: jzxh016:896066:896961 [2] NCCL INFO Channel 10/0 : 10[2] -> 6[2] [send] via NET/IB/2/GDRDMA 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 14/0 : 10[2] -> 6[2] [receive] via NET/IB/2/GDRDMA 2: jzxh016:896064:896962 [0] NCCL INFO Channel 04/0 : 8[0] -> 0[0] [send] via NET/IB/0/GDRDMA 2: jzxh016:896067:896963 [3] NCCL INFO Channel 07/0 : 11[3] -> 3[3] [send] via NET/IB/3/GDRDMA 2: jzxh016:896066:896961 [2] NCCL INFO Channel 14/0 : 10[2] -> 6[2] [send] via NET/IB/2/GDRDMA 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 10/0 : 6[2] -> 2[2] [send] via NET/IB/2/GDRDMA 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 14/0 : 6[2] -> 2[2] [send] via NET/IB/2/GDRDMA 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 02/0 : 2[2] -> 0[0] via P2P/CUMEM 3: jzxh017:51584:52479 [2] NCCL INFO Channel 03/0 : 14[2] -> 12[0] via P2P/CUMEM 2: jzxh016:896064:896962 [0] NCCL INFO Channel 00/0 : 12[0] -> 8[0] [receive] via NET/IB/0/GDRDMA 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 02/0 : 6[2] -> 4[0] via P2P/CUMEM 2: jzxh016:896067:896963 [3] NCCL INFO Channel 03/0 : 15[3] -> 11[3] [receive] via NET/IB/3/GDRDMA 2: jzxh016:896067:896963 [3] NCCL INFO Channel 07/0 : 15[3] -> 11[3] [receive] via NET/IB/3/GDRDMA 2: jzxh016:896064:896962 [0] NCCL INFO Channel 04/0 : 12[0] -> 8[0] [receive] via NET/IB/0/GDRDMA 2: jzxh016:896067:896963 [3] NCCL INFO Channel 03/0 : 11[3] -> 7[3] [send] via NET/IB/3/GDRDMA 2: jzxh016:896064:896962 [0] NCCL INFO Channel 00/0 : 8[0] -> 4[0] [send] via NET/IB/0/GDRDMA 2: jzxh016:896067:896963 [3] NCCL INFO Channel 07/0 : 11[3] -> 7[3] [send] via NET/IB/3/GDRDMA 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 03/0 : 2[2] -> 0[0] via P2P/CUMEM 2: jzxh016:896064:896962 [0] NCCL INFO Channel 04/0 : 8[0] -> 4[0] [send] via NET/IB/0/GDRDMA 2: jzxh016:896066:896961 [2] NCCL INFO Channel 02/0 : 10[2] -> 8[0] via P2P/CUMEM 2: jzxh016:896067:896963 [3] NCCL INFO Channel 11/0 : 11[3] -> 7[3] [send] via NET/IB/3/GDRDMA 2: jzxh016:896064:896962 [0] NCCL INFO Channel 08/0 : 8[0] -> 4[0] [send] via NET/IB/0/GDRDMA 2: jzxh016:896067:896963 [3] NCCL INFO Channel 15/0 : 11[3] -> 7[3] [send] via NET/IB/3/GDRDMA 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 05/0 : 2[2] -> 0[0] via P2P/CUMEM 3: jzxh017:51584:52479 [2] NCCL INFO Channel 05/0 : 14[2] -> 12[0] via P2P/CUMEM 2: jzxh016:896064:896962 [0] NCCL INFO Channel 12/0 : 8[0] -> 4[0] [send] via NET/IB/0/GDRDMA 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 03/0 : 6[2] -> 4[0] via P2P/CUMEM 3: jzxh017:51584:52479 [2] NCCL INFO Channel 06/0 : 14[2] -> 12[0] via P2P/CUMEM 2: jzxh016:896066:896961 [2] NCCL INFO Channel 03/0 : 10[2] -> 8[0] via P2P/CUMEM 2: jzxh016:896066:896961 [2] NCCL INFO Channel 05/0 : 10[2] -> 8[0] via P2P/CUMEM 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 06/0 : 2[2] -> 0[0] via P2P/CUMEM 3: jzxh017:51584:52479 [2] NCCL INFO Channel 10/0 : 14[2] -> 12[0] via P2P/CUMEM 2: jzxh016:896066:896961 [2] NCCL INFO Channel 06/0 : 10[2] -> 8[0] via P2P/CUMEM 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 05/0 : 6[2] -> 4[0] via P2P/CUMEM 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 10/0 : 2[2] -> 0[0] via P2P/CUMEM 3: jzxh017:51584:52479 [2] NCCL INFO Channel 11/0 : 14[2] -> 12[0] via P2P/CUMEM 2: jzxh016:896066:896961 [2] NCCL INFO Channel 10/0 : 10[2] -> 8[0] via P2P/CUMEM 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 06/0 : 6[2] -> 4[0] via P2P/CUMEM 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 10/0 : 6[2] -> 4[0] via P2P/CUMEM 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 11/0 : 2[2] -> 0[0] via P2P/CUMEM 3: jzxh017:51585:52478 [3] NCCL INFO Channel 11/0 : 7[3] -> 15[3] [receive] via NET/IB/3/GDRDMA 3: jzxh017:51584:52479 [2] NCCL INFO Channel 13/0 : 14[2] -> 12[0] via P2P/CUMEM 3: jzxh017:51585:52478 [3] NCCL INFO Channel 15/0 : 7[3] -> 15[3] [receive] via NET/IB/3/GDRDMA 2: jzxh016:896066:896961 [2] NCCL INFO Channel 11/0 : 10[2] -> 8[0] via P2P/CUMEM 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 11/0 : 6[2] -> 4[0] via P2P/CUMEM 3: jzxh017:51585:52478 [3] NCCL INFO Channel 11/0 : 15[3] -> 7[3] [send] via NET/IB/3/GDRDMA 2: jzxh016:896066:896961 [2] NCCL INFO Channel 13/0 : 10[2] -> 8[0] via P2P/CUMEM 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 13/0 : 6[2] -> 4[0] via P2P/CUMEM 3: jzxh017:51584:52479 [2] NCCL INFO Channel 14/0 : 14[2] -> 12[0] via P2P/CUMEM 3: jzxh017:51585:52478 [3] NCCL INFO Channel 15/0 : 15[3] -> 7[3] [send] via NET/IB/3/GDRDMA 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 13/0 : 2[2] -> 0[0] via P2P/CUMEM 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 14/0 : 2[2] -> 0[0] via P2P/CUMEM 2: jzxh016:896066:896961 [2] NCCL INFO Channel 14/0 : 10[2] -> 8[0] via P2P/CUMEM 3: jzxh017:51585:52478 [3] NCCL INFO Channel 03/0 : 15[3] -> 11[3] [send] via NET/IB/3/GDRDMA 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 14/0 : 6[2] -> 4[0] via P2P/CUMEM 3: jzxh017:51585:52478 [3] NCCL INFO Channel 07/0 : 15[3] -> 11[3] [send] via NET/IB/3/GDRDMA 3: jzxh017:51585:52478 [3] NCCL INFO Channel 01/0 : 15[3] -> 12[0] via P2P/CUMEM 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 03/0 : 11[3] -> 7[3] [receive] via NET/IB/3/GDRDMA 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 07/0 : 11[3] -> 7[3] [receive] via NET/IB/3/GDRDMA 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 11/0 : 11[3] -> 7[3] [receive] via NET/IB/3/GDRDMA 3: jzxh017:51585:52478 [3] NCCL INFO Channel 04/0 : 15[3] -> 12[0] via P2P/CUMEM 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 15/0 : 11[3] -> 7[3] [receive] via NET/IB/3/GDRDMA 3: jzxh017:51585:52478 [3] NCCL INFO Channel 05/0 : 15[3] -> 12[0] via P2P/CUMEM 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 11/0 : 7[3] -> 3[3] [send] via NET/IB/3/GDRDMA 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 15/0 : 7[3] -> 3[3] [send] via NET/IB/3/GDRDMA 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 01/0 : 3[3] -> 0[0] via P2P/CUMEM 2: jzxh016:896067:896963 [3] NCCL INFO Channel 01/0 : 11[3] -> 8[0] via P2P/CUMEM 3: jzxh017:51585:52478 [3] NCCL INFO Channel 06/0 : 15[3] -> 12[0] via P2P/CUMEM 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 01/0 : 7[3] -> 4[0] via P2P/CUMEM 3: jzxh017:51585:52478 [3] NCCL INFO Channel 09/0 : 15[3] -> 12[0] via P2P/CUMEM 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 04/0 : 3[3] -> 0[0] via P2P/CUMEM 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 05/0 : 3[3] -> 0[0] via P2P/CUMEM 2: jzxh016:896067:896963 [3] NCCL INFO Channel 04/0 : 11[3] -> 8[0] via P2P/CUMEM 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 04/0 : 7[3] -> 4[0] via P2P/CUMEM 3: jzxh017:51585:52478 [3] NCCL INFO Channel 12/0 : 15[3] -> 12[0] via P2P/CUMEM 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 06/0 : 3[3] -> 0[0] via P2P/CUMEM 3: jzxh017:51585:52478 [3] NCCL INFO Channel 13/0 : 15[3] -> 12[0] via P2P/CUMEM 2: jzxh016:896067:896963 [3] NCCL INFO Channel 05/0 : 11[3] -> 8[0] via P2P/CUMEM 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 05/0 : 7[3] -> 4[0] via P2P/CUMEM 2: jzxh016:896067:896963 [3] NCCL INFO Channel 06/0 : 11[3] -> 8[0] via P2P/CUMEM 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 09/0 : 3[3] -> 0[0] via P2P/CUMEM 2: jzxh016:896067:896963 [3] NCCL INFO Channel 09/0 : 11[3] -> 8[0] via P2P/CUMEM 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 06/0 : 7[3] -> 4[0] via P2P/CUMEM 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 12/0 : 3[3] -> 0[0] via P2P/CUMEM 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 13/0 : 3[3] -> 0[0] via P2P/CUMEM 2: jzxh016:896067:896963 [3] NCCL INFO Channel 12/0 : 11[3] -> 8[0] via P2P/CUMEM 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 09/0 : 7[3] -> 4[0] via P2P/CUMEM 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 14/0 : 3[3] -> 0[0] via P2P/CUMEM 2: jzxh016:896067:896963 [3] NCCL INFO Channel 13/0 : 11[3] -> 8[0] via P2P/CUMEM 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 12/0 : 7[3] -> 4[0] via P2P/CUMEM 3: jzxh017:51585:52478 [3] NCCL INFO Channel 14/0 : 15[3] -> 12[0] via P2P/CUMEM 2: jzxh016:896067:896963 [3] NCCL INFO Channel 14/0 : 11[3] -> 8[0] via P2P/CUMEM 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 13/0 : 7[3] -> 4[0] via P2P/CUMEM 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 14/0 : 7[3] -> 4[0] via P2P/CUMEM 3: jzxh017:51582:52477 [0] NCCL INFO Channel 08/0 : 4[0] -> 12[0] [receive] via NET/IB/0/GDRDMA 3: jzxh017:51582:52477 [0] NCCL INFO Channel 12/0 : 4[0] -> 12[0] [receive] via NET/IB/0/GDRDMA 3: jzxh017:51582:52477 [0] NCCL INFO Channel 08/0 : 12[0] -> 4[0] [send] via NET/IB/0/GDRDMA 3: jzxh017:51582:52477 [0] NCCL INFO Channel 12/0 : 12[0] -> 4[0] [send] via NET/IB/0/GDRDMA 3: jzxh017:51582:52477 [0] NCCL INFO Channel 00/0 : 12[0] -> 8[0] [send] via NET/IB/0/GDRDMA 3: jzxh017:51582:52477 [0] NCCL INFO Channel 04/0 : 12[0] -> 8[0] [send] via NET/IB/0/GDRDMA 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 00/0 : 8[0] -> 4[0] [receive] via NET/IB/0/GDRDMA 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 04/0 : 8[0] -> 4[0] [receive] via NET/IB/0/GDRDMA 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 08/0 : 8[0] -> 4[0] [receive] via NET/IB/0/GDRDMA 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 12/0 : 8[0] -> 4[0] [receive] via NET/IB/0/GDRDMA 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 08/0 : 4[0] -> 0[0] [send] via NET/IB/0/GDRDMA 1: jzxh015:2965582:2967787 [0] NCCL INFO Channel 12/0 : 4[0] -> 0[0] [send] via NET/IB/0/GDRDMA 3: jzxh017:51585:52478 [3] NCCL INFO Channel 02/0 : 15[3] -> 13[1] via P2P/CUMEM 3: jzxh017:51585:52478 [3] NCCL INFO Channel 03/0 : 15[3] -> 13[1] via P2P/CUMEM 3: jzxh017:51585:52478 [3] NCCL INFO Channel 05/0 : 15[3] -> 13[1] via P2P/CUMEM 3: jzxh017:51585:52478 [3] NCCL INFO Channel 06/0 : 15[3] -> 13[1] via P2P/CUMEM 3: jzxh017:51585:52478 [3] NCCL INFO Channel 10/0 : 15[3] -> 13[1] via P2P/CUMEM 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 02/0 : 3[3] -> 1[1] via P2P/CUMEM 2: jzxh016:896067:896963 [3] NCCL INFO Channel 02/0 : 11[3] -> 9[1] via P2P/CUMEM 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 02/0 : 7[3] -> 5[1] via P2P/CUMEM 3: jzxh017:51585:52478 [3] NCCL INFO Channel 11/0 : 15[3] -> 13[1] via P2P/CUMEM 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 03/0 : 3[3] -> 1[1] via P2P/CUMEM 3: jzxh017:51585:52478 [3] NCCL INFO Channel 13/0 : 15[3] -> 13[1] via P2P/CUMEM 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 03/0 : 7[3] -> 5[1] via P2P/CUMEM 2: jzxh016:896067:896963 [3] NCCL INFO Channel 03/0 : 11[3] -> 9[1] via P2P/CUMEM 3: jzxh017:51585:52478 [3] NCCL INFO Channel 14/0 : 15[3] -> 13[1] via P2P/CUMEM 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 05/0 : 3[3] -> 1[1] via P2P/CUMEM 2: jzxh016:896067:896963 [3] NCCL INFO Channel 05/0 : 11[3] -> 9[1] via P2P/CUMEM 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 05/0 : 7[3] -> 5[1] via P2P/CUMEM 3: jzxh017:51585:52478 [3] NCCL INFO Channel 00/0 : 15[3] -> 14[2] via P2P/CUMEM 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 06/0 : 3[3] -> 1[1] via P2P/CUMEM 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 06/0 : 7[3] -> 5[1] via P2P/CUMEM 3: jzxh017:51585:52478 [3] NCCL INFO Channel 04/0 : 15[3] -> 14[2] via P2P/CUMEM 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 10/0 : 7[3] -> 5[1] via P2P/CUMEM 2: jzxh016:896067:896963 [3] NCCL INFO Channel 06/0 : 11[3] -> 9[1] via P2P/CUMEM 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 11/0 : 7[3] -> 5[1] via P2P/CUMEM 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 10/0 : 3[3] -> 1[1] via P2P/CUMEM 2: jzxh016:896067:896963 [3] NCCL INFO Channel 10/0 : 11[3] -> 9[1] via P2P/CUMEM 3: jzxh017:51585:52478 [3] NCCL INFO Channel 07/0 : 15[3] -> 14[2] via P2P/CUMEM 2: jzxh016:896067:896963 [3] NCCL INFO Channel 11/0 : 11[3] -> 9[1] via P2P/CUMEM 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 13/0 : 7[3] -> 5[1] via P2P/CUMEM 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 11/0 : 3[3] -> 1[1] via P2P/CUMEM 3: jzxh017:51585:52478 [3] NCCL INFO Channel 08/0 : 15[3] -> 14[2] via P2P/CUMEM 2: jzxh016:896067:896963 [3] NCCL INFO Channel 13/0 : 11[3] -> 9[1] via P2P/CUMEM 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 14/0 : 7[3] -> 5[1] via P2P/CUMEM 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 13/0 : 3[3] -> 1[1] via P2P/CUMEM 2: jzxh016:896067:896963 [3] NCCL INFO Channel 14/0 : 11[3] -> 9[1] via P2P/CUMEM 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 14/0 : 3[3] -> 1[1] via P2P/CUMEM 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 00/0 : 7[3] -> 6[2] via P2P/CUMEM 2: jzxh016:896067:896963 [3] NCCL INFO Channel 00/0 : 11[3] -> 10[2] via P2P/CUMEM 3: jzxh017:51585:52478 [3] NCCL INFO Channel 12/0 : 15[3] -> 14[2] via P2P/CUMEM 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 00/0 : 3[3] -> 2[2] via P2P/CUMEM 2: jzxh016:896067:896963 [3] NCCL INFO Channel 04/0 : 11[3] -> 10[2] via P2P/CUMEM 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 04/0 : 7[3] -> 6[2] via P2P/CUMEM 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 04/0 : 3[3] -> 2[2] via P2P/CUMEM 2: jzxh016:896067:896963 [3] NCCL INFO Channel 07/0 : 11[3] -> 10[2] via P2P/CUMEM 3: jzxh017:51585:52478 [3] NCCL INFO Channel 15/0 : 15[3] -> 14[2] via P2P/CUMEM 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 07/0 : 7[3] -> 6[2] via P2P/CUMEM 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 07/0 : 3[3] -> 2[2] via P2P/CUMEM 3: jzxh017:51584:52479 [2] NCCL INFO Channel 00/0 : 14[2] -> 13[1] via P2P/CUMEM 2: jzxh016:896067:896963 [3] NCCL INFO Channel 08/0 : 11[3] -> 10[2] via P2P/CUMEM 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 08/0 : 3[3] -> 2[2] via P2P/CUMEM 3: jzxh017:51584:52479 [2] NCCL INFO Channel 01/0 : 14[2] -> 13[1] via P2P/CUMEM 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 08/0 : 7[3] -> 6[2] via P2P/CUMEM 3: jzxh017:51583:52476 [1] NCCL INFO Channel 00/0 : 13[1] -> 12[0] via P2P/CUMEM 2: jzxh016:896067:896963 [3] NCCL INFO Channel 12/0 : 11[3] -> 10[2] via P2P/CUMEM 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 12/0 : 3[3] -> 2[2] via P2P/CUMEM 3: jzxh017:51584:52479 [2] NCCL INFO Channel 04/0 : 14[2] -> 13[1] via P2P/CUMEM 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 12/0 : 7[3] -> 6[2] via P2P/CUMEM 3: jzxh017:51583:52476 [1] NCCL INFO Channel 03/0 : 13[1] -> 12[0] via P2P/CUMEM 3: jzxh017:51584:52479 [2] NCCL INFO Channel 07/0 : 14[2] -> 13[1] via P2P/CUMEM 3: jzxh017:51583:52476 [1] NCCL INFO Channel 07/0 : 13[1] -> 12[0] via P2P/CUMEM 0: jzxh014:2356330:2357229 [3] NCCL INFO Channel 15/0 : 3[3] -> 2[2] via P2P/CUMEM 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 00/0 : 2[2] -> 1[1] via P2P/CUMEM 2: jzxh016:896067:896963 [3] NCCL INFO Channel 15/0 : 11[3] -> 10[2] via P2P/CUMEM 3: jzxh017:51584:52479 [2] NCCL INFO Channel 08/0 : 14[2] -> 13[1] via P2P/CUMEM 3: jzxh017:51583:52476 [1] NCCL INFO Channel 08/0 : 13[1] -> 12[0] via P2P/CUMEM 3: jzxh017:51583:52476 [1] NCCL INFO Channel 11/0 : 13[1] -> 12[0] via P2P/CUMEM 3: jzxh017:51584:52479 [2] NCCL INFO Channel 09/0 : 14[2] -> 13[1] via P2P/CUMEM 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 01/0 : 2[2] -> 1[1] via P2P/CUMEM 1: jzxh015:2965585:2967786 [3] NCCL INFO Channel 15/0 : 7[3] -> 6[2] via P2P/CUMEM 3: jzxh017:51583:52476 [1] NCCL INFO Channel 15/0 : 13[1] -> 12[0] via P2P/CUMEM 3: jzxh017:51584:52479 [2] NCCL INFO Channel 12/0 : 14[2] -> 13[1] via P2P/CUMEM 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 04/0 : 2[2] -> 1[1] via P2P/CUMEM 3: jzxh017:51584:52479 [2] NCCL INFO Channel 15/0 : 14[2] -> 13[1] via P2P/CUMEM 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 07/0 : 2[2] -> 1[1] via P2P/CUMEM 2: jzxh016:896066:896961 [2] NCCL INFO Channel 00/0 : 10[2] -> 9[1] via P2P/CUMEM 0: jzxh014:2356328:2357230 [1] NCCL INFO Channel 00/0 : 1[1] -> 0[0] via P2P/CUMEM 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 08/0 : 2[2] -> 1[1] via P2P/CUMEM 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 00/0 : 6[2] -> 5[1] via P2P/CUMEM 2: jzxh016:896065:896964 [1] NCCL INFO Channel 00/0 : 9[1] -> 8[0] via P2P/CUMEM 2: jzxh016:896066:896961 [2] NCCL INFO Channel 01/0 : 10[2] -> 9[1] via P2P/CUMEM 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 09/0 : 2[2] -> 1[1] via P2P/CUMEM 0: jzxh014:2356328:2357230 [1] NCCL INFO Channel 03/0 : 1[1] -> 0[0] via P2P/CUMEM 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 01/0 : 6[2] -> 5[1] via P2P/CUMEM 2: jzxh016:896065:896964 [1] NCCL INFO Channel 03/0 : 9[1] -> 8[0] via P2P/CUMEM 2: jzxh016:896066:896961 [2] NCCL INFO Channel 04/0 : 10[2] -> 9[1] via P2P/CUMEM 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 12/0 : 2[2] -> 1[1] via P2P/CUMEM 0: jzxh014:2356328:2357230 [1] NCCL INFO Channel 07/0 : 1[1] -> 0[0] via P2P/CUMEM 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 04/0 : 6[2] -> 5[1] via P2P/CUMEM 0: jzxh014:2356329:2357228 [2] NCCL INFO Channel 15/0 : 2[2] -> 1[1] via P2P/CUMEM 2: jzxh016:896065:896964 [1] NCCL INFO Channel 07/0 : 9[1] -> 8[0] via P2P/CUMEM 2: jzxh016:896066:896961 [2] NCCL INFO Channel 07/0 : 10[2] -> 9[1] via P2P/CUMEM 2: jzxh016:896065:896964 [1] NCCL INFO Channel 08/0 : 9[1] -> 8[0] via P2P/CUMEM 0: jzxh014:2356328:2357230 [1] NCCL INFO Channel 08/0 : 1[1] -> 0[0] via P2P/CUMEM 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 07/0 : 6[2] -> 5[1] via P2P/CUMEM 2: jzxh016:896066:896961 [2] NCCL INFO Channel 08/0 : 10[2] -> 9[1] via P2P/CUMEM 2: jzxh016:896065:896964 [1] NCCL INFO Channel 11/0 : 9[1] -> 8[0] via P2P/CUMEM 2: jzxh016:896066:896961 [2] NCCL INFO Channel 09/0 : 10[2] -> 9[1] via P2P/CUMEM 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 00/0 : 5[1] -> 4[0] via P2P/CUMEM 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 08/0 : 6[2] -> 5[1] via P2P/CUMEM 0: jzxh014:2356328:2357230 [1] NCCL INFO Channel 11/0 : 1[1] -> 0[0] via P2P/CUMEM 2: jzxh016:896065:896964 [1] NCCL INFO Channel 15/0 : 9[1] -> 8[0] via P2P/CUMEM 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 03/0 : 5[1] -> 4[0] via P2P/CUMEM 0: jzxh014:2356328:2357230 [1] NCCL INFO Channel 15/0 : 1[1] -> 0[0] via P2P/CUMEM 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 09/0 : 6[2] -> 5[1] via P2P/CUMEM 2: jzxh016:896066:896961 [2] NCCL INFO Channel 12/0 : 10[2] -> 9[1] via P2P/CUMEM 2: jzxh016:896066:896961 [2] NCCL INFO Channel 15/0 : 10[2] -> 9[1] via P2P/CUMEM 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 07/0 : 5[1] -> 4[0] via P2P/CUMEM 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 12/0 : 6[2] -> 5[1] via P2P/CUMEM 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 08/0 : 5[1] -> 4[0] via P2P/CUMEM 1: jzxh015:2965584:2967789 [2] NCCL INFO Channel 15/0 : 6[2] -> 5[1] via P2P/CUMEM 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 11/0 : 5[1] -> 4[0] via P2P/CUMEM 1: jzxh015:2965583:2967788 [1] NCCL INFO Channel 15/0 : 5[1] -> 4[0] via P2P/CUMEM 0: jzxh014:2356327:2357227 [0] NCCL INFO Connected all trees 0: jzxh014:2356328:2357230 [1] NCCL INFO Connected all trees 0: jzxh014:2356330:2357229 [3] NCCL INFO Connected all trees 0: jzxh014:2356327:2357227 [0] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 0: jzxh014:2356329:2357228 [2] NCCL INFO Connected all trees 0: jzxh014:2356327:2357227 [0] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 0: jzxh014:2356328:2357230 [1] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 0: jzxh014:2356330:2357229 [3] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 0: jzxh014:2356328:2357230 [1] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 0: jzxh014:2356330:2357229 [3] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 0: jzxh014:2356329:2357228 [2] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 0: jzxh014:2356329:2357228 [2] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 1: jzxh015:2965582:2967787 [0] NCCL INFO Connected all trees 1: jzxh015:2965585:2967786 [3] NCCL INFO Connected all trees 1: jzxh015:2965583:2967788 [1] NCCL INFO Connected all trees 1: jzxh015:2965582:2967787 [0] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 1: jzxh015:2965582:2967787 [0] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 1: jzxh015:2965584:2967789 [2] NCCL INFO Connected all trees 1: jzxh015:2965585:2967786 [3] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 2: jzxh016:896064:896962 [0] NCCL INFO Connected all trees 1: jzxh015:2965585:2967786 [3] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 1: jzxh015:2965584:2967789 [2] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 1: jzxh015:2965584:2967789 [2] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 1: jzxh015:2965583:2967788 [1] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 1: jzxh015:2965583:2967788 [1] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 3: jzxh017:51582:52477 [0] NCCL INFO Connected all trees 3: jzxh017:51585:52478 [3] NCCL INFO Connected all trees 3: jzxh017:51582:52477 [0] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 2: jzxh016:896067:896963 [3] NCCL INFO Connected all trees 3: jzxh017:51582:52477 [0] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 3: jzxh017:51585:52478 [3] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 3: jzxh017:51585:52478 [3] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 2: jzxh016:896064:896962 [0] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 2: jzxh016:896064:896962 [0] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 2: jzxh016:896065:896964 [1] NCCL INFO Connected all trees 3: jzxh017:51584:52479 [2] NCCL INFO Connected all trees 3: jzxh017:51583:52476 [1] NCCL INFO Connected all trees 3: jzxh017:51584:52479 [2] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 2: jzxh016:896066:896961 [2] NCCL INFO Connected all trees 2: jzxh016:896067:896963 [3] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 3: jzxh017:51583:52476 [1] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 3: jzxh017:51584:52479 [2] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 3: jzxh017:51583:52476 [1] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 2: jzxh016:896067:896963 [3] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 2: jzxh016:896065:896964 [1] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 2: jzxh016:896066:896961 [2] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 2: jzxh016:896065:896964 [1] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 2: jzxh016:896066:896961 [2] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 1: jzxh015:2965583:2967788 [1] NCCL INFO TUNER/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-tuner.so 1: jzxh015:2965583:2967788 [1] NCCL INFO TUNER/Plugin: Using internal tuner plugin. 1: jzxh015:2965582:2967787 [0] NCCL INFO TUNER/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-tuner.so 1: jzxh015:2965583:2967788 [1] NCCL INFO ncclCommInitRank comm 0x55fffbc332b0 rank 5 nranks 16 cudaDev 1 nvmlDev 1 busId 2c000 commId 0x4261a6b6ec5d7236 - Init COMPLETE 1: jzxh015:2965582:2967787 [0] NCCL INFO TUNER/Plugin: Using internal tuner plugin. 1: jzxh015:2965582:2967787 [0] NCCL INFO ncclCommInitRank comm 0x5581ebd8acb0 rank 4 nranks 16 cudaDev 0 nvmlDev 0 busId 1b000 commId 0x4261a6b6ec5d7236 - Init COMPLETE 1: jzxh015:2965584:2967789 [2] NCCL INFO TUNER/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-tuner.so 1: jzxh015:2965585:2967786 [3] NCCL INFO TUNER/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-tuner.so 1: jzxh015:2965584:2967789 [2] NCCL INFO TUNER/Plugin: Using internal tuner plugin. 1: jzxh015:2965585:2967786 [3] NCCL INFO TUNER/Plugin: Using internal tuner plugin. 1: jzxh015:2965584:2967789 [2] NCCL INFO ncclCommInitRank comm 0x564e0a919f70 rank 6 nranks 16 cudaDev 2 nvmlDev 2 busId 9d000 commId 0x4261a6b6ec5d7236 - Init COMPLETE 1: jzxh015:2965585:2967786 [3] NCCL INFO ncclCommInitRank comm 0x55ee7f078d80 rank 7 nranks 16 cudaDev 3 nvmlDev 3 busId ad000 commId 0x4261a6b6ec5d7236 - Init COMPLETE 0: jzxh014:2356330:2357229 [3] NCCL INFO TUNER/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-tuner.so 0: jzxh014:2356330:2357229 [3] NCCL INFO TUNER/Plugin: Using internal tuner plugin. 0: jzxh014:2356330:2357229 [3] NCCL INFO ncclCommInitRank comm 0x55abd54283f0 rank 3 nranks 16 cudaDev 3 nvmlDev 3 busId ad000 commId 0x4261a6b6ec5d7236 - Init COMPLETE 0: jzxh014:2356329:2357228 [2] NCCL INFO TUNER/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-tuner.so 0: jzxh014:2356329:2357228 [2] NCCL INFO TUNER/Plugin: Using internal tuner plugin. 0: jzxh014:2356327:2357227 [0] NCCL INFO TUNER/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-tuner.so 0: jzxh014:2356328:2357230 [1] NCCL INFO TUNER/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-tuner.so 0: jzxh014:2356328:2357230 [1] NCCL INFO TUNER/Plugin: Using internal tuner plugin. 0: jzxh014:2356327:2357227 [0] NCCL INFO TUNER/Plugin: Using internal tuner plugin. 0: jzxh014:2356329:2357228 [2] NCCL INFO ncclCommInitRank comm 0x55e468be3960 rank 2 nranks 16 cudaDev 2 nvmlDev 2 busId 9d000 commId 0x4261a6b6ec5d7236 - Init COMPLETE 0: jzxh014:2356328:2357230 [1] NCCL INFO ncclCommInitRank comm 0x559d12187a60 rank 1 nranks 16 cudaDev 1 nvmlDev 1 busId 2c000 commId 0x4261a6b6ec5d7236 - Init COMPLETE 0: jzxh014:2356327:2357227 [0] NCCL INFO ncclCommInitRank comm 0x555ba8ca5800 rank 0 nranks 16 cudaDev 0 nvmlDev 0 busId 1b000 commId 0x4261a6b6ec5d7236 - Init COMPLETE 3: jzxh017:51584:52479 [2] NCCL INFO TUNER/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-tuner.so 3: jzxh017:51584:52479 [2] NCCL INFO TUNER/Plugin: Using internal tuner plugin. 3: jzxh017:51584:52479 [2] NCCL INFO ncclCommInitRank comm 0x564cbfab3310 rank 14 nranks 16 cudaDev 2 nvmlDev 2 busId 9d000 commId 0x4261a6b6ec5d7236 - Init COMPLETE 3: jzxh017:51585:52478 [3] NCCL INFO TUNER/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-tuner.so 3: jzxh017:51585:52478 [3] NCCL INFO TUNER/Plugin: Using internal tuner plugin. 3: jzxh017:51585:52478 [3] NCCL INFO ncclCommInitRank comm 0x56086fae1720 rank 15 nranks 16 cudaDev 3 nvmlDev 3 busId ad000 commId 0x4261a6b6ec5d7236 - Init COMPLETE 3: jzxh017:51583:52476 [1] NCCL INFO TUNER/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-tuner.so 3: jzxh017:51582:52477 [0] NCCL INFO TUNER/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-tuner.so 3: jzxh017:51583:52476 [1] NCCL INFO TUNER/Plugin: Using internal tuner plugin. 3: jzxh017:51582:52477 [0] NCCL INFO TUNER/Plugin: Using internal tuner plugin. 3: jzxh017:51583:52476 [1] NCCL INFO ncclCommInitRank comm 0x558abda268d0 rank 13 nranks 16 cudaDev 1 nvmlDev 1 busId 2c000 commId 0x4261a6b6ec5d7236 - Init COMPLETE 3: jzxh017:51582:52477 [0] NCCL INFO ncclCommInitRank comm 0x559e3dba4770 rank 12 nranks 16 cudaDev 0 nvmlDev 0 busId 1b000 commId 0x4261a6b6ec5d7236 - Init COMPLETE 1: jzxh015:2965584:2967818 [2] NCCL INFO Channel 12/1 : 6[2] -> 0[0] [send] via NET/IB/0(4)/GDRDMA/Shared 1: jzxh015:2965584:2967818 [2] NCCL INFO Channel 13/1 : 6[2] -> 0[0] [send] via NET/IB/0(4)/GDRDMA/Shared 1: jzxh015:2965582:2967819 [0] NCCL INFO Channel 12/1 : 4[0] -> 0[0] [send] via NET/IB/0/GDRDMA/Shared 1: jzxh015:2965582:2967819 [0] NCCL INFO Channel 13/1 : 4[0] -> 0[0] [send] via NET/IB/0/GDRDMA/Shared 1: jzxh015:2965585:2967820 [3] NCCL INFO Channel 12/1 : 7[3] -> 0[0] [send] via NET/IB/0(4)/GDRDMA/Shared 1: jzxh015:2965583:2967821 [1] NCCL INFO Channel 12/1 : 5[1] -> 0[0] [send] via NET/IB/0(4)/GDRDMA/Shared 1: jzxh015:2965583:2967821 [1] NCCL INFO Channel 13/1 : 5[1] -> 0[0] [send] via NET/IB/0(4)/GDRDMA/Shared 0: jzxh014:2356329:2357261 [2] NCCL INFO Channel 00/1 : 2[2] -> 0[0] via P2P/CUMEM 1: jzxh015:2965585:2967820 [3] NCCL INFO Channel 13/1 : 7[3] -> 0[0] [send] via NET/IB/0(4)/GDRDMA/Shared 0: jzxh014:2356327:2357262 [0] NCCL INFO Channel 08/1 : 15[3] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 0: jzxh014:2356330:2357260 [3] NCCL INFO Channel 00/1 : 3[3] -> 0[0] via P2P/CUMEM 0: jzxh014:2356327:2357262 [0] NCCL INFO Channel 09/1 : 15[3] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 0: jzxh014:2356328:2357259 [1] NCCL INFO Channel 00/1 : 1[1] -> 0[0] via P2P/CUMEM 0: jzxh014:2356329:2357261 [2] NCCL INFO Channel 01/1 : 2[2] -> 0[0] via P2P/CUMEM 0: jzxh014:2356330:2357260 [3] NCCL INFO Channel 01/1 : 3[3] -> 0[0] via P2P/CUMEM 3: jzxh017:51583:52508 [1] NCCL INFO Channel 08/1 : 13[1] -> 0[0] [send] via NET/IB/0(12)/GDRDMA/Shared 2: jzxh016:896065:896964 [1] NCCL INFO TUNER/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-tuner.so 2: jzxh016:896065:896964 [1] NCCL INFO TUNER/Plugin: Using internal tuner plugin. 2: jzxh016:896065:896964 [1] NCCL INFO ncclCommInitRank comm 0x55a46b9ae900 rank 9 nranks 16 cudaDev 1 nvmlDev 1 busId 2c000 commId 0x4261a6b6ec5d7236 - Init COMPLETE 3: jzxh017:51583:52508 [1] NCCL INFO Channel 09/1 : 13[1] -> 0[0] [send] via NET/IB/0(12)/GDRDMA/Shared 0: jzxh014:2356328:2357259 [1] NCCL INFO Channel 01/1 : 1[1] -> 0[0] via P2P/CUMEM 2: jzxh016:896066:896961 [2] NCCL INFO TUNER/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-tuner.so 2: jzxh016:896067:896963 [3] NCCL INFO TUNER/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-tuner.so 2: jzxh016:896066:896961 [2] NCCL INFO TUNER/Plugin: Using internal tuner plugin. 2: jzxh016:896064:896962 [0] NCCL INFO TUNER/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-tuner.so 2: jzxh016:896067:896963 [3] NCCL INFO TUNER/Plugin: Using internal tuner plugin. 2: jzxh016:896066:896961 [2] NCCL INFO ncclCommInitRank comm 0x556ee6361e10 rank 10 nranks 16 cudaDev 2 nvmlDev 2 busId 9d000 commId 0x4261a6b6ec5d7236 - Init COMPLETE 2: jzxh016:896064:896962 [0] NCCL INFO TUNER/Plugin: Using internal tuner plugin. 2: jzxh016:896067:896963 [3] NCCL INFO ncclCommInitRank comm 0x5576b8f9bbe0 rank 11 nranks 16 cudaDev 3 nvmlDev 3 busId ad000 commId 0x4261a6b6ec5d7236 - Init COMPLETE 2: jzxh016:896064:896962 [0] NCCL INFO ncclCommInitRank comm 0x55a50cad5640 rank 8 nranks 16 cudaDev 0 nvmlDev 0 busId 1b000 commId 0x4261a6b6ec5d7236 - Init COMPLETE 3: jzxh017:51585:52510 [3] NCCL INFO Channel 08/1 : 15[3] -> 0[0] [send] via NET/IB/0(12)/GDRDMA/Shared 3: jzxh017:51582:52509 [0] NCCL INFO Channel 08/1 : 12[0] -> 0[0] [send] via NET/IB/0/GDRDMA/Shared 0: jzxh014:2356327:2357262 [0] NCCL INFO Channel 08/1 : 14[2] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 0: jzxh014:2356327:2357262 [0] NCCL INFO Channel 09/1 : 14[2] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 3: jzxh017:51585:52510 [3] NCCL INFO Channel 09/1 : 15[3] -> 0[0] [send] via NET/IB/0(12)/GDRDMA/Shared 3: jzxh017:51582:52509 [0] NCCL INFO Channel 09/1 : 12[0] -> 0[0] [send] via NET/IB/0/GDRDMA/Shared 3: jzxh017:51584:52511 [2] NCCL INFO Channel 08/1 : 14[2] -> 0[0] [send] via NET/IB/0(12)/GDRDMA/Shared 3: jzxh017:51584:52511 [2] NCCL INFO Channel 09/1 : 14[2] -> 0[0] [send] via NET/IB/0(12)/GDRDMA/Shared 0: jzxh014:2356327:2357262 [0] NCCL INFO Channel 08/1 : 13[1] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 0: jzxh014:2356327:2357262 [0] NCCL INFO Channel 09/1 : 13[1] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 0: jzxh014:2356327:2357262 [0] NCCL INFO Channel 08/1 : 12[0] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 0: jzxh014:2356327:2357262 [0] NCCL INFO Channel 09/1 : 12[0] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 0: jzxh014:2356327:2357262 [0] NCCL INFO Channel 04/1 : 11[3] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 0: jzxh014:2356327:2357262 [0] NCCL INFO Channel 05/1 : 11[3] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 2: jzxh016:896065:896993 [1] NCCL INFO Channel 04/1 : 9[1] -> 0[0] [send] via NET/IB/0(8)/GDRDMA/Shared 2: jzxh016:896067:896994 [3] NCCL INFO Channel 04/1 : 11[3] -> 0[0] [send] via NET/IB/0(8)/GDRDMA/Shared 2: jzxh016:896065:896993 [1] NCCL INFO Channel 05/1 : 9[1] -> 0[0] [send] via NET/IB/0(8)/GDRDMA/Shared 2: jzxh016:896067:896994 [3] NCCL INFO Channel 05/1 : 11[3] -> 0[0] [send] via NET/IB/0(8)/GDRDMA/Shared 0: jzxh014:2356327:2357262 [0] NCCL INFO Channel 04/1 : 10[2] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 0: jzxh014:2356327:2357262 [0] NCCL INFO Channel 05/1 : 10[2] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 2: jzxh016:896066:896995 [2] NCCL INFO Channel 04/1 : 10[2] -> 0[0] [send] via NET/IB/0(8)/GDRDMA/Shared 2: jzxh016:896066:896995 [2] NCCL INFO Channel 05/1 : 10[2] -> 0[0] [send] via NET/IB/0(8)/GDRDMA/Shared 2: jzxh016:896064:896996 [0] NCCL INFO Channel 04/1 : 8[0] -> 0[0] [send] via NET/IB/0/GDRDMA/Shared 0: jzxh014:2356327:2357262 [0] NCCL INFO Channel 04/1 : 9[1] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 0: jzxh014:2356327:2357262 [0] NCCL INFO Channel 05/1 : 9[1] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 0: jzxh014:2356327:2357262 [0] NCCL INFO Channel 04/1 : 8[0] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 0: jzxh014:2356327:2357262 [0] NCCL INFO Channel 05/1 : 8[0] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 2: jzxh016:896064:896996 [0] NCCL INFO Channel 05/1 : 8[0] -> 0[0] [send] via NET/IB/0/GDRDMA/Shared 0: jzxh014:2356327:2357262 [0] NCCL INFO Channel 12/1 : 7[3] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 0: jzxh014:2356327:2357262 [0] NCCL INFO Channel 13/1 : 7[3] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 0: jzxh014:2356327:2357262 [0] NCCL INFO Channel 12/1 : 6[2] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 0: jzxh014:2356327:2357262 [0] NCCL INFO Channel 13/1 : 6[2] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 0: jzxh014:2356327:2357262 [0] NCCL INFO Channel 12/1 : 5[1] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 0: jzxh014:2356327:2357262 [0] NCCL INFO Channel 13/1 : 5[1] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 0: jzxh014:2356327:2357262 [0] NCCL INFO Channel 12/1 : 4[0] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 0: jzxh014:2356327:2357262 [0] NCCL INFO Channel 13/1 : 4[0] -> 0[0] [receive] via NET/IB/0/GDRDMA/Shared 0: [2025-11-23 14:47:06,273] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:436] [PID:2356327] [RANK:0] gather_len_batches: [79586, 79586, 79586, 79586, 79586, 79586, 79586, 79586, 79586, 79586, 79586, 79586, 79586, 79586, 79586, 79586] 0: [2025-11-23 14:47:06,314] [INFO] [axolotl.utils.trainer.calc_sample_packing_eff_est:495] [PID:2356327] [RANK:0] sample_packing_eff_est across ranks: [0.8591607213020325, 0.8591607213020325, 0.8591607213020325, 0.8591607213020325, 0.8591607213020325, 0.8591607213020325, 0.8591607213020325, 0.8591607213020325, 0.8591607213020325, 0.8591607213020325, 0.8591607213020325, 0.8591607213020325, 0.8591607213020325, 0.8591607213020325, 0.8591607213020325, 0.8591607213020325] 0: [2025-11-23 14:47:06,331] [INFO] [axolotl.utils.data.sft._prepare_standard_dataset:127] [PID:2356327] [RANK:0] Maximum number of steps set at 2984 1: Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. 3: Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. 3: Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. 3: Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. 0: Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. 2: Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. 1: Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. 0: Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. 2: Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. 0: Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. 0: Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. 1: Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. 3: Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. 2: Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. 2: Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. 1: Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. 0: [2025-11-23 14:47:13,632] [INFO] [axolotl.monkeypatch.transformers.trainer_loss_calc.patch_evaluation_loop:110] [PID:2356327] [RANK:0] Patched Trainer.evaluation_loop with nanmean loss calculation 0: [2025-11-23 14:47:13,634] [INFO] [axolotl.monkeypatch.transformers.trainer_loss_calc.patch_maybe_log_save_evaluate:164] [PID:2356327] [RANK:0] Patched Trainer._maybe_log_save_evaluate with nanmean loss calculation 2: Loading checkpoint shards: 0%| | 0/2 [00:0012->8 [1] -1/-1/-1->12->15 [2] 13/-1/-1->12->14 [3] 14/-1/-1->12->13 [4] 15/-1/-1->12->8 [5] 14/-1/-1->12->15 [6] 15/-1/-1->12->14 [7] -1/-1/-1->12->13 [8] 13/4/-1->12->-1 [9] -1/-1/-1->12->15 [10] 13/-1/-1->12->14 [11] 14/-1/-1->12->13 [12] 15/4/-1->12->-1 [13] 14/-1/-1->12->15 [14] 15/-1/-1->12->14 [15] -1/-1/-1->12->13 3: jzxh017:51582:53370 [0] NCCL INFO P2P Chunksize set to 131072 0: jzxh014:2356330:2358127 [3] NCCL INFO comm 0x14c18412ee00 rank 3 nRanks 16 nNodes 4 localRanks 4 localRank 3 MNNVL 0 0: jzxh014:2356329:2358125 [2] NCCL INFO comm 0x153460122200 rank 2 nRanks 16 nNodes 4 localRanks 4 localRank 2 MNNVL 0 0: jzxh014:2356328:2358126 [1] NCCL INFO comm 0x15475412d200 rank 1 nRanks 16 nNodes 4 localRanks 4 localRank 1 MNNVL 0 0: jzxh014:2356330:2358127 [3] NCCL INFO Trees [0] -1/-1/-1->3->2 [1] 0/-1/-1->3->2 [2] -1/-1/-1->3->1 [3] 1/11/-1->3->-1 [4] 2/-1/-1->3->0 [5] 0/-1/-1->3->1 [6] 1/-1/-1->3->0 [7] 2/11/-1->3->-1 [8] -1/-1/-1->3->2 [9] 0/-1/-1->3->2 [10] -1/-1/-1->3->1 [11] 1/-1/-1->3->7 [12] 2/-1/-1->3->0 [13] 0/-1/-1->3->1 [14] 1/-1/-1->3->0 [15] 2/-1/-1->3->7 0: jzxh014:2356329:2358125 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1 [2] 0/10/-1->2->-1 [3] -1/-1/-1->2->0 [4] 1/-1/-1->2->3 [5] -1/-1/-1->2->0 [6] 0/10/-1->2->-1 [7] 1/-1/-1->2->3 [8] 3/-1/-1->2->1 [9] 3/-1/-1->2->1 [10] 0/-1/-1->2->6 [11] -1/-1/-1->2->0 [12] 1/-1/-1->2->3 [13] -1/-1/-1->2->0 [14] 0/-1/-1->2->6 [15] 1/-1/-1->2->3 0: jzxh014:2356330:2358127 [3] NCCL INFO P2P Chunksize set to 131072 0: jzxh014:2356329:2358125 [2] NCCL INFO P2P Chunksize set to 131072 0: jzxh014:2356327:2358124 [0] NCCL INFO comm 0x1472b8130d20 rank 0 nRanks 16 nNodes 4 localRanks 4 localRank 0 MNNVL 0 0: jzxh014:2356328:2358126 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/9/-1->1->-1 [2] 3/-1/-1->1->0 [3] 0/-1/-1->1->3 [4] -1/-1/-1->1->2 [5] 3/9/-1->1->-1 [6] -1/-1/-1->1->3 [7] 0/-1/-1->1->2 [8] 2/-1/-1->1->0 [9] 2/-1/-1->1->5 [10] 3/-1/-1->1->0 [11] 0/-1/-1->1->3 [12] -1/-1/-1->1->2 [13] 3/-1/-1->1->5 [14] -1/-1/-1->1->3 [15] 0/-1/-1->1->2 0: jzxh014:2356328:2358126 [1] NCCL INFO P2P Chunksize set to 131072 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 00/16 : 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 01/16 : 0 3 2 5 4 7 6 9 8 11 10 13 12 15 14 1 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 02/16 : 0 3 6 5 4 7 10 9 8 11 14 13 12 15 2 1 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 03/16 : 0 1 2 7 4 5 6 11 8 9 10 15 12 13 14 3 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 04/16 : 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 05/16 : 0 3 2 5 4 7 6 9 8 11 10 13 12 15 14 1 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 06/16 : 0 3 6 5 4 7 10 9 8 11 14 13 12 15 2 1 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 07/16 : 0 1 2 7 4 5 6 11 8 9 10 15 12 13 14 3 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 08/16 : 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 09/16 : 0 3 2 5 4 7 6 9 8 11 10 13 12 15 14 1 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 10/16 : 0 3 6 5 4 7 10 9 8 11 14 13 12 15 2 1 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 11/16 : 0 1 2 7 4 5 6 11 8 9 10 15 12 13 14 3 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 12/16 : 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 13/16 : 0 3 2 5 4 7 6 9 8 11 10 13 12 15 14 1 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 14/16 : 0 3 6 5 4 7 10 9 8 11 14 13 12 15 2 1 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 15/16 : 0 1 2 7 4 5 6 11 8 9 10 15 12 13 14 3 3: jzxh017:51585:53367 [3] NCCL INFO Trees [0] -1/-1/-1->15->14 [1] 12/-1/-1->15->14 [2] -1/-1/-1->15->13 [3] 13/-1/-1->15->11 [4] 14/-1/-1->15->12 [5] 12/-1/-1->15->13 [6] 13/-1/-1->15->12 [7] 14/-1/-1->15->11 [8] -1/-1/-1->15->14 [9] 12/-1/-1->15->14 [10] -1/-1/-1->15->13 [11] 13/7/-1->15->-1 [12] 14/-1/-1->15->12 [13] 12/-1/-1->15->13 [14] 13/-1/-1->15->12 [15] 14/7/-1->15->-1 3: jzxh017:51583:53369 [1] NCCL INFO Trees [0] 14/-1/-1->13->12 [1] 14/-1/-1->13->9 [2] 15/-1/-1->13->12 [3] 12/-1/-1->13->15 [4] -1/-1/-1->13->14 [5] 15/-1/-1->13->9 [6] -1/-1/-1->13->15 [7] 12/-1/-1->13->14 [8] 14/-1/-1->13->12 [9] 14/5/-1->13->-1 [10] 15/-1/-1->13->12 [11] 12/-1/-1->13->15 [12] -1/-1/-1->13->14 [13] 15/5/-1->13->-1 [14] -1/-1/-1->13->15 [15] 12/-1/-1->13->14 3: jzxh017:51585:53367 [3] NCCL INFO P2P Chunksize set to 131072 2: jzxh016:896067:897842 [3] NCCL INFO comm 0x14a628114540 rank 11 nRanks 16 nNodes 4 localRanks 4 localRank 3 MNNVL 0 2: jzxh016:896066:897840 [2] NCCL INFO comm 0x1498d4139700 rank 10 nRanks 16 nNodes 4 localRanks 4 localRank 2 MNNVL 0 2: jzxh016:896067:897842 [3] NCCL INFO Trees [0] -1/-1/-1->11->10 [1] 8/-1/-1->11->10 [2] -1/-1/-1->11->9 [3] 9/7/15->11->3 [4] 10/-1/-1->11->8 [5] 8/-1/-1->11->9 [6] 9/-1/-1->11->8 [7] 10/7/15->11->3 [8] -1/-1/-1->11->10 [9] 8/-1/-1->11->10 [10] -1/-1/-1->11->9 [11] 9/-1/-1->11->7 [12] 10/-1/-1->11->8 [13] 8/-1/-1->11->9 [14] 9/-1/-1->11->8 [15] 10/-1/-1->11->7 2: jzxh016:896066:897840 [2] NCCL INFO Trees [0] 11/-1/-1->10->9 [1] 11/-1/-1->10->9 [2] 8/6/14->10->2 [3] -1/-1/-1->10->8 [4] 9/-1/-1->10->11 [5] -1/-1/-1->10->8 [6] 8/6/14->10->2 [7] 9/-1/-1->10->11 [8] 11/-1/-1->10->9 [9] 11/-1/-1->10->9 [10] 8/-1/-1->10->6 [11] -1/-1/-1->10->8 [12] 9/-1/-1->10->11 [13] -1/-1/-1->10->8 [14] 8/-1/-1->10->6 [15] 9/-1/-1->10->11 1: jzxh015:2965585:2968664 [3] NCCL INFO comm 0x145b10135ed0 rank 7 nRanks 16 nNodes 4 localRanks 4 localRank 3 MNNVL 0 1: jzxh015:2965582:2968667 [0] NCCL INFO comm 0x151e3812df00 rank 4 nRanks 16 nNodes 4 localRanks 4 localRank 0 MNNVL 0 1: jzxh015:2965583:2968666 [1] NCCL INFO comm 0x14e36c136200 rank 5 nRanks 16 nNodes 4 localRanks 4 localRank 1 MNNVL 0 1: jzxh015:2965584:2968665 [2] NCCL INFO comm 0x149484114480 rank 6 nRanks 16 nNodes 4 localRanks 4 localRank 2 MNNVL 0 1: jzxh015:2965585:2968664 [3] NCCL INFO Trees [0] -1/-1/-1->7->6 [1] 4/-1/-1->7->6 [2] -1/-1/-1->7->5 [3] 5/-1/-1->7->11 [4] 6/-1/-1->7->4 [5] 4/-1/-1->7->5 [6] 5/-1/-1->7->4 [7] 6/-1/-1->7->11 [8] -1/-1/-1->7->6 [9] 4/-1/-1->7->6 [10] -1/-1/-1->7->5 [11] 5/11/3->7->15 [12] 6/-1/-1->7->4 [13] 4/-1/-1->7->5 [14] 5/-1/-1->7->4 [15] 6/11/3->7->15 1: jzxh015:2965585:2968664 [3] NCCL INFO P2P Chunksize set to 131072 0: jzxh014:2356327:2358124 [0] NCCL INFO Trees [0] 1/8/-1->0->-1 [1] -1/-1/-1->0->3 [2] 1/-1/-1->0->2 [3] 2/-1/-1->0->1 [4] 3/8/-1->0->-1 [5] 2/-1/-1->0->3 [6] 3/-1/-1->0->2 [7] -1/-1/-1->0->1 [8] 1/-1/-1->0->4 [9] -1/-1/-1->0->3 [10] 1/-1/-1->0->2 [11] 2/-1/-1->0->1 [12] 3/-1/-1->0->4 [13] 2/-1/-1->0->3 [14] 3/-1/-1->0->2 [15] -1/-1/-1->0->1 0: jzxh014:2356327:2358124 [0] NCCL INFO P2P Chunksize set to 131072 3: jzxh017:51584:53368 [2] NCCL INFO Trees [0] 15/-1/-1->14->13 [1] 15/-1/-1->14->13 [2] 12/-1/-1->14->10 [3] -1/-1/-1->14->12 [4] 13/-1/-1->14->15 [5] -1/-1/-1->14->12 [6] 12/-1/-1->14->10 [7] 13/-1/-1->14->15 [8] 15/-1/-1->14->13 [9] 15/-1/-1->14->13 [10] 12/6/-1->14->-1 [11] -1/-1/-1->14->12 [12] 13/-1/-1->14->15 [13] -1/-1/-1->14->12 [14] 12/6/-1->14->-1 [15] 13/-1/-1->14->15 3: jzxh017:51583:53369 [1] NCCL INFO P2P Chunksize set to 131072 3: jzxh017:51584:53368 [2] NCCL INFO P2P Chunksize set to 131072 1: jzxh015:2965582:2968667 [0] NCCL INFO Trees [0] 5/-1/-1->4->8 [1] -1/-1/-1->4->7 [2] 5/-1/-1->4->6 [3] 6/-1/-1->4->5 [4] 7/-1/-1->4->8 [5] 6/-1/-1->4->7 [6] 7/-1/-1->4->6 [7] -1/-1/-1->4->5 [8] 5/8/0->4->12 [9] -1/-1/-1->4->7 [10] 5/-1/-1->4->6 [11] 6/-1/-1->4->5 [12] 7/8/0->4->12 [13] 6/-1/-1->4->7 [14] 7/-1/-1->4->6 [15] -1/-1/-1->4->5 1: jzxh015:2965584:2968665 [2] NCCL INFO Trees [0] 7/-1/-1->6->5 [1] 7/-1/-1->6->5 [2] 4/-1/-1->6->10 [3] -1/-1/-1->6->4 [4] 5/-1/-1->6->7 [5] -1/-1/-1->6->4 [6] 4/-1/-1->6->10 [7] 5/-1/-1->6->7 [8] 7/-1/-1->6->5 [9] 7/-1/-1->6->5 [10] 4/10/2->6->14 [11] -1/-1/-1->6->4 [12] 5/-1/-1->6->7 [13] -1/-1/-1->6->4 [14] 4/10/2->6->14 [15] 5/-1/-1->6->7 2: jzxh016:896065:897839 [1] NCCL INFO comm 0x14f2fc1412c0 rank 9 nRanks 16 nNodes 4 localRanks 4 localRank 1 MNNVL 0 2: jzxh016:896067:897842 [3] NCCL INFO P2P Chunksize set to 131072 2: jzxh016:896066:897840 [2] NCCL INFO P2P Chunksize set to 131072 2: jzxh016:896064:897841 [0] NCCL INFO comm 0x15476812cf80 rank 8 nRanks 16 nNodes 4 localRanks 4 localRank 0 MNNVL 0 2: jzxh016:896065:897839 [1] NCCL INFO Trees [0] 10/-1/-1->9->8 [1] 10/5/13->9->1 [2] 11/-1/-1->9->8 [3] 8/-1/-1->9->11 [4] -1/-1/-1->9->10 [5] 11/5/13->9->1 [6] -1/-1/-1->9->11 [7] 8/-1/-1->9->10 [8] 10/-1/-1->9->8 [9] 10/-1/-1->9->5 [10] 11/-1/-1->9->8 [11] 8/-1/-1->9->11 [12] -1/-1/-1->9->10 [13] 11/-1/-1->9->5 [14] -1/-1/-1->9->11 [15] 8/-1/-1->9->10 2: jzxh016:896065:897839 [1] NCCL INFO P2P Chunksize set to 131072 1: jzxh015:2965583:2968666 [1] NCCL INFO Trees [0] 6/-1/-1->5->4 [1] 6/-1/-1->5->9 [2] 7/-1/-1->5->4 [3] 4/-1/-1->5->7 [4] -1/-1/-1->5->6 [5] 7/-1/-1->5->9 [6] -1/-1/-1->5->7 [7] 4/-1/-1->5->6 [8] 6/-1/-1->5->4 [9] 6/9/1->5->13 [10] 7/-1/-1->5->4 [11] 4/-1/-1->5->7 [12] -1/-1/-1->5->6 [13] 7/9/1->5->13 [14] -1/-1/-1->5->7 [15] 4/-1/-1->5->6 1: jzxh015:2965582:2968667 [0] NCCL INFO P2P Chunksize set to 131072 1: jzxh015:2965584:2968665 [2] NCCL INFO P2P Chunksize set to 131072 1: jzxh015:2965583:2968666 [1] NCCL INFO P2P Chunksize set to 131072 2: jzxh016:896064:897841 [0] NCCL INFO Trees [0] 9/4/12->8->0 [1] -1/-1/-1->8->11 [2] 9/-1/-1->8->10 [3] 10/-1/-1->8->9 [4] 11/4/12->8->0 [5] 10/-1/-1->8->11 [6] 11/-1/-1->8->10 [7] -1/-1/-1->8->9 [8] 9/-1/-1->8->4 [9] -1/-1/-1->8->11 [10] 9/-1/-1->8->10 [11] 10/-1/-1->8->9 [12] 11/-1/-1->8->4 [13] 10/-1/-1->8->11 [14] 11/-1/-1->8->10 [15] -1/-1/-1->8->9 2: jzxh016:896064:897841 [0] NCCL INFO P2P Chunksize set to 131072 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 00/0 : 2[2] -> 3[3] via P2P/CUMEM 3: jzxh017:51583:53369 [1] NCCL INFO Channel 00/0 : 13[1] -> 14[2] via P2P/CUMEM 2: jzxh016:896066:897840 [2] NCCL INFO Channel 00/0 : 10[2] -> 11[3] via P2P/CUMEM 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 04/0 : 2[2] -> 3[3] via P2P/CUMEM 0: jzxh014:2356328:2358126 [1] NCCL INFO Channel 00/0 : 1[1] -> 2[2] via P2P/CUMEM 3: jzxh017:51584:53368 [2] NCCL INFO Channel 00/0 : 14[2] -> 15[3] via P2P/CUMEM 2: jzxh016:896065:897839 [1] NCCL INFO Channel 00/0 : 9[1] -> 10[2] via P2P/CUMEM 3: jzxh017:51583:53369 [1] NCCL INFO Channel 03/0 : 13[1] -> 14[2] via P2P/CUMEM 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 08/0 : 2[2] -> 3[3] via P2P/CUMEM 3: jzxh017:51583:53369 [1] NCCL INFO Channel 04/0 : 13[1] -> 14[2] via P2P/CUMEM 2: jzxh016:896065:897839 [1] NCCL INFO Channel 03/0 : 9[1] -> 10[2] via P2P/CUMEM 0: jzxh014:2356328:2358126 [1] NCCL INFO Channel 03/0 : 1[1] -> 2[2] via P2P/CUMEM 3: jzxh017:51584:53368 [2] NCCL INFO Channel 04/0 : 14[2] -> 15[3] via P2P/CUMEM 0: jzxh014:2356328:2358126 [1] NCCL INFO Channel 04/0 : 1[1] -> 2[2] via P2P/CUMEM 3: jzxh017:51583:53369 [1] NCCL INFO Channel 07/0 : 13[1] -> 14[2] via P2P/CUMEM 2: jzxh016:896065:897839 [1] NCCL INFO Channel 04/0 : 9[1] -> 10[2] via P2P/CUMEM 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 12/0 : 2[2] -> 3[3] via P2P/CUMEM 2: jzxh016:896066:897840 [2] NCCL INFO Channel 04/0 : 10[2] -> 11[3] via P2P/CUMEM 0: jzxh014:2356328:2358126 [1] NCCL INFO Channel 07/0 : 1[1] -> 2[2] via P2P/CUMEM 3: jzxh017:51584:53368 [2] NCCL INFO Channel 08/0 : 14[2] -> 15[3] via P2P/CUMEM 2: jzxh016:896065:897839 [1] NCCL INFO Channel 07/0 : 9[1] -> 10[2] via P2P/CUMEM 3: jzxh017:51583:53369 [1] NCCL INFO Channel 08/0 : 13[1] -> 14[2] via P2P/CUMEM 2: jzxh016:896066:897840 [2] NCCL INFO Channel 08/0 : 10[2] -> 11[3] via P2P/CUMEM 2: jzxh016:896065:897839 [1] NCCL INFO Channel 08/0 : 9[1] -> 10[2] via P2P/CUMEM 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 00/0 : 5[1] -> 6[2] via P2P/CUMEM 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 00/0 : 6[2] -> 7[3] via P2P/CUMEM 0: jzxh014:2356328:2358126 [1] NCCL INFO Channel 08/0 : 1[1] -> 2[2] via P2P/CUMEM 2: jzxh016:896066:897840 [2] NCCL INFO Channel 12/0 : 10[2] -> 11[3] via P2P/CUMEM 3: jzxh017:51583:53369 [1] NCCL INFO Channel 11/0 : 13[1] -> 14[2] via P2P/CUMEM 2: jzxh016:896065:897839 [1] NCCL INFO Channel 11/0 : 9[1] -> 10[2] via P2P/CUMEM 0: jzxh014:2356328:2358126 [1] NCCL INFO Channel 11/0 : 1[1] -> 2[2] via P2P/CUMEM 3: jzxh017:51584:53368 [2] NCCL INFO Channel 12/0 : 14[2] -> 15[3] via P2P/CUMEM 2: jzxh016:896065:897839 [1] NCCL INFO Channel 12/0 : 9[1] -> 10[2] via P2P/CUMEM 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 04/0 : 6[2] -> 7[3] via P2P/CUMEM 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 03/0 : 5[1] -> 6[2] via P2P/CUMEM 3: jzxh017:51583:53369 [1] NCCL INFO Channel 12/0 : 13[1] -> 14[2] via P2P/CUMEM 0: jzxh014:2356328:2358126 [1] NCCL INFO Channel 12/0 : 1[1] -> 2[2] via P2P/CUMEM 2: jzxh016:896065:897839 [1] NCCL INFO Channel 15/0 : 9[1] -> 10[2] via P2P/CUMEM 0: jzxh014:2356328:2358126 [1] NCCL INFO Channel 15/0 : 1[1] -> 2[2] via P2P/CUMEM 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 04/0 : 5[1] -> 6[2] via P2P/CUMEM 3: jzxh017:51583:53369 [1] NCCL INFO Channel 15/0 : 13[1] -> 14[2] via P2P/CUMEM 3: jzxh017:51582:53370 [0] NCCL INFO Channel 00/0 : 11[3] -> 12[0] [receive] via NET/IB/0/GDRDMA 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 08/0 : 6[2] -> 7[3] via P2P/CUMEM 3: jzxh017:51585:53367 [3] NCCL INFO Channel 00/0 : 15[3] -> 0[0] [send] via NET/IB/0(12)/GDRDMA 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 07/0 : 5[1] -> 6[2] via P2P/CUMEM 3: jzxh017:51582:53370 [0] NCCL INFO Channel 04/0 : 11[3] -> 12[0] [receive] via NET/IB/0/GDRDMA 3: jzxh017:51585:53367 [3] NCCL INFO Channel 04/0 : 15[3] -> 0[0] [send] via NET/IB/0(12)/GDRDMA 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 12/0 : 6[2] -> 7[3] via P2P/CUMEM 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 08/0 : 5[1] -> 6[2] via P2P/CUMEM 3: jzxh017:51582:53370 [0] NCCL INFO Channel 08/0 : 11[3] -> 12[0] [receive] via NET/IB/0/GDRDMA 3: jzxh017:51585:53367 [3] NCCL INFO Channel 08/0 : 15[3] -> 0[0] [send] via NET/IB/0(12)/GDRDMA 3: jzxh017:51582:53370 [0] NCCL INFO Channel 12/0 : 11[3] -> 12[0] [receive] via NET/IB/0/GDRDMA 3: jzxh017:51582:53370 [0] NCCL INFO Channel 00/0 : 12[0] -> 13[1] via P2P/CUMEM 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 11/0 : 5[1] -> 6[2] via P2P/CUMEM 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 00/0 : 15[3] -> 0[0] [receive] via NET/IB/0/GDRDMA 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 00/0 : 3[3] -> 4[0] [send] via NET/IB/0(0)/GDRDMA 3: jzxh017:51585:53367 [3] NCCL INFO Channel 12/0 : 15[3] -> 0[0] [send] via NET/IB/0(12)/GDRDMA 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 04/0 : 15[3] -> 0[0] [receive] via NET/IB/0/GDRDMA 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 00/0 : 3[3] -> 4[0] [receive] via NET/IB/0/GDRDMA 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 00/0 : 7[3] -> 8[0] [send] via NET/IB/0(4)/GDRDMA 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 12/0 : 5[1] -> 6[2] via P2P/CUMEM 3: jzxh017:51582:53370 [0] NCCL INFO Channel 03/0 : 12[0] -> 13[1] via P2P/CUMEM 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 04/0 : 3[3] -> 4[0] [receive] via NET/IB/0/GDRDMA 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 04/0 : 7[3] -> 8[0] [send] via NET/IB/0(4)/GDRDMA 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 04/0 : 3[3] -> 4[0] [send] via NET/IB/0(0)/GDRDMA 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 08/0 : 15[3] -> 0[0] [receive] via NET/IB/0/GDRDMA 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 08/0 : 3[3] -> 4[0] [send] via NET/IB/0(0)/GDRDMA 3: jzxh017:51582:53370 [0] NCCL INFO Channel 04/0 : 12[0] -> 13[1] via P2P/CUMEM 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 08/0 : 3[3] -> 4[0] [receive] via NET/IB/0/GDRDMA 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 15/0 : 5[1] -> 6[2] via P2P/CUMEM 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 12/0 : 15[3] -> 0[0] [receive] via NET/IB/0/GDRDMA 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 08/0 : 7[3] -> 8[0] [send] via NET/IB/0(4)/GDRDMA 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] via P2P/CUMEM 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 12/0 : 3[3] -> 4[0] [send] via NET/IB/0(0)/GDRDMA 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 12/0 : 3[3] -> 4[0] [receive] via NET/IB/0/GDRDMA 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 00/0 : 4[0] -> 5[1] via P2P/CUMEM 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 12/0 : 7[3] -> 8[0] [send] via NET/IB/0(4)/GDRDMA 3: jzxh017:51582:53370 [0] NCCL INFO Channel 07/0 : 12[0] -> 13[1] via P2P/CUMEM 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 03/0 : 0[0] -> 1[1] via P2P/CUMEM 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 03/0 : 4[0] -> 5[1] via P2P/CUMEM 3: jzxh017:51582:53370 [0] NCCL INFO Channel 08/0 : 12[0] -> 13[1] via P2P/CUMEM 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 04/0 : 4[0] -> 5[1] via P2P/CUMEM 3: jzxh017:51582:53370 [0] NCCL INFO Channel 11/0 : 12[0] -> 13[1] via P2P/CUMEM 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 04/0 : 0[0] -> 1[1] via P2P/CUMEM 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 07/0 : 0[0] -> 1[1] via P2P/CUMEM 3: jzxh017:51582:53370 [0] NCCL INFO Channel 12/0 : 12[0] -> 13[1] via P2P/CUMEM 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 07/0 : 4[0] -> 5[1] via P2P/CUMEM 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 08/0 : 4[0] -> 5[1] via P2P/CUMEM 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 08/0 : 0[0] -> 1[1] via P2P/CUMEM 3: jzxh017:51582:53370 [0] NCCL INFO Channel 15/0 : 12[0] -> 13[1] via P2P/CUMEM 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 11/0 : 4[0] -> 5[1] via P2P/CUMEM 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 11/0 : 0[0] -> 1[1] via P2P/CUMEM 2: jzxh016:896064:897841 [0] NCCL INFO Channel 00/0 : 7[3] -> 8[0] [receive] via NET/IB/0/GDRDMA 2: jzxh016:896067:897842 [3] NCCL INFO Channel 00/0 : 11[3] -> 12[0] [send] via NET/IB/0(8)/GDRDMA 2: jzxh016:896064:897841 [0] NCCL INFO Channel 04/0 : 7[3] -> 8[0] [receive] via NET/IB/0/GDRDMA 2: jzxh016:896067:897842 [3] NCCL INFO Channel 04/0 : 11[3] -> 12[0] [send] via NET/IB/0(8)/GDRDMA 2: jzxh016:896064:897841 [0] NCCL INFO Channel 08/0 : 7[3] -> 8[0] [receive] via NET/IB/0/GDRDMA 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 12/0 : 4[0] -> 5[1] via P2P/CUMEM 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 12/0 : 0[0] -> 1[1] via P2P/CUMEM 2: jzxh016:896067:897842 [3] NCCL INFO Channel 08/0 : 11[3] -> 12[0] [send] via NET/IB/0(8)/GDRDMA 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 15/0 : 4[0] -> 5[1] via P2P/CUMEM 2: jzxh016:896064:897841 [0] NCCL INFO Channel 12/0 : 7[3] -> 8[0] [receive] via NET/IB/0/GDRDMA 2: jzxh016:896064:897841 [0] NCCL INFO Channel 00/0 : 8[0] -> 9[1] via P2P/CUMEM 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 15/0 : 0[0] -> 1[1] via P2P/CUMEM 3: jzxh017:51582:53370 [0] NCCL INFO Channel 01/0 : 12[0] -> 15[3] via P2P/CUMEM 2: jzxh016:896067:897842 [3] NCCL INFO Channel 12/0 : 11[3] -> 12[0] [send] via NET/IB/0(8)/GDRDMA 2: jzxh016:896064:897841 [0] NCCL INFO Channel 03/0 : 8[0] -> 9[1] via P2P/CUMEM 3: jzxh017:51582:53370 [0] NCCL INFO Channel 02/0 : 12[0] -> 15[3] via P2P/CUMEM 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 01/0 : 4[0] -> 7[3] via P2P/CUMEM 3: jzxh017:51582:53370 [0] NCCL INFO Channel 05/0 : 12[0] -> 15[3] via P2P/CUMEM 2: jzxh016:896064:897841 [0] NCCL INFO Channel 04/0 : 8[0] -> 9[1] via P2P/CUMEM 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 01/0 : 0[0] -> 3[3] via P2P/CUMEM 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 02/0 : 4[0] -> 7[3] via P2P/CUMEM 2: jzxh016:896064:897841 [0] NCCL INFO Channel 07/0 : 8[0] -> 9[1] via P2P/CUMEM 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 02/0 : 0[0] -> 3[3] via P2P/CUMEM 3: jzxh017:51582:53370 [0] NCCL INFO Channel 06/0 : 12[0] -> 15[3] via P2P/CUMEM 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 05/0 : 4[0] -> 7[3] via P2P/CUMEM 2: jzxh016:896064:897841 [0] NCCL INFO Channel 08/0 : 8[0] -> 9[1] via P2P/CUMEM 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 06/0 : 4[0] -> 7[3] via P2P/CUMEM 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 05/0 : 0[0] -> 3[3] via P2P/CUMEM 3: jzxh017:51582:53370 [0] NCCL INFO Channel 09/0 : 12[0] -> 15[3] via P2P/CUMEM 2: jzxh016:896064:897841 [0] NCCL INFO Channel 11/0 : 8[0] -> 9[1] via P2P/CUMEM 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 09/0 : 4[0] -> 7[3] via P2P/CUMEM 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 06/0 : 0[0] -> 3[3] via P2P/CUMEM 3: jzxh017:51582:53370 [0] NCCL INFO Channel 10/0 : 12[0] -> 15[3] via P2P/CUMEM 2: jzxh016:896064:897841 [0] NCCL INFO Channel 12/0 : 8[0] -> 9[1] via P2P/CUMEM 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 10/0 : 4[0] -> 7[3] via P2P/CUMEM 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 09/0 : 0[0] -> 3[3] via P2P/CUMEM 3: jzxh017:51582:53370 [0] NCCL INFO Channel 13/0 : 12[0] -> 15[3] via P2P/CUMEM 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 13/0 : 4[0] -> 7[3] via P2P/CUMEM 2: jzxh016:896064:897841 [0] NCCL INFO Channel 15/0 : 8[0] -> 9[1] via P2P/CUMEM 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 14/0 : 4[0] -> 7[3] via P2P/CUMEM 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 10/0 : 0[0] -> 3[3] via P2P/CUMEM 3: jzxh017:51582:53370 [0] NCCL INFO Channel 14/0 : 12[0] -> 15[3] via P2P/CUMEM 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 13/0 : 0[0] -> 3[3] via P2P/CUMEM 2: jzxh016:896064:897841 [0] NCCL INFO Channel 01/0 : 8[0] -> 11[3] via P2P/CUMEM 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 14/0 : 0[0] -> 3[3] via P2P/CUMEM 2: jzxh016:896064:897841 [0] NCCL INFO Channel 02/0 : 8[0] -> 11[3] via P2P/CUMEM 2: jzxh016:896064:897841 [0] NCCL INFO Channel 05/0 : 8[0] -> 11[3] via P2P/CUMEM 2: jzxh016:896064:897841 [0] NCCL INFO Channel 06/0 : 8[0] -> 11[3] via P2P/CUMEM 3: jzxh017:51584:53368 [2] NCCL INFO Channel 02/0 : 11[3] -> 14[2] [receive] via NET/IB/2/GDRDMA 3: jzxh017:51585:53367 [3] NCCL INFO Channel 02/0 : 15[3] -> 2[2] [send] via NET/IB/2(14)/GDRDMA 2: jzxh016:896064:897841 [0] NCCL INFO Channel 09/0 : 8[0] -> 11[3] via P2P/CUMEM 3: jzxh017:51584:53368 [2] NCCL INFO Channel 06/0 : 11[3] -> 14[2] [receive] via NET/IB/2/GDRDMA 3: jzxh017:51585:53367 [3] NCCL INFO Channel 06/0 : 15[3] -> 2[2] [send] via NET/IB/2(14)/GDRDMA 3: jzxh017:51584:53368 [2] NCCL INFO Channel 10/0 : 11[3] -> 14[2] [receive] via NET/IB/2/GDRDMA 3: jzxh017:51585:53367 [3] NCCL INFO Channel 10/0 : 15[3] -> 2[2] [send] via NET/IB/2(14)/GDRDMA 2: jzxh016:896064:897841 [0] NCCL INFO Channel 10/0 : 8[0] -> 11[3] via P2P/CUMEM 2: jzxh016:896064:897841 [0] NCCL INFO Channel 13/0 : 8[0] -> 11[3] via P2P/CUMEM 3: jzxh017:51584:53368 [2] NCCL INFO Channel 14/0 : 11[3] -> 14[2] [receive] via NET/IB/2/GDRDMA 2: jzxh016:896064:897841 [0] NCCL INFO Channel 14/0 : 8[0] -> 11[3] via P2P/CUMEM 3: jzxh017:51585:53367 [3] NCCL INFO Channel 14/0 : 15[3] -> 2[2] [send] via NET/IB/2(14)/GDRDMA 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 02/0 : 3[3] -> 6[2] [receive] via NET/IB/2/GDRDMA 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 06/0 : 3[3] -> 6[2] [receive] via NET/IB/2/GDRDMA 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 10/0 : 3[3] -> 6[2] [receive] via NET/IB/2/GDRDMA 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 14/0 : 3[3] -> 6[2] [receive] via NET/IB/2/GDRDMA 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 02/0 : 7[3] -> 10[2] [send] via NET/IB/2(6)/GDRDMA 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 06/0 : 7[3] -> 10[2] [send] via NET/IB/2(6)/GDRDMA 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 10/0 : 7[3] -> 10[2] [send] via NET/IB/2(6)/GDRDMA 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 14/0 : 7[3] -> 10[2] [send] via NET/IB/2(6)/GDRDMA 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 02/0 : 15[3] -> 2[2] [receive] via NET/IB/2/GDRDMA 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 02/0 : 3[3] -> 6[2] [send] via NET/IB/2(2)/GDRDMA 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 06/0 : 15[3] -> 2[2] [receive] via NET/IB/2/GDRDMA 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 06/0 : 3[3] -> 6[2] [send] via NET/IB/2(2)/GDRDMA 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 10/0 : 15[3] -> 2[2] [receive] via NET/IB/2/GDRDMA 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 10/0 : 3[3] -> 6[2] [send] via NET/IB/2(2)/GDRDMA 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 14/0 : 15[3] -> 2[2] [receive] via NET/IB/2/GDRDMA 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 14/0 : 3[3] -> 6[2] [send] via NET/IB/2(2)/GDRDMA 2: jzxh016:896066:897840 [2] NCCL INFO Channel 02/0 : 7[3] -> 10[2] [receive] via NET/IB/2/GDRDMA 2: jzxh016:896067:897842 [3] NCCL INFO Channel 02/0 : 11[3] -> 14[2] [send] via NET/IB/2(10)/GDRDMA 2: jzxh016:896066:897840 [2] NCCL INFO Channel 06/0 : 7[3] -> 10[2] [receive] via NET/IB/2/GDRDMA 2: jzxh016:896067:897842 [3] NCCL INFO Channel 06/0 : 11[3] -> 14[2] [send] via NET/IB/2(10)/GDRDMA 2: jzxh016:896066:897840 [2] NCCL INFO Channel 10/0 : 7[3] -> 10[2] [receive] via NET/IB/2/GDRDMA 2: jzxh016:896067:897842 [3] NCCL INFO Channel 10/0 : 11[3] -> 14[2] [send] via NET/IB/2(10)/GDRDMA 2: jzxh016:896066:897840 [2] NCCL INFO Channel 14/0 : 7[3] -> 10[2] [receive] via NET/IB/2/GDRDMA 2: jzxh016:896067:897842 [3] NCCL INFO Channel 14/0 : 11[3] -> 14[2] [send] via NET/IB/2(10)/GDRDMA 3: jzxh017:51583:53369 [1] NCCL INFO Channel 01/0 : 10[2] -> 13[1] [receive] via NET/IB/1/GDRDMA 3: jzxh017:51584:53368 [2] NCCL INFO Channel 01/0 : 14[2] -> 1[1] [send] via NET/IB/1(13)/GDRDMA 3: jzxh017:51583:53369 [1] NCCL INFO Channel 05/0 : 10[2] -> 13[1] [receive] via NET/IB/1/GDRDMA 3: jzxh017:51584:53368 [2] NCCL INFO Channel 05/0 : 14[2] -> 1[1] [send] via NET/IB/1(13)/GDRDMA 3: jzxh017:51583:53369 [1] NCCL INFO Channel 09/0 : 10[2] -> 13[1] [receive] via NET/IB/1/GDRDMA 3: jzxh017:51584:53368 [2] NCCL INFO Channel 09/0 : 14[2] -> 1[1] [send] via NET/IB/1(13)/GDRDMA 3: jzxh017:51583:53369 [1] NCCL INFO Channel 13/0 : 10[2] -> 13[1] [receive] via NET/IB/1/GDRDMA 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 01/0 : 2[2] -> 5[1] [receive] via NET/IB/1/GDRDMA 3: jzxh017:51584:53368 [2] NCCL INFO Channel 13/0 : 14[2] -> 1[1] [send] via NET/IB/1(13)/GDRDMA 0: jzxh014:2356328:2358126 [1] NCCL INFO Channel 01/0 : 14[2] -> 1[1] [receive] via NET/IB/1/GDRDMA 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 01/0 : 6[2] -> 9[1] [send] via NET/IB/1(5)/GDRDMA 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 01/0 : 2[2] -> 5[1] [send] via NET/IB/1(1)/GDRDMA 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 05/0 : 2[2] -> 5[1] [receive] via NET/IB/1/GDRDMA 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 05/0 : 6[2] -> 9[1] [send] via NET/IB/1(5)/GDRDMA 0: jzxh014:2356328:2358126 [1] NCCL INFO Channel 05/0 : 14[2] -> 1[1] [receive] via NET/IB/1/GDRDMA 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 09/0 : 2[2] -> 5[1] [receive] via NET/IB/1/GDRDMA 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 09/0 : 6[2] -> 9[1] [send] via NET/IB/1(5)/GDRDMA 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 05/0 : 2[2] -> 5[1] [send] via NET/IB/1(1)/GDRDMA 0: jzxh014:2356328:2358126 [1] NCCL INFO Channel 09/0 : 14[2] -> 1[1] [receive] via NET/IB/1/GDRDMA 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 09/0 : 2[2] -> 5[1] [send] via NET/IB/1(1)/GDRDMA 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 13/0 : 2[2] -> 5[1] [receive] via NET/IB/1/GDRDMA 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 13/0 : 6[2] -> 9[1] [send] via NET/IB/1(5)/GDRDMA 0: jzxh014:2356328:2358126 [1] NCCL INFO Channel 13/0 : 14[2] -> 1[1] [receive] via NET/IB/1/GDRDMA 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 13/0 : 2[2] -> 5[1] [send] via NET/IB/1(1)/GDRDMA 0: jzxh014:2356328:2358126 [1] NCCL INFO Channel 01/0 : 1[1] -> 0[0] via P2P/CUMEM 0: jzxh014:2356328:2358126 [1] NCCL INFO Channel 02/0 : 1[1] -> 0[0] via P2P/CUMEM 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 01/0 : 5[1] -> 4[0] via P2P/CUMEM 0: jzxh014:2356328:2358126 [1] NCCL INFO Channel 05/0 : 1[1] -> 0[0] via P2P/CUMEM 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 02/0 : 5[1] -> 4[0] via P2P/CUMEM 0: jzxh014:2356328:2358126 [1] NCCL INFO Channel 06/0 : 1[1] -> 0[0] via P2P/CUMEM 0: jzxh014:2356328:2358126 [1] NCCL INFO Channel 09/0 : 1[1] -> 0[0] via P2P/CUMEM 2: jzxh016:896065:897839 [1] NCCL INFO Channel 01/0 : 6[2] -> 9[1] [receive] via NET/IB/1/GDRDMA 2: jzxh016:896066:897840 [2] NCCL INFO Channel 01/0 : 10[2] -> 13[1] [send] via NET/IB/1(9)/GDRDMA 2: jzxh016:896065:897839 [1] NCCL INFO Channel 05/0 : 6[2] -> 9[1] [receive] via NET/IB/1/GDRDMA 2: jzxh016:896066:897840 [2] NCCL INFO Channel 05/0 : 10[2] -> 13[1] [send] via NET/IB/1(9)/GDRDMA 2: jzxh016:896065:897839 [1] NCCL INFO Channel 09/0 : 6[2] -> 9[1] [receive] via NET/IB/1/GDRDMA 2: jzxh016:896066:897840 [2] NCCL INFO Channel 09/0 : 10[2] -> 13[1] [send] via NET/IB/1(9)/GDRDMA 0: jzxh014:2356328:2358126 [1] NCCL INFO Channel 10/0 : 1[1] -> 0[0] via P2P/CUMEM 0: jzxh014:2356328:2358126 [1] NCCL INFO Channel 13/0 : 1[1] -> 0[0] via P2P/CUMEM 2: jzxh016:896065:897839 [1] NCCL INFO Channel 13/0 : 6[2] -> 9[1] [receive] via NET/IB/1/GDRDMA 2: jzxh016:896066:897840 [2] NCCL INFO Channel 13/0 : 10[2] -> 13[1] [send] via NET/IB/1(9)/GDRDMA 0: jzxh014:2356328:2358126 [1] NCCL INFO Channel 14/0 : 1[1] -> 0[0] via P2P/CUMEM 2: jzxh016:896065:897839 [1] NCCL INFO Channel 01/0 : 9[1] -> 8[0] via P2P/CUMEM 3: jzxh017:51583:53369 [1] NCCL INFO Channel 01/0 : 13[1] -> 12[0] via P2P/CUMEM 2: jzxh016:896065:897839 [1] NCCL INFO Channel 02/0 : 9[1] -> 8[0] via P2P/CUMEM 3: jzxh017:51583:53369 [1] NCCL INFO Channel 02/0 : 13[1] -> 12[0] via P2P/CUMEM 2: jzxh016:896065:897839 [1] NCCL INFO Channel 05/0 : 9[1] -> 8[0] via P2P/CUMEM 3: jzxh017:51583:53369 [1] NCCL INFO Channel 05/0 : 13[1] -> 12[0] via P2P/CUMEM 2: jzxh016:896065:897839 [1] NCCL INFO Channel 06/0 : 9[1] -> 8[0] via P2P/CUMEM 3: jzxh017:51583:53369 [1] NCCL INFO Channel 06/0 : 13[1] -> 12[0] via P2P/CUMEM 2: jzxh016:896065:897839 [1] NCCL INFO Channel 09/0 : 9[1] -> 8[0] via P2P/CUMEM 3: jzxh017:51583:53369 [1] NCCL INFO Channel 09/0 : 13[1] -> 12[0] via P2P/CUMEM 3: jzxh017:51583:53369 [1] NCCL INFO Channel 10/0 : 13[1] -> 12[0] via P2P/CUMEM 2: jzxh016:896065:897839 [1] NCCL INFO Channel 10/0 : 9[1] -> 8[0] via P2P/CUMEM 2: jzxh016:896065:897839 [1] NCCL INFO Channel 13/0 : 9[1] -> 8[0] via P2P/CUMEM 3: jzxh017:51583:53369 [1] NCCL INFO Channel 13/0 : 13[1] -> 12[0] via P2P/CUMEM 3: jzxh017:51583:53369 [1] NCCL INFO Channel 14/0 : 13[1] -> 12[0] via P2P/CUMEM 2: jzxh016:896065:897839 [1] NCCL INFO Channel 14/0 : 9[1] -> 8[0] via P2P/CUMEM 2: jzxh016:896067:897842 [3] NCCL INFO Channel 03/0 : 6[2] -> 11[3] [receive] via NET/IB/3/GDRDMA 2: jzxh016:896066:897840 [2] NCCL INFO Channel 03/0 : 10[2] -> 15[3] [send] via NET/IB/3(11)/GDRDMA 2: jzxh016:896067:897842 [3] NCCL INFO Channel 07/0 : 6[2] -> 11[3] [receive] via NET/IB/3/GDRDMA 2: jzxh016:896066:897840 [2] NCCL INFO Channel 07/0 : 10[2] -> 15[3] [send] via NET/IB/3(11)/GDRDMA 2: jzxh016:896067:897842 [3] NCCL INFO Channel 11/0 : 6[2] -> 11[3] [receive] via NET/IB/3/GDRDMA 2: jzxh016:896066:897840 [2] NCCL INFO Channel 11/0 : 10[2] -> 15[3] [send] via NET/IB/3(11)/GDRDMA 2: jzxh016:896067:897842 [3] NCCL INFO Channel 15/0 : 6[2] -> 11[3] [receive] via NET/IB/3/GDRDMA 2: jzxh016:896066:897840 [2] NCCL INFO Channel 15/0 : 10[2] -> 15[3] [send] via NET/IB/3(11)/GDRDMA 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 03/0 : 14[2] -> 3[3] [receive] via NET/IB/3/GDRDMA 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 03/0 : 2[2] -> 7[3] [send] via NET/IB/3(3)/GDRDMA 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 07/0 : 14[2] -> 3[3] [receive] via NET/IB/3/GDRDMA 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 07/0 : 2[2] -> 7[3] [send] via NET/IB/3(3)/GDRDMA 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 11/0 : 14[2] -> 3[3] [receive] via NET/IB/3/GDRDMA 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 11/0 : 2[2] -> 7[3] [send] via NET/IB/3(3)/GDRDMA 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 15/0 : 14[2] -> 3[3] [receive] via NET/IB/3/GDRDMA 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 15/0 : 2[2] -> 7[3] [send] via NET/IB/3(3)/GDRDMA 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 03/0 : 3[3] -> 0[0] via P2P/CUMEM 3: jzxh017:51585:53367 [3] NCCL INFO Channel 03/0 : 10[2] -> 15[3] [receive] via NET/IB/3/GDRDMA 3: jzxh017:51584:53368 [2] NCCL INFO Channel 03/0 : 14[2] -> 3[3] [send] via NET/IB/3(15)/GDRDMA 3: jzxh017:51585:53367 [3] NCCL INFO Channel 07/0 : 10[2] -> 15[3] [receive] via NET/IB/3/GDRDMA 3: jzxh017:51584:53368 [2] NCCL INFO Channel 07/0 : 14[2] -> 3[3] [send] via NET/IB/3(15)/GDRDMA 3: jzxh017:51585:53367 [3] NCCL INFO Channel 11/0 : 10[2] -> 15[3] [receive] via NET/IB/3/GDRDMA 3: jzxh017:51584:53368 [2] NCCL INFO Channel 11/0 : 14[2] -> 3[3] [send] via NET/IB/3(15)/GDRDMA 3: jzxh017:51585:53367 [3] NCCL INFO Channel 15/0 : 10[2] -> 15[3] [receive] via NET/IB/3/GDRDMA 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 07/0 : 3[3] -> 0[0] via P2P/CUMEM 3: jzxh017:51584:53368 [2] NCCL INFO Channel 15/0 : 14[2] -> 3[3] [send] via NET/IB/3(15)/GDRDMA 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 05/0 : 5[1] -> 4[0] via P2P/CUMEM 3: jzxh017:51585:53367 [3] NCCL INFO Channel 03/0 : 15[3] -> 12[0] via P2P/CUMEM 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 11/0 : 3[3] -> 0[0] via P2P/CUMEM 3: jzxh017:51585:53367 [3] NCCL INFO Channel 07/0 : 15[3] -> 12[0] via P2P/CUMEM 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 15/0 : 3[3] -> 0[0] via P2P/CUMEM 3: jzxh017:51585:53367 [3] NCCL INFO Channel 11/0 : 15[3] -> 12[0] via P2P/CUMEM 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 01/0 : 3[3] -> 2[2] via P2P/CUMEM 2: jzxh016:896066:897840 [2] NCCL INFO Channel 02/0 : 10[2] -> 9[1] via P2P/CUMEM 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 03/0 : 6[2] -> 11[3] [send] via NET/IB/3(7)/GDRDMA 2: jzxh016:896066:897840 [2] NCCL INFO Channel 06/0 : 10[2] -> 9[1] via P2P/CUMEM 3: jzxh017:51585:53367 [3] NCCL INFO Channel 15/0 : 15[3] -> 12[0] via P2P/CUMEM 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 03/0 : 2[2] -> 7[3] [receive] via NET/IB/3/GDRDMA 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 07/0 : 6[2] -> 11[3] [send] via NET/IB/3(7)/GDRDMA 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 07/0 : 2[2] -> 7[3] [receive] via NET/IB/3/GDRDMA 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 11/0 : 6[2] -> 11[3] [send] via NET/IB/3(7)/GDRDMA 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 05/0 : 3[3] -> 2[2] via P2P/CUMEM 2: jzxh016:896066:897840 [2] NCCL INFO Channel 10/0 : 10[2] -> 9[1] via P2P/CUMEM 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 11/0 : 2[2] -> 7[3] [receive] via NET/IB/3/GDRDMA 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 15/0 : 6[2] -> 11[3] [send] via NET/IB/3(7)/GDRDMA 3: jzxh017:51584:53368 [2] NCCL INFO Channel 02/0 : 14[2] -> 13[1] via P2P/CUMEM 2: jzxh016:896067:897842 [3] NCCL INFO Channel 03/0 : 11[3] -> 8[0] via P2P/CUMEM 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 15/0 : 2[2] -> 7[3] [receive] via NET/IB/3/GDRDMA 3: jzxh017:51585:53367 [3] NCCL INFO Channel 01/0 : 15[3] -> 14[2] via P2P/CUMEM 2: jzxh016:896066:897840 [2] NCCL INFO Channel 14/0 : 10[2] -> 9[1] via P2P/CUMEM 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 03/0 : 7[3] -> 4[0] via P2P/CUMEM 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 09/0 : 3[3] -> 2[2] via P2P/CUMEM 3: jzxh017:51584:53368 [2] NCCL INFO Channel 06/0 : 14[2] -> 13[1] via P2P/CUMEM 2: jzxh016:896067:897842 [3] NCCL INFO Channel 07/0 : 11[3] -> 8[0] via P2P/CUMEM 3: jzxh017:51585:53367 [3] NCCL INFO Channel 05/0 : 15[3] -> 14[2] via P2P/CUMEM 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 06/0 : 5[1] -> 4[0] via P2P/CUMEM 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 13/0 : 3[3] -> 2[2] via P2P/CUMEM 2: jzxh016:896067:897842 [3] NCCL INFO Channel 11/0 : 11[3] -> 8[0] via P2P/CUMEM 3: jzxh017:51584:53368 [2] NCCL INFO Channel 10/0 : 14[2] -> 13[1] via P2P/CUMEM 3: jzxh017:51585:53367 [3] NCCL INFO Channel 09/0 : 15[3] -> 14[2] via P2P/CUMEM 2: jzxh016:896067:897842 [3] NCCL INFO Channel 15/0 : 11[3] -> 8[0] via P2P/CUMEM 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 09/0 : 5[1] -> 4[0] via P2P/CUMEM 3: jzxh017:51584:53368 [2] NCCL INFO Channel 14/0 : 14[2] -> 13[1] via P2P/CUMEM 3: jzxh017:51585:53367 [3] NCCL INFO Channel 13/0 : 15[3] -> 14[2] via P2P/CUMEM 2: jzxh016:896067:897842 [3] NCCL INFO Channel 01/0 : 11[3] -> 10[2] via P2P/CUMEM 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 07/0 : 7[3] -> 4[0] via P2P/CUMEM 2: jzxh016:896067:897842 [3] NCCL INFO Channel 05/0 : 11[3] -> 10[2] via P2P/CUMEM 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 02/0 : 2[2] -> 1[1] via P2P/CUMEM 2: jzxh016:896067:897842 [3] NCCL INFO Channel 09/0 : 11[3] -> 10[2] via P2P/CUMEM 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 06/0 : 2[2] -> 1[1] via P2P/CUMEM 2: jzxh016:896067:897842 [3] NCCL INFO Channel 13/0 : 11[3] -> 10[2] via P2P/CUMEM 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 10/0 : 5[1] -> 4[0] via P2P/CUMEM 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 10/0 : 2[2] -> 1[1] via P2P/CUMEM 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 14/0 : 2[2] -> 1[1] via P2P/CUMEM 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 11/0 : 7[3] -> 4[0] via P2P/CUMEM 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 13/0 : 5[1] -> 4[0] via P2P/CUMEM 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 15/0 : 7[3] -> 4[0] via P2P/CUMEM 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 02/0 : 6[2] -> 5[1] via P2P/CUMEM 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 14/0 : 5[1] -> 4[0] via P2P/CUMEM 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 01/0 : 7[3] -> 6[2] via P2P/CUMEM 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 06/0 : 6[2] -> 5[1] via P2P/CUMEM 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 05/0 : 7[3] -> 6[2] via P2P/CUMEM 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 10/0 : 6[2] -> 5[1] via P2P/CUMEM 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 09/0 : 7[3] -> 6[2] via P2P/CUMEM 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 14/0 : 6[2] -> 5[1] via P2P/CUMEM 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 13/0 : 7[3] -> 6[2] via P2P/CUMEM 0: jzxh014:2356330:2358127 [3] NCCL INFO Connected all rings 3: jzxh017:51584:53368 [2] NCCL INFO Connected all rings 2: jzxh016:896066:897840 [2] NCCL INFO Connected all rings 2: jzxh016:896064:897841 [0] NCCL INFO Connected all rings 2: jzxh016:896065:897839 [1] NCCL INFO Connected all rings 2: jzxh016:896067:897842 [3] NCCL INFO Connected all rings 1: jzxh015:2965584:2968665 [2] NCCL INFO Connected all rings 1: jzxh015:2965582:2968667 [0] NCCL INFO Connected all rings 1: jzxh015:2965585:2968664 [3] NCCL INFO Connected all rings 0: jzxh014:2356329:2358125 [2] NCCL INFO Connected all rings 1: jzxh015:2965583:2968666 [1] NCCL INFO Connected all rings 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 02/0 : 4[0] -> 5[1] via P2P/CUMEM 3: jzxh017:51585:53367 [3] NCCL INFO Connected all rings 2: jzxh016:896064:897841 [0] NCCL INFO Channel 02/0 : 8[0] -> 9[1] via P2P/CUMEM 3: jzxh017:51582:53370 [0] NCCL INFO Connected all rings 3: jzxh017:51583:53369 [1] NCCL INFO Connected all rings 3: jzxh017:51582:53370 [0] NCCL INFO Channel 02/0 : 12[0] -> 13[1] via P2P/CUMEM 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 01/0 : 2[2] -> 3[3] via P2P/CUMEM 3: jzxh017:51582:53370 [0] NCCL INFO Channel 10/0 : 12[0] -> 13[1] via P2P/CUMEM 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 10/0 : 4[0] -> 5[1] via P2P/CUMEM 2: jzxh016:896064:897841 [0] NCCL INFO Channel 10/0 : 8[0] -> 9[1] via P2P/CUMEM 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 07/0 : 2[2] -> 3[3] via P2P/CUMEM 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 01/0 : 6[2] -> 7[3] via P2P/CUMEM 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 09/0 : 2[2] -> 3[3] via P2P/CUMEM 2: jzxh016:896065:897839 [1] NCCL INFO Channel 01/0 : 9[1] -> 10[2] via P2P/CUMEM 3: jzxh017:51584:53368 [2] NCCL INFO Channel 01/0 : 14[2] -> 15[3] via P2P/CUMEM 3: jzxh017:51583:53369 [1] NCCL INFO Channel 01/0 : 13[1] -> 14[2] via P2P/CUMEM 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 01/0 : 5[1] -> 6[2] via P2P/CUMEM 0: jzxh014:2356327:2358124 [0] NCCL INFO Connected all rings 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 15/0 : 2[2] -> 3[3] via P2P/CUMEM 0: jzxh014:2356328:2358126 [1] NCCL INFO Connected all rings 2: jzxh016:896066:897840 [2] NCCL INFO Channel 01/0 : 10[2] -> 11[3] via P2P/CUMEM 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 02/0 : 0[0] -> 1[1] via P2P/CUMEM 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 10/0 : 0[0] -> 1[1] via P2P/CUMEM 3: jzxh017:51583:53369 [1] NCCL INFO Channel 09/0 : 13[1] -> 14[2] via P2P/CUMEM 2: jzxh016:896066:897840 [2] NCCL INFO Channel 07/0 : 10[2] -> 11[3] via P2P/CUMEM 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 07/0 : 6[2] -> 7[3] via P2P/CUMEM 0: jzxh014:2356328:2358126 [1] NCCL INFO Channel 01/0 : 1[1] -> 2[2] via P2P/CUMEM 2: jzxh016:896065:897839 [1] NCCL INFO Channel 09/0 : 9[1] -> 10[2] via P2P/CUMEM 0: jzxh014:2356328:2358126 [1] NCCL INFO Channel 09/0 : 1[1] -> 2[2] via P2P/CUMEM 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 02/0 : 0[0] -> 2[2] via P2P/CUMEM 0: jzxh014:2356328:2358126 [1] NCCL INFO Channel 02/0 : 1[1] -> 3[3] via P2P/CUMEM 0: jzxh014:2356328:2358126 [1] NCCL INFO Channel 03/0 : 1[1] -> 3[3] via P2P/CUMEM 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 03/0 : 0[0] -> 2[2] via P2P/CUMEM 3: jzxh017:51584:53368 [2] NCCL INFO Channel 07/0 : 14[2] -> 15[3] via P2P/CUMEM 2: jzxh016:896066:897840 [2] NCCL INFO Channel 09/0 : 10[2] -> 11[3] via P2P/CUMEM 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 09/0 : 5[1] -> 6[2] via P2P/CUMEM 3: jzxh017:51584:53368 [2] NCCL INFO Channel 09/0 : 14[2] -> 15[3] via P2P/CUMEM 2: jzxh016:896064:897841 [0] NCCL INFO Channel 02/0 : 8[0] -> 10[2] via P2P/CUMEM 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 09/0 : 6[2] -> 7[3] via P2P/CUMEM 3: jzxh017:51582:53370 [0] NCCL INFO Channel 02/0 : 12[0] -> 14[2] via P2P/CUMEM 3: jzxh017:51584:53368 [2] NCCL INFO Channel 15/0 : 14[2] -> 15[3] via P2P/CUMEM 2: jzxh016:896066:897840 [2] NCCL INFO Channel 15/0 : 10[2] -> 11[3] via P2P/CUMEM 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 15/0 : 6[2] -> 7[3] via P2P/CUMEM 3: jzxh017:51582:53370 [0] NCCL INFO Channel 03/0 : 12[0] -> 14[2] via P2P/CUMEM 2: jzxh016:896064:897841 [0] NCCL INFO Channel 03/0 : 8[0] -> 10[2] via P2P/CUMEM 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 02/0 : 4[0] -> 6[2] via P2P/CUMEM 3: jzxh017:51583:53369 [1] NCCL INFO Channel 02/0 : 13[1] -> 15[3] via P2P/CUMEM 2: jzxh016:896064:897841 [0] NCCL INFO Channel 05/0 : 8[0] -> 10[2] via P2P/CUMEM 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 02/0 : 5[1] -> 7[3] via P2P/CUMEM 3: jzxh017:51582:53370 [0] NCCL INFO Channel 05/0 : 12[0] -> 14[2] via P2P/CUMEM 2: jzxh016:896065:897839 [1] NCCL INFO Channel 02/0 : 9[1] -> 11[3] via P2P/CUMEM 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 03/0 : 4[0] -> 6[2] via P2P/CUMEM 3: jzxh017:51582:53370 [0] NCCL INFO Channel 06/0 : 12[0] -> 14[2] via P2P/CUMEM 2: jzxh016:896065:897839 [1] NCCL INFO Channel 03/0 : 9[1] -> 11[3] via P2P/CUMEM 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 05/0 : 4[0] -> 6[2] via P2P/CUMEM 3: jzxh017:51583:53369 [1] NCCL INFO Channel 03/0 : 13[1] -> 15[3] via P2P/CUMEM 2: jzxh016:896064:897841 [0] NCCL INFO Channel 06/0 : 8[0] -> 10[2] via P2P/CUMEM 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 03/0 : 5[1] -> 7[3] via P2P/CUMEM 2: jzxh016:896064:897841 [0] NCCL INFO Channel 10/0 : 8[0] -> 10[2] via P2P/CUMEM 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 05/0 : 5[1] -> 7[3] via P2P/CUMEM 2: jzxh016:896065:897839 [1] NCCL INFO Channel 05/0 : 9[1] -> 11[3] via P2P/CUMEM 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 06/0 : 4[0] -> 6[2] via P2P/CUMEM 0: jzxh014:2356328:2358126 [1] NCCL INFO Channel 05/0 : 1[1] -> 3[3] via P2P/CUMEM 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 05/0 : 0[0] -> 2[2] via P2P/CUMEM 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 06/0 : 5[1] -> 7[3] via P2P/CUMEM 3: jzxh017:51582:53370 [0] NCCL INFO Channel 10/0 : 12[0] -> 14[2] via P2P/CUMEM 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 10/0 : 4[0] -> 6[2] via P2P/CUMEM 3: jzxh017:51583:53369 [1] NCCL INFO Channel 05/0 : 13[1] -> 15[3] via P2P/CUMEM 2: jzxh016:896065:897839 [1] NCCL INFO Channel 06/0 : 9[1] -> 11[3] via P2P/CUMEM 2: jzxh016:896064:897841 [0] NCCL INFO Channel 11/0 : 8[0] -> 10[2] via P2P/CUMEM 3: jzxh017:51582:53370 [0] NCCL INFO Channel 11/0 : 12[0] -> 14[2] via P2P/CUMEM 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 06/0 : 0[0] -> 2[2] via P2P/CUMEM 0: jzxh014:2356328:2358126 [1] NCCL INFO Channel 06/0 : 1[1] -> 3[3] via P2P/CUMEM 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 10/0 : 5[1] -> 7[3] via P2P/CUMEM 3: jzxh017:51583:53369 [1] NCCL INFO Channel 06/0 : 13[1] -> 15[3] via P2P/CUMEM 2: jzxh016:896065:897839 [1] NCCL INFO Channel 10/0 : 9[1] -> 11[3] via P2P/CUMEM 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 11/0 : 4[0] -> 6[2] via P2P/CUMEM 3: jzxh017:51582:53370 [0] NCCL INFO Channel 13/0 : 12[0] -> 14[2] via P2P/CUMEM 0: jzxh014:2356328:2358126 [1] NCCL INFO Channel 10/0 : 1[1] -> 3[3] via P2P/CUMEM 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 10/0 : 0[0] -> 2[2] via P2P/CUMEM 2: jzxh016:896064:897841 [0] NCCL INFO Channel 13/0 : 8[0] -> 10[2] via P2P/CUMEM 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 13/0 : 4[0] -> 6[2] via P2P/CUMEM 3: jzxh017:51583:53369 [1] NCCL INFO Channel 10/0 : 13[1] -> 15[3] via P2P/CUMEM 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 11/0 : 5[1] -> 7[3] via P2P/CUMEM 3: jzxh017:51582:53370 [0] NCCL INFO Channel 14/0 : 12[0] -> 14[2] via P2P/CUMEM 2: jzxh016:896065:897839 [1] NCCL INFO Channel 11/0 : 9[1] -> 11[3] via P2P/CUMEM 3: jzxh017:51583:53369 [1] NCCL INFO Channel 11/0 : 13[1] -> 15[3] via P2P/CUMEM 2: jzxh016:896064:897841 [0] NCCL INFO Channel 14/0 : 8[0] -> 10[2] via P2P/CUMEM 0: jzxh014:2356328:2358126 [1] NCCL INFO Channel 11/0 : 1[1] -> 3[3] via P2P/CUMEM 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 11/0 : 0[0] -> 2[2] via P2P/CUMEM 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 14/0 : 4[0] -> 6[2] via P2P/CUMEM 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 13/0 : 0[0] -> 2[2] via P2P/CUMEM 0: jzxh014:2356328:2358126 [1] NCCL INFO Channel 13/0 : 1[1] -> 3[3] via P2P/CUMEM 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 13/0 : 5[1] -> 7[3] via P2P/CUMEM 3: jzxh017:51583:53369 [1] NCCL INFO Channel 13/0 : 13[1] -> 15[3] via P2P/CUMEM 2: jzxh016:896065:897839 [1] NCCL INFO Channel 13/0 : 9[1] -> 11[3] via P2P/CUMEM 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 14/0 : 0[0] -> 2[2] via P2P/CUMEM 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 14/0 : 5[1] -> 7[3] via P2P/CUMEM 3: jzxh017:51583:53369 [1] NCCL INFO Channel 14/0 : 13[1] -> 15[3] via P2P/CUMEM 2: jzxh016:896065:897839 [1] NCCL INFO Channel 14/0 : 9[1] -> 11[3] via P2P/CUMEM 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 04/0 : 4[0] -> 7[3] via P2P/CUMEM 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 10/0 : 2[2] -> 6[2] [receive] via NET/IB/2/GDRDMA 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 14/0 : 2[2] -> 6[2] [receive] via NET/IB/2/GDRDMA 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 04/0 : 0[0] -> 3[3] via P2P/CUMEM 2: jzxh016:896064:897841 [0] NCCL INFO Channel 04/0 : 8[0] -> 11[3] via P2P/CUMEM 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 02/0 : 6[2] -> 10[2] [send] via NET/IB/2/GDRDMA 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 12/0 : 4[0] -> 7[3] via P2P/CUMEM 2: jzxh016:896066:897840 [2] NCCL INFO Channel 02/0 : 6[2] -> 10[2] [receive] via NET/IB/2/GDRDMA 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 06/0 : 6[2] -> 10[2] [send] via NET/IB/2/GDRDMA 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 10/0 : 6[2] -> 10[2] [send] via NET/IB/2/GDRDMA 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 10/0 : 2[2] -> 6[2] [send] via NET/IB/2/GDRDMA 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 14/0 : 6[2] -> 10[2] [send] via NET/IB/2/GDRDMA 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 14/0 : 2[2] -> 6[2] [send] via NET/IB/2/GDRDMA 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 12/0 : 0[0] -> 3[3] via P2P/CUMEM 3: jzxh017:51582:53370 [0] NCCL INFO Channel 04/0 : 12[0] -> 15[3] via P2P/CUMEM 2: jzxh016:896066:897840 [2] NCCL INFO Channel 06/0 : 6[2] -> 10[2] [receive] via NET/IB/2/GDRDMA 2: jzxh016:896066:897840 [2] NCCL INFO Channel 10/0 : 6[2] -> 10[2] [receive] via NET/IB/2/GDRDMA 2: jzxh016:896064:897841 [0] NCCL INFO Channel 12/0 : 8[0] -> 11[3] via P2P/CUMEM 2: jzxh016:896066:897840 [2] NCCL INFO Channel 14/0 : 6[2] -> 10[2] [receive] via NET/IB/2/GDRDMA 2: jzxh016:896066:897840 [2] NCCL INFO Channel 02/0 : 10[2] -> 14[2] [send] via NET/IB/2/GDRDMA 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 02/0 : 10[2] -> 2[2] [receive] via NET/IB/2/GDRDMA 3: jzxh017:51584:53368 [2] NCCL INFO Channel 02/0 : 10[2] -> 14[2] [receive] via NET/IB/2/GDRDMA 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 06/0 : 10[2] -> 2[2] [receive] via NET/IB/2/GDRDMA 3: jzxh017:51583:53369 [1] NCCL INFO Channel 01/0 : 9[1] -> 13[1] [receive] via NET/IB/1/GDRDMA 3: jzxh017:51584:53368 [2] NCCL INFO Channel 06/0 : 10[2] -> 14[2] [receive] via NET/IB/2/GDRDMA 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 02/0 : 2[2] -> 10[2] [send] via NET/IB/2/GDRDMA 2: jzxh016:896065:897839 [1] NCCL INFO Channel 01/0 : 5[1] -> 9[1] [receive] via NET/IB/1/GDRDMA 2: jzxh016:896066:897840 [2] NCCL INFO Channel 06/0 : 10[2] -> 14[2] [send] via NET/IB/2/GDRDMA 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 10/0 : 14[2] -> 6[2] [receive] via NET/IB/2/GDRDMA 2: jzxh016:896065:897839 [1] NCCL INFO Channel 05/0 : 5[1] -> 9[1] [receive] via NET/IB/1/GDRDMA 2: jzxh016:896065:897839 [1] NCCL INFO Channel 09/0 : 5[1] -> 9[1] [receive] via NET/IB/1/GDRDMA 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 06/0 : 2[2] -> 10[2] [send] via NET/IB/2/GDRDMA 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 14/0 : 14[2] -> 6[2] [receive] via NET/IB/2/GDRDMA 3: jzxh017:51583:53369 [1] NCCL INFO Channel 05/0 : 9[1] -> 13[1] [receive] via NET/IB/1/GDRDMA 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 10/0 : 6[2] -> 14[2] [send] via NET/IB/2/GDRDMA 3: jzxh017:51582:53370 [0] NCCL INFO Channel 12/0 : 12[0] -> 15[3] via P2P/CUMEM 2: jzxh016:896065:897839 [1] NCCL INFO Channel 13/0 : 5[1] -> 9[1] [receive] via NET/IB/1/GDRDMA 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 14/0 : 6[2] -> 14[2] [send] via NET/IB/2/GDRDMA 3: jzxh017:51584:53368 [2] NCCL INFO Channel 10/0 : 6[2] -> 14[2] [receive] via NET/IB/2/GDRDMA 2: jzxh016:896065:897839 [1] NCCL INFO Channel 01/0 : 9[1] -> 13[1] [send] via NET/IB/1/GDRDMA 3: jzxh017:51584:53368 [2] NCCL INFO Channel 14/0 : 6[2] -> 14[2] [receive] via NET/IB/2/GDRDMA 2: jzxh016:896066:897840 [2] NCCL INFO Channel 02/0 : 2[2] -> 10[2] [receive] via NET/IB/2/GDRDMA 2: jzxh016:896065:897839 [1] NCCL INFO Channel 05/0 : 9[1] -> 13[1] [send] via NET/IB/1/GDRDMA 2: jzxh016:896066:897840 [2] NCCL INFO Channel 06/0 : 2[2] -> 10[2] [receive] via NET/IB/2/GDRDMA 2: jzxh016:896066:897840 [2] NCCL INFO Channel 02/0 : 10[2] -> 2[2] [send] via NET/IB/2/GDRDMA 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 09/0 : 1[1] -> 5[1] [receive] via NET/IB/1/GDRDMA 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 10/0 : 6[2] -> 2[2] [receive] via NET/IB/2/GDRDMA 2: jzxh016:896064:897841 [0] NCCL INFO Channel 00/0 : 4[0] -> 8[0] [receive] via NET/IB/0/GDRDMA 0: jzxh014:2356328:2358126 [1] NCCL INFO Channel 14/0 : 1[1] -> 3[3] via P2P/CUMEM 2: jzxh016:896067:897842 [3] NCCL INFO Channel 03/0 : 7[3] -> 11[3] [receive] via NET/IB/3/GDRDMA 2: jzxh016:896066:897840 [2] NCCL INFO Channel 06/0 : 10[2] -> 2[2] [send] via NET/IB/2/GDRDMA 3: jzxh017:51584:53368 [2] NCCL INFO Channel 10/0 : 14[2] -> 6[2] [send] via NET/IB/2/GDRDMA 2: jzxh016:896064:897841 [0] NCCL INFO Channel 04/0 : 4[0] -> 8[0] [receive] via NET/IB/0/GDRDMA 3: jzxh017:51584:53368 [2] NCCL INFO Channel 14/0 : 14[2] -> 6[2] [send] via NET/IB/2/GDRDMA 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 14/0 : 6[2] -> 2[2] [receive] via NET/IB/2/GDRDMA 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 13/0 : 1[1] -> 5[1] [receive] via NET/IB/1/GDRDMA 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 01/0 : 5[1] -> 9[1] [send] via NET/IB/1/GDRDMA 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 02/0 : 10[2] -> 6[2] [receive] via NET/IB/2/GDRDMA 3: jzxh017:51584:53368 [2] NCCL INFO Channel 02/0 : 14[2] -> 10[2] [send] via NET/IB/2/GDRDMA 2: jzxh016:896067:897842 [3] NCCL INFO Channel 07/0 : 7[3] -> 11[3] [receive] via NET/IB/3/GDRDMA 2: jzxh016:896064:897841 [0] NCCL INFO Channel 08/0 : 4[0] -> 8[0] [receive] via NET/IB/0/GDRDMA 2: jzxh016:896067:897842 [3] NCCL INFO Channel 11/0 : 7[3] -> 11[3] [receive] via NET/IB/3/GDRDMA 2: jzxh016:896066:897840 [2] NCCL INFO Channel 02/0 : 14[2] -> 10[2] [receive] via NET/IB/2/GDRDMA 2: jzxh016:896064:897841 [0] NCCL INFO Channel 12/0 : 4[0] -> 8[0] [receive] via NET/IB/0/GDRDMA 2: jzxh016:896067:897842 [3] NCCL INFO Channel 15/0 : 7[3] -> 11[3] [receive] via NET/IB/3/GDRDMA 2: jzxh016:896066:897840 [2] NCCL INFO Channel 06/0 : 14[2] -> 10[2] [receive] via NET/IB/2/GDRDMA 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 05/0 : 5[1] -> 9[1] [send] via NET/IB/1/GDRDMA 3: jzxh017:51584:53368 [2] NCCL INFO Channel 06/0 : 14[2] -> 10[2] [send] via NET/IB/2/GDRDMA 2: jzxh016:896064:897841 [0] NCCL INFO Channel 00/0 : 8[0] -> 12[0] [send] via NET/IB/0/GDRDMA 2: jzxh016:896067:897842 [3] NCCL INFO Channel 03/0 : 11[3] -> 15[3] [send] via NET/IB/3/GDRDMA 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 06/0 : 10[2] -> 6[2] [receive] via NET/IB/2/GDRDMA 2: jzxh016:896066:897840 [2] NCCL INFO Channel 02/0 : 10[2] -> 6[2] [send] via NET/IB/2/GDRDMA 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 09/0 : 5[1] -> 9[1] [send] via NET/IB/1/GDRDMA 2: jzxh016:896064:897841 [0] NCCL INFO Channel 04/0 : 8[0] -> 12[0] [send] via NET/IB/0/GDRDMA 2: jzxh016:896067:897842 [3] NCCL INFO Channel 07/0 : 11[3] -> 15[3] [send] via NET/IB/3/GDRDMA 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 10/0 : 10[2] -> 6[2] [receive] via NET/IB/2/GDRDMA 0: jzxh014:2356328:2358126 [1] NCCL INFO Channel 09/0 : 1[1] -> 5[1] [send] via NET/IB/1/GDRDMA 0: jzxh014:2356328:2358126 [1] NCCL INFO Channel 13/0 : 1[1] -> 5[1] [send] via NET/IB/1/GDRDMA 3: jzxh017:51584:53368 [2] NCCL INFO Channel 02/0 : 14[2] -> 12[0] via P2P/CUMEM 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 13/0 : 5[1] -> 9[1] [send] via NET/IB/1/GDRDMA 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 14/0 : 10[2] -> 6[2] [receive] via NET/IB/2/GDRDMA 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 10/0 : 6[2] -> 2[2] [send] via NET/IB/2/GDRDMA 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 11/0 : 3[3] -> 7[3] [receive] via NET/IB/3/GDRDMA 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 02/0 : 2[2] -> 0[0] via P2P/CUMEM 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 08/0 : 0[0] -> 4[0] [receive] via NET/IB/0/GDRDMA 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 14/0 : 6[2] -> 2[2] [send] via NET/IB/2/GDRDMA 2: jzxh016:896066:897840 [2] NCCL INFO Channel 06/0 : 10[2] -> 6[2] [send] via NET/IB/2/GDRDMA 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 15/0 : 3[3] -> 7[3] [receive] via NET/IB/3/GDRDMA 2: jzxh016:896066:897840 [2] NCCL INFO Channel 10/0 : 10[2] -> 6[2] [send] via NET/IB/2/GDRDMA 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 12/0 : 0[0] -> 4[0] [receive] via NET/IB/0/GDRDMA 0: jzxh014:2356328:2358126 [1] NCCL INFO Channel 01/0 : 9[1] -> 1[1] [receive] via NET/IB/1/GDRDMA 2: jzxh016:896066:897840 [2] NCCL INFO Channel 14/0 : 10[2] -> 6[2] [send] via NET/IB/2/GDRDMA 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 03/0 : 7[3] -> 11[3] [send] via NET/IB/3/GDRDMA 2: jzxh016:896065:897839 [1] NCCL INFO Channel 01/0 : 1[1] -> 9[1] [receive] via NET/IB/1/GDRDMA 2: jzxh016:896066:897840 [2] NCCL INFO Channel 02/0 : 10[2] -> 8[0] via P2P/CUMEM 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 00/0 : 4[0] -> 8[0] [send] via NET/IB/0/GDRDMA 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 07/0 : 7[3] -> 11[3] [send] via NET/IB/3/GDRDMA 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 02/0 : 6[2] -> 4[0] via P2P/CUMEM 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 11/0 : 7[3] -> 11[3] [send] via NET/IB/3/GDRDMA 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 04/0 : 4[0] -> 8[0] [send] via NET/IB/0/GDRDMA 0: jzxh014:2356328:2358126 [1] NCCL INFO Channel 05/0 : 9[1] -> 1[1] [receive] via NET/IB/1/GDRDMA 3: jzxh017:51583:53369 [1] NCCL INFO Channel 09/0 : 5[1] -> 13[1] [receive] via NET/IB/1/GDRDMA 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 03/0 : 2[2] -> 0[0] via P2P/CUMEM 3: jzxh017:51583:53369 [1] NCCL INFO Channel 13/0 : 5[1] -> 13[1] [receive] via NET/IB/1/GDRDMA 0: jzxh014:2356328:2358126 [1] NCCL INFO Channel 01/0 : 1[1] -> 9[1] [send] via NET/IB/1/GDRDMA 3: jzxh017:51584:53368 [2] NCCL INFO Channel 03/0 : 14[2] -> 12[0] via P2P/CUMEM 3: jzxh017:51583:53369 [1] NCCL INFO Channel 09/0 : 13[1] -> 5[1] [send] via NET/IB/1/GDRDMA 3: jzxh017:51583:53369 [1] NCCL INFO Channel 13/0 : 13[1] -> 5[1] [send] via NET/IB/1/GDRDMA 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 11/0 : 3[3] -> 7[3] [send] via NET/IB/3/GDRDMA 0: jzxh014:2356328:2358126 [1] NCCL INFO Channel 05/0 : 1[1] -> 9[1] [send] via NET/IB/1/GDRDMA 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 15/0 : 3[3] -> 7[3] [send] via NET/IB/3/GDRDMA 2: jzxh016:896065:897839 [1] NCCL INFO Channel 05/0 : 1[1] -> 9[1] [receive] via NET/IB/1/GDRDMA 2: jzxh016:896065:897839 [1] NCCL INFO Channel 01/0 : 9[1] -> 1[1] [send] via NET/IB/1/GDRDMA 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 15/0 : 7[3] -> 11[3] [send] via NET/IB/3/GDRDMA 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 08/0 : 4[0] -> 8[0] [send] via NET/IB/0/GDRDMA 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 05/0 : 2[2] -> 0[0] via P2P/CUMEM 2: jzxh016:896066:897840 [2] NCCL INFO Channel 03/0 : 10[2] -> 8[0] via P2P/CUMEM 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 09/0 : 13[1] -> 5[1] [receive] via NET/IB/1/GDRDMA 2: jzxh016:896065:897839 [1] NCCL INFO Channel 05/0 : 9[1] -> 1[1] [send] via NET/IB/1/GDRDMA 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 03/0 : 6[2] -> 4[0] via P2P/CUMEM 2: jzxh016:896066:897840 [2] NCCL INFO Channel 05/0 : 10[2] -> 8[0] via P2P/CUMEM 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 12/0 : 4[0] -> 8[0] [send] via NET/IB/0/GDRDMA 0: jzxh014:2356328:2358126 [1] NCCL INFO Channel 09/0 : 5[1] -> 1[1] [receive] via NET/IB/1/GDRDMA 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 13/0 : 13[1] -> 5[1] [receive] via NET/IB/1/GDRDMA 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 09/0 : 5[1] -> 13[1] [send] via NET/IB/1/GDRDMA 0: jzxh014:2356328:2358126 [1] NCCL INFO Channel 13/0 : 5[1] -> 1[1] [receive] via NET/IB/1/GDRDMA 3: jzxh017:51584:53368 [2] NCCL INFO Channel 05/0 : 14[2] -> 12[0] via P2P/CUMEM 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 03/0 : 11[3] -> 3[3] [receive] via NET/IB/3/GDRDMA 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 06/0 : 2[2] -> 0[0] via P2P/CUMEM 3: jzxh017:51584:53368 [2] NCCL INFO Channel 06/0 : 14[2] -> 12[0] via P2P/CUMEM 3: jzxh017:51583:53369 [1] NCCL INFO Channel 01/0 : 13[1] -> 9[1] [send] via NET/IB/1/GDRDMA 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 07/0 : 11[3] -> 3[3] [receive] via NET/IB/3/GDRDMA 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 03/0 : 3[3] -> 11[3] [send] via NET/IB/3/GDRDMA 2: jzxh016:896066:897840 [2] NCCL INFO Channel 06/0 : 10[2] -> 8[0] via P2P/CUMEM 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 05/0 : 6[2] -> 4[0] via P2P/CUMEM 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 13/0 : 5[1] -> 13[1] [send] via NET/IB/1/GDRDMA 2: jzxh016:896065:897839 [1] NCCL INFO Channel 01/0 : 13[1] -> 9[1] [receive] via NET/IB/1/GDRDMA 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 06/0 : 6[2] -> 4[0] via P2P/CUMEM 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 01/0 : 9[1] -> 5[1] [receive] via NET/IB/1/GDRDMA 2: jzxh016:896066:897840 [2] NCCL INFO Channel 10/0 : 10[2] -> 8[0] via P2P/CUMEM 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 05/0 : 9[1] -> 5[1] [receive] via NET/IB/1/GDRDMA 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 10/0 : 2[2] -> 0[0] via P2P/CUMEM 2: jzxh016:896065:897839 [1] NCCL INFO Channel 05/0 : 13[1] -> 9[1] [receive] via NET/IB/1/GDRDMA 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 11/0 : 15[3] -> 7[3] [receive] via NET/IB/3/GDRDMA 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 07/0 : 3[3] -> 11[3] [send] via NET/IB/3/GDRDMA 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 10/0 : 6[2] -> 4[0] via P2P/CUMEM 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 09/0 : 9[1] -> 5[1] [receive] via NET/IB/1/GDRDMA 3: jzxh017:51584:53368 [2] NCCL INFO Channel 10/0 : 14[2] -> 12[0] via P2P/CUMEM 3: jzxh017:51583:53369 [1] NCCL INFO Channel 05/0 : 13[1] -> 9[1] [send] via NET/IB/1/GDRDMA 3: jzxh017:51585:53367 [3] NCCL INFO Channel 03/0 : 11[3] -> 15[3] [receive] via NET/IB/3/GDRDMA 3: jzxh017:51582:53370 [0] NCCL INFO Channel 00/0 : 8[0] -> 12[0] [receive] via NET/IB/0/GDRDMA 3: jzxh017:51584:53368 [2] NCCL INFO Channel 11/0 : 14[2] -> 12[0] via P2P/CUMEM 3: jzxh017:51585:53367 [3] NCCL INFO Channel 07/0 : 11[3] -> 15[3] [receive] via NET/IB/3/GDRDMA 3: jzxh017:51582:53370 [0] NCCL INFO Channel 04/0 : 8[0] -> 12[0] [receive] via NET/IB/0/GDRDMA 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 11/0 : 2[2] -> 0[0] via P2P/CUMEM 2: jzxh016:896065:897839 [1] NCCL INFO Channel 01/0 : 9[1] -> 5[1] [send] via NET/IB/1/GDRDMA 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 15/0 : 15[3] -> 7[3] [receive] via NET/IB/3/GDRDMA 2: jzxh016:896066:897840 [2] NCCL INFO Channel 11/0 : 10[2] -> 8[0] via P2P/CUMEM 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 13/0 : 9[1] -> 5[1] [receive] via NET/IB/1/GDRDMA 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 11/0 : 7[3] -> 15[3] [send] via NET/IB/3/GDRDMA 2: jzxh016:896065:897839 [1] NCCL INFO Channel 05/0 : 9[1] -> 5[1] [send] via NET/IB/1/GDRDMA 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 15/0 : 7[3] -> 15[3] [send] via NET/IB/3/GDRDMA 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 11/0 : 6[2] -> 4[0] via P2P/CUMEM 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 09/0 : 5[1] -> 1[1] [send] via NET/IB/1/GDRDMA 2: jzxh016:896065:897839 [1] NCCL INFO Channel 09/0 : 9[1] -> 5[1] [send] via NET/IB/1/GDRDMA 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 13/0 : 5[1] -> 1[1] [send] via NET/IB/1/GDRDMA 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 13/0 : 2[2] -> 0[0] via P2P/CUMEM 2: jzxh016:896064:897841 [0] NCCL INFO Channel 00/0 : 0[0] -> 8[0] [receive] via NET/IB/0/GDRDMA 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 13/0 : 6[2] -> 4[0] via P2P/CUMEM 2: jzxh016:896065:897839 [1] NCCL INFO Channel 13/0 : 9[1] -> 5[1] [send] via NET/IB/1/GDRDMA 2: jzxh016:896066:897840 [2] NCCL INFO Channel 13/0 : 10[2] -> 8[0] via P2P/CUMEM 2: jzxh016:896064:897841 [0] NCCL INFO Channel 04/0 : 0[0] -> 8[0] [receive] via NET/IB/0/GDRDMA 3: jzxh017:51584:53368 [2] NCCL INFO Channel 13/0 : 14[2] -> 12[0] via P2P/CUMEM 3: jzxh017:51585:53367 [3] NCCL INFO Channel 11/0 : 7[3] -> 15[3] [receive] via NET/IB/3/GDRDMA 3: jzxh017:51582:53370 [0] NCCL INFO Channel 08/0 : 4[0] -> 12[0] [receive] via NET/IB/0/GDRDMA 3: jzxh017:51585:53367 [3] NCCL INFO Channel 15/0 : 7[3] -> 15[3] [receive] via NET/IB/3/GDRDMA 3: jzxh017:51582:53370 [0] NCCL INFO Channel 12/0 : 4[0] -> 12[0] [receive] via NET/IB/0/GDRDMA 3: jzxh017:51584:53368 [2] NCCL INFO Channel 14/0 : 14[2] -> 12[0] via P2P/CUMEM 3: jzxh017:51582:53370 [0] NCCL INFO Channel 08/0 : 12[0] -> 4[0] [send] via NET/IB/0/GDRDMA 3: jzxh017:51585:53367 [3] NCCL INFO Channel 11/0 : 15[3] -> 7[3] [send] via NET/IB/3/GDRDMA 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 14/0 : 6[2] -> 4[0] via P2P/CUMEM 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 14/0 : 2[2] -> 0[0] via P2P/CUMEM 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 11/0 : 7[3] -> 3[3] [receive] via NET/IB/3/GDRDMA 3: jzxh017:51582:53370 [0] NCCL INFO Channel 12/0 : 12[0] -> 4[0] [send] via NET/IB/0/GDRDMA 3: jzxh017:51585:53367 [3] NCCL INFO Channel 15/0 : 15[3] -> 7[3] [send] via NET/IB/3/GDRDMA 2: jzxh016:896067:897842 [3] NCCL INFO Channel 03/0 : 3[3] -> 11[3] [receive] via NET/IB/3/GDRDMA 3: jzxh017:51585:53367 [3] NCCL INFO Channel 03/0 : 15[3] -> 11[3] [send] via NET/IB/3/GDRDMA 2: jzxh016:896064:897841 [0] NCCL INFO Channel 00/0 : 8[0] -> 0[0] [send] via NET/IB/0/GDRDMA 2: jzxh016:896066:897840 [2] NCCL INFO Channel 14/0 : 10[2] -> 8[0] via P2P/CUMEM 2: jzxh016:896067:897842 [3] NCCL INFO Channel 07/0 : 3[3] -> 11[3] [receive] via NET/IB/3/GDRDMA 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 08/0 : 0[0] -> 4[0] [send] via NET/IB/0/GDRDMA 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 15/0 : 7[3] -> 3[3] [receive] via NET/IB/3/GDRDMA 3: jzxh017:51585:53367 [3] NCCL INFO Channel 07/0 : 15[3] -> 11[3] [send] via NET/IB/3/GDRDMA 2: jzxh016:896064:897841 [0] NCCL INFO Channel 04/0 : 8[0] -> 0[0] [send] via NET/IB/0/GDRDMA 2: jzxh016:896067:897842 [3] NCCL INFO Channel 03/0 : 11[3] -> 3[3] [send] via NET/IB/3/GDRDMA 2: jzxh016:896067:897842 [3] NCCL INFO Channel 07/0 : 11[3] -> 3[3] [send] via NET/IB/3/GDRDMA 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 03/0 : 11[3] -> 7[3] [receive] via NET/IB/3/GDRDMA 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 12/0 : 0[0] -> 4[0] [send] via NET/IB/0/GDRDMA 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 07/0 : 11[3] -> 7[3] [receive] via NET/IB/3/GDRDMA 2: jzxh016:896067:897842 [3] NCCL INFO Channel 03/0 : 15[3] -> 11[3] [receive] via NET/IB/3/GDRDMA 2: jzxh016:896067:897842 [3] NCCL INFO Channel 07/0 : 15[3] -> 11[3] [receive] via NET/IB/3/GDRDMA 2: jzxh016:896067:897842 [3] NCCL INFO Channel 03/0 : 11[3] -> 7[3] [send] via NET/IB/3/GDRDMA 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 00/0 : 8[0] -> 0[0] [receive] via NET/IB/0/GDRDMA 2: jzxh016:896067:897842 [3] NCCL INFO Channel 07/0 : 11[3] -> 7[3] [send] via NET/IB/3/GDRDMA 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 04/0 : 8[0] -> 0[0] [receive] via NET/IB/0/GDRDMA 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 11/0 : 11[3] -> 7[3] [receive] via NET/IB/3/GDRDMA 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 15/0 : 11[3] -> 7[3] [receive] via NET/IB/3/GDRDMA 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 11/0 : 7[3] -> 3[3] [send] via NET/IB/3/GDRDMA 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 15/0 : 7[3] -> 3[3] [send] via NET/IB/3/GDRDMA 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 01/0 : 3[3] -> 0[0] via P2P/CUMEM 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 00/0 : 0[0] -> 8[0] [send] via NET/IB/0/GDRDMA 3: jzxh017:51585:53367 [3] NCCL INFO Channel 01/0 : 15[3] -> 12[0] via P2P/CUMEM 2: jzxh016:896067:897842 [3] NCCL INFO Channel 11/0 : 11[3] -> 7[3] [send] via NET/IB/3/GDRDMA 2: jzxh016:896067:897842 [3] NCCL INFO Channel 15/0 : 11[3] -> 7[3] [send] via NET/IB/3/GDRDMA 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 04/0 : 0[0] -> 8[0] [send] via NET/IB/0/GDRDMA 2: jzxh016:896067:897842 [3] NCCL INFO Channel 01/0 : 11[3] -> 8[0] via P2P/CUMEM 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 08/0 : 4[0] -> 0[0] [receive] via NET/IB/0/GDRDMA 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 08/0 : 12[0] -> 4[0] [receive] via NET/IB/0/GDRDMA 3: jzxh017:51585:53367 [3] NCCL INFO Channel 04/0 : 15[3] -> 12[0] via P2P/CUMEM 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 12/0 : 12[0] -> 4[0] [receive] via NET/IB/0/GDRDMA 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 01/0 : 7[3] -> 4[0] via P2P/CUMEM 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 08/0 : 4[0] -> 12[0] [send] via NET/IB/0/GDRDMA 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 12/0 : 4[0] -> 12[0] [send] via NET/IB/0/GDRDMA 2: jzxh016:896067:897842 [3] NCCL INFO Channel 04/0 : 11[3] -> 8[0] via P2P/CUMEM 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 04/0 : 3[3] -> 0[0] via P2P/CUMEM 0: jzxh014:2356327:2358124 [0] NCCL INFO Channel 12/0 : 4[0] -> 0[0] [receive] via NET/IB/0/GDRDMA 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 04/0 : 7[3] -> 4[0] via P2P/CUMEM 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 00/0 : 8[0] -> 4[0] [receive] via NET/IB/0/GDRDMA 3: jzxh017:51582:53370 [0] NCCL INFO Channel 00/0 : 12[0] -> 8[0] [send] via NET/IB/0/GDRDMA 2: jzxh016:896064:897841 [0] NCCL INFO Channel 00/0 : 12[0] -> 8[0] [receive] via NET/IB/0/GDRDMA 2: jzxh016:896064:897841 [0] NCCL INFO Channel 04/0 : 12[0] -> 8[0] [receive] via NET/IB/0/GDRDMA 2: jzxh016:896064:897841 [0] NCCL INFO Channel 00/0 : 8[0] -> 4[0] [send] via NET/IB/0/GDRDMA 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 05/0 : 3[3] -> 0[0] via P2P/CUMEM 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 04/0 : 8[0] -> 4[0] [receive] via NET/IB/0/GDRDMA 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 08/0 : 8[0] -> 4[0] [receive] via NET/IB/0/GDRDMA 3: jzxh017:51582:53370 [0] NCCL INFO Channel 04/0 : 12[0] -> 8[0] [send] via NET/IB/0/GDRDMA 3: jzxh017:51585:53367 [3] NCCL INFO Channel 05/0 : 15[3] -> 12[0] via P2P/CUMEM 2: jzxh016:896067:897842 [3] NCCL INFO Channel 05/0 : 11[3] -> 8[0] via P2P/CUMEM 2: jzxh016:896064:897841 [0] NCCL INFO Channel 04/0 : 8[0] -> 4[0] [send] via NET/IB/0/GDRDMA 2: jzxh016:896064:897841 [0] NCCL INFO Channel 08/0 : 8[0] -> 4[0] [send] via NET/IB/0/GDRDMA 2: jzxh016:896064:897841 [0] NCCL INFO Channel 12/0 : 8[0] -> 4[0] [send] via NET/IB/0/GDRDMA 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 12/0 : 8[0] -> 4[0] [receive] via NET/IB/0/GDRDMA 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 05/0 : 7[3] -> 4[0] via P2P/CUMEM 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 08/0 : 4[0] -> 0[0] [send] via NET/IB/0/GDRDMA 1: jzxh015:2965582:2968667 [0] NCCL INFO Channel 12/0 : 4[0] -> 0[0] [send] via NET/IB/0/GDRDMA 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 06/0 : 3[3] -> 0[0] via P2P/CUMEM 2: jzxh016:896067:897842 [3] NCCL INFO Channel 06/0 : 11[3] -> 8[0] via P2P/CUMEM 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 06/0 : 7[3] -> 4[0] via P2P/CUMEM 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 09/0 : 3[3] -> 0[0] via P2P/CUMEM 2: jzxh016:896067:897842 [3] NCCL INFO Channel 09/0 : 11[3] -> 8[0] via P2P/CUMEM 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 09/0 : 7[3] -> 4[0] via P2P/CUMEM 2: jzxh016:896067:897842 [3] NCCL INFO Channel 12/0 : 11[3] -> 8[0] via P2P/CUMEM 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 12/0 : 7[3] -> 4[0] via P2P/CUMEM 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 12/0 : 3[3] -> 0[0] via P2P/CUMEM 2: jzxh016:896067:897842 [3] NCCL INFO Channel 13/0 : 11[3] -> 8[0] via P2P/CUMEM 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 13/0 : 7[3] -> 4[0] via P2P/CUMEM 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 13/0 : 3[3] -> 0[0] via P2P/CUMEM 2: jzxh016:896067:897842 [3] NCCL INFO Channel 14/0 : 11[3] -> 8[0] via P2P/CUMEM 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 14/0 : 3[3] -> 0[0] via P2P/CUMEM 3: jzxh017:51585:53367 [3] NCCL INFO Channel 06/0 : 15[3] -> 12[0] via P2P/CUMEM 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 14/0 : 7[3] -> 4[0] via P2P/CUMEM 3: jzxh017:51585:53367 [3] NCCL INFO Channel 09/0 : 15[3] -> 12[0] via P2P/CUMEM 3: jzxh017:51585:53367 [3] NCCL INFO Channel 12/0 : 15[3] -> 12[0] via P2P/CUMEM 3: jzxh017:51585:53367 [3] NCCL INFO Channel 13/0 : 15[3] -> 12[0] via P2P/CUMEM 3: jzxh017:51585:53367 [3] NCCL INFO Channel 14/0 : 15[3] -> 12[0] via P2P/CUMEM 2: jzxh016:896067:897842 [3] NCCL INFO Channel 02/0 : 11[3] -> 9[1] via P2P/CUMEM 3: jzxh017:51585:53367 [3] NCCL INFO Channel 02/0 : 15[3] -> 13[1] via P2P/CUMEM 2: jzxh016:896067:897842 [3] NCCL INFO Channel 03/0 : 11[3] -> 9[1] via P2P/CUMEM 3: jzxh017:51585:53367 [3] NCCL INFO Channel 03/0 : 15[3] -> 13[1] via P2P/CUMEM 2: jzxh016:896067:897842 [3] NCCL INFO Channel 05/0 : 11[3] -> 9[1] via P2P/CUMEM 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 02/0 : 3[3] -> 1[1] via P2P/CUMEM 3: jzxh017:51585:53367 [3] NCCL INFO Channel 05/0 : 15[3] -> 13[1] via P2P/CUMEM 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 02/0 : 7[3] -> 5[1] via P2P/CUMEM 2: jzxh016:896067:897842 [3] NCCL INFO Channel 06/0 : 11[3] -> 9[1] via P2P/CUMEM 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 03/0 : 3[3] -> 1[1] via P2P/CUMEM 3: jzxh017:51585:53367 [3] NCCL INFO Channel 06/0 : 15[3] -> 13[1] via P2P/CUMEM 2: jzxh016:896067:897842 [3] NCCL INFO Channel 10/0 : 11[3] -> 9[1] via P2P/CUMEM 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 03/0 : 7[3] -> 5[1] via P2P/CUMEM 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 05/0 : 3[3] -> 1[1] via P2P/CUMEM 3: jzxh017:51585:53367 [3] NCCL INFO Channel 10/0 : 15[3] -> 13[1] via P2P/CUMEM 2: jzxh016:896067:897842 [3] NCCL INFO Channel 11/0 : 11[3] -> 9[1] via P2P/CUMEM 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 05/0 : 7[3] -> 5[1] via P2P/CUMEM 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 06/0 : 3[3] -> 1[1] via P2P/CUMEM 3: jzxh017:51585:53367 [3] NCCL INFO Channel 11/0 : 15[3] -> 13[1] via P2P/CUMEM 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 06/0 : 7[3] -> 5[1] via P2P/CUMEM 2: jzxh016:896067:897842 [3] NCCL INFO Channel 13/0 : 11[3] -> 9[1] via P2P/CUMEM 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 10/0 : 3[3] -> 1[1] via P2P/CUMEM 3: jzxh017:51585:53367 [3] NCCL INFO Channel 13/0 : 15[3] -> 13[1] via P2P/CUMEM 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 10/0 : 7[3] -> 5[1] via P2P/CUMEM 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 11/0 : 3[3] -> 1[1] via P2P/CUMEM 3: jzxh017:51585:53367 [3] NCCL INFO Channel 14/0 : 15[3] -> 13[1] via P2P/CUMEM 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 11/0 : 7[3] -> 5[1] via P2P/CUMEM 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 13/0 : 3[3] -> 1[1] via P2P/CUMEM 3: jzxh017:51585:53367 [3] NCCL INFO Channel 00/0 : 15[3] -> 14[2] via P2P/CUMEM 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 13/0 : 7[3] -> 5[1] via P2P/CUMEM 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 14/0 : 3[3] -> 1[1] via P2P/CUMEM 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 14/0 : 7[3] -> 5[1] via P2P/CUMEM 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 00/0 : 3[3] -> 2[2] via P2P/CUMEM 3: jzxh017:51585:53367 [3] NCCL INFO Channel 04/0 : 15[3] -> 14[2] via P2P/CUMEM 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 00/0 : 7[3] -> 6[2] via P2P/CUMEM 3: jzxh017:51585:53367 [3] NCCL INFO Channel 07/0 : 15[3] -> 14[2] via P2P/CUMEM 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 04/0 : 3[3] -> 2[2] via P2P/CUMEM 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 04/0 : 7[3] -> 6[2] via P2P/CUMEM 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 07/0 : 3[3] -> 2[2] via P2P/CUMEM 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 07/0 : 7[3] -> 6[2] via P2P/CUMEM 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 08/0 : 3[3] -> 2[2] via P2P/CUMEM 3: jzxh017:51583:53369 [1] NCCL INFO Channel 00/0 : 13[1] -> 12[0] via P2P/CUMEM 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 08/0 : 7[3] -> 6[2] via P2P/CUMEM 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 12/0 : 3[3] -> 2[2] via P2P/CUMEM 3: jzxh017:51585:53367 [3] NCCL INFO Channel 08/0 : 15[3] -> 14[2] via P2P/CUMEM 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 12/0 : 7[3] -> 6[2] via P2P/CUMEM 3: jzxh017:51583:53369 [1] NCCL INFO Channel 03/0 : 13[1] -> 12[0] via P2P/CUMEM 3: jzxh017:51585:53367 [3] NCCL INFO Channel 12/0 : 15[3] -> 14[2] via P2P/CUMEM 3: jzxh017:51585:53367 [3] NCCL INFO Channel 15/0 : 15[3] -> 14[2] via P2P/CUMEM 2: jzxh016:896067:897842 [3] NCCL INFO Channel 14/0 : 11[3] -> 9[1] via P2P/CUMEM 1: jzxh015:2965585:2968664 [3] NCCL INFO Channel 15/0 : 7[3] -> 6[2] via P2P/CUMEM 2: jzxh016:896066:897840 [2] NCCL INFO Channel 00/0 : 10[2] -> 9[1] via P2P/CUMEM 2: jzxh016:896067:897842 [3] NCCL INFO Channel 00/0 : 11[3] -> 10[2] via P2P/CUMEM 0: jzxh014:2356330:2358127 [3] NCCL INFO Channel 15/0 : 3[3] -> 2[2] via P2P/CUMEM 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 00/0 : 2[2] -> 1[1] via P2P/CUMEM 3: jzxh017:51584:53368 [2] NCCL INFO Channel 00/0 : 14[2] -> 13[1] via P2P/CUMEM 2: jzxh016:896067:897842 [3] NCCL INFO Channel 04/0 : 11[3] -> 10[2] via P2P/CUMEM 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 00/0 : 6[2] -> 5[1] via P2P/CUMEM 2: jzxh016:896066:897840 [2] NCCL INFO Channel 01/0 : 10[2] -> 9[1] via P2P/CUMEM 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 01/0 : 2[2] -> 1[1] via P2P/CUMEM 3: jzxh017:51584:53368 [2] NCCL INFO Channel 01/0 : 14[2] -> 13[1] via P2P/CUMEM 2: jzxh016:896066:897840 [2] NCCL INFO Channel 04/0 : 10[2] -> 9[1] via P2P/CUMEM 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 01/0 : 6[2] -> 5[1] via P2P/CUMEM 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 04/0 : 2[2] -> 1[1] via P2P/CUMEM 3: jzxh017:51583:53369 [1] NCCL INFO Channel 07/0 : 13[1] -> 12[0] via P2P/CUMEM 2: jzxh016:896067:897842 [3] NCCL INFO Channel 07/0 : 11[3] -> 10[2] via P2P/CUMEM 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 04/0 : 6[2] -> 5[1] via P2P/CUMEM 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 07/0 : 2[2] -> 1[1] via P2P/CUMEM 0: jzxh014:2356328:2358126 [1] NCCL INFO Channel 00/0 : 1[1] -> 0[0] via P2P/CUMEM 3: jzxh017:51584:53368 [2] NCCL INFO Channel 04/0 : 14[2] -> 13[1] via P2P/CUMEM 2: jzxh016:896067:897842 [3] NCCL INFO Channel 08/0 : 11[3] -> 10[2] via P2P/CUMEM 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 00/0 : 5[1] -> 4[0] via P2P/CUMEM 3: jzxh017:51583:53369 [1] NCCL INFO Channel 08/0 : 13[1] -> 12[0] via P2P/CUMEM 2: jzxh016:896066:897840 [2] NCCL INFO Channel 07/0 : 10[2] -> 9[1] via P2P/CUMEM 0: jzxh014:2356328:2358126 [1] NCCL INFO Channel 03/0 : 1[1] -> 0[0] via P2P/CUMEM 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 08/0 : 2[2] -> 1[1] via P2P/CUMEM 2: jzxh016:896066:897840 [2] NCCL INFO Channel 08/0 : 10[2] -> 9[1] via P2P/CUMEM 2: jzxh016:896067:897842 [3] NCCL INFO Channel 12/0 : 11[3] -> 10[2] via P2P/CUMEM 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 07/0 : 6[2] -> 5[1] via P2P/CUMEM 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 03/0 : 5[1] -> 4[0] via P2P/CUMEM 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 08/0 : 6[2] -> 5[1] via P2P/CUMEM 3: jzxh017:51584:53368 [2] NCCL INFO Channel 07/0 : 14[2] -> 13[1] via P2P/CUMEM 3: jzxh017:51583:53369 [1] NCCL INFO Channel 11/0 : 13[1] -> 12[0] via P2P/CUMEM 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 07/0 : 5[1] -> 4[0] via P2P/CUMEM 0: jzxh014:2356328:2358126 [1] NCCL INFO Channel 07/0 : 1[1] -> 0[0] via P2P/CUMEM 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 09/0 : 2[2] -> 1[1] via P2P/CUMEM 3: jzxh017:51583:53369 [1] NCCL INFO Channel 15/0 : 13[1] -> 12[0] via P2P/CUMEM 3: jzxh017:51584:53368 [2] NCCL INFO Channel 08/0 : 14[2] -> 13[1] via P2P/CUMEM 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 09/0 : 6[2] -> 5[1] via P2P/CUMEM 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 08/0 : 5[1] -> 4[0] via P2P/CUMEM 2: jzxh016:896066:897840 [2] NCCL INFO Channel 09/0 : 10[2] -> 9[1] via P2P/CUMEM 3: jzxh017:51584:53368 [2] NCCL INFO Channel 09/0 : 14[2] -> 13[1] via P2P/CUMEM 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 12/0 : 2[2] -> 1[1] via P2P/CUMEM 0: jzxh014:2356328:2358126 [1] NCCL INFO Channel 08/0 : 1[1] -> 0[0] via P2P/CUMEM 2: jzxh016:896067:897842 [3] NCCL INFO Channel 15/0 : 11[3] -> 10[2] via P2P/CUMEM 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 11/0 : 5[1] -> 4[0] via P2P/CUMEM 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 12/0 : 6[2] -> 5[1] via P2P/CUMEM 3: jzxh017:51584:53368 [2] NCCL INFO Channel 12/0 : 14[2] -> 13[1] via P2P/CUMEM 2: jzxh016:896066:897840 [2] NCCL INFO Channel 12/0 : 10[2] -> 9[1] via P2P/CUMEM 0: jzxh014:2356328:2358126 [1] NCCL INFO Channel 11/0 : 1[1] -> 0[0] via P2P/CUMEM 0: jzxh014:2356329:2358125 [2] NCCL INFO Channel 15/0 : 2[2] -> 1[1] via P2P/CUMEM 1: jzxh015:2965584:2968665 [2] NCCL INFO Channel 15/0 : 6[2] -> 5[1] via P2P/CUMEM 2: jzxh016:896066:897840 [2] NCCL INFO Channel 15/0 : 10[2] -> 9[1] via P2P/CUMEM 1: jzxh015:2965583:2968666 [1] NCCL INFO Channel 15/0 : 5[1] -> 4[0] via P2P/CUMEM 0: jzxh014:2356328:2358126 [1] NCCL INFO Channel 15/0 : 1[1] -> 0[0] via P2P/CUMEM 2: jzxh016:896065:897839 [1] NCCL INFO Channel 00/0 : 9[1] -> 8[0] via P2P/CUMEM 2: jzxh016:896065:897839 [1] NCCL INFO Channel 03/0 : 9[1] -> 8[0] via P2P/CUMEM 3: jzxh017:51584:53368 [2] NCCL INFO Channel 15/0 : 14[2] -> 13[1] via P2P/CUMEM 2: jzxh016:896065:897839 [1] NCCL INFO Channel 07/0 : 9[1] -> 8[0] via P2P/CUMEM 2: jzxh016:896065:897839 [1] NCCL INFO Channel 08/0 : 9[1] -> 8[0] via P2P/CUMEM 2: jzxh016:896065:897839 [1] NCCL INFO Channel 11/0 : 9[1] -> 8[0] via P2P/CUMEM 2: jzxh016:896065:897839 [1] NCCL INFO Channel 15/0 : 9[1] -> 8[0] via P2P/CUMEM 0: jzxh014:2356330:2358127 [3] NCCL INFO Connected all trees 0: jzxh014:2356327:2358124 [0] NCCL INFO Connected all trees 3: jzxh017:51582:53370 [0] NCCL INFO Connected all trees 2: jzxh016:896064:897841 [0] NCCL INFO Connected all trees 2: jzxh016:896067:897842 [3] NCCL INFO Connected all trees 1: jzxh015:2965582:2968667 [0] NCCL INFO Connected all trees 0: jzxh014:2356330:2358127 [3] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 0: jzxh014:2356330:2358127 [3] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 3: jzxh017:51582:53370 [0] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 2: jzxh016:896067:897842 [3] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 2: jzxh016:896066:897840 [2] NCCL INFO Connected all trees 2: jzxh016:896064:897841 [0] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 2: jzxh016:896067:897842 [3] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 2: jzxh016:896064:897841 [0] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 2: jzxh016:896065:897839 [1] NCCL INFO Connected all trees 1: jzxh015:2965582:2968667 [0] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 1: jzxh015:2965582:2968667 [0] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 0: jzxh014:2356329:2358125 [2] NCCL INFO Connected all trees 0: jzxh014:2356327:2358124 [0] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 0: jzxh014:2356327:2358124 [0] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 0: jzxh014:2356328:2358126 [1] NCCL INFO Connected all trees 0: jzxh014:2356329:2358125 [2] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 3: jzxh017:51582:53370 [0] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 3: jzxh017:51585:53367 [3] NCCL INFO Connected all trees 2: jzxh016:896066:897840 [2] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 2: jzxh016:896066:897840 [2] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 2: jzxh016:896065:897839 [1] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 1: jzxh015:2965585:2968664 [3] NCCL INFO Connected all trees 0: jzxh014:2356329:2358125 [2] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 0: jzxh014:2356328:2358126 [1] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 0: jzxh014:2356328:2358126 [1] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 3: jzxh017:51585:53367 [3] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 3: jzxh017:51583:53369 [1] NCCL INFO Connected all trees 3: jzxh017:51585:53367 [3] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 3: jzxh017:51584:53368 [2] NCCL INFO Connected all trees 2: jzxh016:896065:897839 [1] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 1: jzxh015:2965583:2968666 [1] NCCL INFO Connected all trees 3: jzxh017:51583:53369 [1] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 3: jzxh017:51583:53369 [1] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 3: jzxh017:51584:53368 [2] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 1: jzxh015:2965585:2968664 [3] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 1: jzxh015:2965585:2968664 [3] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 3: jzxh017:51584:53368 [2] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 1: jzxh015:2965583:2968666 [1] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 1: jzxh015:2965583:2968666 [1] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 1: jzxh015:2965584:2968665 [2] NCCL INFO Connected all trees 1: jzxh015:2965584:2968665 [2] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 1: jzxh015:2965584:2968665 [2] NCCL INFO 16 coll channels, 16 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer 3: jzxh017:51582:53370 [0] NCCL INFO ncclCommInitRank comm 0x14552c11e540 rank 12 nranks 16 cudaDev 0 nvmlDev 0 busId 1b000 commId 0x619ce177681b4aa5 - Init COMPLETE 3: jzxh017:51584:53368 [2] NCCL INFO ncclCommInitRank comm 0x14ca90136dc0 rank 14 nranks 16 cudaDev 2 nvmlDev 2 busId 9d000 commId 0x619ce177681b4aa5 - Init COMPLETE 2: jzxh016:896067:897842 [3] NCCL INFO ncclCommInitRank comm 0x14a628114540 rank 11 nranks 16 cudaDev 3 nvmlDev 3 busId ad000 commId 0x619ce177681b4aa5 - Init COMPLETE 2: jzxh016:896065:897839 [1] NCCL INFO ncclCommInitRank comm 0x14f2fc1412c0 rank 9 nranks 16 cudaDev 1 nvmlDev 1 busId 2c000 commId 0x619ce177681b4aa5 - Init COMPLETE 2: jzxh016:896066:897840 [2] NCCL INFO ncclCommInitRank comm 0x1498d4139700 rank 10 nranks 16 cudaDev 2 nvmlDev 2 busId 9d000 commId 0x619ce177681b4aa5 - Init COMPLETE 2: jzxh016:896064:897841 [0] NCCL INFO ncclCommInitRank comm 0x15476812cf80 rank 8 nranks 16 cudaDev 0 nvmlDev 0 busId 1b000 commId 0x619ce177681b4aa5 - Init COMPLETE 3: jzxh017:51585:53367 [3] NCCL INFO ncclCommInitRank comm 0x1459cc121240 rank 15 nranks 16 cudaDev 3 nvmlDev 3 busId ad000 commId 0x619ce177681b4aa5 - Init COMPLETE 3: jzxh017:51583:53369 [1] NCCL INFO ncclCommInitRank comm 0x14b65c1450c0 rank 13 nranks 16 cudaDev 1 nvmlDev 1 busId 2c000 commId 0x619ce177681b4aa5 - Init COMPLETE 1: jzxh015:2965585:2968664 [3] NCCL INFO ncclCommInitRank comm 0x145b10135ed0 rank 7 nranks 16 cudaDev 3 nvmlDev 3 busId ad000 commId 0x619ce177681b4aa5 - Init COMPLETE 1: jzxh015:2965583:2968666 [1] NCCL INFO ncclCommInitRank comm 0x14e36c136200 rank 5 nranks 16 cudaDev 1 nvmlDev 1 busId 2c000 commId 0x619ce177681b4aa5 - Init COMPLETE 1: jzxh015:2965582:2968667 [0] NCCL INFO ncclCommInitRank comm 0x151e3812df00 rank 4 nranks 16 cudaDev 0 nvmlDev 0 busId 1b000 commId 0x619ce177681b4aa5 - Init COMPLETE 1: jzxh015:2965584:2968665 [2] NCCL INFO ncclCommInitRank comm 0x149484114480 rank 6 nranks 16 cudaDev 2 nvmlDev 2 busId 9d000 commId 0x619ce177681b4aa5 - Init COMPLETE 0: jzxh014:2356329:2358125 [2] NCCL INFO ncclCommInitRank comm 0x153460122200 rank 2 nranks 16 cudaDev 2 nvmlDev 2 busId 9d000 commId 0x619ce177681b4aa5 - Init COMPLETE 0: jzxh014:2356327:2358124 [0] NCCL INFO ncclCommInitRank comm 0x1472b8130d20 rank 0 nranks 16 cudaDev 0 nvmlDev 0 busId 1b000 commId 0x619ce177681b4aa5 - Init COMPLETE 0: jzxh014:2356330:2358127 [3] NCCL INFO ncclCommInitRank comm 0x14c18412ee00 rank 3 nranks 16 cudaDev 3 nvmlDev 3 busId ad000 commId 0x619ce177681b4aa5 - Init COMPLETE 0: jzxh014:2356328:2358126 [1] NCCL INFO ncclCommInitRank comm 0x15475412d200 rank 1 nranks 16 cudaDev 1 nvmlDev 1 busId 2c000 commId 0x619ce177681b4aa5 - Init COMPLETE 0: {'loss': 0.744, 'grad_norm': 2.3311184266583216, 'learning_rate': 9.05e-07, 'memory/max_mem_active(gib)': 57.09, 'memory/max_mem_allocated(gib)': 57.09, 'memory/device_mem_reserved(gib)': 66.93, 'epoch': 0.0} 0: 0%| | 0/2984 [00:00