Instructions to use JunSotohigashi/curious-hill-838 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use JunSotohigashi/curious-hill-838 with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("JunSotohigashi/curious-hill-838", dtype="auto") - Notebooks
- Google Colab
- Kaggle
| 2025-12-11 16:44:01,560 __main__ <module> [INFO] OUT_DIR set to outputs/2025-12-11/16-44-01 | |
| 2025-12-11 16:44:01,561 __main__ <module> [INFO] Args: app/src/S3_6_sft.py --sft.per_device_train_batch_size 32 --sft.per_device_eval_batch_size 32 --sft.gradient_accumulation_steps 32 --sft.push_to_hub --mode pre_str --model.model_name_or_path tokyotech-llm/Llama-3.1-Swallow-8B-v0.5 | |
| 2025-12-11 16:44:02,259 accelerate.utils.modeling get_balanced_memory [INFO] We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk). | |
| 2025-12-11 16:44:14,571 __main__ load_model [INFO] Model loaded from tokyotech-llm/Llama-3.1-Swallow-8B-v0.5 | |
| LlamaForCausalLM( | |
| (model): LlamaModel( | |
| (embed_tokens): Embedding(128256, 4096) | |
| (layers): ModuleList( | |
| (0-31): 32 x LlamaDecoderLayer( | |
| (self_attn): LlamaAttention( | |
| (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False) | |
| (k_proj): Linear4bit(in_features=4096, out_features=1024, bias=False) | |
| (v_proj): Linear4bit(in_features=4096, out_features=1024, bias=False) | |
| (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False) | |
| ) | |
| (mlp): LlamaMLP( | |
| (gate_proj): Linear4bit(in_features=4096, out_features=14336, bias=False) | |
| (up_proj): Linear4bit(in_features=4096, out_features=14336, bias=False) | |
| (down_proj): Linear4bit(in_features=14336, out_features=4096, bias=False) | |
| (act_fn): SiLUActivation() | |
| ) | |
| (input_layernorm): LlamaRMSNorm((4096,), eps=1e-05) | |
| (post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-05) | |
| ) | |
| ) | |
| (norm): LlamaRMSNorm((4096,), eps=1e-05) | |
| (rotary_emb): LlamaRotaryEmbedding() | |
| ) | |
| (lm_head): Linear(in_features=4096, out_features=128256, bias=False) | |
| ) | |
| 2025-12-11 16:44:18,359 __main__ load_jwtd [INFO] Dataset shuffled, seed=42 | |
| 2025-12-11 16:44:18,359 __main__ load_jwtd [INFO] Dataset loaded, N=57062 | |
| 2025-12-11 16:44:19,509 __main__ load_jwtd [INFO] Dataset loaded, N=12228 | |
| 2025-12-11 16:44:19,509 __main__ main [INFO] Filtered dataset: train 57062 rows, eval 12228 rows | |
| 2025-12-11 16:44:19,516 __main__ add_train_str_with_ratio [INFO] pre_str:posts_str = 57062:0 = 1.000:0.000 | |
| 2025-12-11 16:44:19,639 __main__ add_train_str_with_ratio [INFO] pre_str:posts_str = 12228:0 = 1.000:0.000 | |
| 2025-12-11 16:44:34,492 __main__ main [INFO] wandb initialized | |
| 2025-12-11 16:44:37,207 __main__ main [INFO] Starting SFT training with SFTTrainer | |
| 2025-12-11 16:49:29,186 root evaluate_probability_ratio [INFO] Results epoch 0: outputs/2025-12-11/16-44-01/probability_ratio_epoch_0.json | |
| 2025-12-11 17:06:18,971 root evaluate_probability_ratio [INFO] Results epoch 1: outputs/2025-12-11/16-44-01/probability_ratio_epoch_1.json | |
| 2025-12-11 17:23:07,666 root evaluate_probability_ratio [INFO] Results epoch 2: outputs/2025-12-11/16-44-01/probability_ratio_epoch_2.json | |
| 2025-12-11 17:39:53,894 root evaluate_probability_ratio [INFO] Results epoch 3: outputs/2025-12-11/16-44-01/probability_ratio_epoch_3.json | |
| 2025-12-11 17:56:53,639 root evaluate_probability_ratio [INFO] Results epoch 4: outputs/2025-12-11/16-44-01/probability_ratio_epoch_4.json | |
| 2025-12-11 18:13:44,741 root evaluate_probability_ratio [INFO] Results epoch 5: outputs/2025-12-11/16-44-01/probability_ratio_epoch_5.json | |