--- library_name: peft license: other base_model: nvidia/Llama-3_3-Nemotron-Super-49B-v1_5 tags: - axolotl - base_model:adapter:nvidia/Llama-3_3-Nemotron-Super-49B-v1_5 - lora - transformers datasets: - ConicCat/GLiMA_Thinking - ConicCat/Gutenberg-SFT - ConicCat/Condor-SFT-Filtered - ConicCat/Ao3_Soft_Refusal - ConicCat/VSF pipeline_tag: text-generation model-index: - name: Writer-Stage-1 results: [] --- [Built with Axolotl](https://github.com/axolotl-ai-cloud/axolotl)
See axolotl config axolotl version: `0.16.0.dev0` ```yaml base_model: nvidia/Llama-3_3-Nemotron-Super-49B-v1_5 load_in_8bit: true load_in_4bit: false sequence_len: 5120 max_sample_length: 5120 sample_packing: true gradient_checkpointing: true bf16: true tf32: true flash_attention: true lora_mlp_kernel: false lora_qkv_kernel: false lora_o_kernel: false datasets: - path: ConicCat/GLiMA_Thinking type: chat_template roles_to_train: [] train_on_eos: turn message_field_training: train - path: ConicCat/Gutenberg-SFT type: chat_template - path: ConicCat/Condor-SFT-Filtered split: train[:250] type: chat_template - path: ConicCat/Ao3_Soft_Refusal type: chat_template - path: ConicCat/VSF type: chat_template chat_template_jinja: "{% set bos = \"<|begin_of_text|>\" %}{%- set enable_thinking = false -%}{% set system_start_header = \"<|start_header_id|>\" %}{% set system_end_header = \"<|end_header_id|>\n\n\" %}{% set start_header = \"<|start_header_id|>\" %}{% set end_header = \"<|end_header_id|>\n\n\" %}{% set eot = \"<|eot_id|>\" %}{% set system_token = \"system\" %}{% set user_token = \"user\" %}{% set assistant_token = \"assistant\" %}{% set tool_token = \"tool\" %}{{- bos ~ system_start_header ~ system_token ~ system_end_header -}}{%- if messages[0].role == 'system' and messages[0].content != '' -%}{%- set system_content = messages[0].content -%}{%- if '/no_think' in system_content -%}{%- set system_content = system_content.replace('/no_think', '')|trim -%}{%- set enable_thinking = false -%}{%- elif '/think' in system_content -%}{%- set system_content = system_content.replace('/think', '')|trim -%}{%- set enable_thinking = true -%}{%- endif -%}{{- system_content + '\n\n' -}}{%- endif -%}{%- if tools -%}{{- 'You can use the following tools to assist the user if required:\n[' -}}{%- for tool in tools -%}{{- (tool.function if tool.function is defined else tool) | tojson -}}{{- ', ' if not loop.last else '' -}}{%- endfor -%}{{- ']\n\nIf you decide to call any tool(s), use the following format:\n[{{\"name\": \"tool_name1\", \"arguments\": \"tool_args1\"}}, {{\"name\": \"tool_name2\", \"arguments\": \"tool_args2\"}}]\n\nResponse from tool(s) will be returned in this format:\n[{{\"response\": \"tool_response1\"}}, {{\"response\": \"tool_response2\"}}]\n\nBased on the results returned by the tool(s), you can call additional tools if needed, correct tool calls if any errors are found, or just respond with the answer to the user.' -}}{%- endif -%}{{- eot -}}{%- for message in messages -%}{%- if message.role == user_token -%}{{- start_header ~ user_token ~ end_header -}}{{ message.content -}}{{ eot -}}{%- elif message.role == assistant_token -%}{%- if '' in message.content -%}{%- set content = message.content.split('')[-1].lstrip() -%}{%- else -%}{%- set content = message.content -%}{%- endif -%}{{- start_header ~ assistant_token ~ end_header -}}{{ content -}}{%- if message.tool_calls -%}{{- '[' -}}{%- for call in message.tool_calls -%}{%- set fn = call.function if call.function is defined else call -%}{{- '{\"name\": \"' + fn.name + '\", \"arguments\": ' -}}{%- if fn.arguments is string -%}{{- fn.arguments -}}{%- else -%}{{- fn.arguments | tojson -}}{%- endif -%}{{- '}' + (', ' if not loop.last else '') -}}{%- endfor -%}{{- ']' -}}{%- endif -%}{{- eot -}}{%- elif message.role == tool_token -%}{%- if loop.first or (messages[loop.index0 - 1].role != tool_token) -%}{{- start_header ~ tool_token ~ end_header -}}{{ '[' -}}{%- endif -%}{{- message.content -}}{{- ', ' if not loop.last and (messages[loop.index0 + 1].role == tool_token) else '' -}}{%- if loop.last or (messages[loop.index0 + 1].role != tool_token) -%}{{- ']' -}}{{ eot -}}{%- endif -%}{%- endif -%}{%- endfor -%}{%- if add_generation_prompt -%}{{- start_header ~ assistant_token ~ end_header -}}{%- if not enable_thinking -%}{{- '\n\n\n\n' -}}{%- endif -%}{%- endif -%}" trust_remote_code: true adapter: lora lora_r: 32 lora_alpha: 64 lora_dropout: 0.0 lora_bias: None lora_target_linear: true use_tensorboard: true optimizer: paged_adamw_8bit learning_rate: 1.25e-5 # 1e-4 / 4 loraplus_lr_ratio: 16 # Training arguments output_dir: ./Writer-Stage-1 num_epochs: 3 micro_batch_size: 1 gradient_accumulation_steps: 16 save_strategy: 'no' warmup_ratio: 0.05 lr_scheduler: 'constant_with_warmup' max_grad_norm: 1 logging_steps: 1 seed: 42 ```

# Writer-Stage-1 This model is a fine-tuned version of [nvidia/Llama-3_3-Nemotron-Super-49B-v1_5](https://huggingface.co/nvidia/Llama-3_3-Nemotron-Super-49B-v1_5) on the ConicCat/GLiMA_Thinking, the ConicCat/Gutenberg-SFT, the ConicCat/Condor-SFT-Filtered, the ConicCat/Ao3_Soft_Refusal and the ConicCat/VSF datasets. ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1.25e-05 - train_batch_size: 1 - eval_batch_size: 1 - seed: 42 - gradient_accumulation_steps: 16 - total_train_batch_size: 16 - optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: constant_with_warmup - lr_scheduler_warmup_steps: 2 - training_steps: 54 ### Training results ### Framework versions - PEFT 0.18.1 - Transformers 5.3.0 - Pytorch 2.9.1+cu128 - Datasets 4.5.0 - Tokenizers 0.22.2