Text Generation
PEFT
TensorBoard
Safetensors
Transformers
nemotron-nas
axolotl
lora
conversational
custom_code
8-bit precision
bitsandbytes
Instructions to use ConicCat/Nemo-super-wip-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use ConicCat/Nemo-super-wip-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("nvidia/Llama-3_3-Nemotron-Super-49B-v1_5") model = PeftModel.from_pretrained(base_model, "ConicCat/Nemo-super-wip-lora") - Transformers
How to use ConicCat/Nemo-super-wip-lora with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ConicCat/Nemo-super-wip-lora", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("ConicCat/Nemo-super-wip-lora", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use ConicCat/Nemo-super-wip-lora with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ConicCat/Nemo-super-wip-lora" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ConicCat/Nemo-super-wip-lora", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/ConicCat/Nemo-super-wip-lora
- SGLang
How to use ConicCat/Nemo-super-wip-lora with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ConicCat/Nemo-super-wip-lora" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ConicCat/Nemo-super-wip-lora", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ConicCat/Nemo-super-wip-lora" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ConicCat/Nemo-super-wip-lora", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use ConicCat/Nemo-super-wip-lora with Docker Model Runner:
docker model run hf.co/ConicCat/Nemo-super-wip-lora
metadata
library_name: peft
license: other
base_model: nvidia/Llama-3_3-Nemotron-Super-49B-v1_5
tags:
- axolotl
- base_model:adapter:nvidia/Llama-3_3-Nemotron-Super-49B-v1_5
- lora
- transformers
datasets:
- ConicCat/GLiMA_Thinking
- ConicCat/Gutenberg-SFT
- ConicCat/Condor-SFT-Filtered
- ConicCat/Ao3_Soft_Refusal
- ConicCat/VSF
pipeline_tag: text-generation
model-index:
- name: Writer-Stage-1
results: []
See axolotl config
axolotl version: 0.16.0.dev0
base_model: nvidia/Llama-3_3-Nemotron-Super-49B-v1_5
load_in_8bit: true
load_in_4bit: false
sequence_len: 5120
max_sample_length: 5120
sample_packing: true
gradient_checkpointing: true
bf16: true
tf32: true
flash_attention: true
lora_mlp_kernel: false
lora_qkv_kernel: false
lora_o_kernel: false
datasets:
- path: ConicCat/GLiMA_Thinking
type: chat_template
roles_to_train: []
train_on_eos: turn
message_field_training: train
- path: ConicCat/Gutenberg-SFT
type: chat_template
- path: ConicCat/Condor-SFT-Filtered
split: train[:250]
type: chat_template
- path: ConicCat/Ao3_Soft_Refusal
type: chat_template
- path: ConicCat/VSF
type: chat_template
chat_template_jinja: "{% set bos = \"<|begin_of_text|>\" %}{%- set enable_thinking = false -%}{% set system_start_header = \"<|start_header_id|>\" %}{% set system_end_header = \"<|end_header_id|>\n\n\" %}{% set start_header = \"<|start_header_id|>\" %}{% set end_header = \"<|end_header_id|>\n\n\" %}{% set eot = \"<|eot_id|>\" %}{% set system_token = \"system\" %}{% set user_token = \"user\" %}{% set assistant_token = \"assistant\" %}{% set tool_token = \"tool\" %}{{- bos ~ system_start_header ~ system_token ~ system_end_header -}}{%- if messages[0].role == 'system' and messages[0].content != '' -%}{%- set system_content = messages[0].content -%}{%- if '/no_think' in system_content -%}{%- set system_content = system_content.replace('/no_think', '')|trim -%}{%- set enable_thinking = false -%}{%- elif '/think' in system_content -%}{%- set system_content = system_content.replace('/think', '')|trim -%}{%- set enable_thinking = true -%}{%- endif -%}{{- system_content + '\n\n' -}}{%- endif -%}{%- if tools -%}{{- 'You can use the following tools to assist the user if required:\n<AVAILABLE_TOOLS>[' -}}{%- for tool in tools -%}{{- (tool.function if tool.function is defined else tool) | tojson -}}{{- ', ' if not loop.last else '' -}}{%- endfor -%}{{- ']</AVAILABLE_TOOLS>\n\nIf you decide to call any tool(s), use the following format:\n<TOOLCALL>[{{\"name\": \"tool_name1\", \"arguments\": \"tool_args1\"}}, {{\"name\": \"tool_name2\", \"arguments\": \"tool_args2\"}}]</TOOLCALL>\n\nResponse from tool(s) will be returned in this format:\n<TOOL_RESPONSE>[{{\"response\": \"tool_response1\"}}, {{\"response\": \"tool_response2\"}}]</TOOL_RESPONSE>\n\nBased on the results returned by the tool(s), you can call additional tools if needed, correct tool calls if any errors are found, or just respond with the answer to the user.' -}}{%- endif -%}{{- eot -}}{%- for message in messages -%}{%- if message.role == user_token -%}{{- start_header ~ user_token ~ end_header -}}{{ message.content -}}{{ eot -}}{%- elif message.role == assistant_token -%}{%- if '</think>' in message.content -%}{%- set content = message.content.split('</think>')[-1].lstrip() -%}{%- else -%}{%- set content = message.content -%}{%- endif -%}{{- start_header ~ assistant_token ~ end_header -}}{{ content -}}{%- if message.tool_calls -%}{{- '<TOOLCALL>[' -}}{%- for call in message.tool_calls -%}{%- set fn = call.function if call.function is defined else call -%}{{- '{\"name\": \"' + fn.name + '\", \"arguments\": ' -}}{%- if fn.arguments is string -%}{{- fn.arguments -}}{%- else -%}{{- fn.arguments | tojson -}}{%- endif -%}{{- '}' + (', ' if not loop.last else '') -}}{%- endfor -%}{{- ']</TOOLCALL>' -}}{%- endif -%}{{- eot -}}{%- elif message.role == tool_token -%}{%- if loop.first or (messages[loop.index0 - 1].role != tool_token) -%}{{- start_header ~ tool_token ~ end_header -}}{{ '<TOOL_RESPONSE>[' -}}{%- endif -%}{{- message.content -}}{{- ', ' if not loop.last and (messages[loop.index0 + 1].role == tool_token) else '' -}}{%- if loop.last or (messages[loop.index0 + 1].role != tool_token) -%}{{- ']</TOOL_RESPONSE>' -}}{{ eot -}}{%- endif -%}{%- endif -%}{%- endfor -%}{%- if add_generation_prompt -%}{{- start_header ~ assistant_token ~ end_header -}}{%- if not enable_thinking -%}{{- '<think>\n\n</think>\n\n' -}}{%- endif -%}{%- endif -%}"
trust_remote_code: true
adapter: lora
lora_r: 32
lora_alpha: 64
lora_dropout: 0.0
lora_bias: None
lora_target_linear: true
use_tensorboard: true
optimizer: paged_adamw_8bit
learning_rate: 1.25e-5 # 1e-4 / 4
loraplus_lr_ratio: 16
# Training arguments
output_dir: ./Writer-Stage-1
num_epochs: 3
micro_batch_size: 1
gradient_accumulation_steps: 16
save_strategy: 'no'
warmup_ratio: 0.05
lr_scheduler: 'constant_with_warmup'
max_grad_norm: 1
logging_steps: 1
seed: 42
Writer-Stage-1
This model is a fine-tuned version of nvidia/Llama-3_3-Nemotron-Super-49B-v1_5 on the ConicCat/GLiMA_Thinking, the ConicCat/Gutenberg-SFT, the ConicCat/Condor-SFT-Filtered, the ConicCat/Ao3_Soft_Refusal and the ConicCat/VSF datasets.
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1.25e-05
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 16
- total_train_batch_size: 16
- optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_steps: 2
- training_steps: 54
Training results
Framework versions
- PEFT 0.18.1
- Transformers 5.3.0
- Pytorch 2.9.1+cu128
- Datasets 4.5.0
- Tokenizers 0.22.2