Instructions to use ConicCat/Nemo-super-wip-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ConicCat/Nemo-super-wip-lora with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("nvidia/Llama-3_3-Nemotron-Super-49B-v1_5")
model = PeftModel.from_pretrained(base_model, "ConicCat/Nemo-super-wip-lora")

Transformers

How to use ConicCat/Nemo-super-wip-lora with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ConicCat/Nemo-super-wip-lora", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("ConicCat/Nemo-super-wip-lora", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use ConicCat/Nemo-super-wip-lora with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ConicCat/Nemo-super-wip-lora"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ConicCat/Nemo-super-wip-lora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/ConicCat/Nemo-super-wip-lora

SGLang

How to use ConicCat/Nemo-super-wip-lora with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ConicCat/Nemo-super-wip-lora" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ConicCat/Nemo-super-wip-lora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ConicCat/Nemo-super-wip-lora" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ConicCat/Nemo-super-wip-lora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use ConicCat/Nemo-super-wip-lora with Docker Model Runner:
```
docker model run hf.co/ConicCat/Nemo-super-wip-lora
```

Nemo-super-wip-lora / README.md

ConicCat

Upload folder using huggingface_hub

0776dca verified 2 months ago

preview code

raw

history blame contribute delete

6.54 kB

	---
	library_name: peft
	license: other
	base_model: nvidia/Llama-3_3-Nemotron-Super-49B-v1_5
	tags:
	- axolotl
	- base_model:adapter:nvidia/Llama-3_3-Nemotron-Super-49B-v1_5
	- lora
	- transformers
	datasets:
	- ConicCat/GLiMA_Thinking
	- ConicCat/Gutenberg-SFT
	- ConicCat/Condor-SFT-Filtered
	- ConicCat/Ao3_Soft_Refusal
	- ConicCat/VSF
	pipeline_tag: text-generation
	model-index:
	- name: Writer-Stage-1
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
	<details><summary>See axolotl config</summary>

	axolotl version: `0.16.0.dev0`
	```yaml
	base_model: nvidia/Llama-3_3-Nemotron-Super-49B-v1_5


	load_in_8bit: true
	load_in_4bit: false

	sequence_len: 5120
	max_sample_length: 5120

	sample_packing: true
	gradient_checkpointing: true

	bf16: true
	tf32: true

	flash_attention: true
	lora_mlp_kernel: false
	lora_qkv_kernel: false
	lora_o_kernel: false


	datasets:
	- path: ConicCat/GLiMA_Thinking
	type: chat_template
	roles_to_train: []
	train_on_eos: turn
	message_field_training: train

	- path: ConicCat/Gutenberg-SFT
	type: chat_template

	- path: ConicCat/Condor-SFT-Filtered
	split: train[:250]
	type: chat_template

	- path: ConicCat/Ao3_Soft_Refusal
	type: chat_template

	- path: ConicCat/VSF
	type: chat_template

	chat_template_jinja: "{% set bos = \"<\|begin_of_text\|>\" %}{%- set enable_thinking = false -%}{% set system_start_header = \"<\|start_header_id\|>\" %}{% set system_end_header = \"<\|end_header_id\|>\n\n\" %}{% set start_header = \"<\|start_header_id\|>\" %}{% set end_header = \"<\|end_header_id\|>\n\n\" %}{% set eot = \"<\|eot_id\|>\" %}{% set system_token = \"system\" %}{% set user_token = \"user\" %}{% set assistant_token = \"assistant\" %}{% set tool_token = \"tool\" %}{{- bos ~ system_start_header ~ system_token ~ system_end_header -}}{%- if messages[0].role == 'system' and messages[0].content != '' -%}{%- set system_content = messages[0].content -%}{%- if '/no_think' in system_content -%}{%- set system_content = system_content.replace('/no_think', '')\|trim -%}{%- set enable_thinking = false -%}{%- elif '/think' in system_content -%}{%- set system_content = system_content.replace('/think', '')\|trim -%}{%- set enable_thinking = true -%}{%- endif -%}{{- system_content + '\n\n' -}}{%- endif -%}{%- if tools -%}{{- 'You can use the following tools to assist the user if required:\n<AVAILABLE_TOOLS>[' -}}{%- for tool in tools -%}{{- (tool.function if tool.function is defined else tool) \| tojson -}}{{- ', ' if not loop.last else '' -}}{%- endfor -%}{{- ']</AVAILABLE_TOOLS>\n\nIf you decide to call any tool(s), use the following format:\n<TOOLCALL>[{{\"name\": \"tool_name1\", \"arguments\": \"tool_args1\"}}, {{\"name\": \"tool_name2\", \"arguments\": \"tool_args2\"}}]</TOOLCALL>\n\nResponse from tool(s) will be returned in this format:\n<TOOL_RESPONSE>[{{\"response\": \"tool_response1\"}}, {{\"response\": \"tool_response2\"}}]</TOOL_RESPONSE>\n\nBased on the results returned by the tool(s), you can call additional tools if needed, correct tool calls if any errors are found, or just respond with the answer to the user.' -}}{%- endif -%}{{- eot -}}{%- for message in messages -%}{%- if message.role == user_token -%}{{- start_header ~ user_token ~ end_header -}}{{ message.content -}}{{ eot -}}{%- elif message.role == assistant_token -%}{%- if '</think>' in message.content -%}{%- set content = message.content.split('</think>')[-1].lstrip() -%}{%- else -%}{%- set content = message.content -%}{%- endif -%}{{- start_header ~ assistant_token ~ end_header -}}{{ content -}}{%- if message.tool_calls -%}{{- '<TOOLCALL>[' -}}{%- for call in message.tool_calls -%}{%- set fn = call.function if call.function is defined else call -%}{{- '{\"name\": \"' + fn.name + '\", \"arguments\": ' -}}{%- if fn.arguments is string -%}{{- fn.arguments -}}{%- else -%}{{- fn.arguments \| tojson -}}{%- endif -%}{{- '}' + (', ' if not loop.last else '') -}}{%- endfor -%}{{- ']</TOOLCALL>' -}}{%- endif -%}{{- eot -}}{%- elif message.role == tool_token -%}{%- if loop.first or (messages[loop.index0 - 1].role != tool_token) -%}{{- start_header ~ tool_token ~ end_header -}}{{ '<TOOL_RESPONSE>[' -}}{%- endif -%}{{- message.content -}}{{- ', ' if not loop.last and (messages[loop.index0 + 1].role == tool_token) else '' -}}{%- if loop.last or (messages[loop.index0 + 1].role != tool_token) -%}{{- ']</TOOL_RESPONSE>' -}}{{ eot -}}{%- endif -%}{%- endif -%}{%- endfor -%}{%- if add_generation_prompt -%}{{- start_header ~ assistant_token ~ end_header -}}{%- if not enable_thinking -%}{{- '<think>\n\n</think>\n\n' -}}{%- endif -%}{%- endif -%}"
	trust_remote_code: true

	adapter: lora
	lora_r: 32
	lora_alpha: 64
	lora_dropout: 0.0
	lora_bias: None
	lora_target_linear: true
	use_tensorboard: true

	optimizer: paged_adamw_8bit
	learning_rate: 1.25e-5 # 1e-4 / 4
	loraplus_lr_ratio: 16

	# Training arguments
	output_dir: ./Writer-Stage-1
	num_epochs: 3
	micro_batch_size: 1
	gradient_accumulation_steps: 16
	save_strategy: 'no'
	warmup_ratio: 0.05
	lr_scheduler: 'constant_with_warmup'
	max_grad_norm: 1
	logging_steps: 1
	seed: 42
	```

	</details><br>

	# Writer-Stage-1

	This model is a fine-tuned version of [nvidia/Llama-3_3-Nemotron-Super-49B-v1_5](https://huggingface.co/nvidia/Llama-3_3-Nemotron-Super-49B-v1_5) on the ConicCat/GLiMA_Thinking, the ConicCat/Gutenberg-SFT, the ConicCat/Condor-SFT-Filtered, the ConicCat/Ao3_Soft_Refusal and the ConicCat/VSF datasets.

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1.25e-05
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 42
	- gradient_accumulation_steps: 16
	- total_train_batch_size: 16
	- optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: constant_with_warmup
	- lr_scheduler_warmup_steps: 2
	- training_steps: 54

	### Training results



	### Framework versions

	- PEFT 0.18.1
	- Transformers 5.3.0
	- Pytorch 2.9.1+cu128
	- Datasets 4.5.0
	- Tokenizers 0.22.2