Instructions to use JinnP/Qwen3.5-397B-A17B-LoRA-SFT-v3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use JinnP/Qwen3.5-397B-A17B-LoRA-SFT-v3 with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-397B-A17B")
model = PeftModel.from_pretrained(base_model, "JinnP/Qwen3.5-397B-A17B-LoRA-SFT-v3")

Transformers

How to use JinnP/Qwen3.5-397B-A17B-LoRA-SFT-v3 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="JinnP/Qwen3.5-397B-A17B-LoRA-SFT-v3")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("JinnP/Qwen3.5-397B-A17B-LoRA-SFT-v3", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use JinnP/Qwen3.5-397B-A17B-LoRA-SFT-v3 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "JinnP/Qwen3.5-397B-A17B-LoRA-SFT-v3"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "JinnP/Qwen3.5-397B-A17B-LoRA-SFT-v3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/JinnP/Qwen3.5-397B-A17B-LoRA-SFT-v3

SGLang

How to use JinnP/Qwen3.5-397B-A17B-LoRA-SFT-v3 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "JinnP/Qwen3.5-397B-A17B-LoRA-SFT-v3" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "JinnP/Qwen3.5-397B-A17B-LoRA-SFT-v3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "JinnP/Qwen3.5-397B-A17B-LoRA-SFT-v3" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "JinnP/Qwen3.5-397B-A17B-LoRA-SFT-v3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use JinnP/Qwen3.5-397B-A17B-LoRA-SFT-v3 with Docker Model Runner:
```
docker model run hf.co/JinnP/Qwen3.5-397B-A17B-LoRA-SFT-v3
```

Qwen3.5-397B-A17B LoRA SFT v3

LoRA adapter for Qwen/Qwen3.5-397B-A17B fine-tuned on AMD GPU kernel engineering trajectories using LLaMA-Factory.

What This Adapter Does

Specializes Qwen3.5-397B-A17B for AMD GPU kernel optimization tasks -- writing Triton kernels, debugging ROCm issues, and optimizing performance on AMD Instinct GPUs. Trained on 104 multi-turn agent trajectories from the amdpilot dataset.

Version History

Version	Train Loss	Eval Loss	Key Change	HuggingFace
v1	0.163	n/a	Baseline pipeline	v1
v2	0.085	n/a	3-view data extraction (-48% loss)	v2
v3	0.059	0.044	Recipe fix: 10x steps, 2x rank, eval (-31% loss)	this repo

Training Details

Parameter	Value
Base model	Qwen/Qwen3.5-397B-A17B (MoE, 17B active)
Hardware	8x AMD Instinct MI355X (ROCm 7.2)
LoRA rank / alpha	32 / 64
Target modules	all (13 types)
Trainable params	128.5M / 396.9B (0.032%)
Dataset	296 examples (3-view from 104 trajectories)
Cutoff length	32,768 tokens
Epochs / Steps	10 / 130
Batch size	8 (1 per device x 8 GPUs)
Learning rate	2e-5 (cosine schedule)
Weight decay	0.01
Training time	5h 10min
Framework	LLaMA-Factory + DeepSpeed ZeRO-3 + PEFT 0.18.1

Eval Loss Trajectory

Step	Epoch	Eval Loss
20	1.5	0.0618
40	3.1	0.0539
60	4.6	0.0491
80	6.2	0.0461
100	7.7	0.0446
120	9.2	0.0443
130	10.0	0.0442

Eval loss decreases monotonically with no overfitting. wandb run.

Usage

Load with PEFT

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("JinnP/Qwen3.5-397B-A17B-LoRA-SFT-v3")
model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3.5-397B-A17B", device_map="auto", torch_dtype="bfloat16"
)
model = PeftModel.from_pretrained(model, "JinnP/Qwen3.5-397B-A17B-LoRA-SFT-v3")

Serve with vLLM (LoRA hot-loading)

python -m vllm.entrypoints.openai.api_server \
  --model Qwen/Qwen3.5-397B-A17B \
  --enable-lora \
  --lora-modules amdpilot=JinnP/Qwen3.5-397B-A17B-LoRA-SFT-v3 \
  --tensor-parallel-size 8

Merge with LLaMA-Factory

llamafactory-cli export \
  --model_name_or_path Qwen/Qwen3.5-397B-A17B \
  --adapter_name_or_path JinnP/Qwen3.5-397B-A17B-LoRA-SFT-v3 \
  --template qwen3_5_nothink \
  --finetuning_type lora \
  --export_dir saves/qwen35-397b-merged

Dataset

JinnP/amdpilot-lora-sft-dataset -- 104 multi-turn agent trajectories:

94 KernelBench Triton kernel optimization tasks
4 SGLang/vLLM bugfix and feature tasks
4 frontier bugfix trajectories
Processed into 296 training examples using 3-view extraction (bookend + full + solution chunks)

Framework Versions

PEFT 0.18.1
Transformers 5.2.0
PyTorch 2.9.1+rocm7.2.0
Datasets 4.0.0
Tokenizers 0.22.2

Downloads last month: 1

Model tree for JinnP/Qwen3.5-397B-A17B-LoRA-SFT-v3

Base model

Qwen/Qwen3.5-397B-A17B

Adapter

(15)

this model

JinnP
/

Qwen3.5-397B-A17B-LoRA-SFT-v3