Instructions to use JinnP/Qwen3.5-lora-sft-v5-1-64k with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use JinnP/Qwen3.5-lora-sft-v5-1-64k with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-397B-A17B")
model = PeftModel.from_pretrained(base_model, "JinnP/Qwen3.5-lora-sft-v5-1-64k")

Transformers

How to use JinnP/Qwen3.5-lora-sft-v5-1-64k with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="JinnP/Qwen3.5-lora-sft-v5-1-64k")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("JinnP/Qwen3.5-lora-sft-v5-1-64k", device_map="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use JinnP/Qwen3.5-lora-sft-v5-1-64k with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "JinnP/Qwen3.5-lora-sft-v5-1-64k"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "JinnP/Qwen3.5-lora-sft-v5-1-64k",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/JinnP/Qwen3.5-lora-sft-v5-1-64k

SGLang

How to use JinnP/Qwen3.5-lora-sft-v5-1-64k with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "JinnP/Qwen3.5-lora-sft-v5-1-64k" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "JinnP/Qwen3.5-lora-sft-v5-1-64k",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "JinnP/Qwen3.5-lora-sft-v5-1-64k" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "JinnP/Qwen3.5-lora-sft-v5-1-64k",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use JinnP/Qwen3.5-lora-sft-v5-1-64k with Docker Model Runner:
```
docker model run hf.co/JinnP/Qwen3.5-lora-sft-v5-1-64k
```

Qwen3.5-lora-sft-v5-1-64k

This repository contains a LoRA adapter for Qwen/Qwen3.5-397B-A17B, trained with LLaMA-Factory on the amdpilot_v5_1 SFT dataset.

This is an adapter-only release. You need the base model Qwen/Qwen3.5-397B-A17B to use it.

Key training settings

Fine-tuning method: LoRA
LoRA rank / alpha: 32 / 64
Context window: 65536
Packing: true
Neat packing: false
Precision: bf16
Distributed setup: 8x AMD MI355X
Epochs: 10

Final metrics

Final train loss: 0.0630452295144399
Final eval loss: 0.133148193359375
Train runtime: 47396.7738s (13.17h)

Eval trajectory

Step	Epoch	Eval loss
10	1.7273	0.1846
20	3.3636	0.1579
30	5.0	0.1417
40	6.7273	0.1357
50	8.3636	0.1336
60	10.0	0.1331

Dataset coverage note

On the current amdpilot_v5_1 training split, 65536 tokens cover about 82/89 samples (92.13%). This is substantially better coverage than the earlier 32768 setting.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model_id = "Qwen/Qwen3.5-397B-A17B"
adapter_id = "JinnP/Qwen3.5-lora-sft-v5-1-64k"

tokenizer = AutoTokenizer.from_pretrained(adapter_id, trust_remote_code=True)
base_model = AutoModelForCausalLM.from_pretrained(base_model_id, trust_remote_code=True)
model = PeftModel.from_pretrained(base_model, adapter_id)

Files

adapter_model.safetensors: LoRA adapter weights
adapter_config.json: PEFT adapter config
tokenizer.json / tokenizer_config.json / chat_template.jinja: tokenizer assets
all_results.json / eval_results.json / train_results.json: training metrics

Downloads last month: 5

Model tree for JinnP/Qwen3.5-lora-sft-v5-1-64k

Base model

Qwen/Qwen3.5-397B-A17B

Adapter

(18)

this model