Instructions to use ytgui/Qwen3.5-Sonnet-9B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ytgui/Qwen3.5-Sonnet-9B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="ytgui/Qwen3.5-Sonnet-9B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("ytgui/Qwen3.5-Sonnet-9B")
model = AutoModelForMultimodalLM.from_pretrained("ytgui/Qwen3.5-Sonnet-9B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use ytgui/Qwen3.5-Sonnet-9B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ytgui/Qwen3.5-Sonnet-9B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ytgui/Qwen3.5-Sonnet-9B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/ytgui/Qwen3.5-Sonnet-9B

SGLang

How to use ytgui/Qwen3.5-Sonnet-9B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ytgui/Qwen3.5-Sonnet-9B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ytgui/Qwen3.5-Sonnet-9B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ytgui/Qwen3.5-Sonnet-9B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ytgui/Qwen3.5-Sonnet-9B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use ytgui/Qwen3.5-Sonnet-9B with Docker Model Runner:
```
docker model run hf.co/ytgui/Qwen3.5-Sonnet-9B
```

Original ninja.template provides better results

by Neiko2002 - opened 27 days ago

Discussion

Neiko2002

27 days ago

•

edited 27 days ago

No idea why everyone is using a custom ninja file. I benchmarked your model using https://benchlocal.com/ and it is currently the best fine tuned Qwen3.5 9B a come across, but only when using the original ninja file.

With the ninja file in this repo its performance is worse.

All benchmark where performance on 2x 3090 with 250w power limit. Stock vllm (v0.21.0) with thinking disable and not MTP. Your model is fast thanks to the quant, but also because is used less tokens.

ytgui

Owner 26 days ago

Hi,

Thanks so much for taking the time to bench the model and sharing your findings, It's great to hear it's performing well.

To clarify the ninja template situation: the only change I made was adding a default system prompt, "You are a helpful AI assistant.", to the template. No other modifications. I felt the default system prompt was worth keeping for non-technical users who may not think to set one themselves. As for why this causes a performance difference, models at this scale can sometimes be overfitted to context.

As for the score difference, a range of 74.0–75.2 honestly looks good to me either way 😀

That said, this is a genuinely useful discussion and I love to keep it open, I will look into whether there's a clean workaround.

Thanks.

ytgui

Owner 26 days ago

btw bro, to the best of my knowledge the community still lacks a solid agentic coding benchmark, would that be something you'd be interested in designing?

My rough idea: pack a real git repo (e.g., sqlite, redis) into a container, strip the git history, and define realistic coding tasks like what you'd throw at claude code or opencode. would love to hear your thoughts!

Neiko2002

26 days ago

Yeah both numbers 74.0 and 75.2 are great for a finetune, as it is very diffcult to improve in one area with become worse in another. While benchlocal has these 7 nice bench packs, you are totally right its missing an agentic coding benchmark. Designing a coding benchmark is pretty difficult, as there are to many programming languages. Including serveral of them would make the benchmark too big. Nevertheless docker or container in general are not my expertise, I avoid them when possible.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment