Instructions to use Xingyu-Zheng/Qwopus3.6-27B-v1-preview-INT4-FOEM with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Local Apps Settings

How to use Xingyu-Zheng/Qwopus3.6-27B-v1-preview-INT4-FOEM with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Xingyu-Zheng/Qwopus3.6-27B-v1-preview-INT4-FOEM"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Xingyu-Zheng/Qwopus3.6-27B-v1-preview-INT4-FOEM",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Xingyu-Zheng/Qwopus3.6-27B-v1-preview-INT4-FOEM

SGLang

How to use Xingyu-Zheng/Qwopus3.6-27B-v1-preview-INT4-FOEM with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Xingyu-Zheng/Qwopus3.6-27B-v1-preview-INT4-FOEM" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Xingyu-Zheng/Qwopus3.6-27B-v1-preview-INT4-FOEM",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Xingyu-Zheng/Qwopus3.6-27B-v1-preview-INT4-FOEM" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Xingyu-Zheng/Qwopus3.6-27B-v1-preview-INT4-FOEM",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Unsloth Studio

How to use Xingyu-Zheng/Qwopus3.6-27B-v1-preview-INT4-FOEM with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Xingyu-Zheng/Qwopus3.6-27B-v1-preview-INT4-FOEM to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Xingyu-Zheng/Qwopus3.6-27B-v1-preview-INT4-FOEM to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Xingyu-Zheng/Qwopus3.6-27B-v1-preview-INT4-FOEM to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="Xingyu-Zheng/Qwopus3.6-27B-v1-preview-INT4-FOEM",
    max_seq_length=2048,
)

Docker Model Runner
How to use Xingyu-Zheng/Qwopus3.6-27B-v1-preview-INT4-FOEM with Docker Model Runner:
```
docker model run hf.co/Xingyu-Zheng/Qwopus3.6-27B-v1-preview-INT4-FOEM
```

Improved quality by changing the chat_template.jinja

by Neiko2002 - opened May 15

Discussion

Neiko2002

May 15

I have changed the chat_template.jinja to the official one and the scores on the https://benchlocal.com/ bench packs went from an average of 60.4 points to 82 points (Qwen/Qwen3.6-27B-FP8 has 82.5). Afterwards it has the highest Hermesagent scores of all tested 27B models. Might be worth changing it here in the repo.

Xingyu-Zheng

Owner May 15

That’s surprising! Let me confirm: do you mean that we should use https://huggingface.co/Qwen/Qwen3.6-27B/blob/main/chat_template.jinja instead?

Neiko2002

May 15

Yes. You can install benchlocal on your machine and run your model as it is, than replace the jinja file and try again. You will see a huge difference. The currenty version get 60.4 points across all benchmarks, which is less than Qwen 3.5 9B (76.9 points).

Xingyu-Zheng

Owner May 15

I sincerely appreciate your testing and feedback! I may not have enough time to reproduce the issue myself in the near future, so I directly updated the chat_template.jinja according to your suggestion and added a corresponding note to the Model Card.

Neiko2002

May 16

•

edited May 16

I have to thank you for the quant. Its on par with the one from Lorbus, but thanks to the Qwopus finetune it uses less tokens and is therefore faster:

And here you can also see the high Hermes score, which is amazing.

Neiko2002 changed discussion status to closed May 16

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment