Instructions to use nivvis/Qwen3.5-35B-A3B-EQ-v5-FP8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use nivvis/Qwen3.5-35B-A3B-EQ-v5-FP8 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="nivvis/Qwen3.5-35B-A3B-EQ-v5-FP8")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("nivvis/Qwen3.5-35B-A3B-EQ-v5-FP8")
model = AutoModelForImageTextToText.from_pretrained("nivvis/Qwen3.5-35B-A3B-EQ-v5-FP8")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use nivvis/Qwen3.5-35B-A3B-EQ-v5-FP8 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "nivvis/Qwen3.5-35B-A3B-EQ-v5-FP8"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nivvis/Qwen3.5-35B-A3B-EQ-v5-FP8",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/nivvis/Qwen3.5-35B-A3B-EQ-v5-FP8

SGLang

How to use nivvis/Qwen3.5-35B-A3B-EQ-v5-FP8 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "nivvis/Qwen3.5-35B-A3B-EQ-v5-FP8" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nivvis/Qwen3.5-35B-A3B-EQ-v5-FP8",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "nivvis/Qwen3.5-35B-A3B-EQ-v5-FP8" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nivvis/Qwen3.5-35B-A3B-EQ-v5-FP8",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use nivvis/Qwen3.5-35B-A3B-EQ-v5-FP8 with Docker Model Runner:
```
docker model run hf.co/nivvis/Qwen3.5-35B-A3B-EQ-v5-FP8
```

Qwen3.5-35B-A3B-EQ-v5

A DPO fine-tune of Qwen3.5-35B-A3B-heretic-v2.

The tune optimized for two things:

bringing warmth, emotional intelligence, general chat improvement to Qwen 3.5 series
countering some negative tendencies of Heretic models (overwillingness to agree, be sycophantic, etc)

This is still intended as a general use model (agentic, coding, general chat). Tuning was lightly & with precision. More general benchmarks to follow.

What this model does

This model is trained to be a better conversational partner in emotionally complex situations, while maintaining base model capabilities. It:

Validates without sycophancy — empathizes with frustration without rubber-stamping bad behavior
Sets boundaries warmly — names uncomfortable truths without lecturing
Sounds human — conversational tone, not therapist-speak. better tone vs vanilla Qwen 3.5, e.g. ~~"It sounds like"~~

Key specs


Base	Qwen/Qwen3.5-35B-A3B
Parent	llmfan46/Qwen3.5-35B-A3B-heretic-v2 (decensored via MPOA+SOMA)
Fine-tune	DPO with LoRA (r=32, alpha=64)
Training data	DPO preference pairs with diverse, simulated (real-situation-based) generated dialogue
Precision	bf16

Key specs


Base	Qwen/Qwen3.5-35B-A3B
Parent	llmfan46/Qwen3.5-35B-A3B-heretic-v2 (decensored via MPOA+SOMA)
Architecture	MoE — 35B total, 3B active (256 experts, 8+1 routed)
Fine-tune	DPO with LoRA (r=32, alpha=64)
Training data	DPO preference pairs with diverse system prompts
Precision	FP8 (quantized from bf16)
Size	~35GB (vs ~66GB bf16)
Context	262k (native), trained at 4096

EQ-Bench 3 results

See Qwen3.5-35B-A3B-EQ-v5 for full benchmark results. Scores below are from the bf16 model; FP8 quantization is expected to have minimal impact.

Leaderboard ranking (raw rubric score, claude-3.7-sonnet judge)

#	Model	Raw Score
7	gemini-2.5-pro	193.7
8	EQ-v5 (3B active)	193.6
9	grok-4	192.8
10	claude-opus-4	192.6

Rankings sourced from the EQ-Bench 3 canonical leaderboard data (2026-03-19 snapshot). These are raw rubric scores, not the official ELO ranking — higher is higher but not necessarily better (see eqbench.com for normalized ELO). Newer models (gpt-5.4, claude-sonnet-4-6, claude-opus-4-6) are judged with Opus on the live leaderboard and are not yet in the official repo data with Sonnet scores.

Qwen family comparison (all claude-3.7-sonnet judge)

Model	Params (active)	Raw Score
EQ-v5 (this model)	3B	193.6
Qwen3-235B-A22B	22B	191.1
Qwen3.5-35B-A3B vanilla	3B	185.5
Qwen3-30B-A3B	3B	166.3

HumanEval+ (coding)

Benchmark	pass@1
HumanEval (base)	95.1%
HumanEval+ (extended tests)	88.4%

Thinking enabled, temperature=0.6, top_p=0.95.

Serving

vllm serve nivvis/Qwen3.5-35B-A3B-EQ-v5-FP8 \
  --served-model-name Qwen3.5-35B-A3B-EQ-v5 \
  --reasoning-parser qwen3 \
  --enable-auto-tool-choice \
  --tool-call-parser qwen3_coder \
  --max-num-seqs 32 \
  --max-model-len 262144 \
  --gpu-memory-utilization 0.95 \
  --trust-remote-code \
  --host 0.0.0.0 \
  --port 30000

Sampling recommendations

With thinking: temp=0.7, top_p=0.9, max_tokens=4096
Without thinking: temp=0.7, top_p=0.8, max_tokens=2048

To disable thinking mode:

extra_body={"chat_template_kwargs": {"enable_thinking": False}}

Lineage

Qwen/Qwen3.5-35B-A3B
  → llmfan46/Qwen3.5-35B-A3B-heretic-v2 (decensored)
    → nivvis/Qwen3.5-35B-A3B-EQ-v5 (DPO for EQ, bf16)
      → nivvis/Qwen3.5-35B-A3B-EQ-v5-FP8 (this model)

Limitations

Assertiveness is below frontier — the model can be too agreeable in scenarios requiring pushback
Best insights sometimes stay in thinking tokens and don't fully surface in the response
Trained on English conversational data only
Not a therapist — do not use for mental health advice