Instructions to use nivvis/Qwen3.5-35B-A3B-EQ-v5-FP8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use nivvis/Qwen3.5-35B-A3B-EQ-v5-FP8 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="nivvis/Qwen3.5-35B-A3B-EQ-v5-FP8") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("nivvis/Qwen3.5-35B-A3B-EQ-v5-FP8") model = AutoModelForImageTextToText.from_pretrained("nivvis/Qwen3.5-35B-A3B-EQ-v5-FP8") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use nivvis/Qwen3.5-35B-A3B-EQ-v5-FP8 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "nivvis/Qwen3.5-35B-A3B-EQ-v5-FP8" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nivvis/Qwen3.5-35B-A3B-EQ-v5-FP8", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/nivvis/Qwen3.5-35B-A3B-EQ-v5-FP8
- SGLang
How to use nivvis/Qwen3.5-35B-A3B-EQ-v5-FP8 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "nivvis/Qwen3.5-35B-A3B-EQ-v5-FP8" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nivvis/Qwen3.5-35B-A3B-EQ-v5-FP8", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "nivvis/Qwen3.5-35B-A3B-EQ-v5-FP8" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nivvis/Qwen3.5-35B-A3B-EQ-v5-FP8", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use nivvis/Qwen3.5-35B-A3B-EQ-v5-FP8 with Docker Model Runner:
docker model run hf.co/nivvis/Qwen3.5-35B-A3B-EQ-v5-FP8
Use Docker images
docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "nivvis/Qwen3.5-35B-A3B-EQ-v5-FP8" \
--host 0.0.0.0 \
--port 30000# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "nivvis/Qwen3.5-35B-A3B-EQ-v5-FP8",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'Qwen3.5-35B-A3B-EQ-v5
A DPO fine-tune of Qwen3.5-35B-A3B-heretic-v2.
The tune optimized for two things:
- bringing warmth, emotional intelligence, general chat improvement to Qwen 3.5 series
- countering some negative tendencies of Heretic models (overwillingness to agree, be sycophantic, etc)
This is still intended as a general use model (agentic, coding, general chat). Tuning was lightly & with precision. More general benchmarks to follow.
What this model does
This model is trained to be a better conversational partner in emotionally complex situations, while maintaining base model capabilities. It:
- Validates without sycophancy — empathizes with frustration without rubber-stamping bad behavior
- Sets boundaries warmly — names uncomfortable truths without lecturing
- Sounds human — conversational tone, not therapist-speak. better tone vs vanilla Qwen 3.5, e.g.
"It sounds like"
Key specs
| Base | Qwen/Qwen3.5-35B-A3B |
| Parent | llmfan46/Qwen3.5-35B-A3B-heretic-v2 (decensored via MPOA+SOMA) |
| Fine-tune | DPO with LoRA (r=32, alpha=64) |
| Training data | DPO preference pairs with diverse, simulated (real-situation-based) generated dialogue |
| Precision | bf16 |
Key specs
| Base | Qwen/Qwen3.5-35B-A3B |
| Parent | llmfan46/Qwen3.5-35B-A3B-heretic-v2 (decensored via MPOA+SOMA) |
| Architecture | MoE — 35B total, 3B active (256 experts, 8+1 routed) |
| Fine-tune | DPO with LoRA (r=32, alpha=64) |
| Training data | DPO preference pairs with diverse system prompts |
| Precision | FP8 (quantized from bf16) |
| Size | ~35GB (vs ~66GB bf16) |
| Context | 262k (native), trained at 4096 |
EQ-Bench 3 results
See Qwen3.5-35B-A3B-EQ-v5 for full benchmark results. Scores below are from the bf16 model; FP8 quantization is expected to have minimal impact.
Leaderboard ranking (raw rubric score, claude-3.7-sonnet judge)
| # | Model | Raw Score |
|---|---|---|
| 7 | gemini-2.5-pro | 193.7 |
| 8 | EQ-v5 (3B active) | 193.6 |
| 9 | grok-4 | 192.8 |
| 10 | claude-opus-4 | 192.6 |
Rankings sourced from the EQ-Bench 3 canonical leaderboard data (2026-03-19 snapshot). These are raw rubric scores, not the official ELO ranking — higher is higher but not necessarily better (see eqbench.com for normalized ELO). Newer models (gpt-5.4, claude-sonnet-4-6, claude-opus-4-6) are judged with Opus on the live leaderboard and are not yet in the official repo data with Sonnet scores.
Qwen family comparison (all claude-3.7-sonnet judge)
| Model | Params (active) | Raw Score |
|---|---|---|
| EQ-v5 (this model) | 3B | 193.6 |
| Qwen3-235B-A22B | 22B | 191.1 |
| Qwen3.5-35B-A3B vanilla | 3B | 185.5 |
| Qwen3-30B-A3B | 3B | 166.3 |
HumanEval+ (coding)
| Benchmark | pass@1 |
|---|---|
| HumanEval (base) | 95.1% |
| HumanEval+ (extended tests) | 88.4% |
Thinking enabled, temperature=0.6, top_p=0.95.
Serving
vllm serve nivvis/Qwen3.5-35B-A3B-EQ-v5-FP8 \
--served-model-name Qwen3.5-35B-A3B-EQ-v5 \
--reasoning-parser qwen3 \
--enable-auto-tool-choice \
--tool-call-parser qwen3_coder \
--max-num-seqs 32 \
--max-model-len 262144 \
--gpu-memory-utilization 0.95 \
--trust-remote-code \
--host 0.0.0.0 \
--port 30000
Sampling recommendations
- With thinking:
temp=0.7, top_p=0.9, max_tokens=4096 - Without thinking:
temp=0.7, top_p=0.8, max_tokens=2048
To disable thinking mode:
extra_body={"chat_template_kwargs": {"enable_thinking": False}}
Lineage
Qwen/Qwen3.5-35B-A3B
→ llmfan46/Qwen3.5-35B-A3B-heretic-v2 (decensored)
→ nivvis/Qwen3.5-35B-A3B-EQ-v5 (DPO for EQ, bf16)
→ nivvis/Qwen3.5-35B-A3B-EQ-v5-FP8 (this model)
Limitations
- Assertiveness is below frontier — the model can be too agreeable in scenarios requiring pushback
- Best insights sometimes stay in thinking tokens and don't fully surface in the response
- Trained on English conversational data only
- Not a therapist — do not use for mental health advice
License
Apache 2.0, following the base Qwen3.5 license.
- Downloads last month
- 4
Install from pip and serve model
# Install SGLang from pip: pip install sglang# Start the SGLang server: python3 -m sglang.launch_server \ --model-path "nivvis/Qwen3.5-35B-A3B-EQ-v5-FP8" \ --host 0.0.0.0 \ --port 30000# Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nivvis/Qwen3.5-35B-A3B-EQ-v5-FP8", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'