How to use from
SGLang
Install from pip and serve model
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "codgician/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GPTQ-int4" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "codgician/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GPTQ-int4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'
Use Docker images
docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "codgician/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GPTQ-int4" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "codgician/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GPTQ-int4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'
Quick Links

Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GPTQ-int4

This is a GPTQ INT4 quantized version of Jackrong/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled.

Please refer to the original model card for details on the model architecture, training data, and capabilities.

Note: While the original fine-tuning focused on text-only reasoning tasks, this model inherits multimodal capabilities from the base Qwen3.5-35B-A3B. The vision encoder is preserved and functional for image understanding tasks.

Model Architecture

This is a Mixture-of-Experts (MoE) model with:

  • Total Parameters: 35B
  • Active Parameters: ~3B per token
  • Experts: 256 total, 8 active per token

Quantization Details

  • Method: GPTQ (4-bit INT4, W4A16)
  • Group Size: 128
  • Calibration: 1024 samples from C4 dataset (~2048 tokens average)
  • Vision Encoder: Preserved (not quantized)
  • MTP Module: Preserved (not quantized)

Usage with vLLM

Text-only

from vllm import LLM, SamplingParams

llm = LLM(
    model="codgician/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GPTQ-int4",
    trust_remote_code=True,
    max_model_len=4096,
    gpu_memory_utilization=0.9,
)

sampling_params = SamplingParams(temperature=0.7, max_tokens=2048)
prompt = "Explain the difference between TCP and UDP protocols."
outputs = llm.generate([prompt], sampling_params)
print(outputs[0].outputs[0].text)

With Image (Multimodal)

from vllm import LLM, SamplingParams

llm = LLM(
    model="codgician/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GPTQ-int4",
    trust_remote_code=True,
    max_model_len=4096,
    gpu_memory_utilization=0.9,
)

sampling_params = SamplingParams(temperature=0.7, max_tokens=256)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}},
            {"type": "text", "text": "What is in this image?"}
        ]
    }
]
outputs = llm.chat(messages, sampling_params)
print(outputs[0].outputs[0].text)

Hardware Requirements

Precision VRAM (Approx.)
INT4 GPTQ ~22 GB

Acknowledgements

License

Apache 2.0 (inherited from original model)

Downloads last month
1,390
Safetensors
Model size
36B params
Tensor type
BF16
F32
I32
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for codgician/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GPTQ-int4