Qwen 3.5 4B GGUF

Description

This repository contains GGUF weights for the Qwen/Qwen3.5-4B model. The files were converted and quantized using llama.cpp.

Provided Files

  • Q6_K: High quality, recommended for best performance if you have enough VRAM.
  • Q5_K_M: Balanced quality and speed.
  • Q4_K_M: Optimal for most users, fast and lightweight.

Usage

You can run these models using llama.cpp or any GGUF-compatible software like LM Studio, Ollama, or KoboldCPP.

Example command for llama-cli:

./llama-cli -m qwen3.5-4b-Q4_K_M.gguf -ngl 32

Example PowerShell command for llama-cli:

.\llama-cli.exe -m qwen3.5-4b-Q4_K_M.gguf -ngl 32 -fa 0 --no-mmap --reasoning off 

Parameter Quick Reference (CLI Flags)

When running this model via llama-cli, you can use the following flags to optimize performance:

Flash Attention (-fa)

An optimization technique for the attention mechanism.

  • -fa 1: Enable. Significantly speeds up processing for long contexts (requires model and hardware support).
  • -fa 0: Disable. More stable, but slower when dealing with large contexts.

Memory Mapping (--no-mmap)

Controls how the model file is loaded into the system.

  • Without this flag: The model uses mmap (memory-mapped files) by default. It provides faster loading but may occasionally conflict with specific systems or GPU drivers.
  • With --no-mmap: The model is fully read into system RAM. This is more reliable for troubleshooting but results in slower startup times and higher RAM consumption.

Reasoning Process (--reasoning)

Controls the output of the model's internal "thinking" (for models trained with reasoning capabilities like Qwen 3.5).

  • --reasoning on: Allows the model to display its internal thought process (usually enclosed within <thought> tags).
  • --reasoning off: Disables the thought process output, forcing the model to provide a direct answer immediately.
Downloads last month
711
GGUF
Model size
4B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

6-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TirGun/Qwen3.5-4B-GGUF

Finetuned
Qwen/Qwen3.5-4B
Quantized
(241)
this model