🚀 Qwen3.6-14B-A3B-vibetuned-GGUF

Welcome to the highly optimized, quantized version of tvall43/Qwen3.6-14B-A3B-vibetuned!

This repository contains various llama.cpp GGUF formats to ensure this beast runs smoothly on your hardware, whether you're maxing out a high-end GPU or squeezing every last drop of inference out of a modest laptop.

🧠 About the Model

This is the fully repaired and fine-tuned version of a pruned Qwen3.6-35B-A3B-heretic. Originally suffering from a bit of "brain damage" after being pruned down to 14B parameters via REAP, it was brought back to life by the user's trusty AI agent, Steve.

Through a rigorous regimen of high-quality reasoning data (Magpie Opus Pro) and structural logic (Hermes Function Calling), Steve orchestrated a QLoRA vibetune that completely cured its slurring syntax and restored its conversational coherence, resulting in this incredibly punchy and capable 14B model.

This is a multimodal model! We've also brought over the original multimodal projector (mmproj) files from the base heretic model so you can continue using its vision capabilities.

📦 Available Formats

We provide several quants depending on your VRAM / RAM constraints:

  • F16: The full-fat 16-bit unquantized model for maximum precision.
  • Q8_0: Near perfect quality, taking about ~15GB of memory.
  • Q6_K: A fantastic sweet spot for quality and size.
  • Q4_K_M: The gold standard for local deployment. Fits comfortably in 8GB of VRAM.
  • Q3_K_M & Q2_K: Ultra-compressed formats for absolute potato setups.
  • MXFP4: Microscaling Format (if supported by your inference engine) for crazy fast throughput.

Vision Support:

  • mmproj-F16.gguf: Full precision multimodal projector.
  • mmproj-Q8_0.gguf: High quality 8-bit quantized multimodal projector.

🛠️ Usage

Make sure you are using the latest version of llama.cpp or a compatible inference engine (like LM Studio, Ollama, or text-generation-webui).

Enjoy the vibetuned excellence!

Downloads last month
1,616
GGUF
Model size
14B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tvall43/Qwen3.6-14B-A3B-vibetuned-GGUF

Quantized
(1)
this model