How to use from
llama.cpp
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf Trilogix1/Anthropics-Fable-finetuned-in-Qwen3.6-35B:IQ4_NL
# Run inference directly in the terminal:
llama cli -hf Trilogix1/Anthropics-Fable-finetuned-in-Qwen3.6-35B:IQ4_NL
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf Trilogix1/Anthropics-Fable-finetuned-in-Qwen3.6-35B:IQ4_NL
# Run inference directly in the terminal:
llama cli -hf Trilogix1/Anthropics-Fable-finetuned-in-Qwen3.6-35B:IQ4_NL
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Trilogix1/Anthropics-Fable-finetuned-in-Qwen3.6-35B:IQ4_NL
# Run inference directly in the terminal:
./llama-cli -hf Trilogix1/Anthropics-Fable-finetuned-in-Qwen3.6-35B:IQ4_NL
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Trilogix1/Anthropics-Fable-finetuned-in-Qwen3.6-35B:IQ4_NL
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Trilogix1/Anthropics-Fable-finetuned-in-Qwen3.6-35B:IQ4_NL
Use Docker
docker model run hf.co/Trilogix1/Anthropics-Fable-finetuned-in-Qwen3.6-35B:IQ4_NL
Quick Links

This is a converted and quantized version of Qwen 3.6 35B using Quanta and HugstonOne.


model size = 132219.74 MiB (32.00 BPW)


quant size = 21192.47 MiB (5.13 BPW)

Original weights here: https://huggingface.co/lordx64/Qwable-v1


Credit to Qwen team for the model creation

Credit to https://huggingface.co/lordx64/Qwable-v1 for the finetuning work

Credit to LLama.cpp team for the great contribution

Credit to Hugston Team for Converting, Quantizing, Testing, Benching and other...

Credit to Huggingface for the amazing hosting platform


The quantization in GGUF was made in f32 for better quality quants.
Here we show Quanta our convertor and Quantizer tool.

image

Downloads last month
814
GGUF
Model size
35B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Trilogix1/Anthropics-Fable-finetuned-in-Qwen3.6-35B

Dataset used to train Trilogix1/Anthropics-Fable-finetuned-in-Qwen3.6-35B