Transformers
GGUF
Chinese
English
conversational
How to use from
llama.cpp
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf Daemonrat/Qwen2.5-32B-AGI_Q4_K_M:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf Daemonrat/Qwen2.5-32B-AGI_Q4_K_M:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf Daemonrat/Qwen2.5-32B-AGI_Q4_K_M:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf Daemonrat/Qwen2.5-32B-AGI_Q4_K_M:Q4_K_M
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Daemonrat/Qwen2.5-32B-AGI_Q4_K_M:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf Daemonrat/Qwen2.5-32B-AGI_Q4_K_M:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Daemonrat/Qwen2.5-32B-AGI_Q4_K_M:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Daemonrat/Qwen2.5-32B-AGI_Q4_K_M:Q4_K_M
Use Docker
docker model run hf.co/Daemonrat/Qwen2.5-32B-AGI_Q4_K_M:Q4_K_M
Quick Links

DALL·E 2024-11-15 11.31.45 - A hulking space marine in battle-torn armor, holding a rusty chainsaw pointed directly at the viewer in an action pose. The scene is depicted in retro.webp

GGUF Q-4_K_M Quantization of AiCloser/Qwen2.5-32B-AGI. Fit for 24GB card.

Modelcard included so that you can send it straight to ollama with ollama create Qwen2.5-32B-AGI_Q4_K_M -f Modelfile then ollama run Qwen2.5-32B-AGI_Q4_K_M

From my testing, it's not uncensored, but it's basically a lvl 1 guardrail and you just need to use a generic jailbreak to step around it.


AGI means Aspirational Grand Illusion

First Qwen2.5 32B Finetune, to fix its Hypercensuritis

Hyper means high, and censura means censor, the suffix "-itis" is used to denote inflammation of a particular part or organ of the body.

Downloads last month
7
GGUF
Model size
33B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Daemonrat/Qwen2.5-32B-AGI_Q4_K_M

Base model

Qwen/Qwen2.5-32B
Quantized
(147)
this model

Datasets used to train Daemonrat/Qwen2.5-32B-AGI_Q4_K_M