Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF

Instructions to use wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF",
	filename="MTP-F16.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF:F16
# Run inference directly in the terminal:
llama-cli -hf wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF:F16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF:F16
# Run inference directly in the terminal:
llama-cli -hf wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF:F16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF:F16
# Run inference directly in the terminal:
./llama-cli -hf wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF:F16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF:F16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF:F16

Use Docker

docker model run hf.co/wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF:F16

LM Studio
Jan
Ollama
How to use wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF with Ollama:
```
ollama run hf.co/wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF:F16
```

Unsloth Studio

How to use wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF to start chatting

How to use wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF:F16

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF:F16"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF:F16

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF:F16

Run Hermes

hermes

Docker Model Runner
How to use wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF with Docker Model Runner:
```
docker model run hf.co/wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF:F16
```

Lemonade

How to use wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF:F16

Run and chat with the model

lemonade run user.Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF-F16

List all available models

lemonade list

模型来源：https://huggingface.co/AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-BF16 [a1e595e]

IMatrix 来源：https://huggingface.co/ReadyArt/Dark-Nexus-27B-v3.0-GGUF/blob/main/imatrix.gguf [7296c23]

JINJA来源：https://huggingface.co/froggeric/Qwen-Fixed-Chat-Templates/blob/main/chat_template.jinja [c31fd39]

量化：

llama-quantize.exe XXX-F16.gguf Q8_0

llama-quantize.exe --imatrix imatrix.gguf XXX-F16.gguf Q4_K_M

PPL 测试：

ggml-org/ci/wikitext-2-raw-v1.zip/wiki.test.raw

# LF, UTF-8, 1.23MB

llama-perplexity.exe -m xxx.gguf -f wiki.test.raw

calculating perplexity over 580 chunks, n_ctx=512, batch_size=2048, n_seq=4

- F16:
PPL = 7.3717 +/- 0.05012

- Q8_0:
PPL = 7.3615 +/- 0.05000

- Q4_K_M:
PPL = 7.5632 +/- 0.05183

- IMatrix-Q4_K_M:
PPL = 7.4078 +/- 0.05028

QY789/chinese-novel-dataset/dataset.json > chinese-novel-dataset.raw

# LF, UTF-8, 1.56MB

llama-perplexity.exe --chunks -1 --ctx-size 2048 --model xxx.gguf --file chinese-novel-dataset.raw

calculating perplexity over 180 chunks, n_ctx=2048, batch_size=2048, n_seq=1

- Q8_0:
Size: 27.0 GB
PPL = 27.7540 +/- 0.21244

- IMatrix-Q4_K_M:
Size: 15.6 GB
PPL = 27.8364 +/- 0.21260

---
# 看看就好了。。。

- ArliAI/Qwen3.5-27B-Derestricted > F16 > Q8_0
Size: 27.0 GB
PPL = 24.5703 +/- 0.17746

- ArliAI/Qwen3.5-27B-Derestricted > F16 > IMatrix-Q4_K_M
Size: 15.6 GB
PPL = 24.8051 +/- 0.17916

- morikomorizz/GRM-2.6-Plus-Primal > F16
Size: 50.9 GB
PPL = 27.9228 +/- 0.21007

- morikomorizz/GRM-2.6-Plus-Primal > F16 > IMatrix-Q4_K_M
Size: 15.6 GB
PPL = 27.9997 +/- 0.21023

- mradermacher/GRM-2.6-Plus-Primal.i1-Q4_K_M
Size: 15.4 GB
PPL = 28.1043 +/- 0.21144

- ReadyArt/Dark-Nexus-27B-v3.0.i1-Q4_K_M_attn8_ssm8_hb16
Size: 21.5 GB
PPL = 23.7144 +/- 0.17150

- llmfan46/Qwen3.6-27B-Uncensored-Heretic-V2-Native-MTP-Preserved-Q4_K_M
Size: 15.6 GB
PPL = 28.8168 +/- 0.22046

- unsloth/Qwen3.6-27B-UD-Q8_K_XL
Size: 32.8 GB
PPL = 26.3259 +/- 0.19895

- unsloth/Qwen3.6-27B-Q8_0
Size: 26.6 GB
PPL = 26.2658 +/- 0.19832

QY789/chinese-novel-dataset/dataset.json > chinese-novel-dataset.raw

# CRLF, UTF-8, 1.57MB

# 采样参数不生效
llama-perplexity.exe --chunks -1 --ctx-size 2048 --temp 1.00 --min-p 0.00 --top-k 20 --top-p 0.95 --repeat-penalty 1.00 --presence-penalty 0.00 --model xxx.gguf --file chinese-novel-dataset.raw

calculating perplexity over 180 chunks, n_ctx=2048, batch_size=2048, n_seq=1

- Q8_0:
Size: 27.0 GB
PPL = 27.7540 +/- 0.21244

- Q4_K_M:
Size: 15.6 GB
PPL = 29.2257 +/- 0.22777

- IMatrix-Q4_K_M:
Size: 15.6 GB
PPL = 27.8364 +/- 0.21260

- IMatrix-IQ4_NL:
Size: 14.9 GB
PPL = 28.2146 +/- 0.21672

- IMatrix-IQ4_XS:
Size: 14.2 GB
PPL = 28.1907 +/- 0.21668

import json

with open("dataset.json", "r", encoding="utf-8") as f:
    data = json.load(f)

count = 0
with open("chinese-novel-dataset.raw", "w", encoding="utf-8") as f:
    for item in data:
        text = item.get("input", "").strip()

        if len(text) > 100:  # 保留有实际内容的小说段落
            f.write(text + "\n\n\n")  # 段落间留空行，这是最标准的做法
            count += 1

            if count >= 8000:  # 控制大小
                break

print(f"转换完成！共 {count} 段小说文本")

Roman1111111/claude-opus-4.6-10000x/opus46_final.jsonl > claude-opus-4.6-10000x_chatml.raw

# LF, UTF-8, 11.4MB

llama-perplexity.exe --chunks -1 --ctx-size 2048 --model xxx.gguf --file xxx.raw

calculating perplexity over 1805 chunks, n_ctx=2048, batch_size=2048, n_seq=1

- AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-BF16 > F16 > IMatrix-Q4_K_M
Size: 15.6 GB
PPL = 1.9474 +/- 0.00257

- llmfan46/Qwen3.6-27B-Uncensored-Heretic-V2-Native-MTP-Preserved-Q4_K_M
Size: 15.6 GB
PPL = 1.9192 +/- 0.00245

import argparse
import json
import os


def format_chatml(messages):
    """ChatML 模板 (Qwen, DeepSeek, Orca 等)"""
    formatted_text = ""
    for msg in messages:
        role = msg["role"]
        content = msg.get("content", "")
        reasoning = msg.get("reasoning", "")

        # 如果有推理内容，拼接进 assistant 消息中
        if role == "assistant" and reasoning:
            content = f"<thought>\n{reasoning}\n</thought>\n{content}"

        formatted_text += f"<|im_start|>{role}\n{content}<|im_end|>\n"
    return formatted_text


def format_llama3(messages):
    """Llama 3 / 3.1 模板"""
    formatted_text = "<|begin_of_text|>"
    for msg in messages:
        role = msg["role"]
        content = msg.get("content", "")
        reasoning = msg.get("reasoning", "")

        if role == "assistant" and reasoning:
            # 针对 R1 蒸馏版 Llama-3，通常也是用 <thought> 标签
            content = f"<thought>\n{reasoning}\n</thought>\n{content}"

        formatted_text += (
            f"<|start_header_id|>{role}<|end_header_id|>\n\n{content}<|eot_id|>"
        )
    return formatted_text


def main():
    parser = argparse.ArgumentParser(
        description="Convert JSONL dataset to .raw file for llama-perplexity"
    )
    parser.add_argument(
        "-i",
        "--input",
        required=True,
        help="Input JSONL file path (e.g., test.jsonl)",
    )
    parser.add_argument(
        "-o",
        "--output",
        required=True,
        help="Output .raw file path (e.g., dataset.raw)",
    )
    parser.add_argument(
        "-t",
        "--template",
        choices=["chatml", "llama3"],
        default="chatml",
        help="Chat template to use",
    )

    args = parser.parse_args()

    count = 0
    with open(args.input, "r", encoding="utf-8") as infile, open(
        args.output, "w", encoding="utf-8"
    ) as outfile:
        for line in infile:
            line = line.strip()
            if not line:
                continue
            try:
                data = json.loads(line)
                messages = data.get("messages", [])

                if args.template == "chatml":
                    formatted_chat = format_chatml(messages)
                elif args.template == "llama3":
                    formatted_chat = format_llama3(messages)

                # 写入文件，每个样本之间换行
                outfile.write(formatted_chat + "\n")
                count += 1
            except Exception as e:
                print(f"Error parsing line: {e}")

    print(f"成功转换 {count} 条数据，已保存至: {args.output}")


if __name__ == "__main__":
    main()

Downloads last month: 341

GGUF

Model size

3B params

Architecture

qwen35

Hardware compatibility

4-bit

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF

Base model

Qwen/Qwen3.6-27B

Finetuned

AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-BF16

Quantized

(26)

this model