Instructions to use wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF", filename="MTP-F16.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF:F16 # Run inference directly in the terminal: llama-cli -hf wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF:F16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF:F16 # Run inference directly in the terminal: llama-cli -hf wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF:F16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF:F16 # Run inference directly in the terminal: ./llama-cli -hf wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF:F16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF:F16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF:F16
Use Docker
docker model run hf.co/wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF:F16
- LM Studio
- Jan
- Ollama
How to use wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF with Ollama:
ollama run hf.co/wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF:F16
- Unsloth Studio
How to use wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF to start chatting
- Pi
How to use wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF:F16
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF:F16" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF:F16
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF:F16
Run Hermes
hermes
- Docker Model Runner
How to use wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF with Docker Model Runner:
docker model run hf.co/wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF:F16
- Lemonade
How to use wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF:F16
Run and chat with the model
lemonade run user.Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF-F16
List all available models
lemonade list
模型来源:https://huggingface.co/AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-BF16 [a1e595e]
IMatrix 来源:https://huggingface.co/ReadyArt/Dark-Nexus-27B-v3.0-GGUF/blob/main/imatrix.gguf [7296c23]
JINJA来源:https://huggingface.co/froggeric/Qwen-Fixed-Chat-Templates/blob/main/chat_template.jinja [c31fd39]
量化:
llama-quantize.exe XXX-F16.gguf Q8_0
llama-quantize.exe --imatrix imatrix.gguf XXX-F16.gguf Q4_K_M
PPL 测试:
ggml-org/ci/wikitext-2-raw-v1.zip/wiki.test.raw
# LF, UTF-8, 1.23MB llama-perplexity.exe -m xxx.gguf -f wiki.test.raw calculating perplexity over 580 chunks, n_ctx=512, batch_size=2048, n_seq=4 - F16: PPL = 7.3717 +/- 0.05012 - Q8_0: PPL = 7.3615 +/- 0.05000 - Q4_K_M: PPL = 7.5632 +/- 0.05183 - IMatrix-Q4_K_M: PPL = 7.4078 +/- 0.05028QY789/chinese-novel-dataset/dataset.json > chinese-novel-dataset.raw
# LF, UTF-8, 1.56MB llama-perplexity.exe --chunks -1 --ctx-size 2048 --model xxx.gguf --file chinese-novel-dataset.raw calculating perplexity over 180 chunks, n_ctx=2048, batch_size=2048, n_seq=1 - Q8_0: Size: 27.0 GB PPL = 27.7540 +/- 0.21244 - IMatrix-Q4_K_M: Size: 15.6 GB PPL = 27.8364 +/- 0.21260 --- # 看看就好了。。。 - ArliAI/Qwen3.5-27B-Derestricted > F16 > Q8_0 Size: 27.0 GB PPL = 24.5703 +/- 0.17746 - ArliAI/Qwen3.5-27B-Derestricted > F16 > IMatrix-Q4_K_M Size: 15.6 GB PPL = 24.8051 +/- 0.17916 - morikomorizz/GRM-2.6-Plus-Primal > F16 Size: 50.9 GB PPL = 27.9228 +/- 0.21007 - morikomorizz/GRM-2.6-Plus-Primal > F16 > IMatrix-Q4_K_M Size: 15.6 GB PPL = 27.9997 +/- 0.21023 - mradermacher/GRM-2.6-Plus-Primal.i1-Q4_K_M Size: 15.4 GB PPL = 28.1043 +/- 0.21144 - ReadyArt/Dark-Nexus-27B-v3.0.i1-Q4_K_M_attn8_ssm8_hb16 Size: 21.5 GB PPL = 23.7144 +/- 0.17150 - llmfan46/Qwen3.6-27B-Uncensored-Heretic-V2-Native-MTP-Preserved-Q4_K_M Size: 15.6 GB PPL = 28.8168 +/- 0.22046 - unsloth/Qwen3.6-27B-UD-Q8_K_XL Size: 32.8 GB PPL = 26.3259 +/- 0.19895 - unsloth/Qwen3.6-27B-Q8_0 Size: 26.6 GB PPL = 26.2658 +/- 0.19832QY789/chinese-novel-dataset/dataset.json > chinese-novel-dataset.raw
# CRLF, UTF-8, 1.57MB # 采样参数不生效 llama-perplexity.exe --chunks -1 --ctx-size 2048 --temp 1.00 --min-p 0.00 --top-k 20 --top-p 0.95 --repeat-penalty 1.00 --presence-penalty 0.00 --model xxx.gguf --file chinese-novel-dataset.raw calculating perplexity over 180 chunks, n_ctx=2048, batch_size=2048, n_seq=1 - Q8_0: Size: 27.0 GB PPL = 27.7540 +/- 0.21244 - Q4_K_M: Size: 15.6 GB PPL = 29.2257 +/- 0.22777 - IMatrix-Q4_K_M: Size: 15.6 GB PPL = 27.8364 +/- 0.21260 - IMatrix-IQ4_NL: Size: 14.9 GB PPL = 28.2146 +/- 0.21672 - IMatrix-IQ4_XS: Size: 14.2 GB PPL = 28.1907 +/- 0.21668import json with open("dataset.json", "r", encoding="utf-8") as f: data = json.load(f) count = 0 with open("chinese-novel-dataset.raw", "w", encoding="utf-8") as f: for item in data: text = item.get("input", "").strip() if len(text) > 100: # 保留有实际内容的小说段落 f.write(text + "\n\n\n") # 段落间留空行,这是最标准的做法 count += 1 if count >= 8000: # 控制大小 break print(f"转换完成!共 {count} 段小说文本")Roman1111111/claude-opus-4.6-10000x/opus46_final.jsonl > claude-opus-4.6-10000x_chatml.raw
# LF, UTF-8, 11.4MB llama-perplexity.exe --chunks -1 --ctx-size 2048 --model xxx.gguf --file xxx.raw calculating perplexity over 1805 chunks, n_ctx=2048, batch_size=2048, n_seq=1 - AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-BF16 > F16 > IMatrix-Q4_K_M Size: 15.6 GB PPL = 1.9474 +/- 0.00257 - llmfan46/Qwen3.6-27B-Uncensored-Heretic-V2-Native-MTP-Preserved-Q4_K_M Size: 15.6 GB PPL = 1.9192 +/- 0.00245import argparse import json import os def format_chatml(messages): """ChatML 模板 (Qwen, DeepSeek, Orca 等)""" formatted_text = "" for msg in messages: role = msg["role"] content = msg.get("content", "") reasoning = msg.get("reasoning", "") # 如果有推理内容,拼接进 assistant 消息中 if role == "assistant" and reasoning: content = f"<thought>\n{reasoning}\n</thought>\n{content}" formatted_text += f"<|im_start|>{role}\n{content}<|im_end|>\n" return formatted_text def format_llama3(messages): """Llama 3 / 3.1 模板""" formatted_text = "<|begin_of_text|>" for msg in messages: role = msg["role"] content = msg.get("content", "") reasoning = msg.get("reasoning", "") if role == "assistant" and reasoning: # 针对 R1 蒸馏版 Llama-3,通常也是用 <thought> 标签 content = f"<thought>\n{reasoning}\n</thought>\n{content}" formatted_text += ( f"<|start_header_id|>{role}<|end_header_id|>\n\n{content}<|eot_id|>" ) return formatted_text def main(): parser = argparse.ArgumentParser( description="Convert JSONL dataset to .raw file for llama-perplexity" ) parser.add_argument( "-i", "--input", required=True, help="Input JSONL file path (e.g., test.jsonl)", ) parser.add_argument( "-o", "--output", required=True, help="Output .raw file path (e.g., dataset.raw)", ) parser.add_argument( "-t", "--template", choices=["chatml", "llama3"], default="chatml", help="Chat template to use", ) args = parser.parse_args() count = 0 with open(args.input, "r", encoding="utf-8") as infile, open( args.output, "w", encoding="utf-8" ) as outfile: for line in infile: line = line.strip() if not line: continue try: data = json.loads(line) messages = data.get("messages", []) if args.template == "chatml": formatted_chat = format_chatml(messages) elif args.template == "llama3": formatted_chat = format_llama3(messages) # 写入文件,每个样本之间换行 outfile.write(formatted_chat + "\n") count += 1 except Exception as e: print(f"Error parsing line: {e}") print(f"成功转换 {count} 条数据,已保存至: {args.output}") if __name__ == "__main__": main()
- Downloads last month
- 341
4-bit
16-bit
Model tree for wazimondo/Qwen3.6-27B-AEON-Ultimate-Uncensored_1-GGUF
Base model
Qwen/Qwen3.6-27B