Text Generation
Transformers
Safetensors
qwen2
Generated from Trainer
conversational
text-generation-inference
4-bit precision
exl3
Instructions to use MetaphoricalCode/32B-Qwen2.5-Kunou-v1-exl3-4bpw-hb6 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use MetaphoricalCode/32B-Qwen2.5-Kunou-v1-exl3-4bpw-hb6 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="MetaphoricalCode/32B-Qwen2.5-Kunou-v1-exl3-4bpw-hb6") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("MetaphoricalCode/32B-Qwen2.5-Kunou-v1-exl3-4bpw-hb6") model = AutoModelForMultimodalLM.from_pretrained("MetaphoricalCode/32B-Qwen2.5-Kunou-v1-exl3-4bpw-hb6") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use MetaphoricalCode/32B-Qwen2.5-Kunou-v1-exl3-4bpw-hb6 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "MetaphoricalCode/32B-Qwen2.5-Kunou-v1-exl3-4bpw-hb6" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MetaphoricalCode/32B-Qwen2.5-Kunou-v1-exl3-4bpw-hb6", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/MetaphoricalCode/32B-Qwen2.5-Kunou-v1-exl3-4bpw-hb6
- SGLang
How to use MetaphoricalCode/32B-Qwen2.5-Kunou-v1-exl3-4bpw-hb6 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "MetaphoricalCode/32B-Qwen2.5-Kunou-v1-exl3-4bpw-hb6" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MetaphoricalCode/32B-Qwen2.5-Kunou-v1-exl3-4bpw-hb6", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "MetaphoricalCode/32B-Qwen2.5-Kunou-v1-exl3-4bpw-hb6" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MetaphoricalCode/32B-Qwen2.5-Kunou-v1-exl3-4bpw-hb6", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use MetaphoricalCode/32B-Qwen2.5-Kunou-v1-exl3-4bpw-hb6 with Docker Model Runner:
docker model run hf.co/MetaphoricalCode/32B-Qwen2.5-Kunou-v1-exl3-4bpw-hb6
| library_name: transformers | |
| license: other | |
| license_name: qwen | |
| license_link: https://huggingface.co/Qwen/Qwen2.5-32B-Instruct/blob/main/LICENSE | |
| base_model: | |
| - Sao10K/32B-Qwen2.5-Kunou-v1 | |
| base_model_relation: quantized | |
| tags: | |
| - generated_from_trainer | |
| model-index: | |
| - name: 32B-Qwen2.5-Kunou-v1 | |
| results: [] | |
| ## Quantized using the default exllamav3 (0.0.2) quantization process. | |
| - Original model: https://huggingface.co/Sao10K/32B-Qwen2.5-Kunou-v1 | |
| - exllamav3: https://github.com/turboderp-org/exllamav3 | |
| --- | |
|  | |
| **Sister Versions for Lightweight and Heavyweight Use!** | |
| [72B-Kunou-v1](https://huggingface.co/Sao10K/72B-Qwen2.5-Kunou-v1) | |
| [14B-Kunou-v1](https://huggingface.co/Sao10K/14B-Qwen2.5-Kunou-v1) | |
| # 32B-Qwen2.5-Kunou-v1 | |
| *training delays and all...* | |
| I do not really have anything planned for this model other than it being a generalist, and Roleplay Model? It was just something made and planned in minutes. | |
| <br>Same with the 14B and 72B version. | |
| <br>Kunou's the name of an OC I worked on for a couple of years, for a... fanfic. mmm... | |
| A kind-of successor to L3-70B-Euryale-v2.2 in all but name? I'm keeping Stheno/Euryale lineage to Llama series for now. | |
| <br>I had a version made on top of Nemotron, a supposed Euryale 2.4 but that flopped hard, it was not my cup of tea. | |
| <br>This version is basically a better, more cleaned up Dataset used on Euryale and Stheno. | |
| Recommended Model Settings | *Look, I just use these, they work fine enough. I don't even know how DRY or other meme samplers work. Your system prompt matters more anyway.* | |
| ``` | |
| Prompt Format: ChatML | |
| Temperature: 1.1 | |
| min_p: 0.1 | |
| ``` | |
| Future-ish plans: | |
| ~~<br>\- Complete this model series.~~ | |
| <br>\- Further refine the Datasets used for quality, more secondary chats, more creative-related domains. (Inspired by Drummer) | |
| <br>\- Work on my other incomplete projects. About half a dozen on the backburner for a while now. | |
| Special thanks to my wallet for funding this, my juniors who share a single braincell between them, and my current national service. | |
| <br>Stay safe. There have been more emergency calls, more incidents this holiday season. | |
| Also sorry for the inactivity. Life was in the way. It still is, just less so, for now. Burnout is a thing, huh? | |
| https://sao10k.carrd.co/ for contact. | |
| --- | |
| [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl) | |
| <details><summary>See axolotl config</summary> | |
| axolotl version: `0.5.2` | |
| ```yaml | |
| base_model: Qwen/Qwen2.5-32B-Instruct | |
| model_type: AutoModelForCausalLM | |
| tokenizer_type: AutoTokenizer | |
| load_in_8bit: false | |
| load_in_4bit: true | |
| strict: false | |
| sequence_len: 16384 | |
| bf16: auto | |
| fp16: | |
| tf32: false | |
| flash_attention: true | |
| adapter: qlora | |
| lora_model_dir: | |
| lora_r: 32 | |
| lora_alpha: 64 | |
| lora_dropout: 0.1 | |
| lora_target_linear: true | |
| lora_fan_in_fan_out: | |
| peft_use_rslora: true | |
| # Data | |
| dataset_prepared_path: last_run_prepared | |
| datasets: | |
| - path: datasets/amoral-full-sys-prompt.json # Unalignment Data - Cleaned Up from Original, Split to its own file | |
| type: customchatml | |
| - path: datasets/mimi-superfix-RP-filtered-fixed.json # RP / Creative-Instruct Data | |
| type: customchatml | |
| - path: datasets/hespera-smartshuffle.json # Hesperus-v2-Instruct Data | |
| type: customchatml | |
| warmup_steps: 15 | |
| plugins: | |
| - axolotl.integrations.liger.LigerPlugin | |
| liger_rope: true | |
| liger_rms_norm: true | |
| liger_layer_norm: true | |
| liger_glu_activation: true | |
| liger_fused_linear_cross_entropy: true | |
| # Iterations | |
| num_epochs: 1 | |
| # Batching | |
| gradient_accumulation_steps: 4 | |
| micro_batch_size: 1 | |
| gradient_checkpointing: "unsloth" | |
| # Optimizer | |
| optimizer: paged_ademamix_8bit | |
| lr_scheduler: cosine | |
| learning_rate: 0.000004 | |
| weight_decay: 0.1 | |
| max_grad_norm: 25.0 | |
| # Iterations | |
| num_epochs: 1 | |
| # Misc | |
| deepspeed: ./deepspeed_configs/zero3_bf16.json | |
| ``` | |
| </details><br> |