Instructions to use PotatoOff/MQ-Catsu-70b-4.8bpw with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use PotatoOff/MQ-Catsu-70b-4.8bpw with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="PotatoOff/MQ-Catsu-70b-4.8bpw")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("PotatoOff/MQ-Catsu-70b-4.8bpw")
model = AutoModelForMultimodalLM.from_pretrained("PotatoOff/MQ-Catsu-70b-4.8bpw")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use PotatoOff/MQ-Catsu-70b-4.8bpw with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "PotatoOff/MQ-Catsu-70b-4.8bpw"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PotatoOff/MQ-Catsu-70b-4.8bpw",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/PotatoOff/MQ-Catsu-70b-4.8bpw

SGLang

How to use PotatoOff/MQ-Catsu-70b-4.8bpw with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "PotatoOff/MQ-Catsu-70b-4.8bpw" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PotatoOff/MQ-Catsu-70b-4.8bpw",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "PotatoOff/MQ-Catsu-70b-4.8bpw" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PotatoOff/MQ-Catsu-70b-4.8bpw",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use PotatoOff/MQ-Catsu-70b-4.8bpw with Docker Model Runner:
```
docker model run hf.co/PotatoOff/MQ-Catsu-70b-4.8bpw
```

Configuration Parsing Warning:In config.json: "quantization_config.bits" must be an integer

Welcome to Miqu Cat: A 70B Miqu Lora Fine-Tune

Introducing Miqu Cat, an advanced model fine-tuned by Dr. Kal'tsit then quanted for the the ExllamaV2 project, bringing the model down to an impressive 4.8 bits per weight (bpw). This fine-tuning allows those with limited computational resources to explore its capabilities without compromise.

Competitive Edge - meow!

Miqu Cat stands out in the arena of Miqu fine-tunes, consistently performing admirably in tests and comparisons. It’s crafted to be less restrictive and more robust than its predecessors and variants, making it a versatile tool in AI-driven applications.
48GB VRAM to load the model for 8192 Context Length ["2x3090", "1xA6000", "1xA100 80GB", "etc."]

How to Use Miqu Cat: The Nitty-Gritty

Miqu Cat operates on the CHATML prompt format, designed for straightforward and effective interaction. Whether you're integrating it into existing systems or using it for new projects, its flexible prompt structure facilitates ease of use.

Training Specs

Dataset: 1.5 GB
Compute: Dual setup of 8xA100 nodes

Meet the Author

Dr. Kal'tsit has been at the forefront of this fine-tuning process, ensuring that Miqu Cat gives the user a unique feel.

Downloads last month: 8