Instructions to use PotatoOff/MQ-Catsu-70b-4.8bpw with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use PotatoOff/MQ-Catsu-70b-4.8bpw with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="PotatoOff/MQ-Catsu-70b-4.8bpw") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("PotatoOff/MQ-Catsu-70b-4.8bpw") model = AutoModelForMultimodalLM.from_pretrained("PotatoOff/MQ-Catsu-70b-4.8bpw") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use PotatoOff/MQ-Catsu-70b-4.8bpw with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "PotatoOff/MQ-Catsu-70b-4.8bpw" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "PotatoOff/MQ-Catsu-70b-4.8bpw", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/PotatoOff/MQ-Catsu-70b-4.8bpw
- SGLang
How to use PotatoOff/MQ-Catsu-70b-4.8bpw with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "PotatoOff/MQ-Catsu-70b-4.8bpw" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "PotatoOff/MQ-Catsu-70b-4.8bpw", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "PotatoOff/MQ-Catsu-70b-4.8bpw" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "PotatoOff/MQ-Catsu-70b-4.8bpw", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use PotatoOff/MQ-Catsu-70b-4.8bpw with Docker Model Runner:
docker model run hf.co/PotatoOff/MQ-Catsu-70b-4.8bpw
# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM
tokenizer = AutoTokenizer.from_pretrained("PotatoOff/MQ-Catsu-70b-4.8bpw")
model = AutoModelForMultimodalLM.from_pretrained("PotatoOff/MQ-Catsu-70b-4.8bpw")
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))Configuration Parsing Warning:In config.json: "quantization_config.bits" must be an integer
Welcome to Miqu Cat: A 70B Miqu Lora Fine-Tune
Introducing Miqu Cat, an advanced model fine-tuned by Dr. Kal'tsit then quanted for the the ExllamaV2 project, bringing the model down to an impressive 4.8 bits per weight (bpw). This fine-tuning allows those with limited computational resources to explore its capabilities without compromise.
Competitive Edge - meow!
Miqu Cat stands out in the arena of Miqu fine-tunes, consistently performing admirably in tests and comparisons. It’s crafted to be less restrictive and more robust than its predecessors and variants, making it a versatile tool in AI-driven applications.
48GB VRAM to load the model for 8192 Context Length ["2x3090", "1xA6000", "1xA100 80GB", "etc."]
How to Use Miqu Cat: The Nitty-Gritty
Miqu Cat operates on the CHATML prompt format, designed for straightforward and effective interaction. Whether you're integrating it into existing systems or using it for new projects, its flexible prompt structure facilitates ease of use.
Training Specs
- Dataset: 1.5 GB
- Compute: Dual setup of 8xA100 nodes
Meet the Author
Dr. Kal'tsit has been at the forefront of this fine-tuning process, ensuring that Miqu Cat gives the user a unique feel.
- Downloads last month
- 8
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="PotatoOff/MQ-Catsu-70b-4.8bpw") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)