Instructions to use Dracones/Midnight-Miqu-103B-v1.0_exl2_4.25bpw with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Dracones/Midnight-Miqu-103B-v1.0_exl2_4.25bpw with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Dracones/Midnight-Miqu-103B-v1.0_exl2_4.25bpw")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("Dracones/Midnight-Miqu-103B-v1.0_exl2_4.25bpw")
model = AutoModelForMultimodalLM.from_pretrained("Dracones/Midnight-Miqu-103B-v1.0_exl2_4.25bpw")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Dracones/Midnight-Miqu-103B-v1.0_exl2_4.25bpw with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Dracones/Midnight-Miqu-103B-v1.0_exl2_4.25bpw"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Dracones/Midnight-Miqu-103B-v1.0_exl2_4.25bpw",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Dracones/Midnight-Miqu-103B-v1.0_exl2_4.25bpw

SGLang

How to use Dracones/Midnight-Miqu-103B-v1.0_exl2_4.25bpw with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Dracones/Midnight-Miqu-103B-v1.0_exl2_4.25bpw" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Dracones/Midnight-Miqu-103B-v1.0_exl2_4.25bpw",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Dracones/Midnight-Miqu-103B-v1.0_exl2_4.25bpw" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Dracones/Midnight-Miqu-103B-v1.0_exl2_4.25bpw",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Dracones/Midnight-Miqu-103B-v1.0_exl2_4.25bpw with Docker Model Runner:
```
docker model run hf.co/Dracones/Midnight-Miqu-103B-v1.0_exl2_4.25bpw
```

Error

by Hardcore7651 - opened Mar 8, 2024

Discussion

Hardcore7651

Mar 8, 2024

Tried on an A100 and 3 A5000s in text gen webui using runpod, both times I got this:
ImportError: cannot import name 'ExLlamaV2Cache_Q4' from 'exllamav2' (/usr/local/lib/python3.10/dist-packages/exllamav2/init.py)

Dracones

Owner Mar 8, 2024

Hi. This appears to be an issue with the exllamav2 version you're running: https://github.com/TheBlokeAI/dockerLLM/issues/17

Though to be fair, it's not like I was able to test above 3.5 quants on my home dual 3090 setup. I'm running through these now on a Runpod A100 80GB system to make sure they're all good. But try to update your exllamav2 with pip as per the instructions in the github issue.

Dracones

Owner Mar 8, 2024

•

edited Mar 8, 2024

I was able to run 3.0 through 5.0 quants on a Runpod A100 80GB instance using the below commands, so they should all be working fine.

cd /workspace

python -m pip install --upgrade pip

pip uninstall torch torchaudio torchvision -y

# ExllamaV2
git clone https://github.com/turboderp/exllamav2
cd exllamav2
pip install -r requirements.txt

pip install hf_transfer huggingface_hub[hf_transfer]

# Test Inference on Llama 7B
huggingface-cli download --local-dir-use-symlinks=False --revision 5.0bpw --local-dir turboderp_Llama2-7B-exl2_5.0bpw turboderp/Llama2-7B-exl2

python test_inference.py -m turboderp_Llama2-7B-exl2_5.0bpw -p "Once upon a time,"

rm -r turboderp_Llama2-7B-exl2_5.0bpw

# Download and inference on Midnight quants
## 3.0
huggingface-cli download --local-dir-use-symlinks=False --local-dir Dracones_Midnight-Miqu-103B-v1.0_exl2_3.0bpw Dracones/Midnight-Miqu-103B-v1.0_exl2_3.0bpw

python test_inference.py -m Dracones_Midnight-Miqu-103B-v1.0_exl2_3.0bpw -p "Once upon a time,"

rm -r Dracones_Midnight-Miqu-103B-v1.0_exl2_3.0bpw

## 3.5
huggingface-cli download --local-dir-use-symlinks=False --local-dir Dracones_Midnight-Miqu-103B-v1.0_exl2_3.5bpw Dracones/Midnight-Miqu-103B-v1.0_exl2_3.5bpw

python test_inference.py -m Dracones_Midnight-Miqu-103B-v1.0_exl2_3.5bpw -p "Once upon a time,"

rm -r Dracones_Midnight-Miqu-103B-v1.0_exl2_3.5bpw

## 3.75
huggingface-cli download --local-dir-use-symlinks=False --local-dir Dracones_Midnight-Miqu-103B-v1.0_exl2_3.75bpw Dracones/Midnight-Miqu-103B-v1.0_exl2_3.75bpw

python test_inference.py -m Dracones_Midnight-Miqu-103B-v1.0_exl2_3.75bpw -p "Once upon a time,"

rm -r Dracones_Midnight-Miqu-103B-v1.0_exl2_3.75bpw

## 4.0
huggingface-cli download --local-dir-use-symlinks=False --local-dir Dracones_Midnight-Miqu-103B-v1.0_exl2_4.0bpw Dracones/Midnight-Miqu-103B-v1.0_exl2_4.0bpw

python test_inference.py -m Dracones_Midnight-Miqu-103B-v1.0_exl2_4.0bpw -p "Once upon a time,"

rm -r Dracones_Midnight-Miqu-103B-v1.0_exl2_4.0bpw

## 4.25
huggingface-cli download --local-dir-use-symlinks=False --local-dir Dracones_Midnight-Miqu-103B-v1.0_exl2_4.25bpw Dracones/Midnight-Miqu-103B-v1.0_exl2_4.25bpw

python test_inference.py -m Dracones_Midnight-Miqu-103B-v1.0_exl2_4.25bpw -p "Once upon a time,"

rm -r Dracones_Midnight-Miqu-103B-v1.0_exl2_4.25bpw

## 4.5
huggingface-cli download --local-dir-use-symlinks=False --local-dir Dracones_Midnight-Miqu-103B-v1.0_exl2_4.5bpw Dracones/Midnight-Miqu-103B-v1.0_exl2_4.5bpw

python test_inference.py -m Dracones_Midnight-Miqu-103B-v1.0_exl2_4.5bpw -p "Once upon a time,"

rm -r Dracones_Midnight-Miqu-103B-v1.0_exl2_4.5bpw

## 5.0
huggingface-cli download --local-dir-use-symlinks=False --local-dir Dracones_Midnight-Miqu-103B-v1.0_exl2_5.0bpw Dracones/Midnight-Miqu-103B-v1.0_exl2_5.0bpw

python test_inference.py -m Dracones_Midnight-Miqu-103B-v1.0_exl2_5.0bpw -p "Once upon a time,"

Dracones

Owner Mar 8, 2024

Dracones changed discussion status to closed Mar 8, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment