Instructions to use IlyaGusev/saiga_llama3_70b_sft_m1_d5_abliterated_awq_4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use IlyaGusev/saiga_llama3_70b_sft_m1_d5_abliterated_awq_4bit with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="IlyaGusev/saiga_llama3_70b_sft_m1_d5_abliterated_awq_4bit")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("IlyaGusev/saiga_llama3_70b_sft_m1_d5_abliterated_awq_4bit")
model = AutoModelForCausalLM.from_pretrained("IlyaGusev/saiga_llama3_70b_sft_m1_d5_abliterated_awq_4bit")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use IlyaGusev/saiga_llama3_70b_sft_m1_d5_abliterated_awq_4bit with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "IlyaGusev/saiga_llama3_70b_sft_m1_d5_abliterated_awq_4bit"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "IlyaGusev/saiga_llama3_70b_sft_m1_d5_abliterated_awq_4bit",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/IlyaGusev/saiga_llama3_70b_sft_m1_d5_abliterated_awq_4bit

SGLang

How to use IlyaGusev/saiga_llama3_70b_sft_m1_d5_abliterated_awq_4bit with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "IlyaGusev/saiga_llama3_70b_sft_m1_d5_abliterated_awq_4bit" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "IlyaGusev/saiga_llama3_70b_sft_m1_d5_abliterated_awq_4bit",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "IlyaGusev/saiga_llama3_70b_sft_m1_d5_abliterated_awq_4bit" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "IlyaGusev/saiga_llama3_70b_sft_m1_d5_abliterated_awq_4bit",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use IlyaGusev/saiga_llama3_70b_sft_m1_d5_abliterated_awq_4bit with Docker Model Runner:
```
docker model run hf.co/IlyaGusev/saiga_llama3_70b_sft_m1_d5_abliterated_awq_4bit
```

pr01

by radm - opened Jun 17, 2024

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

-8

Files changed (8) hide show

model-00001-of-00009.safetensors +1 -1
model-00002-of-00009.safetensors +1 -1
model-00003-of-00009.safetensors +1 -1
model-00004-of-00009.safetensors +1 -1
model-00005-of-00009.safetensors +1 -1
model-00006-of-00009.safetensors +1 -1
model-00007-of-00009.safetensors +1 -1
model-00008-of-00009.safetensors +1 -1

model-00001-of-00009.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:545605b08a358134e94e9b19fcfcc04886f4f0da5288969bc234006ff860eeef
 size 4969219128

 version https://git-lfs.github.com/spec/v1
+oid sha256:98c24579187ea18dd1c7eaaf8084bcd5623859579a281031a5221016a760a950
 size 4969219128

model-00002-of-00009.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:46df0b2df291f86d41d9893b37dc26dfc839b09054cf3913a0840545b9cdb1c4
 size 4890226896

 version https://git-lfs.github.com/spec/v1
+oid sha256:4f39e85aca10edb08e67471f3a4274145b6fba2c9c1730866d5443401bb2e069
 size 4890226896

model-00003-of-00009.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:8e91b69c3f228d3b89ecc44fc1e884e35ce6ed24077b1228652c36e1a82e574c
 size 4890226992

 version https://git-lfs.github.com/spec/v1
+oid sha256:49c401cd46c507ba126a0b37c2ffafd108589449456543ec7e27b47950b2365b
 size 4890226992

model-00004-of-00009.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:ffe98f5b3aa8d889493615af0b42ea32325a5c4705cba697b2ffed03b32ea5b3
 size 4890226992

 version https://git-lfs.github.com/spec/v1
+oid sha256:a79b4e284281c9d8b43a89d39d1a8e1c07e0a192f971ab8a2c18995328f5d75e
 size 4890226992

model-00005-of-00009.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:dd953936f17d36387a00c25537e1191f7e05b7e12803816fb2b3d00a1004dfff
 size 4890226992

 version https://git-lfs.github.com/spec/v1
+oid sha256:b3fe6224e1ed11d350ef1b3bcf8715b108266bde6d4a9e80e34660e1f25bf0e7
 size 4890226992

model-00006-of-00009.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b13c0bb089f45da580735df7cb756a5c140dbd1c1a78e428ad488dd15c468728
 size 4890226992

 version https://git-lfs.github.com/spec/v1
+oid sha256:9ba7f88dac9e4214efd3c345bcaf16ae90c04e7388f8a70324c73210bd4e02c6
 size 4890226992

model-00007-of-00009.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:8485977e376f09ca7274b2a28bb087f5b5dfb737dfaf3dbe54bb2f9134ba2cdf
 size 4890226992

 version https://git-lfs.github.com/spec/v1
+oid sha256:14566a753359d7c3af993c719252f7dcd1466337eaa064fbd1a286d5c2b20ebd
 size 4890226992

model-00008-of-00009.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0aa6e853af537985cb181e242874a09c71d886bd15cf5a495163901fa21cbbdc
 size 3356068840

 version https://git-lfs.github.com/spec/v1
+oid sha256:f9d55a78d8abe842d293ba529f65cb0d04c699d8e312dd4f32b1bb444893d9fc
 size 3356068840