Instructions to use PygmalionAI/pygmalion-6b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use PygmalionAI/pygmalion-6b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="PygmalionAI/pygmalion-6b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("PygmalionAI/pygmalion-6b")
model = AutoModelForMultimodalLM.from_pretrained("PygmalionAI/pygmalion-6b")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use PygmalionAI/pygmalion-6b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "PygmalionAI/pygmalion-6b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PygmalionAI/pygmalion-6b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/PygmalionAI/pygmalion-6b

SGLang

How to use PygmalionAI/pygmalion-6b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "PygmalionAI/pygmalion-6b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PygmalionAI/pygmalion-6b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "PygmalionAI/pygmalion-6b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PygmalionAI/pygmalion-6b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use PygmalionAI/pygmalion-6b with Docker Model Runner:
```
docker model run hf.co/PygmalionAI/pygmalion-6b
```

Error running on SageMaker

#13

by Uilo - opened Feb 28, 2023

Discussion

Uilo

Feb 28, 2023

I'm new to this, just trying to get started using the model on SageMaker, using the new Deploy to SageMaker function/script.

After copying across, and starting an inference endpoint (using the supplied code) I get the following error when trying to run:

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
"code": 400,
"type": "InternalServerException",
"message": "Could not load model /.sagemaker/mms/models/PygmalionAI__pygmalion-6b with any of the following classes: (\u003cclass \u0027transformers.models.auto.modeling_auto.AutoModelForSeq2SeqLM\u0027\u003e, \u003cclass \u0027transformers.models.auto.modeling_auto.AutoModelForCausalLM\u0027\u003e, \u003cclass \u0027transformers.models.gptj.modeling_gptj.GPTJForCausalLM\u0027\u003e)."
}

This is the input, as supplied in the deploy script:

predictor.predict({
'inputs': {
    "past_user_inputs": ["Which movie is the best ?"],
    "generated_responses": ["It's Die Hard for sure."],
    "text": "Can you explain why ?"
}
})

Cloudwatch logs:

[INFO ] W-PygmalionAI__pygmalion-6b-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - mms.service.PredictionException: Could not load model /.sagemaker/mms/models/PygmalionAI__pygmalion-6b with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForSeq2SeqLM'>, <class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>, <class 'transformers.models.gptj.modeling_gptj.GPTJForCausalLM'>). : 400

Uilo

Feb 28, 2023

Seems I have to supply the model via S3 in a .tar.gz format.

After doing that I get an InternalServiceException: gpt_neox

According to EleutherAI this is related to an incompatible Transformer version. The SageMaker boilerplate code specifies 4.17. Eleuther says to use 4.25... but SageMaker doesn't support anything past 4.17.... oh dear...

So is there a workaround by using a requirements.txt file along with the model somehow? Any hints on how to do that, and what versions do we need to be using?

11b

Pygmalion org Mar 1, 2023

Unfortunately I can't really help you with this - I don't use SageMaker so I have no idea how it works.

Uilo

Mar 2, 2023

I managed to get the 1.3B version going, just trying to get the 6B going now.

Requirements text:

transformers==4.24.0
torch==1.13.1

This is the error I'm getting now when trying to launch 6b:

[WARN ] W-9000-model com.amazonaws.ml.mms.wlm.BatchAggregator - Load model failed: model, error: Worker died.

11b

Pygmalion org Mar 17, 2023

Again, I can't really help with this since I've never used SageMaker. Your best bet is probably contacting whatever support channel SageMaker offers to their customers.

11b changed discussion status to closed Mar 17, 2023

Fire-Hound

Apr 29, 2023

@Uilo I have the same issue, did you solve it?

Uilo

May 1, 2023

Not really... kind of with the smaller models.

Have a look at this for some more info though:
https://reddit.com/r/PygmalionAI/comments/11dmqly/running_pyg_on_aws_sagemaker/

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment