Instructions to use PygmalionAI/pygmalion-6b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use PygmalionAI/pygmalion-6b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="PygmalionAI/pygmalion-6b") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("PygmalionAI/pygmalion-6b") model = AutoModelForMultimodalLM.from_pretrained("PygmalionAI/pygmalion-6b") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use PygmalionAI/pygmalion-6b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "PygmalionAI/pygmalion-6b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "PygmalionAI/pygmalion-6b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/PygmalionAI/pygmalion-6b
- SGLang
How to use PygmalionAI/pygmalion-6b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "PygmalionAI/pygmalion-6b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "PygmalionAI/pygmalion-6b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "PygmalionAI/pygmalion-6b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "PygmalionAI/pygmalion-6b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use PygmalionAI/pygmalion-6b with Docker Model Runner:
docker model run hf.co/PygmalionAI/pygmalion-6b
Error running on SageMaker
I'm new to this, just trying to get started using the model on SageMaker, using the new Deploy to SageMaker function/script.
After copying across, and starting an inference endpoint (using the supplied code) I get the following error when trying to run:
ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
"code": 400,
"type": "InternalServerException",
"message": "Could not load model /.sagemaker/mms/models/PygmalionAI__pygmalion-6b with any of the following classes: (\u003cclass \u0027transformers.models.auto.modeling_auto.AutoModelForSeq2SeqLM\u0027\u003e, \u003cclass \u0027transformers.models.auto.modeling_auto.AutoModelForCausalLM\u0027\u003e, \u003cclass \u0027transformers.models.gptj.modeling_gptj.GPTJForCausalLM\u0027\u003e)."
}
This is the input, as supplied in the deploy script:
predictor.predict({
'inputs': {
"past_user_inputs": ["Which movie is the best ?"],
"generated_responses": ["It's Die Hard for sure."],
"text": "Can you explain why ?"
}
})
Cloudwatch logs:
[INFO ] W-PygmalionAI__pygmalion-6b-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - mms.service.PredictionException: Could not load model /.sagemaker/mms/models/PygmalionAI__pygmalion-6b with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForSeq2SeqLM'>, <class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>, <class 'transformers.models.gptj.modeling_gptj.GPTJForCausalLM'>). : 400
Seems I have to supply the model via S3 in a .tar.gz format.
After doing that I get an InternalServiceException: gpt_neox
According to EleutherAI this is related to an incompatible Transformer version. The SageMaker boilerplate code specifies 4.17. Eleuther says to use 4.25... but SageMaker doesn't support anything past 4.17.... oh dear...
So is there a workaround by using a requirements.txt file along with the model somehow? Any hints on how to do that, and what versions do we need to be using?
Unfortunately I can't really help you with this - I don't use SageMaker so I have no idea how it works.
I managed to get the 1.3B version going, just trying to get the 6B going now.
Requirements text:
transformers==4.24.0
torch==1.13.1
This is the error I'm getting now when trying to launch 6b:
[WARN ] W-9000-model com.amazonaws.ml.mms.wlm.BatchAggregator - Load model failed: model, error: Worker died.
Again, I can't really help with this since I've never used SageMaker. Your best bet is probably contacting whatever support channel SageMaker offers to their customers.
@Uilo I have the same issue, did you solve it?
Not really... kind of with the smaller models.
Have a look at this for some more info though:
https://reddit.com/r/PygmalionAI/comments/11dmqly/running_pyg_on_aws_sagemaker/