Instructions to use notstoic/pygmalion-13b-4bit-128g with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use notstoic/pygmalion-13b-4bit-128g with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="notstoic/pygmalion-13b-4bit-128g")

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("notstoic/pygmalion-13b-4bit-128g")
model = AutoModelForMultimodalLM.from_pretrained("notstoic/pygmalion-13b-4bit-128g")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use notstoic/pygmalion-13b-4bit-128g with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "notstoic/pygmalion-13b-4bit-128g"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "notstoic/pygmalion-13b-4bit-128g",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/notstoic/pygmalion-13b-4bit-128g

SGLang

How to use notstoic/pygmalion-13b-4bit-128g with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "notstoic/pygmalion-13b-4bit-128g" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "notstoic/pygmalion-13b-4bit-128g",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "notstoic/pygmalion-13b-4bit-128g" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "notstoic/pygmalion-13b-4bit-128g",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use notstoic/pygmalion-13b-4bit-128g with Docker Model Runner:
```
docker model run hf.co/notstoic/pygmalion-13b-4bit-128g
```

ERROR:The model could not be loaded because its type could not be inferred from its name.

by anobu88 - opened May 19, 2023

Discussion

anobu88

May 19, 2023

I get this error when loading the model:
ERROR:The model could not be loaded because its type could not be inferred from its name.
ERROR:Please specify the type manually using the --model_type argument.

How do I load the model?

Entity

May 19, 2023

put llama as the type

anobu88

May 19, 2023

I did, I did it in the UI and it says the same thing. I did in with --model_type llama as well and it jsut say 'done! Complete" and then It'll shut down after pressing enter

mpasila

May 19, 2023

This was quantized using https://github.com/0cc4m/GPTQ-for-LLaMa meaning if you have some other version of GPTQ it might not work.

AliOsman123

May 19, 2023

how can i get to GPTQ version needed?

Tom-Neverwinter

May 20, 2023

https://github.com/ggerganov/llama.cpp/commit/2d5db48371052087a83974abda3767d1aedec598 llama version was bumped. model will need to be changed

notstoic

Owner May 20, 2023

https://github.com/ggerganov/llama.cpp/commit/2d5db48371052087a83974abda3767d1aedec598 llama version was bumped. model will need to be changed

This is the gptq repo, ggml repo here:
https://huggingface.co/notstoic/pygmalion-13b-ggml/

Although it's true that they're not bumped to the latest version of llama.cpp, yet.

anobu88

May 20, 2023

https://github.com/ggerganov/llama.cpp/commit/2d5db48371052087a83974abda3767d1aedec598 llama version was bumped. model will need to be changed

This is the gptq repo, ggml repo here:
https://huggingface.co/notstoic/pygmalion-13b-ggml/

Although it's true that they're not bumped to the latest version of llama.cpp, yet.

How do I run this? Where's the other files?

psychether

May 21, 2023

I get this error when loading the model:
ERROR:The model could not be loaded because its type could not be inferred from its name.
ERROR:Please specify the type manually using the --model_type argument.

How do I load the model?

same thing happening to me

Humeee33

May 23, 2023

notstoic_PygmalionCoT-7b and notstoic_pygmalion-13b-4bit-128g are the only two models oogabooga will load. The other two just crap out with vomit text. Thank you for making them all but can you make it so the other ones work in oogabooga?
Thanks

Kyodan

May 23, 2023

•

edited May 23, 2023

I get this error when loading the model:
ERROR:The model could not be loaded because its type could not be inferred from its name.
ERROR:Please specify the type manually using the --model_type argument.

How do I load the model?

For oobabooga:

Open 'webui.py' in a text editor, then in line 15 (should be the CMD_FLAGS line), remove --model_menu, then add the following within the ' ':

--model notstoic_pygmalion-13b-4bit-128g --model_type Llama

so it should look like (this is an example, yours may have other lines for extensions):

CMD_FLAGS = '--chat --groupsize 128 --wbits 4 --model notstoic_pygmalion-13b-4bit-128g --model_type Llama'

Elrenuir

Jun 4, 2023

same error, however in my webui.py no such line(( and if I copy it there it is no effect

Elrenuir

Jun 6, 2023

I get this error when loading the model:
ERROR:The model could not be loaded because its type could not be inferred from its name.
ERROR:Please specify the type manually using the --model_type argument.

How do I load the model?

For oobabooga:

Open 'webui.py' in a text editor, then in line 15 (should be the CMD_FLAGS line), remove --model_menu, then add the following within the ' ':

--model notstoic_pygmalion-13b-4bit-128g --model_type Llama

so it should look like (this is an example, yours may have other lines for extensions):

CMD_FLAGS = '--chat --groupsize 128 --wbits 4 --model notstoic_pygmalion-13b-4bit-128g --model_type Llama'

same error, however in my webui.py no such line(( and if I copy it there it is no effect

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment