Instructions to use notstoic/pygmalion-13b-4bit-128g with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use notstoic/pygmalion-13b-4bit-128g with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="notstoic/pygmalion-13b-4bit-128g")# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("notstoic/pygmalion-13b-4bit-128g") model = AutoModelForMultimodalLM.from_pretrained("notstoic/pygmalion-13b-4bit-128g") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use notstoic/pygmalion-13b-4bit-128g with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "notstoic/pygmalion-13b-4bit-128g" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "notstoic/pygmalion-13b-4bit-128g", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/notstoic/pygmalion-13b-4bit-128g
- SGLang
How to use notstoic/pygmalion-13b-4bit-128g with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "notstoic/pygmalion-13b-4bit-128g" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "notstoic/pygmalion-13b-4bit-128g", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "notstoic/pygmalion-13b-4bit-128g" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "notstoic/pygmalion-13b-4bit-128g", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use notstoic/pygmalion-13b-4bit-128g with Docker Model Runner:
docker model run hf.co/notstoic/pygmalion-13b-4bit-128g
ERROR:The model could not be loaded because its type could not be inferred from its name.
I get this error when loading the model:
ERROR:The model could not be loaded because its type could not be inferred from its name.
ERROR:Please specify the type manually using the --model_type argument.
How do I load the model?
put llama as the type
I did, I did it in the UI and it says the same thing. I did in with --model_type llama as well and it jsut say 'done! Complete" and then It'll shut down after pressing enter
This was quantized using https://github.com/0cc4m/GPTQ-for-LLaMa meaning if you have some other version of GPTQ it might not work.
how can i get to GPTQ version needed?
https://github.com/ggerganov/llama.cpp/commit/2d5db48371052087a83974abda3767d1aedec598 llama version was bumped. model will need to be changed
https://github.com/ggerganov/llama.cpp/commit/2d5db48371052087a83974abda3767d1aedec598 llama version was bumped. model will need to be changed
This is the gptq repo, ggml repo here:
https://huggingface.co/notstoic/pygmalion-13b-ggml/
Although it's true that they're not bumped to the latest version of llama.cpp, yet.
https://github.com/ggerganov/llama.cpp/commit/2d5db48371052087a83974abda3767d1aedec598 llama version was bumped. model will need to be changed
This is the gptq repo, ggml repo here:
https://huggingface.co/notstoic/pygmalion-13b-ggml/Although it's true that they're not bumped to the latest version of llama.cpp, yet.
How do I run this? Where's the other files?
I get this error when loading the model:
ERROR:The model could not be loaded because its type could not be inferred from its name.
ERROR:Please specify the type manually using the --model_type argument.How do I load the model?
same thing happening to me
notstoic_PygmalionCoT-7b and notstoic_pygmalion-13b-4bit-128g are the only two models oogabooga will load. The other two just crap out with vomit text. Thank you for making them all but can you make it so the other ones work in oogabooga?
Thanks
I get this error when loading the model:
ERROR:The model could not be loaded because its type could not be inferred from its name.
ERROR:Please specify the type manually using the --model_type argument.How do I load the model?
For oobabooga:
Open 'webui.py' in a text editor, then in line 15 (should be the CMD_FLAGS line), remove --model_menu, then add the following within the ' ':
--model notstoic_pygmalion-13b-4bit-128g --model_type Llama
so it should look like (this is an example, yours may have other lines for extensions):
CMD_FLAGS = '--chat --groupsize 128 --wbits 4 --model notstoic_pygmalion-13b-4bit-128g --model_type Llama'
same error, however in my webui.py no such line(( and if I copy it there it is no effect
I get this error when loading the model:
ERROR:The model could not be loaded because its type could not be inferred from its name.
ERROR:Please specify the type manually using the --model_type argument.How do I load the model?
For oobabooga:
Open 'webui.py' in a text editor, then in line 15 (should be the
CMD_FLAGSline), remove--model_menu, then add the following within the' ':
--model notstoic_pygmalion-13b-4bit-128g --model_type Llamaso it should look like (this is an example, yours may have other lines for extensions):
CMD_FLAGS = '--chat --groupsize 128 --wbits 4 --model notstoic_pygmalion-13b-4bit-128g --model_type Llama'
same error, however in my webui.py no such line(( and if I copy it there it is no effect