Text Generation
Transformers
Safetensors
PyTorch
English
llama
facebook
meta
llama-2
text-generation-inference
4-bit precision
gptq
Instructions to use TheBloke/Llama-2-13B-chat-GPTQ with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use TheBloke/Llama-2-13B-chat-GPTQ with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="TheBloke/Llama-2-13B-chat-GPTQ")# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("TheBloke/Llama-2-13B-chat-GPTQ") model = AutoModelForMultimodalLM.from_pretrained("TheBloke/Llama-2-13B-chat-GPTQ") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use TheBloke/Llama-2-13B-chat-GPTQ with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "TheBloke/Llama-2-13B-chat-GPTQ" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TheBloke/Llama-2-13B-chat-GPTQ", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/TheBloke/Llama-2-13B-chat-GPTQ
- SGLang
How to use TheBloke/Llama-2-13B-chat-GPTQ with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "TheBloke/Llama-2-13B-chat-GPTQ" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TheBloke/Llama-2-13B-chat-GPTQ", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "TheBloke/Llama-2-13B-chat-GPTQ" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TheBloke/Llama-2-13B-chat-GPTQ", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use TheBloke/Llama-2-13B-chat-GPTQ with Docker Model Runner:
docker model run hf.co/TheBloke/Llama-2-13B-chat-GPTQ
output embeddings
#54 opened almost 2 years ago
by
pureve
output content
#53 opened almost 2 years ago
by
pureve
How to convert 4bit model back to fp16 data format?
3
#52 opened over 2 years ago
by
tremblingbrain
add template
#51 opened over 2 years ago
by
philschmid
torch.cuda.OutOfMemoryError: CUDA out of memory.
#50 opened over 2 years ago
by
neo-benjamin
Can you please provide 'c4' version?
#49 opened over 2 years ago
by
leeee1204
How much does it take to inference one sample?
#48 opened over 2 years ago
by
andreaKIM
Issues with CUDA and exllama_kernels
9
#47 opened over 2 years ago
by
ditchtech
Calling LlamaTokenizerFast.from_pretrained() with the path to a single file or url is not supported for this tokenizer. Use a model identifier or the path to a directory instead.
#46 opened over 2 years ago
by
kidrah-yxalag
Hallucination issue in Llama-2-13B-chat-GPTQ
7
#45 opened over 2 years ago
by
DivyanshTiwari7
Increasing the model's predefined max length
#44 opened over 2 years ago
by
MLconArtist
[AUTOMATED] Model Memory Requirements
#43 opened over 2 years ago
by
model-sizer-bot
Deploying TheBloke/Llama-2-13B-chat-GPTQ as a batch end point in sagemaker
#41 opened over 2 years ago
by
vinaykakara
Deploying this on Text Generation Inference (TGI) server on AWS SageMaker
1
#38 opened almost 3 years ago
by
ZaydJamadar
Understanding materials
1
#37 opened almost 3 years ago
by
rishabh-gurbani
Temperature or top_p is not working
2
#35 opened almost 3 years ago
by
chintan4560
Train model with webui
1
#34 opened almost 3 years ago
by
Samitoo
HuggingFace's bitsandbytes vs AutoGPTQ?
👍 1
2
#33 opened almost 3 years ago
by
chongcy
What library was used to quantize this model ?
1
#32 opened almost 3 years ago
by
ImWolf7
Dataset used for quantisation
2
#31 opened almost 3 years ago
by
CarlosAndrea
How to make it (Llama-2-13B-chat-GPTQ) work with Fastchat
4
#30 opened almost 3 years ago
by
Vishvendra
Error: Transformers import module musicgen
#29 opened almost 3 years ago
by
galdezanni
Finetuning the model using custom dataset.
#28 opened almost 3 years ago
by
Varanasi5213
Necessary material for llama2
7
#27 opened almost 3 years ago
by
Samitoo
Converting hf format model to 128g.safetensors
7
#26 opened almost 3 years ago
by
goodromka
Llama-2-13B-chat-GPTQ problem
2
#23 opened almost 3 years ago
by
nigsdf
Getting an error: AttributeError: module 'accelerate.utils' has no attribute 'modeling'. Please tell me what should i do?
#21 opened almost 3 years ago
by
Dhairye
Getting error while loading model_basename = "gptq_model-8bit-128g"
7
#20 opened almost 3 years ago
by
Pchaudhary
fine tune on custom chat dataset using QLORA & PEFT
3
#19 opened almost 3 years ago
by
yashk92
General Update Question for LLMs
2
#17 opened almost 3 years ago
by
Acrious
File not found error while loading model
19
#14 opened almost 3 years ago
by
Osamarafique998
CPU Inference
1
#13 opened almost 3 years ago
by
Ange09
Slow Inference Speed
#12 opened almost 3 years ago
by
asifahmed
Error while loading model from path
3
#11 opened almost 3 years ago
by
abhishekpandit
Censorship is hilarious
6
#10 opened almost 3 years ago
by
tea-lover-418
why it says no quantize_config.json file but it has
6
#9 opened almost 3 years ago
by
Mark000111888
Error loading model from a different branch with revision
9
#8 opened almost 3 years ago
by
amitj
Llama v2 GPTQ context length
6
#7 opened almost 3 years ago
by
andrewsameh
Is this model based on `chat` or `chat-hf` model of llama2?
👍 1
3
#6 opened almost 3 years ago
by
pootow
Prompt format
8
#5 opened almost 3 years ago
by
mr96
Bravo! That was fast : )
❤️ 5
2
#3 opened almost 3 years ago
by
jacobgoldenart
Doesn't contain the files
3
#1 opened almost 3 years ago
by
aminedjeghri