How to use from
vLLM
Install from pip and serve model
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "TeeZee/Qra-13B-chat"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TeeZee/Qra-13B-chat",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'
Use Docker
docker model run hf.co/TeeZee/Qra-13B-chat
Quick Links

TeeZee/Qra-13B-chat

  • Developed by: TeeZee
  • License: llama2
  • Finetuned from model : TeeZee/Qra-13b-instruct
  • Finetuned on a PL chat dataset
  • Alpaca chat format, tested, works fine, can follow chat coherently

This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.

All comments are greatly appreciated, download, test and if you appreciate my work, consider buying me my fuel: Buy Me A Coffee

Downloads last month
1,347
Safetensors
Model size
13B params
Tensor type
BF16
·
Inference Providers NEW
Input a message to start chatting with TeeZee/Qra-13B-chat.

Model tree for TeeZee/Qra-13B-chat

Base model

OPI-PG/Qra-13b
Finetuned
(1)
this model
Quantizations
4 models

Dataset used to train TeeZee/Qra-13B-chat

Collection including TeeZee/Qra-13B-chat