How to use from
SGLang
Install from pip and serve model
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "mwitiderrick/open_llama_3b_instruct_v_0.1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mwitiderrick/open_llama_3b_instruct_v_0.1",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'
Use Docker images
docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "mwitiderrick/open_llama_3b_instruct_v_0.1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mwitiderrick/open_llama_3b_instruct_v_0.1",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'
Quick Links

OpenLLaMA Instruct: An Open Reproduction of LLaMA

This is an OpenLlama model that has been fine-tuned on 2 epochs of the first 5000 samples from the Open-Platypus dataset.

The modified version of the dataset can be found here

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM,pipeline

tokenizer = AutoTokenizer.from_pretrained("mwitiderrick/open_llama_3b_chat_v_0.1")
model = AutoModelForCausalLM.from_pretrained("mwitiderrick/open_llama_3b_chat_v_0.1")
query = "How can I evaluate the performance and quality of the generated text from language models?"
text_gen = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)
output = text_gen(f"### Instruction:\n{query}### Response:\n")
print(output[0]['generated_text'])
"""
### Instruction:
How can I evaluate the performance and quality of the generated text from language models?### Response:
I want to evaluate the performance of the language model by comparing the generated text with the original text. I can use a similarity measure to compare the two texts. For example, I can use the Levenshtein distance, which measures the number of edits needed to transform one text into another. The Levenshtein distance between two texts is the minimum number of edits needed to transform one text into another. The Levenshtein distance between two texts is the minimum number of edits needed to transform one text into another. The Levenshtein distance between two texts is the minimum number of edits needed to transform one text into another. The Levenshtein distance between two texts is the minimum number of edits needed to transform one text into another. The Levenshtein distance between two texts is the minimum number
"""
Downloads last month
6
Safetensors
Model size
3B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mwitiderrick/open_llama_3b_instruct_v_0.1

Finetuned
(18)
this model

Dataset used to train mwitiderrick/open_llama_3b_instruct_v_0.1