Instructions to use akumaburn/Alpaca-Llama-3-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use akumaburn/Alpaca-Llama-3-8B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="akumaburn/Alpaca-Llama-3-8B")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("akumaburn/Alpaca-Llama-3-8B")
model = AutoModelForCausalLM.from_pretrained("akumaburn/Alpaca-Llama-3-8B")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use akumaburn/Alpaca-Llama-3-8B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "akumaburn/Alpaca-Llama-3-8B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "akumaburn/Alpaca-Llama-3-8B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/akumaburn/Alpaca-Llama-3-8B

SGLang

How to use akumaburn/Alpaca-Llama-3-8B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "akumaburn/Alpaca-Llama-3-8B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "akumaburn/Alpaca-Llama-3-8B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "akumaburn/Alpaca-Llama-3-8B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "akumaburn/Alpaca-Llama-3-8B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Unsloth Studio

How to use akumaburn/Alpaca-Llama-3-8B with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for akumaburn/Alpaca-Llama-3-8B to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for akumaburn/Alpaca-Llama-3-8B to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for akumaburn/Alpaca-Llama-3-8B to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="akumaburn/Alpaca-Llama-3-8B",
    max_seq_length=2048,
)

Docker Model Runner
How to use akumaburn/Alpaca-Llama-3-8B with Docker Model Runner:
```
docker model run hf.co/akumaburn/Alpaca-Llama-3-8B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Alpaca-Llama-3-8B

Fine Tuned using dataset: https://huggingface.co/datasets/yahma/alpaca-cleaned
Epoch Count: 1
Step Count: 6,470/6,470
Batch Size: 2
Gradient Accumulation Steps: 4
Context Size: 8192
Num examples: 51,760
Trainable Parameters: 41,943,040
Learning Rate: 0.00001
Training Loss: 0.960000
Fined Tuned using: Google Colab Pro (Nvidia T4 runtime)
Developed by: akumaburn
License: apache-2.0
Finetuned from model : unsloth/llama-3-8b-bnb-4bit
Prompt Format: Alpaca (https://libertai.io/apis/text-generation/prompting.html)
Chai ELO: 1146.84 (https://console.chaiverse.com/models/akumaburn-alpaca-llama-3-8b_v1)

Some GGUF quantizations can be found in https://huggingface.co/akumaburn/Alpaca-Llama-3-8B-GGUF

mistral-7b-openorca.Q8_0.gguf:

MMLU-Test: Final result: 41.5836 +/- 0.4174
Arc-Easy: Final result: 72.6316 +/- 1.8691
Truthful QA: Final result: 32.0685 +/- 1.6339
Arc-Challenge: Final result: 48.8294 +/- 2.8956

llama-3-8b-bnb-4bit.Q8_0.gguf:

MMLU-Test: Final result: 40.4074 +/- 0.4156
Arc-Easy: Final result: 73.8596 +/- 1.8421
Truthful QA: Final result: 26.6830 +/- 1.5484
Arc-Challenge: Final result: 46.8227 +/- 2.8906

Open_Orca_Llama-3-8B-unsloth.Q8_0.gguf:

MMLU-Test: Final result: 39.3818 +/- 0.4138
Arc-Easy: Final result: 67.3684 +/- 1.9656
Truthful QA: Final result: 29.0086 +/- 1.5886
Arc-Challenge: Final result: 42.1405 +/- 2.8604

Alpaca-Llama-3-8B-GGUF-unsloth.Q8_0.gguf:

MMLU-Test: Final result: 40.6441 +/- 0.4160
Arc-Easy: Final result: 77.5439 +/- 1.7494
Truthful QA: Final result: 29.7430 +/- 1.6003
Arc-Challenge: Final result: 50.5017 +/- 2.8963

Meta-Llama-3-8B.Q8_0.gguf:

MMLU-Test: Final result: 40.8664 +/- 0.4163
Arc-Easy: Final result: 74.3860 +/- 1.8299
Truthful QA: Final result: 28.6414 +/- 1.5826
Arc-Challenge: Final result: 47.1572 +/- 2.8917

Llama.cpp Options For Testing: --samplers "tfs;typical;temp" --draft 32 --ctx-size 8192 --temp 0.82 --tfs 0.8 --typical 1.1 --repeat-last-n 512 --batch-size 8192 --repeat-penalty 1.0 --n-gpu-layers 100 --threads 12

This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.