Instructions to use Dans-Archive/Dans-PileOfSets-Mk1-llama-13b-merged with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Dans-Archive/Dans-PileOfSets-Mk1-llama-13b-merged with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Dans-Archive/Dans-PileOfSets-Mk1-llama-13b-merged")

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("Dans-Archive/Dans-PileOfSets-Mk1-llama-13b-merged")
model = AutoModelForMultimodalLM.from_pretrained("Dans-Archive/Dans-PileOfSets-Mk1-llama-13b-merged")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Dans-Archive/Dans-PileOfSets-Mk1-llama-13b-merged with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Dans-Archive/Dans-PileOfSets-Mk1-llama-13b-merged"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Dans-Archive/Dans-PileOfSets-Mk1-llama-13b-merged",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Dans-Archive/Dans-PileOfSets-Mk1-llama-13b-merged

SGLang

How to use Dans-Archive/Dans-PileOfSets-Mk1-llama-13b-merged with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Dans-Archive/Dans-PileOfSets-Mk1-llama-13b-merged" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Dans-Archive/Dans-PileOfSets-Mk1-llama-13b-merged",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Dans-Archive/Dans-PileOfSets-Mk1-llama-13b-merged" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Dans-Archive/Dans-PileOfSets-Mk1-llama-13b-merged",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Dans-Archive/Dans-PileOfSets-Mk1-llama-13b-merged with Docker Model Runner:
```
docker model run hf.co/Dans-Archive/Dans-PileOfSets-Mk1-llama-13b-merged
```

Description:

This is a llama 13b model merge of the LoRA with the same name.

Objective for this project:

To create a model that upholds a logical thread, regardless of whether the output is verbose or concise. Training has been performed on a version of the pile of sets, reduced to 40% of its original size, to expedite training iterations. I personally utilize this model as an aid for storytelling and writing. While it serves this purpose adequately, I still perceive this version as a prototype.

Prompt format:

Stanford Alpaca

The prompt should start on a new line after "### Response:"

For examples with a non-empty input field:

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Input:
{input}

### Response:

For examples with an empty input field:

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Response:

Perplexity Benchmarks:

wikitext: 4.66796875

Training information:

2 Epochs
64 / 32 R / A
1024 Cutoff
19 hours on an A6000

Data used in training:

All cleaned and scrubbed in various ways then culled to various degrees.

Camel biology, physics, chemistry, math, and AI society
Alpaca evol instruct
GPTeacher Instruct
Alpaca GPT4
Dolly Databricks

Plans for the future, a brief overview:

Pivot to a conversational format going forward
Train another 13b LoRA against the entirety of my pile of sets rather than just a portion of it for Mk2
Train 30b on the Mk2 pile of sets
Expand the story generation capabilities and likely more for Mk3

Model used for training and other information:

https://huggingface.co/PocketDoc/llama-13b-gptq-4bit-128g

Merge model: https://huggingface.co/huggyllama/llama-13b

Disclaimer:

It has not been aligned and no warranty is given for the quality or safety of its outputs.

Downloads last month: 5

Dans-Archive
/

Dans-PileOfSets-Mk1-llama-13b-merged