Instructions to use Dans-Archive/Dans-PileOfSets-Mk1-llama-13b-merged with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Dans-Archive/Dans-PileOfSets-Mk1-llama-13b-merged with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Dans-Archive/Dans-PileOfSets-Mk1-llama-13b-merged")# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("Dans-Archive/Dans-PileOfSets-Mk1-llama-13b-merged") model = AutoModelForMultimodalLM.from_pretrained("Dans-Archive/Dans-PileOfSets-Mk1-llama-13b-merged") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Dans-Archive/Dans-PileOfSets-Mk1-llama-13b-merged with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Dans-Archive/Dans-PileOfSets-Mk1-llama-13b-merged" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Dans-Archive/Dans-PileOfSets-Mk1-llama-13b-merged", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Dans-Archive/Dans-PileOfSets-Mk1-llama-13b-merged
- SGLang
How to use Dans-Archive/Dans-PileOfSets-Mk1-llama-13b-merged with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Dans-Archive/Dans-PileOfSets-Mk1-llama-13b-merged" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Dans-Archive/Dans-PileOfSets-Mk1-llama-13b-merged", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Dans-Archive/Dans-PileOfSets-Mk1-llama-13b-merged" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Dans-Archive/Dans-PileOfSets-Mk1-llama-13b-merged", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Dans-Archive/Dans-PileOfSets-Mk1-llama-13b-merged with Docker Model Runner:
docker model run hf.co/Dans-Archive/Dans-PileOfSets-Mk1-llama-13b-merged
Description:
This is a llama 13b model merge of the LoRA with the same name.
Objective for this project:
To create a model that upholds a logical thread, regardless of whether the output is verbose or concise. Training has been performed on a version of the pile of sets, reduced to 40% of its original size, to expedite training iterations. I personally utilize this model as an aid for storytelling and writing. While it serves this purpose adequately, I still perceive this version as a prototype.
Prompt format:
Stanford Alpaca
The prompt should start on a new line after "### Response:"
- For examples with a non-empty input field:
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Input:
{input}
### Response:
- For examples with an empty input field:
Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Response:
Perplexity Benchmarks:
- wikitext: 4.66796875
Training information:
- 2 Epochs
- 64 / 32 R / A
- 1024 Cutoff
- 19 hours on an A6000
Data used in training:
All cleaned and scrubbed in various ways then culled to various degrees.
- Camel biology, physics, chemistry, math, and AI society
- Alpaca evol instruct
- GPTeacher Instruct
- Alpaca GPT4
- Dolly Databricks
Plans for the future, a brief overview:
- Pivot to a conversational format going forward
- Train another 13b LoRA against the entirety of my pile of sets rather than just a portion of it for Mk2
- Train 30b on the Mk2 pile of sets
- Expand the story generation capabilities and likely more for Mk3
Model used for training and other information:
https://huggingface.co/PocketDoc/llama-13b-gptq-4bit-128g
Merge model: https://huggingface.co/huggyllama/llama-13b
Disclaimer:
It has not been aligned and no warranty is given for the quality or safety of its outputs.
- Downloads last month
- 5