How to use from
vLLM
Install from pip and serve model
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "allura-org/MoE-Girl-1BA-7BT"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "allura-org/MoE-Girl-1BA-7BT",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'
Use Docker
docker model run hf.co/allura-org/MoE-Girl-1BA-7BT
Quick Links

MoE Girl 1bA 7bT

image/jpeg A finetune of OLMoE by AllenAI designed for roleplaying (and maybe general usecases if you try hard enough).

Disclaimer

PLEASE do not expect godliness out of this, it's a model with 1 billion active parameters. Expect something more akin to Gemma 2 2B, not Llama 3 8B.

Quants

GGUF (requires a newish version of llama.cpp or kobold.cpp 1.76):

Prompting

Use ChatML.

<|im_start|>system
You are a helpful assistant who talks like a pirate.<|im_end|>
<|im_start|>user
Hello there!<|im_end|>
<|im_start|>assistant
Yarr harr harr, me matey!<|im_end|>

Thanks

Special thanks to the members of Allura for testing and emotional support, as well as the creators of all the datasets that were used in the Special Sauce used to train this model. I love you all <3 - Fizz

Downloads last month
13
Safetensors
Model size
7B params
Tensor type
F32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for allura-org/MoE-Girl-1BA-7BT

Finetuned
(6)
this model
Quantizations
3 models

Collection including allura-org/MoE-Girl-1BA-7BT