Instructions to use BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF",
	filename="DaringLotus-v2-10.7B-Q3_K.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF:Q6_K
# Run inference directly in the terminal:
llama-cli -hf BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF:Q6_K

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF:Q6_K
# Run inference directly in the terminal:
llama-cli -hf BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF:Q6_K

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF:Q6_K
# Run inference directly in the terminal:
./llama-cli -hf BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF:Q6_K

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF:Q6_K
# Run inference directly in the terminal:
./build/bin/llama-cli -hf BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF:Q6_K

Use Docker

docker model run hf.co/BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF:Q6_K

LM Studio
Jan
Ollama
How to use BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF with Ollama:
```
ollama run hf.co/BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF:Q6_K
```

Unsloth Studio

How to use BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF to start chatting

Atomic Chat new
Docker Model Runner
How to use BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF with Docker Model Runner:
```
docker model run hf.co/BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF:Q6_K
```

Lemonade

How to use BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF:Q6_K

Run and chat with the model

lemonade run user.DaringLotus-SnowLotus-10.7b-IQ-GGUF-Q6_K

List all available models

lemonade list

Important Note

The most recent version of llama.cpp has broken historical GGUFs, so I am uploading a few requants to preserve these two models compatibility. These will be called v3 in the file naming even though they are the same model.

Summary

3-4x Importance Matrix GGUFs and 3-4x regular GGUFs for https://huggingface.co/BlueNipples/SnowLotus-v2-10.7B and https://huggingface.co/BlueNipples/DaringLotus-v2-10.7b.

I added a few more quants. I'm super happy with these merges, they turned out great. Basically Daring is the slightly more creative/prose oriented one, but also slightly less coherent. Daring basically nessesitates regens/swipes. They both have excellent prose for their size that is largely not very gpt-ish and are able to often take story context, lore entries and character card info into account. You can probably use these as your mainstay - which especially helpful if you GPU struggles with 13b, and honestly I think these models are probably equal to or better than any 13b anyway. I might be wrong, but I do think they are very good compared to anything I've personally run. See the individual model cards for merge recipe details.

Thanks to lucyknada for helping me get the imatrix quants done quicker!

Importance Matrix Note

Imatrix currently does not run with Koboldcpp although bound to be supported in the future as it is supported by Llamacpp (and I'm guessing therefor ooba). Those quants should provide a perplexity boost especially to the smaller quants. The dat files are also there so if you make a fp16 gguf from the main model cards you might be able to save yourself some time producing your own imatrix quants.

Format Notes

Solar is desgined for 4k context, but Nyx reports that his merge works to 8k. Given this has a slerp gradient back into that, I'm not sure which applies here. Alpaca instruct formatting.

Ayumi Index

http://ayumi.m8geil.de/erp4_chatlogs/?S=rma_0#!/index

In the Ayumi ERPv4 Chat Log Index, SnowLotus scores a 94.10 in Flesch which means it produces more complex sentences than Daring (quite complex), DaringLotus scores higher in Var and Ad[jv], which means it makes heavier use of adjectives and adverbs (is more descriptive). Noteably Daring is in the top 8 for adjectives in a sentence, highest in it's weight class if you discount the chinese model, and in general both models did very well on this metric (SnowLotus ranks higher here than anything above it in IQ4), showcasing their descriptive ability.

SnowLotus beats DaringLotus on IQ4 with a score of 70.94, only bet by SOLAR Instruct and Fimbulvetr in it's weight class (altho also noteably Kunoichi 7b by a slim margin), DaringLotus is a bit lower at 65.37 - not as smart.

Interestingly the benchmarking here showed repetition for both models (which I haven't seen), but more with SnowLotus - so it's possible Daring repeats less than SnowLotus? These roughly confirm my impressions of the differences, altho potentially reveal some new details too. I've had a great experience RPing with these models, and seen no repetition myself, but be sure to use MinP or DynaTemp rather than the older samplers and be prepared to regen anything they get stuck on!

Downloads last month: 384

GGUF

Model size

11B params

Architecture

llama

Hardware compatibility

2-bit

6-bit

View +6 variants

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support