Instructions to use BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF", filename="DaringLotus-v2-10.7B-Q3_K.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF:Q6_K # Run inference directly in the terminal: llama-cli -hf BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF:Q6_K
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF:Q6_K # Run inference directly in the terminal: llama-cli -hf BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF:Q6_K
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF:Q6_K # Run inference directly in the terminal: ./llama-cli -hf BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF:Q6_K
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF:Q6_K # Run inference directly in the terminal: ./build/bin/llama-cli -hf BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF:Q6_K
Use Docker
docker model run hf.co/BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF:Q6_K
- LM Studio
- Jan
- Ollama
How to use BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF with Ollama:
ollama run hf.co/BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF:Q6_K
- Unsloth Studio
How to use BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF to start chatting
- Atomic Chat new
- Docker Model Runner
How to use BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF with Docker Model Runner:
docker model run hf.co/BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF:Q6_K
- Lemonade
How to use BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull BlueNipples/DaringLotus-SnowLotus-10.7b-IQ-GGUF:Q6_K
Run and chat with the model
lemonade run user.DaringLotus-SnowLotus-10.7b-IQ-GGUF-Q6_K
List all available models
lemonade list
Important Note
The most recent version of llama.cpp has broken historical GGUFs, so I am uploading a few requants to preserve these two models compatibility. These will be called v3 in the file naming even though they are the same model.
Summary
3-4x Importance Matrix GGUFs and 3-4x regular GGUFs for https://huggingface.co/BlueNipples/SnowLotus-v2-10.7B and https://huggingface.co/BlueNipples/DaringLotus-v2-10.7b.
I added a few more quants. I'm super happy with these merges, they turned out great. Basically Daring is the slightly more creative/prose oriented one, but also slightly less coherent. Daring basically nessesitates regens/swipes. They both have excellent prose for their size that is largely not very gpt-ish and are able to often take story context, lore entries and character card info into account. You can probably use these as your mainstay - which especially helpful if you GPU struggles with 13b, and honestly I think these models are probably equal to or better than any 13b anyway. I might be wrong, but I do think they are very good compared to anything I've personally run. See the individual model cards for merge recipe details.
Thanks to lucyknada for helping me get the imatrix quants done quicker!
Importance Matrix Note
Imatrix currently does not run with Koboldcpp although bound to be supported in the future as it is supported by Llamacpp (and I'm guessing therefor ooba). Those quants should provide a perplexity boost especially to the smaller quants. The dat files are also there so if you make a fp16 gguf from the main model cards you might be able to save yourself some time producing your own imatrix quants.
Format Notes
Solar is desgined for 4k context, but Nyx reports that his merge works to 8k. Given this has a slerp gradient back into that, I'm not sure which applies here. Alpaca instruct formatting.
Ayumi Index
http://ayumi.m8geil.de/erp4_chatlogs/?S=rma_0#!/index
In the Ayumi ERPv4 Chat Log Index, SnowLotus scores a 94.10 in Flesch which means it produces more complex sentences than Daring (quite complex), DaringLotus scores higher in Var and Ad[jv], which means it makes heavier use of adjectives and adverbs (is more descriptive). Noteably Daring is in the top 8 for adjectives in a sentence, highest in it's weight class if you discount the chinese model, and in general both models did very well on this metric (SnowLotus ranks higher here than anything above it in IQ4), showcasing their descriptive ability.
SnowLotus beats DaringLotus on IQ4 with a score of 70.94, only bet by SOLAR Instruct and Fimbulvetr in it's weight class (altho also noteably Kunoichi 7b by a slim margin), DaringLotus is a bit lower at 65.37 - not as smart.
Interestingly the benchmarking here showed repetition for both models (which I haven't seen), but more with SnowLotus - so it's possible Daring repeats less than SnowLotus? These roughly confirm my impressions of the differences, altho potentially reveal some new details too. I've had a great experience RPing with these models, and seen no repetition myself, but be sure to use MinP or DynaTemp rather than the older samplers and be prepared to regen anything they get stuck on!
- Downloads last month
- 384
