Instructions to use Naphula-Archives/MN-Raven-12B-v0-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Naphula-Archives/MN-Raven-12B-v0-GGUF with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Naphula-Archives/MN-Raven-12B-v0-GGUF", dtype="auto") - llama-cpp-python
How to use Naphula-Archives/MN-Raven-12B-v0-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Naphula-Archives/MN-Raven-12B-v0-GGUF", filename="MN-Raven-12B-v0a-Q8_0.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use Naphula-Archives/MN-Raven-12B-v0-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Naphula-Archives/MN-Raven-12B-v0-GGUF:Q8_0 # Run inference directly in the terminal: llama-cli -hf Naphula-Archives/MN-Raven-12B-v0-GGUF:Q8_0
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Naphula-Archives/MN-Raven-12B-v0-GGUF:Q8_0 # Run inference directly in the terminal: llama-cli -hf Naphula-Archives/MN-Raven-12B-v0-GGUF:Q8_0
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Naphula-Archives/MN-Raven-12B-v0-GGUF:Q8_0 # Run inference directly in the terminal: ./llama-cli -hf Naphula-Archives/MN-Raven-12B-v0-GGUF:Q8_0
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Naphula-Archives/MN-Raven-12B-v0-GGUF:Q8_0 # Run inference directly in the terminal: ./build/bin/llama-cli -hf Naphula-Archives/MN-Raven-12B-v0-GGUF:Q8_0
Use Docker
docker model run hf.co/Naphula-Archives/MN-Raven-12B-v0-GGUF:Q8_0
- LM Studio
- Jan
- Ollama
How to use Naphula-Archives/MN-Raven-12B-v0-GGUF with Ollama:
ollama run hf.co/Naphula-Archives/MN-Raven-12B-v0-GGUF:Q8_0
- Unsloth Studio
How to use Naphula-Archives/MN-Raven-12B-v0-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Naphula-Archives/MN-Raven-12B-v0-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Naphula-Archives/MN-Raven-12B-v0-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Naphula-Archives/MN-Raven-12B-v0-GGUF to start chatting
- Docker Model Runner
How to use Naphula-Archives/MN-Raven-12B-v0-GGUF with Docker Model Runner:
docker model run hf.co/Naphula-Archives/MN-Raven-12B-v0-GGUF:Q8_0
- Lemonade
How to use Naphula-Archives/MN-Raven-12B-v0-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Naphula-Archives/MN-Raven-12B-v0-GGUF:Q8_0
Run and chat with the model
lemonade run user.MN-Raven-12B-v0-GGUF-Q8_0
List all available models
lemonade list
12B raven taking longer than expected, wont absorb enough 'style'
12B nemo is a real bitch to finetune compared to 8B llama. I have over 40 LoRAs saved (ranging from r 16 to 256), and tested each one, and while creative, I still can't get it to sound like 8B which says it is "the ghost of Edgar Allan Poe". It identifies usually as Mr. Hyde or Frankenstein's monster instead. The LLM assistant seems to think this is due to Model Architecture Bias, and that Nemo likely had less pretraining of Poe material compared to Llama 8B, so the dataset has "less neurons to latch onto" despite having 4B more parameters of space.
Fixing EOS bugs (as seen with v0a, this prototype) was the easy part. What went wrong here was using incorrect special tokens. This setup here is stable:
"is_mistral_derived_model": true,
"special_tokens": {
"eos_token": "</s>",
"pad_token": "<pad>"
},
The hard part is getting the model to "think it is Poe" without editing the dataset for additional reinforcement. Cranking up LR broke the model (1e-4 seems like the best value), additional epochs made no difference.
I have spent a bit on runpod serverless payloads (over $30) trying to find the "magic settings" for 12B since the 8B settings don't work as well for it. At this point I'm stepping back from trying to get Nemo finetunes to sound like the Llama ones and will probably move up to 24B instead.
So, Raven 12B might not be as good as 8B at sounding like Poe, but the latest LoRAs are still highly creative, gothic style writers, and I'm testing a few merge combinations to determine the highest quality version for release.
(All the loras sound different than any existing Nemo model, but I think the ceiling is lower for Nemo 12B than Llama 8B, the capacity for variety seems lower overall.) There's no point making any more LORAs since i tested all setting variations I could think of, so most likely it would require a Poe_v2 dataset upgrade to get it "perfect".
The model won't be released until it has reached a minimum quality threshold via multiple new prompt tests I've created for it. However, a few more 12Bs are planned after this, so once optimal settings are found, future 12B finetunes should be easier.
v0n and v0o are quite stable and creative, these are being tested in varous combinations now. GGUFs uploading too