Instructions to use Naphula-Archives/MN-Raven-12B-v0-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Naphula-Archives/MN-Raven-12B-v0-GGUF with Transformers:

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Naphula-Archives/MN-Raven-12B-v0-GGUF", dtype="auto")

llama-cpp-python

How to use Naphula-Archives/MN-Raven-12B-v0-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Naphula-Archives/MN-Raven-12B-v0-GGUF",
	filename="MN-Raven-12B-v0a-Q8_0.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use Naphula-Archives/MN-Raven-12B-v0-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Naphula-Archives/MN-Raven-12B-v0-GGUF:Q8_0
# Run inference directly in the terminal:
llama-cli -hf Naphula-Archives/MN-Raven-12B-v0-GGUF:Q8_0

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Naphula-Archives/MN-Raven-12B-v0-GGUF:Q8_0
# Run inference directly in the terminal:
llama-cli -hf Naphula-Archives/MN-Raven-12B-v0-GGUF:Q8_0

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Naphula-Archives/MN-Raven-12B-v0-GGUF:Q8_0
# Run inference directly in the terminal:
./llama-cli -hf Naphula-Archives/MN-Raven-12B-v0-GGUF:Q8_0

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Naphula-Archives/MN-Raven-12B-v0-GGUF:Q8_0
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Naphula-Archives/MN-Raven-12B-v0-GGUF:Q8_0

Use Docker

docker model run hf.co/Naphula-Archives/MN-Raven-12B-v0-GGUF:Q8_0

LM Studio
Jan
Ollama
How to use Naphula-Archives/MN-Raven-12B-v0-GGUF with Ollama:
```
ollama run hf.co/Naphula-Archives/MN-Raven-12B-v0-GGUF:Q8_0
```

Unsloth Studio

How to use Naphula-Archives/MN-Raven-12B-v0-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Naphula-Archives/MN-Raven-12B-v0-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Naphula-Archives/MN-Raven-12B-v0-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Naphula-Archives/MN-Raven-12B-v0-GGUF to start chatting

Docker Model Runner
How to use Naphula-Archives/MN-Raven-12B-v0-GGUF with Docker Model Runner:
```
docker model run hf.co/Naphula-Archives/MN-Raven-12B-v0-GGUF:Q8_0
```

Lemonade

How to use Naphula-Archives/MN-Raven-12B-v0-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Naphula-Archives/MN-Raven-12B-v0-GGUF:Q8_0

Run and chat with the model

lemonade run user.MN-Raven-12B-v0-GGUF-Q8_0

List all available models

lemonade list

12B raven taking longer than expected, wont absorb enough 'style'

by Naphula - opened 1 day ago

Discussion

Naphula

Naphula-Archives org 1 day ago

•

edited 1 day ago

12B nemo is a real bitch to finetune compared to 8B llama. I have over 40 LoRAs saved (ranging from r 16 to 256), and tested each one, and while creative, I still can't get it to sound like 8B which says it is "the ghost of Edgar Allan Poe". It identifies usually as Mr. Hyde or Frankenstein's monster instead. The LLM assistant seems to think this is due to Model Architecture Bias, and that Nemo likely had less pretraining of Poe material compared to Llama 8B, so the dataset has "less neurons to latch onto" despite having 4B more parameters of space.

Fixing EOS bugs (as seen with v0a, this prototype) was the easy part. What went wrong here was using incorrect special tokens. This setup here is stable:

"is_mistral_derived_model": true,

      "special_tokens": {
        "eos_token": "</s>",
        "pad_token": "<pad>"
      },

The hard part is getting the model to "think it is Poe" without editing the dataset for additional reinforcement. Cranking up LR broke the model (1e-4 seems like the best value), additional epochs made no difference.

I have spent a bit on runpod serverless payloads (over $30) trying to find the "magic settings" for 12B since the 8B settings don't work as well for it. At this point I'm stepping back from trying to get Nemo finetunes to sound like the Llama ones and will probably move up to 24B instead.

So, Raven 12B might not be as good as 8B at sounding like Poe, but the latest LoRAs are still highly creative, gothic style writers, and I'm testing a few merge combinations to determine the highest quality version for release.

(All the loras sound different than any existing Nemo model, but I think the ceiling is lower for Nemo 12B than Llama 8B, the capacity for variety seems lower overall.) There's no point making any more LORAs since i tested all setting variations I could think of, so most likely it would require a Poe_v2 dataset upgrade to get it "perfect".

The model won't be released until it has reached a minimum quality threshold via multiple new prompt tests I've created for it. However, a few more 12Bs are planned after this, so once optimal settings are found, future 12B finetunes should be easier.

Naphula

Naphula-Archives org 1 day ago

v0n and v0o are quite stable and creative, these are being tested in varous combinations now. GGUFs uploading too

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment