12B raven taking longer than expected, wont absorb enough 'style'

#2
by Naphula - opened
Naphula-Archives org
edited 1 day ago

12B nemo is a real bitch to finetune compared to 8B llama. I have over 40 LoRAs saved (ranging from r 16 to 256), and tested each one, and while creative, I still can't get it to sound like 8B which says it is "the ghost of Edgar Allan Poe". It identifies usually as Mr. Hyde or Frankenstein's monster instead. The LLM assistant seems to think this is due to Model Architecture Bias, and that Nemo likely had less pretraining of Poe material compared to Llama 8B, so the dataset has "less neurons to latch onto" despite having 4B more parameters of space.

Fixing EOS bugs (as seen with v0a, this prototype) was the easy part. What went wrong here was using incorrect special tokens. This setup here is stable:

"is_mistral_derived_model": true,

      "special_tokens": {
        "eos_token": "</s>",
        "pad_token": "<pad>"
      },

The hard part is getting the model to "think it is Poe" without editing the dataset for additional reinforcement. Cranking up LR broke the model (1e-4 seems like the best value), additional epochs made no difference.

I have spent a bit on runpod serverless payloads (over $30) trying to find the "magic settings" for 12B since the 8B settings don't work as well for it. At this point I'm stepping back from trying to get Nemo finetunes to sound like the Llama ones and will probably move up to 24B instead.

So, Raven 12B might not be as good as 8B at sounding like Poe, but the latest LoRAs are still highly creative, gothic style writers, and I'm testing a few merge combinations to determine the highest quality version for release.

(All the loras sound different than any existing Nemo model, but I think the ceiling is lower for Nemo 12B than Llama 8B, the capacity for variety seems lower overall.) There's no point making any more LORAs since i tested all setting variations I could think of, so most likely it would require a Poe_v2 dataset upgrade to get it "perfect".

The model won't be released until it has reached a minimum quality threshold via multiple new prompt tests I've created for it. However, a few more 12Bs are planned after this, so once optimal settings are found, future 12B finetunes should be easier.

Naphula-Archives org

v0n and v0o are quite stable and creative, these are being tested in varous combinations now. GGUFs uploading too

Sign up or log in to comment