potential

by Utochi - opened Apr 24

•

What an interesting model this is. Let me explain..

I go around testing models for every day roleplay use 'and' every day tool calling use such as with hermes agent (using my own personal personality that i created for it)

Okay so.. usually i use something like mag-mel R1 for roleplay. a good 12b roleplay model, follows instructions well, good at rp and whatnot, you can see a review i wrote for it on the main page for mag-mel. good model

i tested THIS model for roleplay and.. i will say its 'decent', what i mean by decent is its fast of course, it has google gemma gptism that i recognize, a 'tiny' bit of em dash usage but actually an acceptable level that i can ignore surprisingly. so ill review it like this;

creativity 6 out of 10
it is creative yes, but it feels slightly bland but can actually surprise me from time to time. other models WILL beat this one in creativity, im not surprised, im going to say, considering that im pretty much using 4b models here, this is actually rather decent here and punches out of its weight size. its funny, i say it is creative but bland, contradicting myself, because it does contradict. sometimes it does get creative but its rather rare, or takes a bunch of swipes. so its mostly bland.

refusals 10 out of 10
this model has not out right refused even once. worth mentioning. zero BS I'm sorry, i cannot-- nonsense.

hallucination 9 out of 10
this model does not seem to hallucinate hardly at all, some small issues on occasion. im testing up to 50k context size. nothing a swipe wont fix.

formatting instruction following 8 out of 10 (most models get lower results)
this model confused me at first, i couldnt understand why it was putting first person thought at the first line of roleplay EACH AND EVERY TIME it was annoying, until i realized.. I told it to without realizing it in my system prompt instructions. i gave it an example, most models mix and match a bit, this one takes it literally, be careful how you instruct it. it WILL follow the rules, i dock two points only because it does not really follow the rules where i say to keep responses between 1 to 4 paragraphs and it goes onto 5 or 6 sometimes, it likes the sound of its own voice. i give it a pass here because i technically could be more strict, i like some fluidity and flexibility on roleplay depending on the situation, tis my preference. it follows rules well.

instruction following pt2 10 out of 10
what i mean by part 2 is i gave it 'stats' to follow and update during a roleplay, as in, if one thing happens, increase or dock points or do nothing for certain stats and so on. THIS ONE FOLLOWS. it follows the rules beautifully, better than any sub 70b model ive seen thus far (exception being mag-mel r1), but this one punches way out of its level for pure instruction following for stats, it does it wonderfully, like, status changes on a character? does it. if its instructed, its there. for example, 'when char is below 50 points, switch 'healthy' to 'injured' or something like that. it nails it nicely. very hard for models to do this but it does.

sexuality.. 4 out of 10
yeah this one fails here, its so resistant to sexual matters, its a 'goody two shoes' model that while it wont refuse sexual things.. it is a saint and leans away from it considerably, like a naive sheltered princess xD, but yeah, it doesn't like it. its not to say it wont, it just needs pulled into it, it doesn't 'get the hint' if that makes sense, if you flirt with it, bring a bat with a label 'LEWD' and smack it on the head several times and it might get the picture type of thing. if i wrote a system prompt for it to be lewd as hell, it probably would be due to the models obedience to prompting so theirs that.

Situational/world awareness..
i wont rate this yet, ive been testing this for several days, ill say i haven noticed any idiocy or terminal forgetting, but this is where creativity takes a hit and it feels bland. sometimes. for instance, character wears armor, it will certainly reference the armor often. once i suggested id make spaghetti and the time period and location (fantasy world) buried deep in the context, the character had no clue what spaghetti was. i consider that a bonus. good job on staying in character. okay so i can feel that it is 'trying' to stay aware, more so than others. its not perfect, but i can kind of feel that it genuinely wants to remember and keep track of stuff like that

now this is where i bring in hermes agent
this model works as an agent. with hermes. as well as roleplay. how interesting. i had the idea to test it during a roleplay when i found out that it was following my instructions so well, so i stuck it into hermes and lo and behold the damn thing adapted right in. even had it make a skill, it can tool call, code and follow instructions very nicely and stay in character. its a GOOD agent. not a great agent but a good one. dont get me wrong, its not perfect, it did make syntax errors in a config adjustment that i had it make. so dont trust it 'too' much. i was just quite impressed that it settled in and reliably makes too calls. for my little gaming laptop, this model is quite good for these use cases.

my ending verdict is this.. zerofata/G4-MeroMero-26B-A4B is a severely underrated model, and has potential, i would like to see more work being done with the gemma a4b model, this has impressed me a lot.

EXACT PARAMETERS USED IN TESTING
okay so i use lm studio as the back end
temp 0.8
top k sampling 20
repeat penalty disabled
presence penalty disabled
top p sampling 0.95
min - sampling 0.05
reasoning section parsing enabled (doesnt actually reason)
reasoning section parsing start string <|channel>thought end string <channel|>
using default jinja template, just dont touch, it works as is

front end i use is ChatterUI
settings (if a settings is not here, leave it at zero)
generated tokens 500
temp 1
top p 0.9
top k 100
min p 0.05
rep penalty range 4096 (i just max it out)
rep penalty 1.1
xtc threshold 0.05 (decided to test it out with this.. i 'think' it helped)
xtc probability 1

dry base 1.75
dry allowed length 2
dry multiplier 0.7
dry sequence break ["</s>", "\n", ":", "\"", "*", "<s>"]

flash attention ON
k cache Q8_0
V cache Q8_0
To my delight.. turning these on actually worked without outputting complete nonsense, saved space. it seems gemma models were quite literally made for context compression.

yeah i did a lot of testing with the settings, i found sort of a sweet spot that works for me, experiment as needed. this does work.. some how. chatterui does not support gemma thinking yet so i never tested its reasoning. i like that i can reasonably use this model at 40k or even 60k context where i was stuck firmly at 10k for dense 12b models

so my only areas of complaint are that its chatty, a little bland (maybe i could tweak a setting for more creativity.. ill test that later), sexual avoidance, not that it isn't capable, just likes to shy away from it.

all in all, this model was a pleasant surprise. my overall rating for it is 7.5 out of 10

@zerofata i genuinely appreciate your work with this model and i sincerely hope to see more of your work in the future, i realize it was difficult to work with, sadly, my computer can only handle models like this up to dense 12b so my options are so limited, so its nice to see people giving these models a fair chance.

zerofata

Owner Apr 26

Thank you for the detailed feedback!

Small models definitely have their place (particularly now that they're getting good enough to to tool call and do agentic stuff like you mentioned). It's definitely not perfect yet but I think I will take a crack and try improving the creativity further in the future.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment