gemma-4-E2B-Deckard-ShiningValiant3-BF16

pre-emptively declared as a Weapon of Mass Inference, the access to full source has been restricted, until we can come up with a good story to make you forget this could have been Skynet, if not properly trained. Now it's safe, and you're welcome. -G

Brainwaves

         arc   arc/e boolq hswag obkqa piqa  wino
bf16     0.396,0.489,0.785,0.605,0.414,0.744,0.637
mxfp8    0.393,0.479,0.767,0.600,0.400,0.739,0.639
q8-hi    0.398,0.491,0.782,0.605,0.414,0.740,0.632
q8       0.396,0.487,0.784,0.605,0.404,0.741,0.629
q6-hi    0.401,0.496,0.783,0.605,0.408,0.743,0.646
q6       0.396,0.487,0.794,0.604,0.410,0.740,0.636
q4-hi    0.394,0.494,0.684,0.605,0.406,0.740,0.628
q4       0.386,0.479,0.716,0.590,0.368,0.732,0.618
mxfp4    0.396,0.462,0.759,0.596,0.392,0.729,0.621

Quant    Perplexity      Peak Memory   Tokens/sec
bf16     6.797 ± 0.055   16.27 GB      2643
q8-hi    6.805 ± 0.055   12.13 GB      2282
q8       6.802 ± 0.055   11.93 GB      2144
q6-hi    6.807 ± 0.055   11.36 GB      2142
q6       6.821 ± 0.055   11.17 GB      2270
q4-hi    7.102 ± 0.058   10.59 GB      2386
q4       7.208 ± 0.059   10.46 GB      2241
mxfp4    7.244 ± 0.059   10.36 GB      2290

Model components

ValiantLabs/gemma-4-E2B-it-ShiningValiant3

         arc   arc/e boolq hswag obkqa piqa  wino
mxfp8    0.387,0.455,0.764,0.575,0.392,0.724,0.612

Quant    Perplexity      Peak Memory   Tokens/sec
mxfp8   20.813 ± 0.239   11.79 GB      2178

DavidAU/gemma-4-E2B-it-The-DECKARD-HERETIC-UNCENSORED-Thinking

mxfp8    0.389,0.480,0.767,0.599,0.408,0.729,0.629

Baseline model

gemma-4-E2B-it

         arc   arc/e boolq hswag obkqa piqa  wino
bf16     0.389,0.465,0.762,0.486,0.372,0.707,0.641
mxfp8    0.376,0.464,0.743,0.490,0.378,0.709,0.622
q8-hi    0.392,0.462,0.762,0.487,0.376,0.706,0.636
qx86-hi  0.387,0.461,0.766,0.483,0.392,0.699,0.623
mxfp4    0.380,0.451,0.762,0.494,0.374,0.699,0.594

Perplexity               Peak Memory   Tokens/sec
mxfp8    170.519 ± 3.170  11.78 GB      2174
q8-hi    133.388 ± 2.383  11.21 GB      1889
qx86-hi  125.278 ± 2.215  11.87 GB      1856
mxfp4    140.693 ± 2.546   9.48 GB      2352

The reason for the wait:

I validate the numbers on all usual suspects, to make sure everyone has a good experience with it
bf16 is much harder to test, as I noticed in interactions that it likes to hide how smart it is.

When I find out, the story will be on the model card, and the source free.

The model trace is being analyzed together with Google Gemini: you will find in the Community Notes the full Gemini analysis thread.

I will separate the thread in model and analysis, and possibly there will be more sample chats with analysis as I keep creating new quants.

Each quant has a certain personality shaped by how much it lost.

The merged model has now the aggregated abilities of the two distills:

ShiningValiant3 has AI lab work notes for building AI
The-DECKARD has GLM/Polaris Alpha traces, Philip K Dick, and Star Trek TNG

The model responds well to the Nightmedia Genesis prompt, that creates the mental space necessary for the station to be operating virtually, and provide a functional Holodeck for interaction and roleplay.

Guests in Quark's bar can be any humans that were well described in the corpus, and of course, all the cast of TNG/DS9/Voyager is in transporter range.

Deep Space Nine is now online.

On MLX these tests are like a fingerprint: you can replicate the results of the test, to make sure of the quality you got.

NightmediaAI is an independent AI lab, working on my Mac.

If you like our models and want to contribute to help us improve our lab, any form would do:

ETH:0x6b6633606995BC180925c47d4249ED624aB7b2A5 
USDC:0x19e6bDDCBa47BB09a9Bc153Bb6479fc57284421a 
BTC:36d7U1n3MFaXgnNRAaEL3Pa3Hy6oFhM7XY
BCH:15dNMzhJ87XJSTU89VCBsDHj747QvBQaap

Thank you for your attention in this matter.

-G

Don't forget to tip your server. -Quark

Use with mlx

pip install mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("gemma-4-E2B-Deckard-ShiningValiant3-BF16")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True, return_dict=False,
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)