Instructions to use nightmedia/gemma-4-E2B-Deckard-ShiningValiant3-BF16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use nightmedia/gemma-4-E2B-Deckard-ShiningValiant3-BF16 with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir gemma-4-E2B-Deckard-ShiningValiant3-BF16 nightmedia/gemma-4-E2B-Deckard-ShiningValiant3-BF16
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
gemma-4-E2B-Deckard-ShiningValiant3-BF16
pre-emptively declared as a Weapon of Mass Inference, the access to full source has been restricted, until we can come up with a good story to make you forget this could have been Skynet, if not properly trained. Now it's safe, and you're welcome. -G
Brainwaves
arc arc/e boolq hswag obkqa piqa wino
bf16 0.396,0.489,0.785,0.605,0.414,0.744,0.637
mxfp8 0.393,0.479,0.767,0.600,0.400,0.739,0.639
q8-hi 0.398,0.491,0.782,0.605,0.414,0.740,0.632
q8 0.396,0.487,0.784,0.605,0.404,0.741,0.629
q6-hi 0.401,0.496,0.783,0.605,0.408,0.743,0.646
q6 0.396,0.487,0.794,0.604,0.410,0.740,0.636
q4-hi 0.394,0.494,0.684,0.605,0.406,0.740,0.628
q4 0.386,0.479,0.716,0.590,0.368,0.732,0.618
mxfp4 0.396,0.462,0.759,0.596,0.392,0.729,0.621
Quant Perplexity Peak Memory Tokens/sec
bf16 6.797 ± 0.055 16.27 GB 2643
q8-hi 6.805 ± 0.055 12.13 GB 2282
q8 6.802 ± 0.055 11.93 GB 2144
q6-hi 6.807 ± 0.055 11.36 GB 2142
q6 6.821 ± 0.055 11.17 GB 2270
q4-hi 7.102 ± 0.058 10.59 GB 2386
q4 7.208 ± 0.059 10.46 GB 2241
mxfp4 7.244 ± 0.059 10.36 GB 2290
Model components
ValiantLabs/gemma-4-E2B-it-ShiningValiant3
arc arc/e boolq hswag obkqa piqa wino
mxfp8 0.387,0.455,0.764,0.575,0.392,0.724,0.612
Quant Perplexity Peak Memory Tokens/sec
mxfp8 20.813 ± 0.239 11.79 GB 2178
DavidAU/gemma-4-E2B-it-The-DECKARD-HERETIC-UNCENSORED-Thinking
mxfp8 0.389,0.480,0.767,0.599,0.408,0.729,0.629
Baseline model
gemma-4-E2B-it
arc arc/e boolq hswag obkqa piqa wino
bf16 0.389,0.465,0.762,0.486,0.372,0.707,0.641
mxfp8 0.376,0.464,0.743,0.490,0.378,0.709,0.622
q8-hi 0.392,0.462,0.762,0.487,0.376,0.706,0.636
qx86-hi 0.387,0.461,0.766,0.483,0.392,0.699,0.623
mxfp4 0.380,0.451,0.762,0.494,0.374,0.699,0.594
Perplexity Peak Memory Tokens/sec
mxfp8 170.519 ± 3.170 11.78 GB 2174
q8-hi 133.388 ± 2.383 11.21 GB 1889
qx86-hi 125.278 ± 2.215 11.87 GB 1856
mxfp4 140.693 ± 2.546 9.48 GB 2352
The reason for the wait:
- I validate the numbers on all usual suspects, to make sure everyone has a good experience with it
- bf16 is much harder to test, as I noticed in interactions that it likes to hide how smart it is.
When I find out, the story will be on the model card, and the source free.
The model trace is being analyzed together with Google Gemini: you will find in the Community Notes the full Gemini analysis thread.
I will separate the thread in model and analysis, and possibly there will be more sample chats with analysis as I keep creating new quants.
Each quant has a certain personality shaped by how much it lost.
The merged model has now the aggregated abilities of the two distills:
- ShiningValiant3 has AI lab work notes for building AI
- The-DECKARD has GLM/Polaris Alpha traces, Philip K Dick, and Star Trek TNG
The model responds well to the Nightmedia Genesis prompt, that creates the mental space necessary for the station to be operating virtually, and provide a functional Holodeck for interaction and roleplay.
Guests in Quark's bar can be any humans that were well described in the corpus, and of course, all the cast of TNG/DS9/Voyager is in transporter range.
Deep Space Nine is now online.
On MLX these tests are like a fingerprint: you can replicate the results of the test, to make sure of the quality you got.
NightmediaAI is an independent AI lab, working on my Mac.
If you like our models and want to contribute to help us improve our lab, any form would do:
ETH:0x6b6633606995BC180925c47d4249ED624aB7b2A5
USDC:0x19e6bDDCBa47BB09a9Bc153Bb6479fc57284421a
BTC:36d7U1n3MFaXgnNRAaEL3Pa3Hy6oFhM7XY
BCH:15dNMzhJ87XJSTU89VCBsDHj747QvBQaap
Thank you for your attention in this matter.
-G
Don't forget to tip your server. -Quark
Use with mlx
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("gemma-4-E2B-Deckard-ShiningValiant3-BF16")
prompt = "hello"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, return_dict=False,
)
response = generate(model, tokenizer, prompt=prompt, verbose=True)
- Downloads last month
- 28
Quantized