Instructions to use AEON-7/Gemma-4-E4B-DECKARD-HERETIC-NVFP4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AEON-7/Gemma-4-E4B-DECKARD-HERETIC-NVFP4 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="AEON-7/Gemma-4-E4B-DECKARD-HERETIC-NVFP4") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("AEON-7/Gemma-4-E4B-DECKARD-HERETIC-NVFP4") model = AutoModelForMultimodalLM.from_pretrained("AEON-7/Gemma-4-E4B-DECKARD-HERETIC-NVFP4") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use AEON-7/Gemma-4-E4B-DECKARD-HERETIC-NVFP4 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "AEON-7/Gemma-4-E4B-DECKARD-HERETIC-NVFP4" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AEON-7/Gemma-4-E4B-DECKARD-HERETIC-NVFP4", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/AEON-7/Gemma-4-E4B-DECKARD-HERETIC-NVFP4
- SGLang
How to use AEON-7/Gemma-4-E4B-DECKARD-HERETIC-NVFP4 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "AEON-7/Gemma-4-E4B-DECKARD-HERETIC-NVFP4" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AEON-7/Gemma-4-E4B-DECKARD-HERETIC-NVFP4", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "AEON-7/Gemma-4-E4B-DECKARD-HERETIC-NVFP4" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AEON-7/Gemma-4-E4B-DECKARD-HERETIC-NVFP4", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use AEON-7/Gemma-4-E4B-DECKARD-HERETIC-NVFP4 with Docker Model Runner:
docker model run hf.co/AEON-7/Gemma-4-E4B-DECKARD-HERETIC-NVFP4
Gemma-4-E4B-DECKARD-HERETIC-NVFP4
NVFP4-quantized EAGLE-style speculative-decoding drafter for the standard (non-uncensored) Gemma 4 31B DECKARD HERETIC. Pair with the matching target model for accelerated single-stream decode on Blackwell-class GPUs (DGX Spark / RTX PRO 6000 / RTX 5090 / B100 / B200).
For the abliterated/uncensored variant of this drafter, see AEON-7/Gemma-4-E4B-DECKARD-HERETIC-Uncensored-NVFP4. For the target it accelerates, see the gemma-4-31B-it-speculator.eagle3-NVFP4 collection on this profile.
Files
model.safetensors— NVFP4-quantized weightshf_quant_config.json— modelopt quant configchat_template.jinja— Gemma 4 chat templateconfig.json/generation_config.json/tokenizer.*/processor_config.json
Quick start (vLLM, as drafter)
vllm serve <target-model-id> \
--speculative-config '{"method":"eagle3","model":"AEON-7/Gemma-4-E4B-DECKARD-HERETIC-NVFP4","num_speculative_tokens":3}' \
--trust-remote-code
License
Inherits the Gemma Terms of Use. Use of this model is subject to those terms.
☕ Support the work
If this release has been useful, tips are deeply appreciated — they go directly toward more compute, more models, and more open releases.
₿ Bitcoin (BTC)![]() bc1q09xmzn00q4z3c5raene0f3pzn9d9pvawfm0py4
|
Ξ Ethereum (ETH)![]() 0x1512667F6D61454ad531d2E45C0a5d1fd82D0500
|
◎ Solana (SOL)![]() DgQsjHdAnT5PNLQTNpJdpLS3tYGpVcsHQCkpoiAKsw8t
|
ⓜ Monero (XMR)![]() 836XrSKw4R76vNi3QPJ5Fa9ugcyvE2cWmKSPv3AhpTNNKvqP8v5ba9JRL4Vh7UnFNjDz3E2GXZDVVenu3rkZaNdUFhjAvgd
|
Ethereum L2s (Base, Arbitrum, Optimism, Polygon, etc.) and EVM-compatible tokens can be sent to the same Ethereum address.
- Downloads last month
- 263



