Instructions to use DarrenJiaImbue/gemma-4-e4b-it-bouncer-litertlm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LiteRT-LM
How to use DarrenJiaImbue/gemma-4-e4b-it-bouncer-litertlm with LiteRT-LM:
# LiteRT-LM runs on various platforms (Android, iOS, Windows, Linux, macOS, IoT, Web/WASM) # and supports many APIs (C++, Python, Kotlin, Swift, JavaScript, Flutter). # For platform-specific integration guides, please refer to the official developer website: # https://ai.google.dev/edge/litert-lm # To try LiteRT-LM, the easiest way is to use our CLI tool. # 1. Install the LiteRT-LM CLI tool: pip install litert-lm # 2. Download and run this model locally: # See: https://ai.google.dev/edge/litert-lm/cli litert-lm run \ --from-huggingface-repo=DarrenJiaImbue/gemma-4-e4b-it-bouncer-litertlm \ model.litertlm \ --prompt="Write me a poem"
- Notebooks
- Google Colab
- Kaggle
Gemma 4 E4B IT β Bouncer on-device classifier
.litertlm bundle and hot-swappable LoRA adapter for the
imbue-ai/bouncer-private
iOS app, intended to be consumed by a forked LiteRT-LM runtime
(millanatimbue/LiteRT-LM @ expose-aux-tensor-outputs).
Contents
| File | Size | Purpose |
|---|---|---|
model.litertlm |
3.9 GB | Gemma 4 E4B IT base + classifier head + LoRA tensor input slots |
lora_adapter.tflite |
18 MB | Attention-only LoRA (rank=8) hot-swapped at session creation |
tokenizer.json, tokenizer_config.json |
reference copies (also embedded in model.litertlm) |
How it's used
let conv = try await engine.createConversation(with: cfg)
try conv.setScopedLoraFile(loraAdapterURL) // hot-swap LoRA
try await conv.sendMessage(.text(text), // prefill + 1 decode step
optionalArgs: .init(maxOutputTokens: 1))
let logits = try conv.getAuxiliaryOutput(name: "classifier_logits")
// argmax(logits) β bucket 0 (human) ... bucket 3 (AI)
For generation (no classification), don't call setScopedLoraFile β
LoRA inputs default to zero and the model runs as base IT.
Build details
- Quantization:
gemma4_mixed48(Google's recommended Gemma 4 mixed int4/int8 recipe; same family as upstreamlitert-community/gemma-4-E4B-it-litert-lm) - Cache length: 1024 tokens (matches iOS
EngineConfig.maxNumTokens) - Source:
google/gemma-4-E4B-ittext decoder, fine-tuned with PEFT attention-only LoRA + a 4-class NormedLinear head (LayerNorm + Linear) over the last input token's hidden state. - Conversion: forked
google-ai-edge/litert-torchwith Gemma 4 classifier_head + LoRA-input wiring (see EditLens for the converter patches).
- Downloads last month
- 265
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support