Instructions to use DarrenJiaImbue/gemma-4-e4b-it-bouncer-litertlm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LiteRT-LM
How to use DarrenJiaImbue/gemma-4-e4b-it-bouncer-litertlm with LiteRT-LM:
# LiteRT-LM runs on various platforms (Android, iOS, Windows, Linux, macOS, IoT, Web/WASM) # and supports many APIs (C++, Python, Kotlin, Swift, JavaScript, Flutter). # For platform-specific integration guides, please refer to the official developer website: # https://ai.google.dev/edge/litert-lm # To try LiteRT-LM, the easiest way is to use our CLI tool. # 1. Install the LiteRT-LM CLI tool: pip install litert-lm # 2. Download and run this model locally: # See: https://ai.google.dev/edge/litert-lm/cli litert-lm run \ --from-huggingface-repo=DarrenJiaImbue/gemma-4-e4b-it-bouncer-litertlm \ model.litertlm \ --prompt="Write me a poem"
- Notebooks
- Google Colab
- Kaggle
| language: en | |
| library_name: litert-lm | |
| tags: | |
| - on-device | |
| - gemma | |
| - gemma-4 | |
| - ai-text-detection | |
| - bouncer | |
| # Gemma 4 E4B IT — Bouncer on-device classifier | |
| `.litertlm` bundle and hot-swappable LoRA adapter for the | |
| [imbue-ai/bouncer-private](https://github.com/imbue-ai/bouncer-private) | |
| iOS app, intended to be consumed by a forked LiteRT-LM runtime | |
| (millanatimbue/LiteRT-LM @ expose-aux-tensor-outputs). | |
| ## Contents | |
| | File | Size | Purpose | | |
| |---|---|---| | |
| | `model.litertlm` | 3.9 GB | Gemma 4 E4B IT base + classifier head + LoRA tensor input slots | | |
| | `lora_adapter.tflite` | 18 MB | Attention-only LoRA (rank=8) hot-swapped at session creation | | |
| | `tokenizer.json`, `tokenizer_config.json` | reference copies (also embedded in `model.litertlm`) | | |
| ## How it's used | |
| ```swift | |
| let conv = try await engine.createConversation(with: cfg) | |
| try conv.setScopedLoraFile(loraAdapterURL) // hot-swap LoRA | |
| try await conv.sendMessage(.text(text), // prefill + 1 decode step | |
| optionalArgs: .init(maxOutputTokens: 1)) | |
| let logits = try conv.getAuxiliaryOutput(name: "classifier_logits") | |
| // argmax(logits) → bucket 0 (human) ... bucket 3 (AI) | |
| ``` | |
| For generation (no classification), don't call `setScopedLoraFile` — | |
| LoRA inputs default to zero and the model runs as base IT. | |
| ## Build details | |
| - Quantization: `gemma4_mixed48` (Google's recommended Gemma 4 mixed | |
| int4/int8 recipe; same family as upstream | |
| `litert-community/gemma-4-E4B-it-litert-lm`) | |
| - Cache length: 1024 tokens (matches iOS `EngineConfig.maxNumTokens`) | |
| - Source: `google/gemma-4-E4B-it` text decoder, fine-tuned with PEFT | |
| attention-only LoRA + a 4-class NormedLinear head (LayerNorm + | |
| Linear) over the last input token's hidden state. | |
| - Conversion: forked `google-ai-edge/litert-torch` with Gemma 4 | |
| classifier_head + LoRA-input wiring (see | |
| [EditLens](https://github.com/pangramlabs/EditLens) for the | |
| converter patches). | |