--- language: en library_name: litert-lm tags: - on-device - gemma - gemma-4 - ai-text-detection - bouncer --- # Gemma 4 E4B IT — Bouncer on-device classifier `.litertlm` bundle and hot-swappable LoRA adapter for the [imbue-ai/bouncer-private](https://github.com/imbue-ai/bouncer-private) iOS app, intended to be consumed by a forked LiteRT-LM runtime (millanatimbue/LiteRT-LM @ expose-aux-tensor-outputs). ## Contents | File | Size | Purpose | |---|---|---| | `model.litertlm` | 3.9 GB | Gemma 4 E4B IT base + classifier head + LoRA tensor input slots | | `lora_adapter.tflite` | 18 MB | Attention-only LoRA (rank=8) hot-swapped at session creation | | `tokenizer.json`, `tokenizer_config.json` | reference copies (also embedded in `model.litertlm`) | ## How it's used ```swift let conv = try await engine.createConversation(with: cfg) try conv.setScopedLoraFile(loraAdapterURL) // hot-swap LoRA try await conv.sendMessage(.text(text), // prefill + 1 decode step optionalArgs: .init(maxOutputTokens: 1)) let logits = try conv.getAuxiliaryOutput(name: "classifier_logits") // argmax(logits) → bucket 0 (human) ... bucket 3 (AI) ``` For generation (no classification), don't call `setScopedLoraFile` — LoRA inputs default to zero and the model runs as base IT. ## Build details - Quantization: `gemma4_mixed48` (Google's recommended Gemma 4 mixed int4/int8 recipe; same family as upstream `litert-community/gemma-4-E4B-it-litert-lm`) - Cache length: 1024 tokens (matches iOS `EngineConfig.maxNumTokens`) - Source: `google/gemma-4-E4B-it` text decoder, fine-tuned with PEFT attention-only LoRA + a 4-class NormedLinear head (LayerNorm + Linear) over the last input token's hidden state. - Conversion: forked `google-ai-edge/litert-torch` with Gemma 4 classifier_head + LoRA-input wiring (see [EditLens](https://github.com/pangramlabs/EditLens) for the converter patches).