---
language: en
library_name: litert-lm
tags:
- on-device
- gemma
- gemma-4
- ai-text-detection
- bouncer
---

# Gemma 4 E4B IT — Bouncer on-device classifier

`.litertlm` bundle and hot-swappable LoRA adapter for the
[imbue-ai/bouncer-private](https://github.com/imbue-ai/bouncer-private)
iOS app, intended to be consumed by a forked LiteRT-LM runtime
(millanatimbue/LiteRT-LM @ expose-aux-tensor-outputs).

## Contents

| File | Size | Purpose |
|---|---|---|
| `model.litertlm` | 3.9 GB | Gemma 4 E4B IT base + classifier head + LoRA tensor input slots |
| `lora_adapter.tflite` | 18 MB | Attention-only LoRA (rank=8) hot-swapped at session creation |
| `tokenizer.json`, `tokenizer_config.json` | reference copies (also embedded in `model.litertlm`) |

## How it's used

```swift
let conv = try await engine.createConversation(with: cfg)
try conv.setScopedLoraFile(loraAdapterURL)          // hot-swap LoRA
try await conv.sendMessage(.text(text),             // prefill + 1 decode step
    optionalArgs: .init(maxOutputTokens: 1))
let logits = try conv.getAuxiliaryOutput(name: "classifier_logits")
// argmax(logits) → bucket 0 (human) ... bucket 3 (AI)
```

For generation (no classification), don't call `setScopedLoraFile` —
LoRA inputs default to zero and the model runs as base IT.

## Build details

- Quantization: `gemma4_mixed48` (Google's recommended Gemma 4 mixed
  int4/int8 recipe; same family as upstream
  `litert-community/gemma-4-E4B-it-litert-lm`)
- Cache length: 1024 tokens (matches iOS `EngineConfig.maxNumTokens`)
- Source: `google/gemma-4-E4B-it` text decoder, fine-tuned with PEFT
  attention-only LoRA + a 4-class NormedLinear head (LayerNorm +
  Linear) over the last input token's hidden state.
- Conversion: forked `google-ai-edge/litert-torch` with Gemma 4
  classifier_head + LoRA-input wiring (see
  [EditLens](https://github.com/pangramlabs/EditLens) for the
  converter patches).