Gemma 4 E4B IT — Bouncer on-device classifier

.litertlm bundle and hot-swappable LoRA adapter for the imbue-ai/bouncer-private iOS app, intended to be consumed by a forked LiteRT-LM runtime (millanatimbue/LiteRT-LM @ expose-aux-tensor-outputs).

File	Size	Purpose
`model.litertlm`	3.9 GB	Gemma 4 E4B IT base + classifier head + LoRA tensor input slots
`lora_adapter.tflite`	18 MB	Attention-only LoRA (rank=8) hot-swapped at session creation
`tokenizer.json`, `tokenizer_config.json`	reference copies (also embedded in `model.litertlm`)

How it's used

let conv = try await engine.createConversation(with: cfg)
try conv.setScopedLoraFile(loraAdapterURL)          // hot-swap LoRA
try await conv.sendMessage(.text(text),             // prefill + 1 decode step
    optionalArgs: .init(maxOutputTokens: 1))
let logits = try conv.getAuxiliaryOutput(name: "classifier_logits")
// argmax(logits) → bucket 0 (human) ... bucket 3 (AI)

For generation (no classification), don't call setScopedLoraFile — LoRA inputs default to zero and the model runs as base IT.

Build details

Quantization: gemma4_mixed48 (Google's recommended Gemma 4 mixed int4/int8 recipe; same family as upstream litert-community/gemma-4-E4B-it-litert-lm)
Cache length: 1024 tokens (matches iOS EngineConfig.maxNumTokens)
Source: google/gemma-4-E4B-it text decoder, fine-tuned with PEFT attention-only LoRA + a 4-class NormedLinear head (LayerNorm + Linear) over the last input token's hidden state.
Conversion: forked google-ai-edge/litert-torch with Gemma 4 classifier_head + LoRA-input wiring (see EditLens for the converter patches).

Downloads last month: 265

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

DarrenJiaImbue
/

gemma-4-e4b-it-bouncer-litertlm

Gemma 4 E4B IT — Bouncer on-device classifier

Contents

How it's used

Build details