Gemma 4 E4B IT β€” Bouncer on-device classifier

.litertlm bundle and hot-swappable LoRA adapter for the imbue-ai/bouncer-private iOS app, intended to be consumed by a forked LiteRT-LM runtime (millanatimbue/LiteRT-LM @ expose-aux-tensor-outputs).

Contents

File Size Purpose
model.litertlm 3.9 GB Gemma 4 E4B IT base + classifier head + LoRA tensor input slots
lora_adapter.tflite 18 MB Attention-only LoRA (rank=8) hot-swapped at session creation
tokenizer.json, tokenizer_config.json reference copies (also embedded in model.litertlm)

How it's used

let conv = try await engine.createConversation(with: cfg)
try conv.setScopedLoraFile(loraAdapterURL)          // hot-swap LoRA
try await conv.sendMessage(.text(text),             // prefill + 1 decode step
    optionalArgs: .init(maxOutputTokens: 1))
let logits = try conv.getAuxiliaryOutput(name: "classifier_logits")
// argmax(logits) β†’ bucket 0 (human) ... bucket 3 (AI)

For generation (no classification), don't call setScopedLoraFile β€” LoRA inputs default to zero and the model runs as base IT.

Build details

  • Quantization: gemma4_mixed48 (Google's recommended Gemma 4 mixed int4/int8 recipe; same family as upstream litert-community/gemma-4-E4B-it-litert-lm)
  • Cache length: 1024 tokens (matches iOS EngineConfig.maxNumTokens)
  • Source: google/gemma-4-E4B-it text decoder, fine-tuned with PEFT attention-only LoRA + a 4-class NormedLinear head (LayerNorm + Linear) over the last input token's hidden state.
  • Conversion: forked google-ai-edge/litert-torch with Gemma 4 classifier_head + LoRA-input wiring (see EditLens for the converter patches).
Downloads last month
265
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support