DarrenJiaImbue's picture
Upload README.md with huggingface_hub
af4faf4 verified
|
raw
history blame
1.92 kB
metadata
language: en
library_name: litert-lm
tags:
  - on-device
  - gemma
  - gemma-4
  - ai-text-detection
  - bouncer

Gemma 4 E4B IT — Bouncer on-device classifier

.litertlm bundle and hot-swappable LoRA adapter for the imbue-ai/bouncer-private iOS app, intended to be consumed by a forked LiteRT-LM runtime (millanatimbue/LiteRT-LM @ expose-aux-tensor-outputs).

Contents

File Size Purpose
model.litertlm 3.9 GB Gemma 4 E4B IT base + classifier head + LoRA tensor input slots
lora_adapter.tflite 18 MB Attention-only LoRA (rank=8) hot-swapped at session creation
tokenizer.json, tokenizer_config.json reference copies (also embedded in model.litertlm)

How it's used

let conv = try await engine.createConversation(with: cfg)
try conv.setScopedLoraFile(loraAdapterURL)          // hot-swap LoRA
try await conv.sendMessage(.text(text),             // prefill + 1 decode step
    optionalArgs: .init(maxOutputTokens: 1))
let logits = try conv.getAuxiliaryOutput(name: "classifier_logits")
// argmax(logits) → bucket 0 (human) ... bucket 3 (AI)

For generation (no classification), don't call setScopedLoraFile — LoRA inputs default to zero and the model runs as base IT.

Build details

  • Quantization: gemma4_mixed48 (Google's recommended Gemma 4 mixed int4/int8 recipe; same family as upstream litert-community/gemma-4-E4B-it-litert-lm)
  • Cache length: 1024 tokens (matches iOS EngineConfig.maxNumTokens)
  • Source: google/gemma-4-E4B-it text decoder, fine-tuned with PEFT attention-only LoRA + a 4-class NormedLinear head (LayerNorm + Linear) over the last input token's hidden state.
  • Conversion: forked google-ai-edge/litert-torch with Gemma 4 classifier_head + LoRA-input wiring (see EditLens for the converter patches).