DarrenJiaImbue's picture
Upload README.md with huggingface_hub
af4faf4 verified
|
raw
history blame
1.92 kB
---
language: en
library_name: litert-lm
tags:
- on-device
- gemma
- gemma-4
- ai-text-detection
- bouncer
---
# Gemma 4 E4B IT — Bouncer on-device classifier
`.litertlm` bundle and hot-swappable LoRA adapter for the
[imbue-ai/bouncer-private](https://github.com/imbue-ai/bouncer-private)
iOS app, intended to be consumed by a forked LiteRT-LM runtime
(millanatimbue/LiteRT-LM @ expose-aux-tensor-outputs).
## Contents
| File | Size | Purpose |
|---|---|---|
| `model.litertlm` | 3.9 GB | Gemma 4 E4B IT base + classifier head + LoRA tensor input slots |
| `lora_adapter.tflite` | 18 MB | Attention-only LoRA (rank=8) hot-swapped at session creation |
| `tokenizer.json`, `tokenizer_config.json` | reference copies (also embedded in `model.litertlm`) |
## How it's used
```swift
let conv = try await engine.createConversation(with: cfg)
try conv.setScopedLoraFile(loraAdapterURL) // hot-swap LoRA
try await conv.sendMessage(.text(text), // prefill + 1 decode step
optionalArgs: .init(maxOutputTokens: 1))
let logits = try conv.getAuxiliaryOutput(name: "classifier_logits")
// argmax(logits) → bucket 0 (human) ... bucket 3 (AI)
```
For generation (no classification), don't call `setScopedLoraFile`
LoRA inputs default to zero and the model runs as base IT.
## Build details
- Quantization: `gemma4_mixed48` (Google's recommended Gemma 4 mixed
int4/int8 recipe; same family as upstream
`litert-community/gemma-4-E4B-it-litert-lm`)
- Cache length: 1024 tokens (matches iOS `EngineConfig.maxNumTokens`)
- Source: `google/gemma-4-E4B-it` text decoder, fine-tuned with PEFT
attention-only LoRA + a 4-class NormedLinear head (LayerNorm +
Linear) over the last input token's hidden state.
- Conversion: forked `google-ai-edge/litert-torch` with Gemma 4
classifier_head + LoRA-input wiring (see
[EditLens](https://github.com/pangramlabs/EditLens) for the
converter patches).