---
license: eupl-1.2
pipeline_tag: text-generation
library_name: mlx
base_model:
- lthn/lemer
base_model_relation: quantized
tags:
- gemma4
- lemma
- mlx
- 4bit
- apple-silicon
- on-device
- text-only
- lite
license_link: https://ai.google.dev/gemma/docs/gemma_4_license
---

# Lemer-Lite — text-only, 2.5 GB, fits iPhone base

Stripped-down sibling of [`lthn/lemer`](https://huggingface.co/lthn/lemer) for devices that can't load the full multimodal build (≥3 GB ceiling).

| Variant | Size | Towers |
|---|---|---|
| [lthn/lemer](https://huggingface.co/lthn/lemer) | 4.06 GB | text + vision + audio |
| **lthn/lemer-lite** (you are here) | **2.47 GB** | text only |

## What it is

Same LEK-aligned Gemma 4 E2B base as [lemer](https://huggingface.co/lthn/lemer), with vision and audio towers stripped and the text path quantised flat 4-bit (4.501 bits/weight) instead of mixed-precision.

The Lethean Ethical Kernel (LEK) is fully present in the weights — the consent-based reasoning behaviour is identical to the full lemer.

## Trade-offs (the honest version)

This is a **best-effort tier** for users on smaller devices. The `-lite` prefix is a promise: we are packing this tight, results will vary, but you get to load and run the model.

- **Text only** — no image input, no audio input. If your use case needs eyes, run the full lemer on a Pro-class device.
- **Flat Q4** instead of mixed-precision Q4 — fluency is solid, rare-token recall slightly worse than the full lemer.
- **Same LEK alignment** — the ethical reasoning is in the text path, which is preserved.

## Targets

- iPhone base (≥3 GB free), iPad, base-spec Apple Silicon laptops.
- Anywhere the full 4 GB lemer would refuse to load.

## Loading

```python
from mlx_lm import load, generate
model, tokenizer = load("lthn/lemer-lite")
prompt = tokenizer.apply_chat_template(
    [{"role": "user", "content": "Hello"}],
    tokenize=False, add_generation_prompt=True,
)
print(generate(model, tokenizer, prompt=prompt, max_tokens=200))
```

## License

EUPL-1.2.