--- license: eupl-1.2 pipeline_tag: text-generation library_name: mlx base_model: - lthn/lemer base_model_relation: quantized tags: - gemma4 - lemma - mlx - 4bit - apple-silicon - on-device - text-only - lite license_link: https://ai.google.dev/gemma/docs/gemma_4_license --- # Lemer-Lite — text-only, 2.5 GB, fits iPhone base Stripped-down sibling of [`lthn/lemer`](https://huggingface.co/lthn/lemer) for devices that can't load the full multimodal build (≥3 GB ceiling). | Variant | Size | Towers | |---|---|---| | [lthn/lemer](https://huggingface.co/lthn/lemer) | 4.06 GB | text + vision + audio | | **lthn/lemer-lite** (you are here) | **2.47 GB** | text only | ## What it is Same LEK-aligned Gemma 4 E2B base as [lemer](https://huggingface.co/lthn/lemer), with vision and audio towers stripped and the text path quantised flat 4-bit (4.501 bits/weight) instead of mixed-precision. The Lethean Ethical Kernel (LEK) is fully present in the weights — the consent-based reasoning behaviour is identical to the full lemer. ## Trade-offs (the honest version) This is a **best-effort tier** for users on smaller devices. The `-lite` prefix is a promise: we are packing this tight, results will vary, but you get to load and run the model. - **Text only** — no image input, no audio input. If your use case needs eyes, run the full lemer on a Pro-class device. - **Flat Q4** instead of mixed-precision Q4 — fluency is solid, rare-token recall slightly worse than the full lemer. - **Same LEK alignment** — the ethical reasoning is in the text path, which is preserved. ## Targets - iPhone base (≥3 GB free), iPad, base-spec Apple Silicon laptops. - Anywhere the full 4 GB lemer would refuse to load. ## Loading ```python from mlx_lm import load, generate model, tokenizer = load("lthn/lemer-lite") prompt = tokenizer.apply_chat_template( [{"role": "user", "content": "Hello"}], tokenize=False, add_generation_prompt=True, ) print(generate(model, tokenizer, prompt=prompt, max_tokens=200)) ``` ## License EUPL-1.2.