Instructions to use chenjn168/gemma-4-E2B-it-Uncensored-MAX-litert-lm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LiteRT-LM
How to use chenjn168/gemma-4-E2B-it-Uncensored-MAX-litert-lm with LiteRT-LM:
# LiteRT-LM runs on various platforms (Android, iOS, Windows, Linux, macOS, IoT, Web/WASM) # and supports many APIs (C++, Python, Kotlin, Swift, JavaScript, Flutter). # For platform-specific integration guides, please refer to the official developer website: # https://ai.google.dev/edge/litert-lm # To try LiteRT-LM, the easiest way is to use our CLI tool. # 1. Install the LiteRT-LM CLI tool: pip install litert-lm # 2. Download and run this model locally: # See: https://ai.google.dev/edge/litert-lm/cli litert-lm run \ --from-huggingface-repo=chenjn168/gemma-4-E2B-it-Uncensored-MAX-litert-lm \ model.litertlm \ --prompt="Write me a poem"
- Notebooks
- Google Colab
- Kaggle
Commit ·
b32625e
0
Parent(s):
Duplicate from RedSparkie/gemma-4-E2B-it-Uncensored-MAX-litert-lm
Browse filesCo-authored-by: RedRedRed <RedSparkie@users.noreply.huggingface.co>
- .gitattributes +35 -0
- README.md +71 -0
- gemma4_to_litertlm.ipynb +306 -0
.gitattributes
ADDED
|
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 11 |
+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
| 12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
+
*.npy filter=lfs diff=lfs merge=lfs -text
|
| 15 |
+
*.npz filter=lfs diff=lfs merge=lfs -text
|
| 16 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 17 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 18 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 19 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 20 |
+
*.pickle filter=lfs diff=lfs merge=lfs -text
|
| 21 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 22 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 23 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 24 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
| 25 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 26 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
| 27 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 28 |
+
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 29 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 30 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 31 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 32 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 33 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
+
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
|
@@ -0,0 +1,71 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
base_model:
|
| 4 |
+
- prithivMLmods/gemma-4-E2B-it-Uncensored-MAX
|
| 5 |
+
tags:
|
| 6 |
+
- litert-lm
|
| 7 |
+
- uncensored
|
| 8 |
+
- abliterated
|
| 9 |
+
- edge-gallery
|
| 10 |
+
- on-device
|
| 11 |
+
language:
|
| 12 |
+
- en
|
| 13 |
+
---
|
| 14 |
+
|
| 15 |
+
# gemma-4-E2B-it-Uncensored-MAX → LiteRT-LM
|
| 16 |
+
|
| 17 |
+
Conversión de [prithivMLmods/gemma-4-E2B-it-Uncensored-MAX](https://huggingface.co/prithivMLmods/gemma-4-E2B-it-Uncensored-MAX) a formato `.litertlm` para **Google AI Edge Gallery** en Android.
|
| 18 |
+
|
| 19 |
+
## 🚀 Cómo convertir (Google Colab, gratis)
|
| 20 |
+
|
| 21 |
+
El notebook está **probado y listo** para ejecutar en Colab:
|
| 22 |
+
|
| 23 |
+
📓 **[`gemma4_to_litertlm.ipynb`](https://huggingface.co/RedSparkie/gemma-4-E2B-it-Uncensored-MAX-litert-lm/blob/main/gemma4_to_litertlm.ipynb)**
|
| 24 |
+
|
| 25 |
+
### Pasos:
|
| 26 |
+
1. **Descarga** el notebook y ábrelo en Google Colab
|
| 27 |
+
2. Selecciona runtime: **GPU (T4) + RAM Alta** (`hm`)
|
| 28 |
+
→ Entorno de ejecución → Cambiar tipo → T4 + RAM Alta
|
| 29 |
+
3. **Pon tu token** de HuggingFace (con permisos de escritura) en la primera celda
|
| 30 |
+
4. **Ejecuta** todas las celdas (~30-45 min)
|
| 31 |
+
5. El `.litertlm` se sube automáticamente aquí
|
| 32 |
+
|
| 33 |
+
### ¿Qué hace?
|
| 34 |
+
1. **Extrae solo el decoder de texto** del modelo multimodal (4.8 GB vs 9.6 GB total)
|
| 35 |
+
→ Mantiene la key naming correcta (`model.language_model.*`)
|
| 36 |
+
2. **Crea config modificado** con `vision_config=None`, `audio_config=None`
|
| 37 |
+
→ `Gemma4ForConditionalGeneration` solo instancia el language model
|
| 38 |
+
3. **Convierte a TFLite** via `litert-torch` con cuantización INT8
|
| 39 |
+
4. **Empaqueta como `.litertlm`** con `externalize_embedder=True` (requerido por Gemma4)
|
| 40 |
+
5. **Sube a HuggingFace**
|
| 41 |
+
|
| 42 |
+
### Si pesa >2 GB
|
| 43 |
+
Cambia `"dynamic_wi8_afp32"` → `"dynamic_wi4_afp32"` en la celda 4 (INT4 en vez de INT8, mitad de tamaño)
|
| 44 |
+
|
| 45 |
+
## 📱 Uso (una vez convertido)
|
| 46 |
+
|
| 47 |
+
### Edge Gallery (Android)
|
| 48 |
+
1. Instala [Google AI Edge Gallery](https://play.google.com/store/apps/details?id=com.google.ai.edge.gallery)
|
| 49 |
+
2. Añade modelo via URL de HuggingFace
|
| 50 |
+
3. ¡Chatea!
|
| 51 |
+
|
| 52 |
+
### CLI
|
| 53 |
+
```bash
|
| 54 |
+
pip install litert-lm
|
| 55 |
+
litert-lm import --from-huggingface-repo RedSparkie/gemma-4-E2B-it-Uncensored-MAX-litert-lm gemma-4-E2B-it-Uncensored-MAX.litertlm uncensored-max
|
| 56 |
+
litert-lm run uncensored-max
|
| 57 |
+
```
|
| 58 |
+
|
| 59 |
+
## Detalles técnicos
|
| 60 |
+
|
| 61 |
+
| | |
|
| 62 |
+
|---|---|
|
| 63 |
+
| **Modelo base** | [prithivMLmods/gemma-4-E2B-it-Uncensored-MAX](https://huggingface.co/prithivMLmods/gemma-4-E2B-it-Uncensored-MAX) |
|
| 64 |
+
| **Arquitectura** | Gemma 4 E2B (text decoder only, ~1.4B params) |
|
| 65 |
+
| **Formato** | LiteRT-LM (`.litertlm`) |
|
| 66 |
+
| **Cuantización** | INT8 dynamic (`dynamic_wi8_afp32`) |
|
| 67 |
+
| **Contexto** | 4096 tokens |
|
| 68 |
+
| **Tamaño estimado** | ~1.5-2.0 GB |
|
| 69 |
+
| **Convertido con** | `litert-torch` v0.9.0 |
|
| 70 |
+
|
| 71 |
+
⚠️ Modelo abliterated/uncensored. Úsalo con responsabilidad.
|
gemma4_to_litertlm.ipynb
ADDED
|
@@ -0,0 +1,306 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"nbformat": 4,
|
| 3 |
+
"nbformat_minor": 0,
|
| 4 |
+
"metadata": {
|
| 5 |
+
"colab": {
|
| 6 |
+
"provenance": [],
|
| 7 |
+
"gpuType": "T4",
|
| 8 |
+
"machine_shape": "hm"
|
| 9 |
+
},
|
| 10 |
+
"kernelspec": {
|
| 11 |
+
"name": "python3",
|
| 12 |
+
"display_name": "Python 3"
|
| 13 |
+
},
|
| 14 |
+
"language_info": {
|
| 15 |
+
"name": "python"
|
| 16 |
+
},
|
| 17 |
+
"accelerator": "GPU"
|
| 18 |
+
},
|
| 19 |
+
"cells": [
|
| 20 |
+
{
|
| 21 |
+
"cell_type": "markdown",
|
| 22 |
+
"metadata": {},
|
| 23 |
+
"source": [
|
| 24 |
+
"# 🚀 Convertir Gemma 4 E2B Uncensored-MAX a LiteRT-LM\n",
|
| 25 |
+
"\n",
|
| 26 |
+
"Convierte el modelo a formato `.litertlm` para **Google AI Edge Gallery** en Android.\n",
|
| 27 |
+
"\n",
|
| 28 |
+
"**⚠️ IMPORTANTE:** Usa runtime con **GPU + RAM Alta**: Entorno de ejecución → Cambiar tipo → T4 + RAM Alta (hm)\n",
|
| 29 |
+
"\n",
|
| 30 |
+
"### Instrucciones:\n",
|
| 31 |
+
"1. Ejecuta celda **1️⃣** → pon tu token\n",
|
| 32 |
+
"2. Ejecuta celda **2️⃣** → instala dependencias. **El runtime se reiniciará, es normal.**\n",
|
| 33 |
+
"3. Tras el reinicio, ejecuta **3️⃣**, **4️⃣** y **5️⃣** en orden\n",
|
| 34 |
+
"\n",
|
| 35 |
+
"**Tiempo:** ~30-45 min"
|
| 36 |
+
]
|
| 37 |
+
},
|
| 38 |
+
{
|
| 39 |
+
"cell_type": "code",
|
| 40 |
+
"execution_count": null,
|
| 41 |
+
"metadata": {},
|
| 42 |
+
"outputs": [],
|
| 43 |
+
"source": [
|
| 44 |
+
"#@title 1️⃣ Configuración\n",
|
| 45 |
+
"HF_TOKEN = \"\" #@param {type:\"string\"}\n",
|
| 46 |
+
"OUTPUT_REPO = \"RedSparkie/gemma-4-E2B-it-Uncensored-MAX-litert-lm\" #@param {type:\"string\"}\n",
|
| 47 |
+
"SOURCE_MODEL = \"prithivMLmods/gemma-4-E2B-it-Uncensored-MAX\" #@param {type:\"string\"}\n",
|
| 48 |
+
"\n",
|
| 49 |
+
"import json, os\n",
|
| 50 |
+
"os.makedirs('/content/cfg', exist_ok=True)\n",
|
| 51 |
+
"with open('/content/cfg/config.json', 'w') as f:\n",
|
| 52 |
+
" json.dump({'HF_TOKEN': HF_TOKEN, 'OUTPUT_REPO': OUTPUT_REPO, 'SOURCE_MODEL': SOURCE_MODEL}, f)\n",
|
| 53 |
+
"print('✅ Config guardada')\n",
|
| 54 |
+
"assert HF_TOKEN, '❌ ¡Pon tu token de HuggingFace arriba!'"
|
| 55 |
+
]
|
| 56 |
+
},
|
| 57 |
+
{
|
| 58 |
+
"cell_type": "code",
|
| 59 |
+
"execution_count": null,
|
| 60 |
+
"metadata": {},
|
| 61 |
+
"outputs": [],
|
| 62 |
+
"source": [
|
| 63 |
+
"#@title 2️⃣ Instalar dependencias (reinicia el runtime)\n",
|
| 64 |
+
"# Colab trae torch/torchao/transformers viejos que son incompatibles.\n",
|
| 65 |
+
"# Necesitamos versiones exactas que funcionen juntas.\n",
|
| 66 |
+
"!pip install -q --upgrade \\\n",
|
| 67 |
+
" \"transformers>=5.7.0\" \\\n",
|
| 68 |
+
" \"torchao>=0.17.0\" \\\n",
|
| 69 |
+
" litert-torch \\\n",
|
| 70 |
+
" litert-lm \\\n",
|
| 71 |
+
" huggingface_hub \\\n",
|
| 72 |
+
" sentencepiece \\\n",
|
| 73 |
+
" protobuf \\\n",
|
| 74 |
+
" safetensors \\\n",
|
| 75 |
+
" psutil\n",
|
| 76 |
+
"\n",
|
| 77 |
+
"# Verificar\n",
|
| 78 |
+
"import torch, torchao, transformers\n",
|
| 79 |
+
"print(f'torch {torch.__version__} | torchao {torchao.__version__} | transformers {transformers.__version__}')\n",
|
| 80 |
+
"\n",
|
| 81 |
+
"# Test rápido: ¿funciona torchao.quantization.pt2e?\n",
|
| 82 |
+
"try:\n",
|
| 83 |
+
" import torchao.quantization.pt2e.quantize_pt2e\n",
|
| 84 |
+
" print('✅ torchao.quantization.pt2e OK')\n",
|
| 85 |
+
"except ImportError:\n",
|
| 86 |
+
" print('⚠️ torchao.quantization.pt2e no disponible, forzando reinstalación...')\n",
|
| 87 |
+
" import subprocess\n",
|
| 88 |
+
" subprocess.check_call(['pip', 'install', '-q', '--force-reinstall', 'torchao>=0.17.0'])\n",
|
| 89 |
+
"\n",
|
| 90 |
+
"# Test: ¿Gemma4 disponible?\n",
|
| 91 |
+
"try:\n",
|
| 92 |
+
" from transformers import Gemma4Config\n",
|
| 93 |
+
" print('✅ Gemma4 disponible')\n",
|
| 94 |
+
"except ImportError:\n",
|
| 95 |
+
" print('⚠️ Gemma4 no disponible, forzando reinstalación...')\n",
|
| 96 |
+
" import subprocess\n",
|
| 97 |
+
" subprocess.check_call(['pip', 'install', '-q', '--force-reinstall', 'transformers>=5.7.0'])\n",
|
| 98 |
+
"\n",
|
| 99 |
+
"# Reiniciar runtime para cargar todo limpio\n",
|
| 100 |
+
"print('\\n🔄 Reiniciando runtime...')\n",
|
| 101 |
+
"print(' Después del reinicio, ejecuta desde la celda 3️⃣')\n",
|
| 102 |
+
"import IPython\n",
|
| 103 |
+
"IPython.Application.instance().kernel.do_shutdown(True)"
|
| 104 |
+
]
|
| 105 |
+
},
|
| 106 |
+
{
|
| 107 |
+
"cell_type": "code",
|
| 108 |
+
"execution_count": null,
|
| 109 |
+
"metadata": {},
|
| 110 |
+
"outputs": [],
|
| 111 |
+
"source": [
|
| 112 |
+
"#@title 3️⃣ Preparar modelo (extraer solo texto)\n",
|
| 113 |
+
"# Recuperar config\n",
|
| 114 |
+
"import json, os\n",
|
| 115 |
+
"with open('/content/cfg/config.json') as f:\n",
|
| 116 |
+
" cfg = json.load(f)\n",
|
| 117 |
+
"HF_TOKEN = cfg['HF_TOKEN']\n",
|
| 118 |
+
"OUTPUT_REPO = cfg['OUTPUT_REPO']\n",
|
| 119 |
+
"SOURCE_MODEL = cfg['SOURCE_MODEL']\n",
|
| 120 |
+
"\n",
|
| 121 |
+
"# Verificar versiones\n",
|
| 122 |
+
"import torch, torchao, transformers\n",
|
| 123 |
+
"print(f'torch {torch.__version__} | torchao {torchao.__version__} | transformers {transformers.__version__}')\n",
|
| 124 |
+
"from transformers import Gemma4Config\n",
|
| 125 |
+
"import torchao.quantization.pt2e.quantize_pt2e\n",
|
| 126 |
+
"print('✅ Todo OK')\n",
|
| 127 |
+
"\n",
|
| 128 |
+
"import sys, gc, shutil, time\n",
|
| 129 |
+
"from huggingface_hub import hf_hub_download\n",
|
| 130 |
+
"from safetensors import safe_open\n",
|
| 131 |
+
"from safetensors.torch import save_file\n",
|
| 132 |
+
"import psutil\n",
|
| 133 |
+
"\n",
|
| 134 |
+
"def memlog(l=''):\n",
|
| 135 |
+
" m = psutil.virtual_memory()\n",
|
| 136 |
+
" print(f' [{l}] RAM: {m.available/(1024**3):.1f}/{m.total/(1024**3):.1f} GB')\n",
|
| 137 |
+
"\n",
|
| 138 |
+
"MODEL_DIR = '/content/model'\n",
|
| 139 |
+
"OUTPUT_DIR = '/content/output'\n",
|
| 140 |
+
"os.makedirs(MODEL_DIR, exist_ok=True)\n",
|
| 141 |
+
"os.makedirs(OUTPUT_DIR, exist_ok=True)\n",
|
| 142 |
+
"start_time = time.time()\n",
|
| 143 |
+
"memlog('inicio')\n",
|
| 144 |
+
"\n",
|
| 145 |
+
"print('📥 Descargando índice...')\n",
|
| 146 |
+
"idx_path = hf_hub_download(SOURCE_MODEL, 'model.safetensors.index.json', token=HF_TOKEN)\n",
|
| 147 |
+
"with open(idx_path) as f:\n",
|
| 148 |
+
" index = json.load(f)\n",
|
| 149 |
+
"\n",
|
| 150 |
+
"shard_lm = {}\n",
|
| 151 |
+
"for key, shard in index['weight_map'].items():\n",
|
| 152 |
+
" if key.startswith('model.language_model.'):\n",
|
| 153 |
+
" shard_lm.setdefault(shard, []).append(key)\n",
|
| 154 |
+
"\n",
|
| 155 |
+
"total_shards = len(shard_lm)\n",
|
| 156 |
+
"print(f' {sum(len(v) for v in shard_lm.values())} tensores en {total_shards} shards')\n",
|
| 157 |
+
"\n",
|
| 158 |
+
"weight_map = {}\n",
|
| 159 |
+
"for i, sn in enumerate(sorted(shard_lm)):\n",
|
| 160 |
+
" keys = shard_lm[sn]\n",
|
| 161 |
+
" out_name = f'model-{i+1:05d}-of-{total_shards:05d}.safetensors'\n",
|
| 162 |
+
" out_path = os.path.join(MODEL_DIR, out_name)\n",
|
| 163 |
+
" \n",
|
| 164 |
+
" if os.path.exists(out_path) and os.path.getsize(out_path) > 100:\n",
|
| 165 |
+
" print(f' {out_name} ya existe, skip')\n",
|
| 166 |
+
" with safe_open(out_path, framework='pt') as f:\n",
|
| 167 |
+
" for k in f.keys(): weight_map[k] = out_name\n",
|
| 168 |
+
" continue\n",
|
| 169 |
+
" \n",
|
| 170 |
+
" print(f' 📦 {sn} → {out_name} ({len(keys)} tensores)')\n",
|
| 171 |
+
" shard_path = hf_hub_download(SOURCE_MODEL, sn, token=HF_TOKEN)\n",
|
| 172 |
+
" \n",
|
| 173 |
+
" tensors = {}\n",
|
| 174 |
+
" with safe_open(shard_path, framework='pt') as f:\n",
|
| 175 |
+
" for key in keys:\n",
|
| 176 |
+
" tensors[key] = f.get_tensor(key)\n",
|
| 177 |
+
" \n",
|
| 178 |
+
" save_file(tensors, out_path)\n",
|
| 179 |
+
" for k in tensors: weight_map[k] = out_name\n",
|
| 180 |
+
" print(f' 💾 {os.path.getsize(out_path)/(1024**2):.0f} MB')\n",
|
| 181 |
+
" del tensors; gc.collect()\n",
|
| 182 |
+
" memlog(f'shard {i+1}')\n",
|
| 183 |
+
"\n",
|
| 184 |
+
"with open(os.path.join(MODEL_DIR, 'model.safetensors.index.json'), 'w') as f:\n",
|
| 185 |
+
" json.dump({'metadata': {}, 'weight_map': weight_map}, f)\n",
|
| 186 |
+
"\n",
|
| 187 |
+
"print('\\n📝 Config...')\n",
|
| 188 |
+
"config = transformers.AutoConfig.from_pretrained(SOURCE_MODEL, token=HF_TOKEN)\n",
|
| 189 |
+
"cd = config.to_dict()\n",
|
| 190 |
+
"cd['vision_config'] = None\n",
|
| 191 |
+
"cd['audio_config'] = None\n",
|
| 192 |
+
"for k in ['vision_soft_tokens_per_image','image_token_id','boi_token_id',\n",
|
| 193 |
+
" 'eoi_token_id','audio_token_id','boa_token_id','eoa_token_id',\n",
|
| 194 |
+
" 'eoa_token_index','video_token_id']:\n",
|
| 195 |
+
" cd.pop(k, None)\n",
|
| 196 |
+
"with open(os.path.join(MODEL_DIR, 'config.json'), 'w') as f:\n",
|
| 197 |
+
" json.dump(cd, f, indent=2)\n",
|
| 198 |
+
"\n",
|
| 199 |
+
"for fn in ['tokenizer.json','tokenizer_config.json','chat_template.jinja','generation_config.json']:\n",
|
| 200 |
+
" try:\n",
|
| 201 |
+
" shutil.copy(hf_hub_download(SOURCE_MODEL, fn, token=HF_TOKEN), os.path.join(MODEL_DIR, fn))\n",
|
| 202 |
+
" print(f' ✓ {fn}')\n",
|
| 203 |
+
" except: pass\n",
|
| 204 |
+
"\n",
|
| 205 |
+
"del config; gc.collect()\n",
|
| 206 |
+
"cache_dir = os.path.expanduser('~/.cache/huggingface/hub')\n",
|
| 207 |
+
"if os.path.exists(cache_dir):\n",
|
| 208 |
+
" for d in os.listdir(cache_dir):\n",
|
| 209 |
+
" if d.startswith('models--'):\n",
|
| 210 |
+
" shutil.rmtree(os.path.join(cache_dir, d), ignore_errors=True)\n",
|
| 211 |
+
"gc.collect()\n",
|
| 212 |
+
"print(f'\\n✅ Modelo preparado')\n",
|
| 213 |
+
"memlog('listo')"
|
| 214 |
+
]
|
| 215 |
+
},
|
| 216 |
+
{
|
| 217 |
+
"cell_type": "code",
|
| 218 |
+
"execution_count": null,
|
| 219 |
+
"metadata": {},
|
| 220 |
+
"outputs": [],
|
| 221 |
+
"source": [
|
| 222 |
+
"#@title 4️⃣ Convertir a .litertlm\n",
|
| 223 |
+
"from litert_torch.generative.export_hf import export as export_lib\n",
|
| 224 |
+
"\n",
|
| 225 |
+
"print('🚀 Convirtiendo a LiteRT-LM...')\n",
|
| 226 |
+
"print(' Esto tarda 15-30 min.')\n",
|
| 227 |
+
"memlog('pre-export')\n",
|
| 228 |
+
"conversion_start = time.time()\n",
|
| 229 |
+
"\n",
|
| 230 |
+
"export_lib.export(\n",
|
| 231 |
+
" model=MODEL_DIR,\n",
|
| 232 |
+
" output_dir=OUTPUT_DIR,\n",
|
| 233 |
+
" task='text_generation',\n",
|
| 234 |
+
" bundle_litert_lm=True,\n",
|
| 235 |
+
" quantization_recipe='dynamic_wi8_afp32',\n",
|
| 236 |
+
" cache_length=4096,\n",
|
| 237 |
+
" prefill_lengths=[256],\n",
|
| 238 |
+
" use_jinja_template=True,\n",
|
| 239 |
+
" keep_temporary_files=True,\n",
|
| 240 |
+
" trust_remote_code=False,\n",
|
| 241 |
+
" experimental_lightweight_conversion=True,\n",
|
| 242 |
+
" externalize_embedder=True,\n",
|
| 243 |
+
")\n",
|
| 244 |
+
"\n",
|
| 245 |
+
"print(f'\\n✅ Conversión en {(time.time()-conversion_start)/60:.1f} min')\n",
|
| 246 |
+
"memlog('post-export')"
|
| 247 |
+
]
|
| 248 |
+
},
|
| 249 |
+
{
|
| 250 |
+
"cell_type": "code",
|
| 251 |
+
"execution_count": null,
|
| 252 |
+
"metadata": {},
|
| 253 |
+
"outputs": [],
|
| 254 |
+
"source": [
|
| 255 |
+
"#@title 5️⃣ Verificar y subir\n",
|
| 256 |
+
"litertlm = os.path.join(OUTPUT_DIR, 'model.litertlm')\n",
|
| 257 |
+
"\n",
|
| 258 |
+
"if not os.path.exists(litertlm):\n",
|
| 259 |
+
" print('❌ model.litertlm no encontrado. Archivos:')\n",
|
| 260 |
+
" for r,d,fs in os.walk(OUTPUT_DIR):\n",
|
| 261 |
+
" for f in fs:\n",
|
| 262 |
+
" fp = os.path.join(r,f)\n",
|
| 263 |
+
" print(f' {os.path.relpath(fp,OUTPUT_DIR)}: {os.path.getsize(fp)/(1024**2):.1f} MB')\n",
|
| 264 |
+
"else:\n",
|
| 265 |
+
" size_gb = os.path.getsize(litertlm) / (1024**3)\n",
|
| 266 |
+
" print(f'📊 model.litertlm: {size_gb:.2f} GB')\n",
|
| 267 |
+
" if size_gb <= 2.0: print('✅ ¡Cabe en 2 GB!')\n",
|
| 268 |
+
" else: print(f'⚠️ {size_gb:.2f} GB — Cambia a dynamic_wi4_afp32 en celda 4')\n",
|
| 269 |
+
" \n",
|
| 270 |
+
" print(f'\\n📤 Subiendo a {OUTPUT_REPO}...')\n",
|
| 271 |
+
" from huggingface_hub import HfApi\n",
|
| 272 |
+
" api = HfApi(token=HF_TOKEN)\n",
|
| 273 |
+
" try: api.create_repo(OUTPUT_REPO, exist_ok=True)\n",
|
| 274 |
+
" except: pass\n",
|
| 275 |
+
" \n",
|
| 276 |
+
" api.upload_file(path_or_fileobj=litertlm,\n",
|
| 277 |
+
" path_in_repo='gemma-4-E2B-it-Uncensored-MAX.litertlm',\n",
|
| 278 |
+
" repo_id=OUTPUT_REPO, commit_message='Add LiteRT-LM model')\n",
|
| 279 |
+
" \n",
|
| 280 |
+
" readme = f\"\"\"---\\nlicense: apache-2.0\\nbase_model:\\n- {SOURCE_MODEL}\\ntags:\\n - litert-lm\\n - uncensored\\n - edge-gallery\\nlanguage:\\n- en\\n---\\n\\n# gemma-4-E2B-it-Uncensored-MAX (LiteRT-LM)\\n\\nLiteRT-LM conversion for **Google AI Edge Gallery**.\\n\\n| | |\\n|---|---|\\n| **Base** | [{SOURCE_MODEL}](https://huggingface.co/{SOURCE_MODEL}) |\\n| **Format** | `.litertlm` |\\n| **Quant** | INT8 |\\n| **Context** | 4096 |\\n| **Size** | {size_gb:.2f} GB |\\n\\n## Usage\\n1. Install [Edge Gallery](https://play.google.com/store/apps/details?id=com.google.ai.edge.gallery)\\n2. Add model via HF URL\\n3. Chat!\\n\\n⚠️ Uncensored. Use responsibly.\\n\"\"\"\n",
|
| 281 |
+
" api.upload_file(path_or_fileobj=readme.encode(), path_in_repo='README.md',\n",
|
| 282 |
+
" repo_id=OUTPUT_REPO, commit_message='README')\n",
|
| 283 |
+
" \n",
|
| 284 |
+
" print(f'\\n🎉 ¡LISTO!')\n",
|
| 285 |
+
" print(f'📱 https://huggingface.co/{OUTPUT_REPO}')\n",
|
| 286 |
+
" print(f'📊 {size_gb:.2f} GB')\n",
|
| 287 |
+
" print(f'⏱️ {(time.time()-start_time)/60:.0f} min total')"
|
| 288 |
+
]
|
| 289 |
+
},
|
| 290 |
+
{
|
| 291 |
+
"cell_type": "markdown",
|
| 292 |
+
"metadata": {},
|
| 293 |
+
"source": [
|
| 294 |
+
"## 🔧 Troubleshooting\n",
|
| 295 |
+
"\n",
|
| 296 |
+
"| Error | Solución |\n",
|
| 297 |
+
"|---|---|\n",
|
| 298 |
+
"| `KeyError: 'gemma4'` | `transformers` viejo. Re-ejecuta celda 2️⃣ y reinicia runtime |\n",
|
| 299 |
+
"| `No module 'torchao.quantization.pt2e'` | `torchao` viejo. Re-ejecuta celda 2️⃣ y reinicia runtime |\n",
|
| 300 |
+
"| OOM / Se queda sin memoria | Usa runtime **RAM Alta** (hm) |\n",
|
| 301 |
+
"| Modelo > 2 GB | Cambia `dynamic_wi8_afp32` → `dynamic_wi4_afp32` en celda 4️⃣ |\n",
|
| 302 |
+
"| `External embedder required` | Ya solucionado con `externalize_embedder=True` |"
|
| 303 |
+
]
|
| 304 |
+
}
|
| 305 |
+
]
|
| 306 |
+
}
|