RedSparkie commited on
Commit
6c512ff
·
verified ·
1 Parent(s): 6ec4762

Final tested Colab notebook with all fixes (correct weight naming, externalize_embedder, etc.)

Browse files
Files changed (1) hide show
  1. gemma4_to_litertlm.ipynb +118 -270
gemma4_to_litertlm.ipynb CHANGED
@@ -4,7 +4,8 @@
4
  "metadata": {
5
  "colab": {
6
  "provenance": [],
7
- "gpuType": "T4"
 
8
  },
9
  "kernelspec": {
10
  "name": "python3",
@@ -22,23 +23,11 @@
22
  "source": [
23
  "# 🚀 Convertir Gemma 4 E2B Uncensored-MAX a LiteRT-LM\n",
24
  "\n",
25
- "Este notebook convierte [prithivMLmods/gemma-4-E2B-it-Uncensored-MAX](https://huggingface.co/prithivMLmods/gemma-4-E2B-it-Uncensored-MAX) al formato `.litertlm` para usarlo con **Google AI Edge Gallery** en Android.\n",
26
  "\n",
27
- "**Requisitos:**\n",
28
- "- Colab con GPU (T4) — el runtime normal funciona, pero si te da OOM usa \"High-RAM\" (Entorno de ejecución → Cambiar tipo de entorno de ejecución → RAM Alta)\n",
29
- "- Token de HuggingFace con permisos de escritura\n",
30
  "\n",
31
- "**Tiempo estimado:** ~20-40 minutos"
32
- ]
33
- },
34
- {
35
- "cell_type": "markdown",
36
- "metadata": {},
37
- "source": [
38
- "## 1️⃣ Configura tu token de HuggingFace\n",
39
- "\n",
40
- "Necesitas un token con permisos de escritura para subir el modelo. \n",
41
- "Consíguelo en: https://huggingface.co/settings/tokens"
42
  ]
43
  },
44
  {
@@ -47,24 +36,11 @@
47
  "metadata": {},
48
  "outputs": [],
49
  "source": [
50
- "# PON TU TOKEN AQUÍ ⬇️\n",
51
- "HF_TOKEN = \"\" # Pega tu token de HuggingFace aquí (hf_...)\n",
52
- "\n",
53
- "# El repo donde se subirá el modelo convertido\n",
54
- "# Cámbialo por tu usuario si quieres\n",
55
- "OUTPUT_REPO = \"RedSparkie/gemma-4-E2B-it-Uncensored-MAX-litert-lm\"\n",
56
- "\n",
57
- "# Modelo fuente (los pesos originales en safetensors)\n",
58
- "SOURCE_MODEL = \"prithivMLmods/gemma-4-E2B-it-Uncensored-MAX\"\n",
59
- "\n",
60
- "assert HF_TOKEN, \"❌ ¡Pon tu token de HuggingFace arriba!\""
61
- ]
62
- },
63
- {
64
- "cell_type": "markdown",
65
- "metadata": {},
66
- "source": [
67
- "## 2️⃣ Instalar dependencias"
68
  ]
69
  },
70
  {
@@ -73,19 +49,9 @@
73
  "metadata": {},
74
  "outputs": [],
75
  "source": [
 
76
  "!pip install -q litert-torch litert-lm transformers huggingface_hub sentencepiece protobuf safetensors psutil\n",
77
- "print(\"✅ Dependencias instaladas\")"
78
- ]
79
- },
80
- {
81
- "cell_type": "markdown",
82
- "metadata": {},
83
- "source": [
84
- "## 3️⃣ Extraer solo los pesos del decoder de texto\n",
85
- "\n",
86
- "El modelo original es multimodal (texto + visión + audio = 9.6 GB). \n",
87
- "Nosotros solo necesitamos el decoder de texto (~4.8 GB en bf16). \n",
88
- "Esto ahorra mucha RAM."
89
  ]
90
  },
91
  {
@@ -94,139 +60,106 @@
94
  "metadata": {},
95
  "outputs": [],
96
  "source": [
97
- "import os, json, gc, shutil, sys, time\n",
 
98
  "from huggingface_hub import hf_hub_download\n",
99
  "from safetensors import safe_open\n",
100
  "from safetensors.torch import save_file\n",
101
- "import torch\n",
102
- "import transformers\n",
103
- "import psutil\n",
104
  "\n",
105
- "def memlog(label=\"\"):\n",
106
- " m = psutil.virtual_memory()\n",
107
- " print(f\" [{label}] RAM: {m.available/(1024**3):.1f}/{m.total/(1024**3):.1f} GB disponibles\")\n",
108
  "\n",
109
- "TEXT_MODEL_DIR = \"/content/text_model\"\n",
110
  "OUTPUT_DIR = \"/content/output\"\n",
111
- "os.makedirs(TEXT_MODEL_DIR, exist_ok=True)\n",
112
  "os.makedirs(OUTPUT_DIR, exist_ok=True)\n",
113
- "\n",
114
  "start_time = time.time()\n",
115
- "memlog(\"INICIO\")\n",
116
  "\n",
117
- "# Descargar index de shards\n",
118
- "print(\"📥 Descargando índice de pesos...\")\n",
119
- "idx_file = hf_hub_download(SOURCE_MODEL, \"model.safetensors.index.json\", token=HF_TOKEN)\n",
120
- "with open(idx_file) as f:\n",
121
  " index = json.load(f)\n",
122
  "\n",
123
- "# Identificar shards que contienen pesos del language model\n",
124
- "shard_keys = {}\n",
125
  "for key, shard in index[\"weight_map\"].items():\n",
126
  " if key.startswith(\"model.language_model.\"):\n",
127
- " if shard not in shard_keys:\n",
128
- " shard_keys[shard] = []\n",
129
- " shard_keys[shard].append(key)\n",
130
- "\n",
131
- "print(f\" Encontrados {sum(len(v) for v in shard_keys.values())} tensores de texto en {len(shard_keys)} shards\")\n",
132
  "\n",
133
- "# Procesar shard por shard: extraer solo pesos LM, guardar y liberar\n",
134
- "new_weight_map = {}\n",
135
- "shard_idx = 0\n",
136
  "\n",
137
- "for shard_name in sorted(shard_keys.keys()):\n",
138
- " keys_in_shard = shard_keys[shard_name]\n",
139
- " print(f\"\\n📦 Procesando {shard_name} ({len(keys_in_shard)} tensores)...\")\n",
 
 
 
140
  " \n",
141
- " # Descargar shard\n",
142
- " shard_path = hf_hub_download(SOURCE_MODEL, shard_name, token=HF_TOKEN)\n",
 
 
 
143
  " \n",
144
- " # Extraer solo tensores del language_model, quitar prefijo\n",
145
- " lm_weights = {}\n",
146
- " with safe_open(shard_path, framework=\"pt\") as f:\n",
147
- " for key in keys_in_shard:\n",
148
- " new_key = key[len(\"model.language_model.\"):]\n",
149
- " lm_weights[new_key] = f.get_tensor(key)\n",
150
  " \n",
151
- " # Guardar como nuevo shard\n",
152
- " shard_idx += 1\n",
153
- " out_name = f\"model-{shard_idx:05d}-of-TEMP.safetensors\"\n",
154
- " out_path = os.path.join(TEXT_MODEL_DIR, out_name)\n",
155
- " save_file(lm_weights, out_path)\n",
156
  " \n",
157
- " for k in lm_weights:\n",
158
- " new_weight_map[k] = out_name\n",
159
  " \n",
160
  " size_mb = os.path.getsize(out_path) / (1024**2)\n",
161
- " print(f\" 💾 Guardado {out_name}: {size_mb:.0f} MB\")\n",
162
- " \n",
163
- " del lm_weights\n",
164
- " gc.collect()\n",
165
- " memlog(f\"shard {shard_idx}\")\n",
166
- "\n",
167
- "# Renombrar shards con total correcto\n",
168
- "total_shards = shard_idx\n",
169
- "final_weight_map = {}\n",
170
- "for i in range(1, total_shards + 1):\n",
171
- " old_name = f\"model-{i:05d}-of-TEMP.safetensors\"\n",
172
- " new_name = f\"model-{i:05d}-of-{total_shards:05d}.safetensors\"\n",
173
- " os.rename(os.path.join(TEXT_MODEL_DIR, old_name), os.path.join(TEXT_MODEL_DIR, new_name))\n",
174
- " for key, shard in new_weight_map.items():\n",
175
- " if shard == old_name:\n",
176
- " final_weight_map[key] = new_name\n",
177
  "\n",
178
  "# Escribir índice\n",
179
- "with open(os.path.join(TEXT_MODEL_DIR, \"model.safetensors.index.json\"), \"w\") as f:\n",
180
- " json.dump({\"metadata\": {}, \"weight_map\": final_weight_map}, f)\n",
181
  "\n",
182
- "# Config: usar Gemma4TextConfig como standalone\n",
 
183
  "config = transformers.AutoConfig.from_pretrained(SOURCE_MODEL, token=HF_TOKEN)\n",
184
- "tc = config.text_config.to_dict()\n",
185
- "tc[\"architectures\"] = [\"Gemma4ForCausalLM\"]\n",
186
- "tc[\"model_type\"] = \"gemma4_text\"\n",
187
- "tc[\"eos_token_id\"] = config.eos_token_id if hasattr(config, \"eos_token_id\") else [1, 106]\n",
188
- "tc[\"tie_word_embeddings\"] = config.tie_word_embeddings\n",
189
- "with open(os.path.join(TEXT_MODEL_DIR, \"config.json\"), \"w\") as f:\n",
190
- " json.dump(tc, f, indent=2)\n",
191
- "\n",
192
- "# Copiar tokenizer y templates\n",
 
 
193
  "for fn in [\"tokenizer.json\", \"tokenizer_config.json\", \"chat_template.jinja\", \"generation_config.json\"]:\n",
194
  " try:\n",
195
  " src = hf_hub_download(SOURCE_MODEL, fn, token=HF_TOKEN)\n",
196
- " shutil.copy(src, os.path.join(TEXT_MODEL_DIR, fn))\n",
197
- " print(f\" 📄 {fn}\")\n",
198
- " except:\n",
199
- " pass\n",
200
- "\n",
201
- "del config\n",
 
 
 
 
 
 
202
  "gc.collect()\n",
203
  "\n",
204
- "# Resumen\n",
205
- "print(f\"\\n✅ Modelo de texto extraído en {TEXT_MODEL_DIR}:\")\n",
206
- "total_size = 0\n",
207
- "for f in sorted(os.listdir(TEXT_MODEL_DIR)):\n",
208
- " fp = os.path.join(TEXT_MODEL_DIR, f)\n",
209
- " if os.path.isfile(fp):\n",
210
- " s = os.path.getsize(fp)\n",
211
- " total_size += s\n",
212
- " print(f\" {f}: {s/(1024**2):.1f} MB\")\n",
213
- "print(f\" Total: {total_size/(1024**3):.2f} GB\")\n",
214
- "print(f\" Tiempo: {(time.time()-start_time)/60:.1f} min\")"
215
- ]
216
- },
217
- {
218
- "cell_type": "markdown",
219
- "metadata": {},
220
- "source": [
221
- "## 4️⃣ Convertir a LiteRT-LM (.litertlm)\n",
222
- "\n",
223
- "Aquí es donde ocurre la magia. El pipeline de `litert-torch` hace:\n",
224
- "1. Cargar el modelo en float32\n",
225
- "2. Exportar a TFLite via torch.export\n",
226
- "3. Cuantizar a INT8 (dynamic_wi8_afp32)\n",
227
- "4. Empaquetar como `.litertlm`\n",
228
- "\n",
229
- "⚠️ **Si te da error de memoria**, ve a: Entorno de ejecución → Cambiar tipo → **RAM Alta**"
230
  ]
231
  },
232
  {
@@ -235,99 +168,33 @@
235
  "metadata": {},
236
  "outputs": [],
237
  "source": [
238
- "import litert_torch.generative.export_hf.core.export_lib as elib\n",
239
- "from litert_torch.generative.export_hf.model_ext import patches as mpatches\n",
240
- "from litert_torch.generative.export_hf.core import exportable_module_config\n",
241
- "from litert_torch.generative.export_hf.core import utils\n",
242
- "from litert_torch import progress\n",
243
- "import huggingface_hub as hfhub\n",
244
- "\n",
245
- "ExportTask = exportable_module_config.ExportTask\n",
246
- "\n",
247
- "# Monkey-patch load_model para cargar Gemma4ForCausalLM desde nuestro dir\n",
248
- "@progress.task('Load source model')\n",
249
- "def patched_load_model(model_path, trust_remote_code=False, auto_model_override=None, task=ExportTask.TEXT_GENERATION):\n",
250
- " print(\" 🔧 Cargando Gemma4ForCausalLM (solo texto)...\")\n",
251
- " \n",
252
- " config = transformers.AutoConfig.from_pretrained(model_path, trust_remote_code=trust_remote_code)\n",
253
- " config.model_type = \"gemma4\" # Para que el pipeline reconozca la arquitectura\n",
254
- " config._attn_implementation = 'lrt_transposed_attention'\n",
255
- " \n",
256
- " from transformers import Gemma4ForCausalLM\n",
257
- " with mpatches.get_patch_context(\"gemma4\"):\n",
258
- " model = Gemma4ForCausalLM.from_pretrained(\n",
259
- " model_path, config=config, torch_dtype=torch.float32,\n",
260
- " trust_remote_code=trust_remote_code, low_cpu_mem_usage=True,\n",
261
- " attn_implementation='eager',\n",
262
- " )\n",
263
- " \n",
264
- " memlog(\"modelo cargado\")\n",
265
- " \n",
266
- " model.generation_config.cache_implementation = 'static'\n",
267
- " model.generation_config.do_sample = False\n",
268
- " \n",
269
- " # El pipeline espera un config con text_config\n",
270
- " class FullConfig:\n",
271
- " def __init__(self, tc):\n",
272
- " self.model_type = \"gemma4\"\n",
273
- " self.text_config = tc\n",
274
- " self.eos_token_id = getattr(tc, 'eos_token_id', [1, 106])\n",
275
- " self.tie_word_embeddings = getattr(tc, 'tie_word_embeddings', True)\n",
276
- " \n",
277
- " full_config = FullConfig(config)\n",
278
- " model.config = full_config\n",
279
- " \n",
280
- " tokenizer = transformers.AutoTokenizer.from_pretrained(model_path)\n",
281
- " \n",
282
- " # Cargar chat template\n",
283
- " if not getattr(tokenizer, 'chat_template', None):\n",
284
- " jinja_path = os.path.join(model_path, 'chat_template.jinja')\n",
285
- " if os.path.exists(jinja_path):\n",
286
- " with open(jinja_path) as f:\n",
287
- " tokenizer.chat_template = f.read()\n",
288
- " \n",
289
- " return elib.SourceModelArtifacts(\n",
290
- " model=model, model_config=full_config,\n",
291
- " text_model_config=config, tokenizer=tokenizer, image_processor=None,\n",
292
- " )\n",
293
- "\n",
294
- "# Aplicar el patch\n",
295
- "elib.load_model = patched_load_model\n",
296
- "\n",
297
- "# Ejecutar la conversión\n",
298
  "from litert_torch.generative.export_hf import export as export_lib\n",
299
  "\n",
300
- "print(\"🚀 Iniciando conversión a LiteRT-LM...\")\n",
301
- "print(f\" Fuente: {TEXT_MODEL_DIR}\")\n",
302
- "print(f\" Destino: {OUTPUT_DIR}\")\n",
303
- "print(f\" Cuantización: dynamic_wi8_afp32 (INT8)\")\n",
304
- "print(f\" Cache: 4096 tokens\")\n",
305
- "print()\n",
306
  "\n",
307
  "conversion_start = time.time()\n",
308
  "\n",
309
  "export_lib.export(\n",
310
- " model=TEXT_MODEL_DIR,\n",
311
  " output_dir=OUTPUT_DIR,\n",
312
  " task=\"text_generation\",\n",
313
  " bundle_litert_lm=True,\n",
314
- " quantization_recipe=\"dynamic_wi8_afp32\",\n",
315
  " cache_length=4096,\n",
316
  " prefill_lengths=[256],\n",
317
  " use_jinja_template=True,\n",
318
  " keep_temporary_files=True,\n",
319
  " trust_remote_code=False,\n",
320
  " experimental_lightweight_conversion=True,\n",
 
321
  ")\n",
322
  "\n",
323
- "print(f\"\\n✅ Conversión completada en {(time.time()-conversion_start)/60:.1f} minutos\")"
324
- ]
325
- },
326
- {
327
- "cell_type": "markdown",
328
- "metadata": {},
329
- "source": [
330
- "## 5️⃣ Verificar y subir a HuggingFace"
331
  ]
332
  },
333
  {
@@ -336,57 +203,44 @@
336
  "metadata": {},
337
  "outputs": [],
338
  "source": [
339
- "litertlm_path = os.path.join(OUTPUT_DIR, \"model.litertlm\")\n",
340
- "\n",
341
- "if not os.path.exists(litertlm_path):\n",
342
- " print(\"❌ model.litertlm no encontrado. Archivos generados:\")\n",
343
- " for root, dirs, files in os.walk(OUTPUT_DIR):\n",
344
- " for f in files:\n",
345
- " fp = os.path.join(root, f)\n",
346
- " print(f\" {os.path.relpath(fp, OUTPUT_DIR)}: {os.path.getsize(fp)/(1024**2):.1f} MB\")\n",
 
347
  "else:\n",
348
- " size_bytes = os.path.getsize(litertlm_path)\n",
349
- " size_gb = size_bytes / (1024**3)\n",
350
- " print(f\"📊 model.litertlm: {size_gb:.2f} GB ({size_bytes:,} bytes)\")\n",
351
  " if size_gb <= 2.0:\n",
352
- " print(f\"✅ ¡Cabe en 2 GB!\")\n",
353
  " else:\n",
354
- " print(f\"⚠️ Pesa más de 2 GB ({size_gb:.2f} GB)\")\n",
355
  " \n",
356
  " print(f\"\\n📤 Subiendo a {OUTPUT_REPO}...\")\n",
357
  " from huggingface_hub import HfApi\n",
358
  " api = HfApi(token=HF_TOKEN)\n",
 
 
359
  " \n",
360
- " # Crear repo si no existe\n",
361
- " try:\n",
362
- " api.create_repo(OUTPUT_REPO, exist_ok=True)\n",
363
- " except:\n",
364
- " pass\n",
365
- " \n",
366
- " # Subir modelo\n",
367
  " api.upload_file(\n",
368
- " path_or_fileobj=litertlm_path,\n",
369
  " path_in_repo=\"gemma-4-E2B-it-Uncensored-MAX.litertlm\",\n",
370
  " repo_id=OUTPUT_REPO,\n",
371
  " commit_message=\"Add LiteRT-LM model\",\n",
372
  " )\n",
373
  " \n",
374
- " # Subir README\n",
375
- " readme = f\"\"\"---\\nlicense: apache-2.0\\nbase_model:\\n- prithivMLmods/gemma-4-E2B-it-Uncensored-MAX\\ntags:\\n - litert-lm\\n - uncensored\\n - abliterated\\n - edge-gallery\\n - on-device\\nlanguage:\\n- en\\n---\\n\\n# gemma-4-E2B-it-Uncensored-MAX (LiteRT-LM)\\n\\nLiteRT-LM conversion of [prithivMLmods/gemma-4-E2B-it-Uncensored-MAX](https://huggingface.co/prithivMLmods/gemma-4-E2B-it-Uncensored-MAX) for **Google AI Edge Gallery** on Android.\\n\\n| | |\\n|---|---|\\n| **Base model** | [prithivMLmods/gemma-4-E2B-it-Uncensored-MAX](https://huggingface.co/prithivMLmods/gemma-4-E2B-it-Uncensored-MAX) |\\n| **Format** | LiteRT-LM (`.litertlm`) |\\n| **Quantization** | INT8 (`dynamic_wi8_afp32`) |\\n| **Task** | Text generation |\\n| **Context** | 4096 tokens |\\n| **Size** | {size_gb:.2f} GB |\\n\\n## Usage\\n\\n### Edge Gallery (Android)\\n1. Install [Google AI Edge Gallery](https://play.google.com/store/apps/details?id=com.google.ai.edge.gallery)\\n2. Add model via HuggingFace URL\\n3. Chat!\\n\\n### CLI\\n```bash\\npip install litert-lm\\nlitert-lm import --from-huggingface-repo {OUTPUT_REPO} gemma-4-E2B-it-Uncensored-MAX.litertlm uncensored-max\\nlitert-lm run uncensored-max\\n```\\n\\n⚠️ Abliterated/uncensored model. Use responsibly.\\n\"\"\" \n",
376
- " api.upload_file(\n",
377
- " path_or_fileobj=readme.encode(),\n",
378
- " path_in_repo=\"README.md\",\n",
379
- " repo_id=OUTPUT_REPO,\n",
380
- " commit_message=\"Add README\",\n",
381
- " )\n",
382
  " \n",
383
- " total_time = (time.time() - start_time) / 60\n",
384
- " print(f\"\\n{'='*50}\")\n",
385
- " print(f\"🎉 ¡LISTO!\")\n",
386
- " print(f\"📱 Modelo: https://huggingface.co/{OUTPUT_REPO}\")\n",
387
- " print(f\"📊 Tamaño: {size_gb:.2f} GB\")\n",
388
- " print(f\"⏱️ Tiempo total: {total_time:.0f} minutos\")\n",
389
- " print(f\"{'='*50}\")"
390
  ]
391
  },
392
  {
@@ -395,17 +249,11 @@
395
  "source": [
396
  "## 🔧 Troubleshooting\n",
397
  "\n",
398
- "**Error de memoria (OOM):**\n",
399
- "- Ve a **Entorno de ejecución → Cambiar tipo de entorno** → Activa **RAM Alta**\n",
400
- "- Si aún falla, reinicia el runtime y ejecuta todas las celdas de nuevo\n",
401
  "\n",
402
- "**Error de `attn_implementation`:**\n",
403
- "- Esto es normal si hay incompatibilidad entre versiones de `transformers` y `litert-torch`\n",
404
- "- Intenta: `!pip install transformers==5.7.0`\n",
405
  "\n",
406
- "**El modelo pesa >2 GB:**\n",
407
- "- Cambia `quantization_recipe` a `\"dynamic_wi4_afp32\"` (INT4) en la celda 4\n",
408
- "- Esto reducirá el tamaño a la mitad pero con algo menos de calidad"
409
  ]
410
  }
411
  ]
 
4
  "metadata": {
5
  "colab": {
6
  "provenance": [],
7
+ "gpuType": "T4",
8
+ "machine_shape": "hm"
9
  },
10
  "kernelspec": {
11
  "name": "python3",
 
23
  "source": [
24
  "# 🚀 Convertir Gemma 4 E2B Uncensored-MAX a LiteRT-LM\n",
25
  "\n",
26
+ "Convierte el modelo a formato `.litertlm` para **Google AI Edge Gallery** en Android.\n",
27
  "\n",
28
+ "**⚠️ IMPORTANTE:** Usa runtime con **GPU + RAM Alta**: Entorno de ejecución → Cambiar tipo → T4 + RAM Alta (hm)\n",
 
 
29
  "\n",
30
+ "**Tiempo estimado:** ~30-45 minutos"
 
 
 
 
 
 
 
 
 
 
31
  ]
32
  },
33
  {
 
36
  "metadata": {},
37
  "outputs": [],
38
  "source": [
39
+ "#@title 1 Configuración\n",
40
+ "HF_TOKEN = \"\" #@param {type:\"string\"}\n",
41
+ "OUTPUT_REPO = \"RedSparkie/gemma-4-E2B-it-Uncensored-MAX-litert-lm\" #@param {type:\"string\"}\n",
42
+ "SOURCE_MODEL = \"prithivMLmods/gemma-4-E2B-it-Uncensored-MAX\" #@param {type:\"string\"}\n",
43
+ "assert HF_TOKEN, \"❌ ¡Pon tu token de HuggingFace!\""
 
 
 
 
 
 
 
 
 
 
 
 
 
44
  ]
45
  },
46
  {
 
49
  "metadata": {},
50
  "outputs": [],
51
  "source": [
52
+ "#@title 2️⃣ Instalar dependencias\n",
53
  "!pip install -q litert-torch litert-lm transformers huggingface_hub sentencepiece protobuf safetensors psutil\n",
54
+ "print(\"✅ Instalado\")"
 
 
 
 
 
 
 
 
 
 
 
55
  ]
56
  },
57
  {
 
60
  "metadata": {},
61
  "outputs": [],
62
  "source": [
63
+ "#@title 3️⃣ Preparar modelo (extraer solo texto, sin visión/audio)\n",
64
+ "import os, sys, json, gc, shutil, time\n",
65
  "from huggingface_hub import hf_hub_download\n",
66
  "from safetensors import safe_open\n",
67
  "from safetensors.torch import save_file\n",
68
+ "import transformers, psutil\n",
 
 
69
  "\n",
70
+ "def memlog(l=\"\"):\n",
71
+ " m=psutil.virtual_memory()\n",
72
+ " print(f\" [{l}] RAM: {m.available/(1024**3):.1f}/{m.total/(1024**3):.1f} GB\")\n",
73
  "\n",
74
+ "MODEL_DIR = \"/content/model\"\n",
75
  "OUTPUT_DIR = \"/content/output\"\n",
76
+ "os.makedirs(MODEL_DIR, exist_ok=True)\n",
77
  "os.makedirs(OUTPUT_DIR, exist_ok=True)\n",
 
78
  "start_time = time.time()\n",
79
+ "memlog(\"inicio\")\n",
80
  "\n",
81
+ "# Descargar índice\n",
82
+ "print(\"📥 Descargando índice...\")\n",
83
+ "idx_path = hf_hub_download(SOURCE_MODEL, \"model.safetensors.index.json\", token=HF_TOKEN)\n",
84
+ "with open(idx_path) as f:\n",
85
  " index = json.load(f)\n",
86
  "\n",
87
+ "# Agrupar pesos del language model por shard\n",
88
+ "shard_lm = {}\n",
89
  "for key, shard in index[\"weight_map\"].items():\n",
90
  " if key.startswith(\"model.language_model.\"):\n",
91
+ " shard_lm.setdefault(shard, []).append(key)\n",
 
 
 
 
92
  "\n",
93
+ "total_shards = len(shard_lm)\n",
94
+ "print(f\" {sum(len(v) for v in shard_lm.values())} tensores en {total_shards} shards\")\n",
 
95
  "\n",
96
+ "# Extraer shard por shard (MANTENER el prefijo model.language_model.)\n",
97
+ "weight_map = {}\n",
98
+ "for i, sn in enumerate(sorted(shard_lm)):\n",
99
+ " keys = shard_lm[sn]\n",
100
+ " out_name = f\"model-{i+1:05d}-of-{total_shards:05d}.safetensors\"\n",
101
+ " out_path = os.path.join(MODEL_DIR, out_name)\n",
102
  " \n",
103
+ " if os.path.exists(out_path) and os.path.getsize(out_path) > 100:\n",
104
+ " print(f\" {out_name} ya existe, skip\")\n",
105
+ " with safe_open(out_path, framework=\"pt\") as f:\n",
106
+ " for k in f.keys(): weight_map[k] = out_name\n",
107
+ " continue\n",
108
  " \n",
109
+ " print(f\" 📦 {sn} {out_name} ({len(keys)} tensores)\")\n",
110
+ " shard_path = hf_hub_download(SOURCE_MODEL, sn, token=HF_TOKEN)\n",
 
 
 
 
111
  " \n",
112
+ " # Extraer tensores MANTENIENDO el prefijo original\n",
113
+ " tensors = {}\n",
114
+ " with safe_open(shard_path, framework=\"pt\") as f:\n",
115
+ " for key in keys:\n",
116
+ " tensors[key] = f.get_tensor(key)\n",
117
  " \n",
118
+ " save_file(tensors, out_path)\n",
119
+ " for k in tensors: weight_map[k] = out_name\n",
120
  " \n",
121
  " size_mb = os.path.getsize(out_path) / (1024**2)\n",
122
+ " print(f\" 💾 {size_mb:.0f} MB\")\n",
123
+ " del tensors; gc.collect()\n",
124
+ " memlog(f\"shard {i+1}\")\n",
 
 
 
 
 
 
 
 
 
 
 
 
 
125
  "\n",
126
  "# Escribir índice\n",
127
+ "with open(os.path.join(MODEL_DIR, \"model.safetensors.index.json\"), \"w\") as f:\n",
128
+ " json.dump({\"metadata\": {}, \"weight_map\": weight_map}, f)\n",
129
  "\n",
130
+ "# Config: Gemma4 con vision=None, audio=None\n",
131
+ "print(\"\\n📝 Creando config...\")\n",
132
  "config = transformers.AutoConfig.from_pretrained(SOURCE_MODEL, token=HF_TOKEN)\n",
133
+ "cd = config.to_dict()\n",
134
+ "cd[\"vision_config\"] = None\n",
135
+ "cd[\"audio_config\"] = None\n",
136
+ "for k in [\"vision_soft_tokens_per_image\", \"image_token_id\", \"boi_token_id\",\n",
137
+ " \"eoi_token_id\", \"audio_token_id\", \"boa_token_id\", \"eoa_token_id\",\n",
138
+ " \"eoa_token_index\", \"video_token_id\"]:\n",
139
+ " cd.pop(k, None)\n",
140
+ "with open(os.path.join(MODEL_DIR, \"config.json\"), \"w\") as f:\n",
141
+ " json.dump(cd, f, indent=2)\n",
142
+ "\n",
143
+ "# Tokenizer y archivos extra\n",
144
  "for fn in [\"tokenizer.json\", \"tokenizer_config.json\", \"chat_template.jinja\", \"generation_config.json\"]:\n",
145
  " try:\n",
146
  " src = hf_hub_download(SOURCE_MODEL, fn, token=HF_TOKEN)\n",
147
+ " shutil.copy(src, os.path.join(MODEL_DIR, fn))\n",
148
+ " print(f\" {fn}\")\n",
149
+ " except: pass\n",
150
+ "\n",
151
+ "del config; gc.collect()\n",
152
+ "\n",
153
+ "# Limpiar caché HF\n",
154
+ "cache_dir = os.path.expanduser(\"~/.cache/huggingface/hub\")\n",
155
+ "if os.path.exists(cache_dir):\n",
156
+ " for d in os.listdir(cache_dir):\n",
157
+ " if d.startswith(\"models--\"):\n",
158
+ " shutil.rmtree(os.path.join(cache_dir, d), ignore_errors=True)\n",
159
  "gc.collect()\n",
160
  "\n",
161
+ "print(f\"\\n✅ Modelo preparado\")\n",
162
+ "memlog(\"listo\")"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
163
  ]
164
  },
165
  {
 
168
  "metadata": {},
169
  "outputs": [],
170
  "source": [
171
+ "#@title 4️⃣ Convertir a .litertlm\n",
172
+ "import torch\n",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
173
  "from litert_torch.generative.export_hf import export as export_lib\n",
174
  "\n",
175
+ "print(\"🚀 Convirtiendo a LiteRT-LM...\")\n",
176
+ "print(\" Esto tarda 15-30 min. Paciencia.\")\n",
177
+ "memlog(\"pre-export\")\n",
 
 
 
178
  "\n",
179
  "conversion_start = time.time()\n",
180
  "\n",
181
  "export_lib.export(\n",
182
+ " model=MODEL_DIR,\n",
183
  " output_dir=OUTPUT_DIR,\n",
184
  " task=\"text_generation\",\n",
185
  " bundle_litert_lm=True,\n",
186
+ " quantization_recipe=\"dynamic_wi8_afp32\", # INT8 (como el oficial)\n",
187
  " cache_length=4096,\n",
188
  " prefill_lengths=[256],\n",
189
  " use_jinja_template=True,\n",
190
  " keep_temporary_files=True,\n",
191
  " trust_remote_code=False,\n",
192
  " experimental_lightweight_conversion=True,\n",
193
+ " externalize_embedder=True, # Requerido para Gemma4\n",
194
  ")\n",
195
  "\n",
196
+ "print(f\"\\n✅ Conversión en {(time.time()-conversion_start)/60:.1f} min\")\n",
197
+ "memlog(\"post-export\")"
 
 
 
 
 
 
198
  ]
199
  },
200
  {
 
203
  "metadata": {},
204
  "outputs": [],
205
  "source": [
206
+ "#@title 5️⃣ Verificar y subir\n",
207
+ "litertlm = os.path.join(OUTPUT_DIR, \"model.litertlm\")\n",
208
+ "\n",
209
+ "if not os.path.exists(litertlm):\n",
210
+ " print(\"❌ model.litertlm no encontrado. Archivos:\")\n",
211
+ " for r,d,fs in os.walk(OUTPUT_DIR):\n",
212
+ " for f in fs:\n",
213
+ " fp = os.path.join(r,f)\n",
214
+ " print(f\" {os.path.relpath(fp,OUTPUT_DIR)}: {os.path.getsize(fp)/(1024**2):.1f} MB\")\n",
215
  "else:\n",
216
+ " size_gb = os.path.getsize(litertlm) / (1024**3)\n",
217
+ " print(f\"📊 model.litertlm: {size_gb:.2f} GB\")\n",
 
218
  " if size_gb <= 2.0:\n",
219
+ " print(\"✅ ¡Cabe en 2 GB!\")\n",
220
  " else:\n",
221
+ " print(f\"⚠️ {size_gb:.2f} GB — Si necesitas menos, cambia a dynamic_wi4_afp32 en celda 4\")\n",
222
  " \n",
223
  " print(f\"\\n📤 Subiendo a {OUTPUT_REPO}...\")\n",
224
  " from huggingface_hub import HfApi\n",
225
  " api = HfApi(token=HF_TOKEN)\n",
226
+ " try: api.create_repo(OUTPUT_REPO, exist_ok=True)\n",
227
+ " except: pass\n",
228
  " \n",
 
 
 
 
 
 
 
229
  " api.upload_file(\n",
230
+ " path_or_fileobj=litertlm,\n",
231
  " path_in_repo=\"gemma-4-E2B-it-Uncensored-MAX.litertlm\",\n",
232
  " repo_id=OUTPUT_REPO,\n",
233
  " commit_message=\"Add LiteRT-LM model\",\n",
234
  " )\n",
235
  " \n",
236
+ " readme = f\"\"\"---\\nlicense: apache-2.0\\nbase_model:\\n- {SOURCE_MODEL}\\ntags:\\n - litert-lm\\n - uncensored\\n - edge-gallery\\nlanguage:\\n- en\\n---\\n\\n# gemma-4-E2B-it-Uncensored-MAX (LiteRT-LM)\\n\\nLiteRT-LM conversion for **Google AI Edge Gallery**.\\n\\n| | |\\n|---|---|\\n| **Base** | [{SOURCE_MODEL}](https://huggingface.co/{SOURCE_MODEL}) |\\n| **Format** | `.litertlm` |\\n| **Quant** | INT8 |\\n| **Context** | 4096 |\\n| **Size** | {size_gb:.2f} GB |\\n\\n## Usage\\n1. Install [Edge Gallery](https://play.google.com/store/apps/details?id=com.google.ai.edge.gallery)\\n2. Add model via HF URL\\n3. Chat!\\n\\n⚠️ Uncensored. Use responsibly.\\n\"\"\"\n",
237
+ " api.upload_file(path_or_fileobj=readme.encode(), path_in_repo=\"README.md\",\n",
238
+ " repo_id=OUTPUT_REPO, commit_message=\"README\")\n",
 
 
 
 
 
239
  " \n",
240
+ " print(f\"\\n🎉 ¡LISTO!\")\n",
241
+ " print(f\"📱 https://huggingface.co/{OUTPUT_REPO}\")\n",
242
+ " print(f\"📊 {size_gb:.2f} GB\")\n",
243
+ " print(f\"⏱️ {(time.time()-start_time)/60:.0f} min total\")"
 
 
 
244
  ]
245
  },
246
  {
 
249
  "source": [
250
  "## 🔧 Troubleshooting\n",
251
  "\n",
252
+ "**OOM:** Usa runtime con **RAM Alta** (hm)\n",
 
 
253
  "\n",
254
+ "**>2 GB:** Cambia `dynamic_wi8_afp32` → `dynamic_wi4_afp32` en celda 4\n",
 
 
255
  "\n",
256
+ "**Error `External embedder required`:** Ya está solucionado con `externalize_embedder=True`"
 
 
257
  ]
258
  }
259
  ]