Instructions to use RedSparkie/gemma-4-E2B-it-Uncensored-MAX-litert-lm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LiteRT-LM
How to use RedSparkie/gemma-4-E2B-it-Uncensored-MAX-litert-lm with LiteRT-LM:
# LiteRT-LM runs on various platforms (Android, iOS, Windows, Linux, macOS, IoT, Web/WASM) # and supports many APIs (C++, Python, Kotlin, Swift, JavaScript, Flutter). # For platform-specific integration guides, please refer to the official developer website: # https://ai.google.dev/edge/litert-lm # To try LiteRT-LM, the easiest way is to use our CLI tool. # 1. Install the LiteRT-LM CLI tool: pip install litert-lm # 2. Download and run this model locally: # See: https://ai.google.dev/edge/litert-lm/cli litert-lm run \ --from-huggingface-repo=RedSparkie/gemma-4-E2B-it-Uncensored-MAX-litert-lm \ model.litertlm \ --prompt="Write me a poem"
- Notebooks
- Google Colab
- Kaggle
Final tested Colab notebook with all fixes (correct weight naming, externalize_embedder, etc.)
Browse files- gemma4_to_litertlm.ipynb +118 -270
gemma4_to_litertlm.ipynb
CHANGED
|
@@ -4,7 +4,8 @@
|
|
| 4 |
"metadata": {
|
| 5 |
"colab": {
|
| 6 |
"provenance": [],
|
| 7 |
-
"gpuType": "T4"
|
|
|
|
| 8 |
},
|
| 9 |
"kernelspec": {
|
| 10 |
"name": "python3",
|
|
@@ -22,23 +23,11 @@
|
|
| 22 |
"source": [
|
| 23 |
"# 🚀 Convertir Gemma 4 E2B Uncensored-MAX a LiteRT-LM\n",
|
| 24 |
"\n",
|
| 25 |
-
"
|
| 26 |
"\n",
|
| 27 |
-
"**
|
| 28 |
-
"- Colab con GPU (T4) — el runtime normal funciona, pero si te da OOM usa \"High-RAM\" (Entorno de ejecución → Cambiar tipo de entorno de ejecución → RAM Alta)\n",
|
| 29 |
-
"- Token de HuggingFace con permisos de escritura\n",
|
| 30 |
"\n",
|
| 31 |
-
"**Tiempo estimado:** ~
|
| 32 |
-
]
|
| 33 |
-
},
|
| 34 |
-
{
|
| 35 |
-
"cell_type": "markdown",
|
| 36 |
-
"metadata": {},
|
| 37 |
-
"source": [
|
| 38 |
-
"## 1️⃣ Configura tu token de HuggingFace\n",
|
| 39 |
-
"\n",
|
| 40 |
-
"Necesitas un token con permisos de escritura para subir el modelo. \n",
|
| 41 |
-
"Consíguelo en: https://huggingface.co/settings/tokens"
|
| 42 |
]
|
| 43 |
},
|
| 44 |
{
|
|
@@ -47,24 +36,11 @@
|
|
| 47 |
"metadata": {},
|
| 48 |
"outputs": [],
|
| 49 |
"source": [
|
| 50 |
-
"#
|
| 51 |
-
"HF_TOKEN = \"\" #
|
| 52 |
-
"\n",
|
| 53 |
-
"
|
| 54 |
-
"
|
| 55 |
-
"OUTPUT_REPO = \"RedSparkie/gemma-4-E2B-it-Uncensored-MAX-litert-lm\"\n",
|
| 56 |
-
"\n",
|
| 57 |
-
"# Modelo fuente (los pesos originales en safetensors)\n",
|
| 58 |
-
"SOURCE_MODEL = \"prithivMLmods/gemma-4-E2B-it-Uncensored-MAX\"\n",
|
| 59 |
-
"\n",
|
| 60 |
-
"assert HF_TOKEN, \"❌ ¡Pon tu token de HuggingFace arriba!\""
|
| 61 |
-
]
|
| 62 |
-
},
|
| 63 |
-
{
|
| 64 |
-
"cell_type": "markdown",
|
| 65 |
-
"metadata": {},
|
| 66 |
-
"source": [
|
| 67 |
-
"## 2️⃣ Instalar dependencias"
|
| 68 |
]
|
| 69 |
},
|
| 70 |
{
|
|
@@ -73,19 +49,9 @@
|
|
| 73 |
"metadata": {},
|
| 74 |
"outputs": [],
|
| 75 |
"source": [
|
|
|
|
| 76 |
"!pip install -q litert-torch litert-lm transformers huggingface_hub sentencepiece protobuf safetensors psutil\n",
|
| 77 |
-
"print(\"✅
|
| 78 |
-
]
|
| 79 |
-
},
|
| 80 |
-
{
|
| 81 |
-
"cell_type": "markdown",
|
| 82 |
-
"metadata": {},
|
| 83 |
-
"source": [
|
| 84 |
-
"## 3️⃣ Extraer solo los pesos del decoder de texto\n",
|
| 85 |
-
"\n",
|
| 86 |
-
"El modelo original es multimodal (texto + visión + audio = 9.6 GB). \n",
|
| 87 |
-
"Nosotros solo necesitamos el decoder de texto (~4.8 GB en bf16). \n",
|
| 88 |
-
"Esto ahorra mucha RAM."
|
| 89 |
]
|
| 90 |
},
|
| 91 |
{
|
|
@@ -94,139 +60,106 @@
|
|
| 94 |
"metadata": {},
|
| 95 |
"outputs": [],
|
| 96 |
"source": [
|
| 97 |
-
"
|
|
|
|
| 98 |
"from huggingface_hub import hf_hub_download\n",
|
| 99 |
"from safetensors import safe_open\n",
|
| 100 |
"from safetensors.torch import save_file\n",
|
| 101 |
-
"import
|
| 102 |
-
"import transformers\n",
|
| 103 |
-
"import psutil\n",
|
| 104 |
"\n",
|
| 105 |
-
"def memlog(
|
| 106 |
-
" m
|
| 107 |
-
" print(f\" [{
|
| 108 |
"\n",
|
| 109 |
-
"
|
| 110 |
"OUTPUT_DIR = \"/content/output\"\n",
|
| 111 |
-
"os.makedirs(
|
| 112 |
"os.makedirs(OUTPUT_DIR, exist_ok=True)\n",
|
| 113 |
-
"\n",
|
| 114 |
"start_time = time.time()\n",
|
| 115 |
-
"memlog(\"
|
| 116 |
"\n",
|
| 117 |
-
"# Descargar
|
| 118 |
-
"print(\"📥 Descargando índice
|
| 119 |
-
"
|
| 120 |
-
"with open(
|
| 121 |
" index = json.load(f)\n",
|
| 122 |
"\n",
|
| 123 |
-
"#
|
| 124 |
-
"
|
| 125 |
"for key, shard in index[\"weight_map\"].items():\n",
|
| 126 |
" if key.startswith(\"model.language_model.\"):\n",
|
| 127 |
-
"
|
| 128 |
-
" shard_keys[shard] = []\n",
|
| 129 |
-
" shard_keys[shard].append(key)\n",
|
| 130 |
-
"\n",
|
| 131 |
-
"print(f\" Encontrados {sum(len(v) for v in shard_keys.values())} tensores de texto en {len(shard_keys)} shards\")\n",
|
| 132 |
"\n",
|
| 133 |
-
"
|
| 134 |
-
"
|
| 135 |
-
"shard_idx = 0\n",
|
| 136 |
"\n",
|
| 137 |
-
"
|
| 138 |
-
"
|
| 139 |
-
"
|
|
|
|
|
|
|
|
|
|
| 140 |
" \n",
|
| 141 |
-
"
|
| 142 |
-
"
|
|
|
|
|
|
|
|
|
|
| 143 |
" \n",
|
| 144 |
-
"
|
| 145 |
-
"
|
| 146 |
-
" with safe_open(shard_path, framework=\"pt\") as f:\n",
|
| 147 |
-
" for key in keys_in_shard:\n",
|
| 148 |
-
" new_key = key[len(\"model.language_model.\"):]\n",
|
| 149 |
-
" lm_weights[new_key] = f.get_tensor(key)\n",
|
| 150 |
" \n",
|
| 151 |
-
" #
|
| 152 |
-
"
|
| 153 |
-
"
|
| 154 |
-
"
|
| 155 |
-
"
|
| 156 |
" \n",
|
| 157 |
-
"
|
| 158 |
-
"
|
| 159 |
" \n",
|
| 160 |
" size_mb = os.path.getsize(out_path) / (1024**2)\n",
|
| 161 |
-
" print(f\"
|
| 162 |
-
" \n",
|
| 163 |
-
"
|
| 164 |
-
" gc.collect()\n",
|
| 165 |
-
" memlog(f\"shard {shard_idx}\")\n",
|
| 166 |
-
"\n",
|
| 167 |
-
"# Renombrar shards con total correcto\n",
|
| 168 |
-
"total_shards = shard_idx\n",
|
| 169 |
-
"final_weight_map = {}\n",
|
| 170 |
-
"for i in range(1, total_shards + 1):\n",
|
| 171 |
-
" old_name = f\"model-{i:05d}-of-TEMP.safetensors\"\n",
|
| 172 |
-
" new_name = f\"model-{i:05d}-of-{total_shards:05d}.safetensors\"\n",
|
| 173 |
-
" os.rename(os.path.join(TEXT_MODEL_DIR, old_name), os.path.join(TEXT_MODEL_DIR, new_name))\n",
|
| 174 |
-
" for key, shard in new_weight_map.items():\n",
|
| 175 |
-
" if shard == old_name:\n",
|
| 176 |
-
" final_weight_map[key] = new_name\n",
|
| 177 |
"\n",
|
| 178 |
"# Escribir índice\n",
|
| 179 |
-
"with open(os.path.join(
|
| 180 |
-
" json.dump({\"metadata\": {}, \"weight_map\":
|
| 181 |
"\n",
|
| 182 |
-
"# Config:
|
|
|
|
| 183 |
"config = transformers.AutoConfig.from_pretrained(SOURCE_MODEL, token=HF_TOKEN)\n",
|
| 184 |
-
"
|
| 185 |
-
"
|
| 186 |
-
"
|
| 187 |
-
"
|
| 188 |
-
"
|
| 189 |
-
"
|
| 190 |
-
"
|
| 191 |
-
"\n",
|
| 192 |
-
"
|
|
|
|
|
|
|
| 193 |
"for fn in [\"tokenizer.json\", \"tokenizer_config.json\", \"chat_template.jinja\", \"generation_config.json\"]:\n",
|
| 194 |
" try:\n",
|
| 195 |
" src = hf_hub_download(SOURCE_MODEL, fn, token=HF_TOKEN)\n",
|
| 196 |
-
" shutil.copy(src, os.path.join(
|
| 197 |
-
" print(f\"
|
| 198 |
-
" except:\n",
|
| 199 |
-
"
|
| 200 |
-
"\n",
|
| 201 |
-
"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 202 |
"gc.collect()\n",
|
| 203 |
"\n",
|
| 204 |
-
"
|
| 205 |
-
"
|
| 206 |
-
"total_size = 0\n",
|
| 207 |
-
"for f in sorted(os.listdir(TEXT_MODEL_DIR)):\n",
|
| 208 |
-
" fp = os.path.join(TEXT_MODEL_DIR, f)\n",
|
| 209 |
-
" if os.path.isfile(fp):\n",
|
| 210 |
-
" s = os.path.getsize(fp)\n",
|
| 211 |
-
" total_size += s\n",
|
| 212 |
-
" print(f\" {f}: {s/(1024**2):.1f} MB\")\n",
|
| 213 |
-
"print(f\" Total: {total_size/(1024**3):.2f} GB\")\n",
|
| 214 |
-
"print(f\" Tiempo: {(time.time()-start_time)/60:.1f} min\")"
|
| 215 |
-
]
|
| 216 |
-
},
|
| 217 |
-
{
|
| 218 |
-
"cell_type": "markdown",
|
| 219 |
-
"metadata": {},
|
| 220 |
-
"source": [
|
| 221 |
-
"## 4️⃣ Convertir a LiteRT-LM (.litertlm)\n",
|
| 222 |
-
"\n",
|
| 223 |
-
"Aquí es donde ocurre la magia. El pipeline de `litert-torch` hace:\n",
|
| 224 |
-
"1. Cargar el modelo en float32\n",
|
| 225 |
-
"2. Exportar a TFLite via torch.export\n",
|
| 226 |
-
"3. Cuantizar a INT8 (dynamic_wi8_afp32)\n",
|
| 227 |
-
"4. Empaquetar como `.litertlm`\n",
|
| 228 |
-
"\n",
|
| 229 |
-
"⚠️ **Si te da error de memoria**, ve a: Entorno de ejecución → Cambiar tipo → **RAM Alta**"
|
| 230 |
]
|
| 231 |
},
|
| 232 |
{
|
|
@@ -235,99 +168,33 @@
|
|
| 235 |
"metadata": {},
|
| 236 |
"outputs": [],
|
| 237 |
"source": [
|
| 238 |
-
"
|
| 239 |
-
"
|
| 240 |
-
"from litert_torch.generative.export_hf.core import exportable_module_config\n",
|
| 241 |
-
"from litert_torch.generative.export_hf.core import utils\n",
|
| 242 |
-
"from litert_torch import progress\n",
|
| 243 |
-
"import huggingface_hub as hfhub\n",
|
| 244 |
-
"\n",
|
| 245 |
-
"ExportTask = exportable_module_config.ExportTask\n",
|
| 246 |
-
"\n",
|
| 247 |
-
"# Monkey-patch load_model para cargar Gemma4ForCausalLM desde nuestro dir\n",
|
| 248 |
-
"@progress.task('Load source model')\n",
|
| 249 |
-
"def patched_load_model(model_path, trust_remote_code=False, auto_model_override=None, task=ExportTask.TEXT_GENERATION):\n",
|
| 250 |
-
" print(\" 🔧 Cargando Gemma4ForCausalLM (solo texto)...\")\n",
|
| 251 |
-
" \n",
|
| 252 |
-
" config = transformers.AutoConfig.from_pretrained(model_path, trust_remote_code=trust_remote_code)\n",
|
| 253 |
-
" config.model_type = \"gemma4\" # Para que el pipeline reconozca la arquitectura\n",
|
| 254 |
-
" config._attn_implementation = 'lrt_transposed_attention'\n",
|
| 255 |
-
" \n",
|
| 256 |
-
" from transformers import Gemma4ForCausalLM\n",
|
| 257 |
-
" with mpatches.get_patch_context(\"gemma4\"):\n",
|
| 258 |
-
" model = Gemma4ForCausalLM.from_pretrained(\n",
|
| 259 |
-
" model_path, config=config, torch_dtype=torch.float32,\n",
|
| 260 |
-
" trust_remote_code=trust_remote_code, low_cpu_mem_usage=True,\n",
|
| 261 |
-
" attn_implementation='eager',\n",
|
| 262 |
-
" )\n",
|
| 263 |
-
" \n",
|
| 264 |
-
" memlog(\"modelo cargado\")\n",
|
| 265 |
-
" \n",
|
| 266 |
-
" model.generation_config.cache_implementation = 'static'\n",
|
| 267 |
-
" model.generation_config.do_sample = False\n",
|
| 268 |
-
" \n",
|
| 269 |
-
" # El pipeline espera un config con text_config\n",
|
| 270 |
-
" class FullConfig:\n",
|
| 271 |
-
" def __init__(self, tc):\n",
|
| 272 |
-
" self.model_type = \"gemma4\"\n",
|
| 273 |
-
" self.text_config = tc\n",
|
| 274 |
-
" self.eos_token_id = getattr(tc, 'eos_token_id', [1, 106])\n",
|
| 275 |
-
" self.tie_word_embeddings = getattr(tc, 'tie_word_embeddings', True)\n",
|
| 276 |
-
" \n",
|
| 277 |
-
" full_config = FullConfig(config)\n",
|
| 278 |
-
" model.config = full_config\n",
|
| 279 |
-
" \n",
|
| 280 |
-
" tokenizer = transformers.AutoTokenizer.from_pretrained(model_path)\n",
|
| 281 |
-
" \n",
|
| 282 |
-
" # Cargar chat template\n",
|
| 283 |
-
" if not getattr(tokenizer, 'chat_template', None):\n",
|
| 284 |
-
" jinja_path = os.path.join(model_path, 'chat_template.jinja')\n",
|
| 285 |
-
" if os.path.exists(jinja_path):\n",
|
| 286 |
-
" with open(jinja_path) as f:\n",
|
| 287 |
-
" tokenizer.chat_template = f.read()\n",
|
| 288 |
-
" \n",
|
| 289 |
-
" return elib.SourceModelArtifacts(\n",
|
| 290 |
-
" model=model, model_config=full_config,\n",
|
| 291 |
-
" text_model_config=config, tokenizer=tokenizer, image_processor=None,\n",
|
| 292 |
-
" )\n",
|
| 293 |
-
"\n",
|
| 294 |
-
"# Aplicar el patch\n",
|
| 295 |
-
"elib.load_model = patched_load_model\n",
|
| 296 |
-
"\n",
|
| 297 |
-
"# Ejecutar la conversión\n",
|
| 298 |
"from litert_torch.generative.export_hf import export as export_lib\n",
|
| 299 |
"\n",
|
| 300 |
-
"print(\"🚀
|
| 301 |
-
"print(
|
| 302 |
-
"
|
| 303 |
-
"print(f\" Cuantización: dynamic_wi8_afp32 (INT8)\")\n",
|
| 304 |
-
"print(f\" Cache: 4096 tokens\")\n",
|
| 305 |
-
"print()\n",
|
| 306 |
"\n",
|
| 307 |
"conversion_start = time.time()\n",
|
| 308 |
"\n",
|
| 309 |
"export_lib.export(\n",
|
| 310 |
-
" model=
|
| 311 |
" output_dir=OUTPUT_DIR,\n",
|
| 312 |
" task=\"text_generation\",\n",
|
| 313 |
" bundle_litert_lm=True,\n",
|
| 314 |
-
" quantization_recipe=\"dynamic_wi8_afp32\",\n",
|
| 315 |
" cache_length=4096,\n",
|
| 316 |
" prefill_lengths=[256],\n",
|
| 317 |
" use_jinja_template=True,\n",
|
| 318 |
" keep_temporary_files=True,\n",
|
| 319 |
" trust_remote_code=False,\n",
|
| 320 |
" experimental_lightweight_conversion=True,\n",
|
|
|
|
| 321 |
")\n",
|
| 322 |
"\n",
|
| 323 |
-
"print(f\"\\n✅ Conversión
|
| 324 |
-
|
| 325 |
-
},
|
| 326 |
-
{
|
| 327 |
-
"cell_type": "markdown",
|
| 328 |
-
"metadata": {},
|
| 329 |
-
"source": [
|
| 330 |
-
"## 5️⃣ Verificar y subir a HuggingFace"
|
| 331 |
]
|
| 332 |
},
|
| 333 |
{
|
|
@@ -336,57 +203,44 @@
|
|
| 336 |
"metadata": {},
|
| 337 |
"outputs": [],
|
| 338 |
"source": [
|
| 339 |
-
"
|
| 340 |
-
"\n",
|
| 341 |
-
"
|
| 342 |
-
"
|
| 343 |
-
"
|
| 344 |
-
"
|
| 345 |
-
"
|
| 346 |
-
"
|
|
|
|
| 347 |
"else:\n",
|
| 348 |
-
"
|
| 349 |
-
"
|
| 350 |
-
" print(f\"📊 model.litertlm: {size_gb:.2f} GB ({size_bytes:,} bytes)\")\n",
|
| 351 |
" if size_gb <= 2.0:\n",
|
| 352 |
-
" print(
|
| 353 |
" else:\n",
|
| 354 |
-
" print(f\"⚠️
|
| 355 |
" \n",
|
| 356 |
" print(f\"\\n📤 Subiendo a {OUTPUT_REPO}...\")\n",
|
| 357 |
" from huggingface_hub import HfApi\n",
|
| 358 |
" api = HfApi(token=HF_TOKEN)\n",
|
|
|
|
|
|
|
| 359 |
" \n",
|
| 360 |
-
" # Crear repo si no existe\n",
|
| 361 |
-
" try:\n",
|
| 362 |
-
" api.create_repo(OUTPUT_REPO, exist_ok=True)\n",
|
| 363 |
-
" except:\n",
|
| 364 |
-
" pass\n",
|
| 365 |
-
" \n",
|
| 366 |
-
" # Subir modelo\n",
|
| 367 |
" api.upload_file(\n",
|
| 368 |
-
" path_or_fileobj=
|
| 369 |
" path_in_repo=\"gemma-4-E2B-it-Uncensored-MAX.litertlm\",\n",
|
| 370 |
" repo_id=OUTPUT_REPO,\n",
|
| 371 |
" commit_message=\"Add LiteRT-LM model\",\n",
|
| 372 |
" )\n",
|
| 373 |
" \n",
|
| 374 |
-
" #
|
| 375 |
-
"
|
| 376 |
-
"
|
| 377 |
-
" path_or_fileobj=readme.encode(),\n",
|
| 378 |
-
" path_in_repo=\"README.md\",\n",
|
| 379 |
-
" repo_id=OUTPUT_REPO,\n",
|
| 380 |
-
" commit_message=\"Add README\",\n",
|
| 381 |
-
" )\n",
|
| 382 |
" \n",
|
| 383 |
-
"
|
| 384 |
-
" print(f\"
|
| 385 |
-
" print(f\"
|
| 386 |
-
" print(f\"
|
| 387 |
-
" print(f\"📊 Tamaño: {size_gb:.2f} GB\")\n",
|
| 388 |
-
" print(f\"⏱️ Tiempo total: {total_time:.0f} minutos\")\n",
|
| 389 |
-
" print(f\"{'='*50}\")"
|
| 390 |
]
|
| 391 |
},
|
| 392 |
{
|
|
@@ -395,17 +249,11 @@
|
|
| 395 |
"source": [
|
| 396 |
"## 🔧 Troubleshooting\n",
|
| 397 |
"\n",
|
| 398 |
-
"**
|
| 399 |
-
"- Ve a **Entorno de ejecución → Cambiar tipo de entorno** → Activa **RAM Alta**\n",
|
| 400 |
-
"- Si aún falla, reinicia el runtime y ejecuta todas las celdas de nuevo\n",
|
| 401 |
"\n",
|
| 402 |
-
"**
|
| 403 |
-
"- Esto es normal si hay incompatibilidad entre versiones de `transformers` y `litert-torch`\n",
|
| 404 |
-
"- Intenta: `!pip install transformers==5.7.0`\n",
|
| 405 |
"\n",
|
| 406 |
-
"**
|
| 407 |
-
"- Cambia `quantization_recipe` a `\"dynamic_wi4_afp32\"` (INT4) en la celda 4\n",
|
| 408 |
-
"- Esto reducirá el tamaño a la mitad pero con algo menos de calidad"
|
| 409 |
]
|
| 410 |
}
|
| 411 |
]
|
|
|
|
| 4 |
"metadata": {
|
| 5 |
"colab": {
|
| 6 |
"provenance": [],
|
| 7 |
+
"gpuType": "T4",
|
| 8 |
+
"machine_shape": "hm"
|
| 9 |
},
|
| 10 |
"kernelspec": {
|
| 11 |
"name": "python3",
|
|
|
|
| 23 |
"source": [
|
| 24 |
"# 🚀 Convertir Gemma 4 E2B Uncensored-MAX a LiteRT-LM\n",
|
| 25 |
"\n",
|
| 26 |
+
"Convierte el modelo a formato `.litertlm` para **Google AI Edge Gallery** en Android.\n",
|
| 27 |
"\n",
|
| 28 |
+
"**⚠️ IMPORTANTE:** Usa runtime con **GPU + RAM Alta**: Entorno de ejecución → Cambiar tipo → T4 + RAM Alta (hm)\n",
|
|
|
|
|
|
|
| 29 |
"\n",
|
| 30 |
+
"**Tiempo estimado:** ~30-45 minutos"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
]
|
| 32 |
},
|
| 33 |
{
|
|
|
|
| 36 |
"metadata": {},
|
| 37 |
"outputs": [],
|
| 38 |
"source": [
|
| 39 |
+
"#@title 1️⃣ Configuración\n",
|
| 40 |
+
"HF_TOKEN = \"\" #@param {type:\"string\"}\n",
|
| 41 |
+
"OUTPUT_REPO = \"RedSparkie/gemma-4-E2B-it-Uncensored-MAX-litert-lm\" #@param {type:\"string\"}\n",
|
| 42 |
+
"SOURCE_MODEL = \"prithivMLmods/gemma-4-E2B-it-Uncensored-MAX\" #@param {type:\"string\"}\n",
|
| 43 |
+
"assert HF_TOKEN, \"❌ ¡Pon tu token de HuggingFace!\""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 44 |
]
|
| 45 |
},
|
| 46 |
{
|
|
|
|
| 49 |
"metadata": {},
|
| 50 |
"outputs": [],
|
| 51 |
"source": [
|
| 52 |
+
"#@title 2️⃣ Instalar dependencias\n",
|
| 53 |
"!pip install -q litert-torch litert-lm transformers huggingface_hub sentencepiece protobuf safetensors psutil\n",
|
| 54 |
+
"print(\"✅ Instalado\")"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
]
|
| 56 |
},
|
| 57 |
{
|
|
|
|
| 60 |
"metadata": {},
|
| 61 |
"outputs": [],
|
| 62 |
"source": [
|
| 63 |
+
"#@title 3️⃣ Preparar modelo (extraer solo texto, sin visión/audio)\n",
|
| 64 |
+
"import os, sys, json, gc, shutil, time\n",
|
| 65 |
"from huggingface_hub import hf_hub_download\n",
|
| 66 |
"from safetensors import safe_open\n",
|
| 67 |
"from safetensors.torch import save_file\n",
|
| 68 |
+
"import transformers, psutil\n",
|
|
|
|
|
|
|
| 69 |
"\n",
|
| 70 |
+
"def memlog(l=\"\"):\n",
|
| 71 |
+
" m=psutil.virtual_memory()\n",
|
| 72 |
+
" print(f\" [{l}] RAM: {m.available/(1024**3):.1f}/{m.total/(1024**3):.1f} GB\")\n",
|
| 73 |
"\n",
|
| 74 |
+
"MODEL_DIR = \"/content/model\"\n",
|
| 75 |
"OUTPUT_DIR = \"/content/output\"\n",
|
| 76 |
+
"os.makedirs(MODEL_DIR, exist_ok=True)\n",
|
| 77 |
"os.makedirs(OUTPUT_DIR, exist_ok=True)\n",
|
|
|
|
| 78 |
"start_time = time.time()\n",
|
| 79 |
+
"memlog(\"inicio\")\n",
|
| 80 |
"\n",
|
| 81 |
+
"# Descargar índice\n",
|
| 82 |
+
"print(\"📥 Descargando índice...\")\n",
|
| 83 |
+
"idx_path = hf_hub_download(SOURCE_MODEL, \"model.safetensors.index.json\", token=HF_TOKEN)\n",
|
| 84 |
+
"with open(idx_path) as f:\n",
|
| 85 |
" index = json.load(f)\n",
|
| 86 |
"\n",
|
| 87 |
+
"# Agrupar pesos del language model por shard\n",
|
| 88 |
+
"shard_lm = {}\n",
|
| 89 |
"for key, shard in index[\"weight_map\"].items():\n",
|
| 90 |
" if key.startswith(\"model.language_model.\"):\n",
|
| 91 |
+
" shard_lm.setdefault(shard, []).append(key)\n",
|
|
|
|
|
|
|
|
|
|
|
|
|
| 92 |
"\n",
|
| 93 |
+
"total_shards = len(shard_lm)\n",
|
| 94 |
+
"print(f\" {sum(len(v) for v in shard_lm.values())} tensores en {total_shards} shards\")\n",
|
|
|
|
| 95 |
"\n",
|
| 96 |
+
"# Extraer shard por shard (MANTENER el prefijo model.language_model.)\n",
|
| 97 |
+
"weight_map = {}\n",
|
| 98 |
+
"for i, sn in enumerate(sorted(shard_lm)):\n",
|
| 99 |
+
" keys = shard_lm[sn]\n",
|
| 100 |
+
" out_name = f\"model-{i+1:05d}-of-{total_shards:05d}.safetensors\"\n",
|
| 101 |
+
" out_path = os.path.join(MODEL_DIR, out_name)\n",
|
| 102 |
" \n",
|
| 103 |
+
" if os.path.exists(out_path) and os.path.getsize(out_path) > 100:\n",
|
| 104 |
+
" print(f\" {out_name} ya existe, skip\")\n",
|
| 105 |
+
" with safe_open(out_path, framework=\"pt\") as f:\n",
|
| 106 |
+
" for k in f.keys(): weight_map[k] = out_name\n",
|
| 107 |
+
" continue\n",
|
| 108 |
" \n",
|
| 109 |
+
" print(f\" 📦 {sn} → {out_name} ({len(keys)} tensores)\")\n",
|
| 110 |
+
" shard_path = hf_hub_download(SOURCE_MODEL, sn, token=HF_TOKEN)\n",
|
|
|
|
|
|
|
|
|
|
|
|
|
| 111 |
" \n",
|
| 112 |
+
" # Extraer tensores MANTENIENDO el prefijo original\n",
|
| 113 |
+
" tensors = {}\n",
|
| 114 |
+
" with safe_open(shard_path, framework=\"pt\") as f:\n",
|
| 115 |
+
" for key in keys:\n",
|
| 116 |
+
" tensors[key] = f.get_tensor(key)\n",
|
| 117 |
" \n",
|
| 118 |
+
" save_file(tensors, out_path)\n",
|
| 119 |
+
" for k in tensors: weight_map[k] = out_name\n",
|
| 120 |
" \n",
|
| 121 |
" size_mb = os.path.getsize(out_path) / (1024**2)\n",
|
| 122 |
+
" print(f\" 💾 {size_mb:.0f} MB\")\n",
|
| 123 |
+
" del tensors; gc.collect()\n",
|
| 124 |
+
" memlog(f\"shard {i+1}\")\n",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 125 |
"\n",
|
| 126 |
"# Escribir índice\n",
|
| 127 |
+
"with open(os.path.join(MODEL_DIR, \"model.safetensors.index.json\"), \"w\") as f:\n",
|
| 128 |
+
" json.dump({\"metadata\": {}, \"weight_map\": weight_map}, f)\n",
|
| 129 |
"\n",
|
| 130 |
+
"# Config: Gemma4 con vision=None, audio=None\n",
|
| 131 |
+
"print(\"\\n📝 Creando config...\")\n",
|
| 132 |
"config = transformers.AutoConfig.from_pretrained(SOURCE_MODEL, token=HF_TOKEN)\n",
|
| 133 |
+
"cd = config.to_dict()\n",
|
| 134 |
+
"cd[\"vision_config\"] = None\n",
|
| 135 |
+
"cd[\"audio_config\"] = None\n",
|
| 136 |
+
"for k in [\"vision_soft_tokens_per_image\", \"image_token_id\", \"boi_token_id\",\n",
|
| 137 |
+
" \"eoi_token_id\", \"audio_token_id\", \"boa_token_id\", \"eoa_token_id\",\n",
|
| 138 |
+
" \"eoa_token_index\", \"video_token_id\"]:\n",
|
| 139 |
+
" cd.pop(k, None)\n",
|
| 140 |
+
"with open(os.path.join(MODEL_DIR, \"config.json\"), \"w\") as f:\n",
|
| 141 |
+
" json.dump(cd, f, indent=2)\n",
|
| 142 |
+
"\n",
|
| 143 |
+
"# Tokenizer y archivos extra\n",
|
| 144 |
"for fn in [\"tokenizer.json\", \"tokenizer_config.json\", \"chat_template.jinja\", \"generation_config.json\"]:\n",
|
| 145 |
" try:\n",
|
| 146 |
" src = hf_hub_download(SOURCE_MODEL, fn, token=HF_TOKEN)\n",
|
| 147 |
+
" shutil.copy(src, os.path.join(MODEL_DIR, fn))\n",
|
| 148 |
+
" print(f\" ✓ {fn}\")\n",
|
| 149 |
+
" except: pass\n",
|
| 150 |
+
"\n",
|
| 151 |
+
"del config; gc.collect()\n",
|
| 152 |
+
"\n",
|
| 153 |
+
"# Limpiar caché HF\n",
|
| 154 |
+
"cache_dir = os.path.expanduser(\"~/.cache/huggingface/hub\")\n",
|
| 155 |
+
"if os.path.exists(cache_dir):\n",
|
| 156 |
+
" for d in os.listdir(cache_dir):\n",
|
| 157 |
+
" if d.startswith(\"models--\"):\n",
|
| 158 |
+
" shutil.rmtree(os.path.join(cache_dir, d), ignore_errors=True)\n",
|
| 159 |
"gc.collect()\n",
|
| 160 |
"\n",
|
| 161 |
+
"print(f\"\\n✅ Modelo preparado\")\n",
|
| 162 |
+
"memlog(\"listo\")"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 163 |
]
|
| 164 |
},
|
| 165 |
{
|
|
|
|
| 168 |
"metadata": {},
|
| 169 |
"outputs": [],
|
| 170 |
"source": [
|
| 171 |
+
"#@title 4️⃣ Convertir a .litertlm\n",
|
| 172 |
+
"import torch\n",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 173 |
"from litert_torch.generative.export_hf import export as export_lib\n",
|
| 174 |
"\n",
|
| 175 |
+
"print(\"🚀 Convirtiendo a LiteRT-LM...\")\n",
|
| 176 |
+
"print(\" Esto tarda 15-30 min. Paciencia.\")\n",
|
| 177 |
+
"memlog(\"pre-export\")\n",
|
|
|
|
|
|
|
|
|
|
| 178 |
"\n",
|
| 179 |
"conversion_start = time.time()\n",
|
| 180 |
"\n",
|
| 181 |
"export_lib.export(\n",
|
| 182 |
+
" model=MODEL_DIR,\n",
|
| 183 |
" output_dir=OUTPUT_DIR,\n",
|
| 184 |
" task=\"text_generation\",\n",
|
| 185 |
" bundle_litert_lm=True,\n",
|
| 186 |
+
" quantization_recipe=\"dynamic_wi8_afp32\", # INT8 (como el oficial)\n",
|
| 187 |
" cache_length=4096,\n",
|
| 188 |
" prefill_lengths=[256],\n",
|
| 189 |
" use_jinja_template=True,\n",
|
| 190 |
" keep_temporary_files=True,\n",
|
| 191 |
" trust_remote_code=False,\n",
|
| 192 |
" experimental_lightweight_conversion=True,\n",
|
| 193 |
+
" externalize_embedder=True, # Requerido para Gemma4\n",
|
| 194 |
")\n",
|
| 195 |
"\n",
|
| 196 |
+
"print(f\"\\n✅ Conversión en {(time.time()-conversion_start)/60:.1f} min\")\n",
|
| 197 |
+
"memlog(\"post-export\")"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 198 |
]
|
| 199 |
},
|
| 200 |
{
|
|
|
|
| 203 |
"metadata": {},
|
| 204 |
"outputs": [],
|
| 205 |
"source": [
|
| 206 |
+
"#@title 5️⃣ Verificar y subir\n",
|
| 207 |
+
"litertlm = os.path.join(OUTPUT_DIR, \"model.litertlm\")\n",
|
| 208 |
+
"\n",
|
| 209 |
+
"if not os.path.exists(litertlm):\n",
|
| 210 |
+
" print(\"❌ model.litertlm no encontrado. Archivos:\")\n",
|
| 211 |
+
" for r,d,fs in os.walk(OUTPUT_DIR):\n",
|
| 212 |
+
" for f in fs:\n",
|
| 213 |
+
" fp = os.path.join(r,f)\n",
|
| 214 |
+
" print(f\" {os.path.relpath(fp,OUTPUT_DIR)}: {os.path.getsize(fp)/(1024**2):.1f} MB\")\n",
|
| 215 |
"else:\n",
|
| 216 |
+
" size_gb = os.path.getsize(litertlm) / (1024**3)\n",
|
| 217 |
+
" print(f\"📊 model.litertlm: {size_gb:.2f} GB\")\n",
|
|
|
|
| 218 |
" if size_gb <= 2.0:\n",
|
| 219 |
+
" print(\"✅ ¡Cabe en 2 GB!\")\n",
|
| 220 |
" else:\n",
|
| 221 |
+
" print(f\"⚠️ {size_gb:.2f} GB — Si necesitas menos, cambia a dynamic_wi4_afp32 en celda 4\")\n",
|
| 222 |
" \n",
|
| 223 |
" print(f\"\\n📤 Subiendo a {OUTPUT_REPO}...\")\n",
|
| 224 |
" from huggingface_hub import HfApi\n",
|
| 225 |
" api = HfApi(token=HF_TOKEN)\n",
|
| 226 |
+
" try: api.create_repo(OUTPUT_REPO, exist_ok=True)\n",
|
| 227 |
+
" except: pass\n",
|
| 228 |
" \n",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 229 |
" api.upload_file(\n",
|
| 230 |
+
" path_or_fileobj=litertlm,\n",
|
| 231 |
" path_in_repo=\"gemma-4-E2B-it-Uncensored-MAX.litertlm\",\n",
|
| 232 |
" repo_id=OUTPUT_REPO,\n",
|
| 233 |
" commit_message=\"Add LiteRT-LM model\",\n",
|
| 234 |
" )\n",
|
| 235 |
" \n",
|
| 236 |
+
" readme = f\"\"\"---\\nlicense: apache-2.0\\nbase_model:\\n- {SOURCE_MODEL}\\ntags:\\n - litert-lm\\n - uncensored\\n - edge-gallery\\nlanguage:\\n- en\\n---\\n\\n# gemma-4-E2B-it-Uncensored-MAX (LiteRT-LM)\\n\\nLiteRT-LM conversion for **Google AI Edge Gallery**.\\n\\n| | |\\n|---|---|\\n| **Base** | [{SOURCE_MODEL}](https://huggingface.co/{SOURCE_MODEL}) |\\n| **Format** | `.litertlm` |\\n| **Quant** | INT8 |\\n| **Context** | 4096 |\\n| **Size** | {size_gb:.2f} GB |\\n\\n## Usage\\n1. Install [Edge Gallery](https://play.google.com/store/apps/details?id=com.google.ai.edge.gallery)\\n2. Add model via HF URL\\n3. Chat!\\n\\n⚠️ Uncensored. Use responsibly.\\n\"\"\"\n",
|
| 237 |
+
" api.upload_file(path_or_fileobj=readme.encode(), path_in_repo=\"README.md\",\n",
|
| 238 |
+
" repo_id=OUTPUT_REPO, commit_message=\"README\")\n",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 239 |
" \n",
|
| 240 |
+
" print(f\"\\n🎉 ¡LISTO!\")\n",
|
| 241 |
+
" print(f\"📱 https://huggingface.co/{OUTPUT_REPO}\")\n",
|
| 242 |
+
" print(f\"📊 {size_gb:.2f} GB\")\n",
|
| 243 |
+
" print(f\"⏱️ {(time.time()-start_time)/60:.0f} min total\")"
|
|
|
|
|
|
|
|
|
|
| 244 |
]
|
| 245 |
},
|
| 246 |
{
|
|
|
|
| 249 |
"source": [
|
| 250 |
"## 🔧 Troubleshooting\n",
|
| 251 |
"\n",
|
| 252 |
+
"**OOM:** Usa runtime con **RAM Alta** (hm)\n",
|
|
|
|
|
|
|
| 253 |
"\n",
|
| 254 |
+
"**>2 GB:** Cambia `dynamic_wi8_afp32` → `dynamic_wi4_afp32` en celda 4\n",
|
|
|
|
|
|
|
| 255 |
"\n",
|
| 256 |
+
"**Error `External embedder required`:** Ya está solucionado con `externalize_embedder=True`"
|
|
|
|
|
|
|
| 257 |
]
|
| 258 |
}
|
| 259 |
]
|