blackcloud1199
/

SmolLM2-1.7B-Executorch-Q8DA4W

Model card Files Files and versions

SmolLM2-1.7B-Executorch-Q8DA4W / README.md

blackcloud1199's picture

Upload README.md with huggingface_hub

1a4933f verified 6 months ago

|

History Blame Contribute Delete

1.39 kB

license: apache-2.0
library_name: executorch
tags:
  - android
  - ios
  - on-device
  - pytorch
  - react-native
  - smollm
  - llama
base_model: HuggingFaceTB/SmolLM2-1.7B-Instruct

SmolLM2-1.7B-Executorch-Q8DA4W

This repository contains the smollm2_1_7b_q8da4w.pte model, exported for use with ExecuTorch.

Details

Base Model: HuggingFaceTB/SmolLM2-1.7B-Instruct
Format: .pte (ExecuTorch)
Quantization: Q8DA4W (4-bit linear weights, 8-bit dynamic activations)
Architecture: llama (compatible with Llama export pipeline)
File Size: ~1.7 GB

Features

🚀 Optimized for mobile/edge devices
📱 Compatible with react-native-executorch
💡 SmolLM2 is efficient and fast for resource-constrained environments
🗣️ Instruct-tuned for conversational AI

Usage

This model is ready to be used in mobile applications (iOS/Android) via the ExecuTorch runtime or react-native-executorch.

Download smollm2_1_7b_q8da4w.pte and the tokenizer files (tokenizer.json, vocab.json, merges.txt).
Place them in your app's asset folder.
Load with ExecuTorch runtime.

Notes

SmolLM2 uses byte-level BPE tokenizer (similar to GPT-2), not SentencePiece like Llama.
Tokenizer files are: tokenizer.json, vocab.json, merges.txt