Buckets:
| library_name: lf4 | |
| tags: | |
| - lf4 | |
| - static-embedding | |
| - 4-bit | |
| - quantized | |
| - sentence-similarity | |
| - code-search | |
| - tool-search | |
| - sentence-transformers | |
| - embedding | |
| language: en | |
| license: mit | |
| pipeline_tag: sentence-similarity | |
| # VTXAI/Vortex-Embed-4.7M | |
| **Native 4-bit quantized** static sentence embedding model. | |
| Generates 256-dimensional sentence embeddings via mean-pooling of a learned 4-bit quantized embedding table. | |
| Weighs only **4.7 MB** on disk — no transformers, no torch, no GPU needed. | |
| ## Model Size | |
| | Format | Size | Compression | | |
| |--------|------|-------------| | |
| | FP32 (original) | 28.8 MB | 1.0× | | |
| | **LF4 (this model)** | **4.7 MB** | **6.4×** | | |
| ## Architecture | |
| Learned static embedding table with 4-bit per-block quantization (LF4): | |
| ``` | |
| LF4StaticEmbedding( | |
| vocab=29528, dim=256, bits=4, | |
| block_size=32, size=4.7MB | |
| ) | |
| ``` | |
| Encoding: `tokenize → lookup dequantized embeddings → mean pool → L2 normalize` | |
| Weights stored as: | |
| - `embedding_packed`: uint8 (29528 × 128) — 4-bit packed, 2 values/byte | |
| - `embedding_scales`: float16 (29528 × 8) — per-block scale | |
| - `embedding_zeros`: float16 (29528 × 8) — per-block zero-point | |
| ## Usage | |
| ### Python inference (lightweight, no torch) | |
| ```python | |
| from lf4_model import LF4StaticEmbedding | |
| model = LF4StaticEmbedding.from_pretrained("VTXAI/Vortex-Embed-4.7M") | |
| print(model) # LF4StaticEmbedding(vocab=29528, dim=256, bits=4, size=4.7MB) | |
| # Encode sentences to 256-dim vectors | |
| embeddings = model.encode(["search the web for news", "read file contents"]) | |
| # Cosine similarity search | |
| scores, indices = model.search(query_emb, doc_emb, top_k=10) | |
| ``` | |
| ### With sentence-transformers (torch) | |
| ```python | |
| from sentence_transformers import SentenceTransformer | |
| model = SentenceTransformer("VTXAI/Vortex-Embed-4.7M", backend="static") | |
| embeddings = model.encode(["search the web for news", "read file contents"]) | |
| ``` | |
| ## Quality | |
| - **Cosine preservation vs FP32**: 0.9969 | |
| - **MSE**: 0.256990 | |
| - **Tool search accuracy**: 100% (15/15, benchmarks) | |
| - **Codebase indexing**: 12.5s index, 14.6ms P50 search (JARVIS codebase, 2707 chunks) | |
| - Trained on: CornStack (Python/JS/Java) + Glaive function-calling | |
| - Base: **VTXAI/Vortex-Embed** → fine-tuned → LF4 quantized | |
| ## Why Static Embedding? | |
| | Feature | Static (this) | Transformer (BERT) | | |
| |---|---|---| | |
| | Inference speed | **0.15ms** | ~50ms | | |
| | Load time | **144ms** | ~5s | | |
| | Disk size | **4.7 MB** | ~400 MB | | |
| | GPU needed | **No** | Recommended | | |
| | Accuracy | Comparable* | Higher for complex semantics | | |
| \* For domain-specific tasks (code search, tool retrieval) the gap narrows significantly. | |
| ## No Dependencies Beyond NumPy | |
| ```bash | |
| pip install numpy safetensors tokenizers | |
| ``` | |
| The model loads and runs with just `numpy`, `safetensors`, and HuggingFace `tokenizers`. | |
| No PyTorch, no transformers, no sentence-transformers required for basic inference. | |
Xet Storage Details
- Size:
- 2.88 kB
- Xet hash:
- cb86272b70138e95b55b8311e66be7c621b86bd216f24ed027714e41896adcad
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.