bbkdevops's picture
|
download
raw
2.88 kB
---
library_name: lf4
tags:
- lf4
- static-embedding
- 4-bit
- quantized
- sentence-similarity
- code-search
- tool-search
- sentence-transformers
- embedding
language: en
license: mit
pipeline_tag: sentence-similarity
---
# VTXAI/Vortex-Embed-4.7M
**Native 4-bit quantized** static sentence embedding model.
Generates 256-dimensional sentence embeddings via mean-pooling of a learned 4-bit quantized embedding table.
Weighs only **4.7 MB** on disk — no transformers, no torch, no GPU needed.
## Model Size
| Format | Size | Compression |
|--------|------|-------------|
| FP32 (original) | 28.8 MB | 1.0× |
| **LF4 (this model)** | **4.7 MB** | **6.4×** |
## Architecture
Learned static embedding table with 4-bit per-block quantization (LF4):
```
LF4StaticEmbedding(
vocab=29528, dim=256, bits=4,
block_size=32, size=4.7MB
)
```
Encoding: `tokenize → lookup dequantized embeddings → mean pool → L2 normalize`
Weights stored as:
- `embedding_packed`: uint8 (29528 × 128) — 4-bit packed, 2 values/byte
- `embedding_scales`: float16 (29528 × 8) — per-block scale
- `embedding_zeros`: float16 (29528 × 8) — per-block zero-point
## Usage
### Python inference (lightweight, no torch)
```python
from lf4_model import LF4StaticEmbedding
model = LF4StaticEmbedding.from_pretrained("VTXAI/Vortex-Embed-4.7M")
print(model) # LF4StaticEmbedding(vocab=29528, dim=256, bits=4, size=4.7MB)
# Encode sentences to 256-dim vectors
embeddings = model.encode(["search the web for news", "read file contents"])
# Cosine similarity search
scores, indices = model.search(query_emb, doc_emb, top_k=10)
```
### With sentence-transformers (torch)
```python
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("VTXAI/Vortex-Embed-4.7M", backend="static")
embeddings = model.encode(["search the web for news", "read file contents"])
```
## Quality
- **Cosine preservation vs FP32**: 0.9969
- **MSE**: 0.256990
- **Tool search accuracy**: 100% (15/15, benchmarks)
- **Codebase indexing**: 12.5s index, 14.6ms P50 search (JARVIS codebase, 2707 chunks)
- Trained on: CornStack (Python/JS/Java) + Glaive function-calling
- Base: **VTXAI/Vortex-Embed** → fine-tuned → LF4 quantized
## Why Static Embedding?
| Feature | Static (this) | Transformer (BERT) |
|---|---|---|
| Inference speed | **0.15ms** | ~50ms |
| Load time | **144ms** | ~5s |
| Disk size | **4.7 MB** | ~400 MB |
| GPU needed | **No** | Recommended |
| Accuracy | Comparable* | Higher for complex semantics |
\* For domain-specific tasks (code search, tool retrieval) the gap narrows significantly.
## No Dependencies Beyond NumPy
```bash
pip install numpy safetensors tokenizers
```
The model loads and runs with just `numpy`, `safetensors`, and HuggingFace `tokenizers`.
No PyTorch, no transformers, no sentence-transformers required for basic inference.

Xet Storage Details

Size:
2.88 kB
·
Xet hash:
cb86272b70138e95b55b8311e66be7c621b86bd216f24ed027714e41896adcad

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.