Buckets:

bbkdevops
/

Vortex-Embed-4.7M-bucket

Files

xet

bbkdevops/Vortex-Embed-4.7M-bucket / README.md

bbkdevops

14 days ago

preview code

download

raw

2.88 kB

	---
	library_name: lf4
	tags:
	- lf4
	- static-embedding
	- 4-bit
	- quantized
	- sentence-similarity
	- code-search
	- tool-search
	- sentence-transformers
	- embedding
	language: en
	license: mit
	pipeline_tag: sentence-similarity
	---

	# VTXAI/Vortex-Embed-4.7M

	Native 4-bit quantized static sentence embedding model.
	Generates 256-dimensional sentence embeddings via mean-pooling of a learned 4-bit quantized embedding table.

	Weighs only 4.7 MB on disk — no transformers, no torch, no GPU needed.

	## Model Size

	\| Format \| Size \| Compression \|
	\|--------\|------\|-------------\|
	\| FP32 (original) \| 28.8 MB \| 1.0× \|
	\| LF4 (this model) \| 4.7 MB \| 6.4× \|

	## Architecture

	Learned static embedding table with 4-bit per-block quantization (LF4):

	```
	LF4StaticEmbedding(
	vocab=29528, dim=256, bits=4,
	block_size=32, size=4.7MB
	)
	```

	Encoding: `tokenize → lookup dequantized embeddings → mean pool → L2 normalize`

	Weights stored as:
	- `embedding_packed`: uint8 (29528 × 128) — 4-bit packed, 2 values/byte
	- `embedding_scales`: float16 (29528 × 8) — per-block scale
	- `embedding_zeros`: float16 (29528 × 8) — per-block zero-point

	## Usage

	### Python inference (lightweight, no torch)

	```python
	from lf4_model import LF4StaticEmbedding

	model = LF4StaticEmbedding.from_pretrained("VTXAI/Vortex-Embed-4.7M")
	print(model) # LF4StaticEmbedding(vocab=29528, dim=256, bits=4, size=4.7MB)

	# Encode sentences to 256-dim vectors
	embeddings = model.encode(["search the web for news", "read file contents"])

	# Cosine similarity search
	scores, indices = model.search(query_emb, doc_emb, top_k=10)
	```

	### With sentence-transformers (torch)

	```python
	from sentence_transformers import SentenceTransformer

	model = SentenceTransformer("VTXAI/Vortex-Embed-4.7M", backend="static")
	embeddings = model.encode(["search the web for news", "read file contents"])
	```

	## Quality

	- Cosine preservation vs FP32: 0.9969
	- MSE: 0.256990
	- Tool search accuracy: 100% (15/15, benchmarks)
	- Codebase indexing: 12.5s index, 14.6ms P50 search (JARVIS codebase, 2707 chunks)
	- Trained on: CornStack (Python/JS/Java) + Glaive function-calling
	- Base: VTXAI/Vortex-Embed → fine-tuned → LF4 quantized

	## Why Static Embedding?

	\| Feature \| Static (this) \| Transformer (BERT) \|
	\|---\|---\|---\|
	\| Inference speed \| 0.15ms \| ~50ms \|
	\| Load time \| 144ms \| ~5s \|
	\| Disk size \| 4.7 MB \| ~400 MB \|
	\| GPU needed \| No \| Recommended \|
	\| Accuracy \| Comparable* \| Higher for complex semantics \|

	\* For domain-specific tasks (code search, tool retrieval) the gap narrows significantly.

	## No Dependencies Beyond NumPy

	```bash
	pip install numpy safetensors tokenizers
	```

	The model loads and runs with just `numpy`, `safetensors`, and HuggingFace `tokenizers`.
	No PyTorch, no transformers, no sentence-transformers required for basic inference.

Xet Storage Details

Size:: 2.88 kB
Xet hash:: cb86272b70138e95b55b8311e66be7c621b86bd216f24ed027714e41896adcad

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.