Instructions to use geoffsee/octen-embedding-0.6b-onnx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use geoffsee/octen-embedding-0.6b-onnx with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("geoffsee/octen-embedding-0.6b-onnx") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
Octen-Embedding-0.6B ONNX
ONNX export of Octen/Octen-Embedding-0.6B for inference with ONNX Runtime (Python, Web/WASM, etc.).
- Base model: Octen/Octen-Embedding-0.6B
- Pooling: last token, L2-normalized
- Max sequence length: 512
- Dynamic batch: True
- Hidden size: 1024
Files
| File | Description |
|---|---|
model.fp16.onnx (+ .onnx.data) |
FP16 weights, ~1.1 GB |
model.int8.onnx |
INT8 quantized, ~560 MB |
tokenizer/ |
Hugging Face tokenizer (same as base model) |
conversion-metadata.json |
Export config |
Usage
Python (ONNX Runtime)
import numpy as np
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download
import onnxruntime as ort
# Download from this repo
repo_id = "geoffsee/octen-embedding-0.6b-onnx"
path_fp16 = hf_hub_download(repo_id=repo_id, filename="model.fp16.onnx")
path_int8 = hf_hub_download(repo_id=repo_id, filename="model.int8.onnx")
tokenizer_path = hf_hub_download(repo_id=repo_id, filename="tokenizer/tokenizer.json", repo_type="model")
tokenizer = AutoTokenizer.from_pretrained(repo_id)
session = ort.InferenceSession(path_int8, providers=["CPUExecutionProvider"])
encoded = tokenizer("Your text here", return_tensors="np", padding=True, truncation=True, max_length=512)
outputs = session.run(None, {"input_ids": encoded["input_ids"].astype(np.int64), "attention_mask": encoded["attention_mask"].astype(np.int64)})
embeddings = outputs[0] # (batch, 1024)
JavaScript / ONNX Runtime Web
Use model.fp16.onnx or model.int8.onnx with onnxruntime-web. Load the tokenizer from tokenizer/ (e.g. with a compatible JS tokenizer).
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support