cstr commited on
Commit
7354204
·
verified ·
1 Parent(s): c6d7fb5

Update README: v2 Ollama format, retrieval LoRA merged

Browse files
Files changed (1) hide show
  1. README.md +27 -57
README.md CHANGED
@@ -1,70 +1,40 @@
1
  ---
2
- license: cc-by-nc-4.0
3
- language: [multilingual]
4
- tags: [embeddings, gguf, ggml, text-embeddings, qwen3, crispembed, ollama]
 
 
 
 
 
 
5
  pipeline_tag: feature-extraction
6
  base_model: jinaai/jina-embeddings-v5-text-small
7
  ---
8
 
9
- # jina-v5-small GGUF
10
 
11
- GGUF format of [jinaai/jina-embeddings-v5-text-small](https://huggingface.co/jinaai/jina-embeddings-v5-text-small) for use with [CrispEmbed](https://github.com/CrispStrobe/CrispEmbed) and [Ollama](https://ollama.com).
12
 
13
- ## Files
14
-
15
- | File | Quantization | Size |
16
- |------|-------------|------|
17
- | [jina-v5-small-q4_k.gguf](https://huggingface.co/cstr/jina-v5-small-GGUF/resolve/main/jina-v5-small-q4_k.gguf) | Q4_K | 0 MB |
18
- | [jina-v5-small-q8_0.gguf](https://huggingface.co/cstr/jina-v5-small-GGUF/resolve/main/jina-v5-small-q8_0.gguf) | Q8_0 | 0 MB |
19
- | [jina-v5-small.gguf](https://huggingface.co/cstr/jina-v5-small-GGUF/resolve/main/jina-v5-small.gguf) | F32 | 0 MB |
20
-
21
- **Recommended:** Q8_0 for quality (cos vs HF: L2=1.0), Q4_K for size (L2=1.0).
22
-
23
- ## Quick Start
24
 
25
- ### CrispEmbed
26
- ```bash
27
- ./crispembed -m jina-v5-small "Hello world"
28
- ./crispembed-server -m jina-v5-small --port 8080
29
- ```
30
-
31
- ### Ollama (with [CrispStrobe fork](https://github.com/CrispStrobe/ollama/tree/feat/xlmr-embedding))
32
- ```bash
33
- echo "FROM jina-v5-small-q8_0.gguf" > Modelfile
34
- ollama create jina-v5-small -f Modelfile
35
- curl http://localhost:11434/api/embed -d '{"model":"jina-v5-small","input":["Hello world"]}'
36
- ```
37
-
38
- ### Python (CrispEmbed)
39
- ```python
40
- from crispembed import CrispEmbed
41
- model = CrispEmbed("jina-v5-small-q8_0.gguf")
42
- vectors = model.encode(["Hello world", "Goodbye world"])
43
- ```
44
-
45
- ## Model Details
46
 
47
- | Property | Value |
48
- |----------|-------|
49
- | Architecture | Qwen3 |
50
- | Parameters | 600M |
51
- | Embedding Dimension | 1024 |
52
- | Layers | 28 |
53
- | Pooling | last-token |
54
- | Tokenizer | BPE |
55
- | Language | multilingual |
56
- | Q8_0 vs HuggingFace | L2=1.0 |
57
- | Q4_K vs HuggingFace | L2=1.0 |
58
 
59
- ## Server API
60
 
61
- CrispEmbed server supports four API dialects:
62
- - `POST /embed` -- native
63
- - `POST /v1/embeddings` -- OpenAI-compatible
64
- - `POST /api/embed` -- Ollama-compatible
65
- - `POST /api/embeddings` -- Ollama legacy
66
 
67
- ## Credits
68
 
69
- - Original model: [jinaai/jina-embeddings-v5-text-small](https://huggingface.co/jinaai/jina-embeddings-v5-text-small)
70
- - Inference: [CrispEmbed](https://github.com/CrispStrobe/CrispEmbed) (MIT, ggml-based)
 
1
  ---
2
+ license: apache-2.0
3
+ language:
4
+ - multilingual
5
+ tags:
6
+ - embeddings
7
+ - gguf
8
+ - text-embeddings
9
+ - jina
10
+ - crispembed
11
  pipeline_tag: feature-extraction
12
  base_model: jinaai/jina-embeddings-v5-text-small
13
  ---
14
 
15
+ # jina-embeddings-v5-text-small GGUF
16
 
17
+ GGUF format of [jinaai/jina-embeddings-v5-text-small](https://huggingface.co/jinaai/jina-embeddings-v5-text-small) for use with [CrispEmbed](https://github.com/CrispStrobe/CrispEmbed) and Ollama-compatible runtimes.
18
 
19
+ **Note:** These GGUFs have the `retrieval` LoRA adapter merged into the base weights. The original model supports 4 task-specific adapters (retrieval, text-matching, clustering, classification); this GGUF uses the retrieval adapter which is the most common use case.
 
 
 
 
 
 
 
 
 
 
20
 
21
+ ## Files
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
 
23
+ | File | Quantization | Size | Parity (cos vs HF) |
24
+ |------|-------------|------|---------------------|
25
+ | jina-v5-small.gguf | F32 | ~2.3 GB | 0.9999 |
26
+ | jina-v5-small-q8_0.gguf | Q8_0 | ~631 MB | 0.9995 |
27
+ | jina-v5-small-q5_k.gguf | Q5_K | ~489 MB | 0.9926 |
28
+ | jina-v5-small-q4_k.gguf | Q4_K | ~419 MB | 0.9725 |
 
 
 
 
 
29
 
30
+ ## Architecture
31
 
32
+ - **Base:** Qwen3-style transformer (28 layers, 1024 dims)
33
+ - **Embedding dimension:** 1024
34
+ - **Pooling:** Last-token + L2 normalize
35
+ - **Context length:** 8,192 tokens
36
+ - **License:** Apache 2.0
37
 
38
+ ## Notes
39
 
40
+ Ollama-compatible format (`qwen3.*` namespace). Bidirectional attention (`is_bidirectional=1`).