Fix paper link, add project page and sample usage

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +58 -2
README.md CHANGED
@@ -2,6 +2,7 @@
2
  language:
3
  - en
4
  - zh
 
5
  pipeline_tag: feature-extraction
6
  tags:
7
  - embedding
@@ -9,15 +10,70 @@ tags:
9
  - long-context
10
  - rag
11
  - qwen
12
- license: apache-2.0
13
  ---
14
 
15
  # EvoEmbedding: Evolvable Representations for Long-Context Retrieval and Agentic Memory
16
 
17
- πŸ”— **[GitHub Repository](https://github.com/MiG-NJU/EvoEmbedding)** | πŸ“š **[Training Dataset](https://huggingface.co/datasets/MiG-NJU/EvoTrain-180K)** | πŸ“‘ **[Paper (https://arxiv.org/abs/2606.21649)]()**
18
 
19
  **EvoEmbedding** is a novel embedding model designed for long-context and dynamic retrieval scenarios. Unlike static embedding models that chunk text in isolation, EvoEmbedding maintains a continuously updated **Latent Memory Queue**. This allows it to capture temporal dynamics and generate *context-aware, evolvable embeddings* for precise retrieval in agentic workflows and long-conversations.
20
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  ## πŸ“¦ Model Family
22
 
23
  We provide EvoEmbedding in three sizes based on the Qwen architecture:
 
2
  language:
3
  - en
4
  - zh
5
+ license: apache-2.0
6
  pipeline_tag: feature-extraction
7
  tags:
8
  - embedding
 
10
  - long-context
11
  - rag
12
  - qwen
 
13
  ---
14
 
15
  # EvoEmbedding: Evolvable Representations for Long-Context Retrieval and Agentic Memory
16
 
17
+ πŸ”— **[GitHub Repository](https://github.com/MiG-NJU/EvoEmbedding)** | 🌐 **[Project Page](https://clare-nie.github.io/EvoEmbedding/)** | πŸ“š **[Training Dataset](https://huggingface.co/datasets/MiG-NJU/EvoTrain-180K)** | πŸ“„ **[Paper](https://huggingface.co/papers/2606.21649)**
18
 
19
  **EvoEmbedding** is a novel embedding model designed for long-context and dynamic retrieval scenarios. Unlike static embedding models that chunk text in isolation, EvoEmbedding maintains a continuously updated **Latent Memory Queue**. This allows it to capture temporal dynamics and generate *context-aware, evolvable embeddings* for precise retrieval in agentic workflows and long-conversations.
20
 
21
+ ## πŸš€ Quick Start
22
+
23
+ ### Installation
24
+
25
+ First, clone the repository and install the dependencies:
26
+
27
+ ```bash
28
+ git clone https://github.com/MiG-NJU/EvoEmbedding.git
29
+ cd EvoEmbedding
30
+ conda create -n evoemb python=3.10 -y
31
+ conda activate evoemb
32
+ pip install -r requirements-evoembedding-lite.txt
33
+ ```
34
+
35
+ ### Usage
36
+
37
+ #### As an Embedding Model
38
+
39
+ ```python
40
+ from model.client import EvoEmbeddingClient
41
+
42
+ client = EvoEmbeddingClient()
43
+
44
+ messages = [
45
+ {"role": "user", "content": "I visited Paris in April."},
46
+ {"role": "assistant", "content": "Noted."},
47
+ {"role": "user", "content": "I bought a new laptop yesterday."},
48
+ {"role": "assistant", "content": "Got it."},
49
+ {"role": "user", "content": "Where did I travel in spring?"},
50
+ ]
51
+
52
+ embeddings = client.encode_messages(messages)
53
+ ```
54
+
55
+ The `messages` input preserves the original dialogue order. `encode_messages` returns normalized embeddings for the history turns and the final query.
56
+
57
+ #### As a Reranker
58
+
59
+ ```python
60
+ candidates = [
61
+ "I visited Paris in April.",
62
+ "I bought a new laptop yesterday.",
63
+ "The meeting was moved to Friday.",
64
+ ]
65
+ query = "Where did I travel in spring?"
66
+
67
+ ranked_candidates, ranked_indices = client.rerank(
68
+ query,
69
+ candidates,
70
+ top_k=1,
71
+ return_indices=True,
72
+ )
73
+ ```
74
+
75
+ The reranker takes a direct list of candidate strings and returns them in relevance order.
76
+
77
  ## πŸ“¦ Model Family
78
 
79
  We provide EvoEmbedding in three sizes based on the Qwen architecture: