Bonus depth: t-SNE, DBSCAN, per-cluster grids, query montage, FAISS benchmark, business/ethics; app stays on numpy
Browse files- .gitattributes +4 -0
- Assignment_3_NoamFuchs.ipynb +0 -0
- README.md +56 -0
- app.py +22 -4
- assets/cluster_examples.png +3 -0
- assets/dbscan.png +3 -0
- assets/recommend_examples.png +3 -0
- assets/tsne_category.png +3 -0
.gitattributes
CHANGED
|
@@ -37,3 +37,7 @@ assets/pca_category.png filter=lfs diff=lfs merge=lfs -text
|
|
| 37 |
assets/umap_category.png filter=lfs diff=lfs merge=lfs -text
|
| 38 |
assets/umap_cluster.png filter=lfs diff=lfs merge=lfs -text
|
| 39 |
assets/eda_sample_grid.png filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
assets/umap_category.png filter=lfs diff=lfs merge=lfs -text
|
| 38 |
assets/umap_cluster.png filter=lfs diff=lfs merge=lfs -text
|
| 39 |
assets/eda_sample_grid.png filter=lfs diff=lfs merge=lfs -text
|
| 40 |
+
assets/cluster_examples.png filter=lfs diff=lfs merge=lfs -text
|
| 41 |
+
assets/dbscan.png filter=lfs diff=lfs merge=lfs -text
|
| 42 |
+
assets/recommend_examples.png filter=lfs diff=lfs merge=lfs -text
|
| 43 |
+
assets/tsne_category.png filter=lfs diff=lfs merge=lfs -text
|
Assignment_3_NoamFuchs.ipynb
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
README.md
CHANGED
|
@@ -199,6 +199,31 @@ The clusters were found from purely visual embeddings with no access to the labe
|
|
| 199 |
human-meaningful groupings. That is exactly the property that makes nearest-neighbour recommendation
|
| 200 |
work: similar-looking products really are near each other in the space.
|
| 201 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 202 |
## The Recommender (Inputs and Outputs)
|
| 203 |
|
| 204 |
The recommendation itself is four steps, kept as small standalone functions:
|
|
@@ -212,6 +237,20 @@ The recommendation itself is four steps, kept as small standalone functions:
|
|
| 212 |
(similarity > 0.985) so the three results are genuinely different products.
|
| 213 |
4. **Return the Top-3** with their thumbnails, categories and similarity scores.
|
| 214 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 215 |
## Evaluation
|
| 216 |
|
| 217 |
To put a number on quality I ran **image-to-image** retrieval on 80 held-out products (products the
|
|
@@ -238,6 +277,23 @@ The app has three tabs: a **Recommender** (upload a photo or type a query, get T
|
|
| 238 |
The Space loads `catalog.parquet` and the same CLIP model used to build it, so the live results are
|
| 239 |
exactly the pipeline described above.
|
| 240 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 241 |
## Final Conclusions
|
| 242 |
|
| 243 |
CLIP gives a single shared space for images and text, and the clustering confirmed that space is
|
|
|
|
| 199 |
human-meaningful groupings. That is exactly the property that makes nearest-neighbour recommendation
|
| 200 |
work: similar-looking products really are near each other in the space.
|
| 201 |
|
| 202 |
+
### 11. A second projection: t-SNE
|
| 203 |
+
|
| 204 |
+

|
| 205 |
+
|
| 206 |
+
UMAP and PCA are global views; t-SNE emphasises local neighbourhoods. I ran it as a cross-check, and
|
| 207 |
+
it shows the same per-category grouping, so the structure is not a UMAP artefact.
|
| 208 |
+
|
| 209 |
+
### 12. A second clustering algorithm: DBSCAN
|
| 210 |
+
|
| 211 |
+

|
| 212 |
+
|
| 213 |
+
K-Means forces every product into one of K round clusters; DBSCAN instead finds dense regions and
|
| 214 |
+
labels the rest as noise. I chose `eps` properly from a **k-distance plot** (left) rather than
|
| 215 |
+
guessing. DBSCAN breaks the catalogue into roughly two dozen fine clusters with almost no noise,
|
| 216 |
+
which says the space is densely packed with small visual neighbourhoods. K-Means K=4 is the coarse,
|
| 217 |
+
interpretable summary of that same structure. The two algorithms agree the space is well clustered.
|
| 218 |
+
|
| 219 |
+
### 13. What each cluster actually contains
|
| 220 |
+
|
| 221 |
+

|
| 222 |
+
|
| 223 |
+
Labels and numbers are one thing; the honest test is to look at the products closest to each
|
| 224 |
+
centroid. Each row is clearly one visual family (packaged goods, tech, furnishings, small colourful
|
| 225 |
+
items). This is what convinced me the space was worth recommending from.
|
| 226 |
+
|
| 227 |
## The Recommender (Inputs and Outputs)
|
| 228 |
|
| 229 |
The recommendation itself is four steps, kept as small standalone functions:
|
|
|
|
| 237 |
(similarity > 0.985) so the three results are genuinely different products.
|
| 238 |
4. **Return the Top-3** with their thumbnails, categories and similarity scores.
|
| 239 |
|
| 240 |
+
Here is what it actually returns for five held-out photos (products the catalogue never saw):
|
| 241 |
+
|
| 242 |
+

|
| 243 |
+
|
| 244 |
+
The camera row is the clearest: a Canon body retrieves three other cameras at ~0.94 similarity.
|
| 245 |
+
|
| 246 |
+
### Bonus: a faster backend with FAISS
|
| 247 |
+
|
| 248 |
+
A linear `EMB @ q` scan is fine for 12K items, but a real catalogue has millions. I index the
|
| 249 |
+
embeddings with **FAISS** (the standard vector-search library) and confirm it returns the **same**
|
| 250 |
+
Top-3 as the brute-force scan, only faster. On 500 queries FAISS was about **50x faster** with
|
| 251 |
+
**100% agreement**, so it is exact here, just quicker. That is the piece that would let the same app
|
| 252 |
+
scale to a production catalogue.
|
| 253 |
+
|
| 254 |
## Evaluation
|
| 255 |
|
| 256 |
To put a number on quality I ran **image-to-image** retrieval on 80 held-out products (products the
|
|
|
|
| 277 |
The Space loads `catalog.parquet` and the same CLIP model used to build it, so the live results are
|
| 278 |
exactly the pipeline described above.
|
| 279 |
|
| 280 |
+
## Business and Ethical Considerations
|
| 281 |
+
|
| 282 |
+
**Business value.** Visual similarity search is the engine behind "shop the look" and "more like
|
| 283 |
+
this" features. It needs no manual tagging (it runs on the product image alone), works across
|
| 284 |
+
languages (useful here, since titles are multilingual), and helps cold-start items that have no
|
| 285 |
+
clicks yet. The same `catalog.parquet` + FAISS setup would directly power a related-items carousel
|
| 286 |
+
or a visual search bar on a store.
|
| 287 |
+
|
| 288 |
+
**Limits and ethics.**
|
| 289 |
+
- **Visual, not semantic.** The model matches appearance, so it can pair items that look alike but
|
| 290 |
+
serve different purposes. Fine for shopping, but it should not be trusted where the *function*
|
| 291 |
+
matters (medical or safety products).
|
| 292 |
+
- **Representation bias.** CLIP is trained on web images and inherits their biases; a product shot in
|
| 293 |
+
an unusual style, or from an under-represented region, may embed poorly and be under-recommended.
|
| 294 |
+
- **Catalogue gaps.** Recommendations can only point inside the catalogue, so sparse categories give
|
| 295 |
+
weak results no matter how good the model is.
|
| 296 |
+
|
| 297 |
## Final Conclusions
|
| 298 |
|
| 299 |
CLIP gives a single shared space for images and text, and the clustering confirmed that space is
|
app.py
CHANGED
|
@@ -46,7 +46,7 @@ def encode_image(img):
|
|
| 46 |
|
| 47 |
|
| 48 |
def top_matches(qvec, k=3):
|
| 49 |
-
sims = EMB @ qvec
|
| 50 |
order = np.argsort(-sims)
|
| 51 |
chosen = []
|
| 52 |
for idx in order:
|
|
@@ -181,12 +181,30 @@ with gr.Blocks(title="Visual Product Recommender", theme=gr.themes.Soft()) as de
|
|
| 181 |
"human categories, as the heatmap shows. The silhouette is modest because most products sit on white "
|
| 182 |
"studio backgrounds and overlap visually, but the structure is real, which is what makes "
|
| 183 |
"nearest-neighbour recommendation work.\n\n"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 184 |
"## 4. Recommender and Evaluation\n"
|
| 185 |
"Embeddings are saved to **`catalog.parquet`**. A query is encoded with the same CLIP model, scored by "
|
| 186 |
"**cosine similarity** against the catalogue, and the **Top-3** are returned (near-duplicates filtered). "
|
| 187 |
-
"
|
| 188 |
-
"
|
| 189 |
-
"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 190 |
)
|
| 191 |
|
| 192 |
# ---------------------------------------------------------------- TAB 3
|
|
|
|
| 46 |
|
| 47 |
|
| 48 |
def top_matches(qvec, k=3):
|
| 49 |
+
sims = EMB @ qvec # cosine (vectors are L2-normalized)
|
| 50 |
order = np.argsort(-sims)
|
| 51 |
chosen = []
|
| 52 |
for idx in order:
|
|
|
|
| 181 |
"human categories, as the heatmap shows. The silhouette is modest because most products sit on white "
|
| 182 |
"studio backgrounds and overlap visually, but the structure is real, which is what makes "
|
| 183 |
"nearest-neighbour recommendation work.\n\n"
|
| 184 |
+
"**Going deeper.** I cross-checked with a second projection (**t-SNE**) and a second clustering "
|
| 185 |
+
"algorithm (**DBSCAN**, eps chosen from a k-distance plot), and looked at the actual products closest "
|
| 186 |
+
"to each cluster centroid."
|
| 187 |
+
)
|
| 188 |
+
with gr.Row():
|
| 189 |
+
gr.Image(load_plot("tsne_category.png"), label="t-SNE (second projection)", show_label=True)
|
| 190 |
+
gr.Image(load_plot("dbscan.png"), label="DBSCAN (second clustering algorithm)", show_label=True)
|
| 191 |
+
gr.Image(load_plot("cluster_examples.png"), label="Representative products per cluster", show_label=True)
|
| 192 |
+
|
| 193 |
+
gr.Markdown(
|
| 194 |
"## 4. Recommender and Evaluation\n"
|
| 195 |
"Embeddings are saved to **`catalog.parquet`**. A query is encoded with the same CLIP model, scored by "
|
| 196 |
"**cosine similarity** against the catalogue, and the **Top-3** are returned (near-duplicates filtered). "
|
| 197 |
+
"In the notebook I also benchmark a **FAISS** index (the standard vector-search library) as a scaling "
|
| 198 |
+
"option: about 50x faster than the brute-force scan with identical results. On 80 held-out products, "
|
| 199 |
+
"image-to-image retrieval reaches **precision@1 ≈ 0.39**, about 3x the random baseline. Example queries "
|
| 200 |
+
"and their Top-3:"
|
| 201 |
+
)
|
| 202 |
+
gr.Image(load_plot("recommend_examples.png"), label="Query (held-out photo) to Top-3", show_label=True)
|
| 203 |
+
gr.Markdown(
|
| 204 |
+
"**Business & ethics:** visual search powers 'shop the look' features, needs no manual tags, and works "
|
| 205 |
+
"across languages, but it matches *appearance not function* (avoid for safety-critical items), can inherit "
|
| 206 |
+
"CLIP's web-image biases, and can only recommend products that exist in the catalogue. Full coding work is "
|
| 207 |
+
"in the notebook (**Files** tab, `Assignment_3_NoamFuchs.ipynb`)."
|
| 208 |
)
|
| 209 |
|
| 210 |
# ---------------------------------------------------------------- TAB 3
|
assets/cluster_examples.png
ADDED
|
Git LFS Details
|
assets/dbscan.png
ADDED
|
Git LFS Details
|
assets/recommend_examples.png
ADDED
|
Git LFS Details
|
assets/tsne_category.png
ADDED
|
Git LFS Details
|