Noam12345 commited on
Commit
8b18247
·
verified ·
1 Parent(s): 17e5ce6

Bonus depth: t-SNE, DBSCAN, per-cluster grids, query montage, FAISS benchmark, business/ethics; app stays on numpy

Browse files
.gitattributes CHANGED
@@ -37,3 +37,7 @@ assets/pca_category.png filter=lfs diff=lfs merge=lfs -text
37
  assets/umap_category.png filter=lfs diff=lfs merge=lfs -text
38
  assets/umap_cluster.png filter=lfs diff=lfs merge=lfs -text
39
  assets/eda_sample_grid.png filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
37
  assets/umap_category.png filter=lfs diff=lfs merge=lfs -text
38
  assets/umap_cluster.png filter=lfs diff=lfs merge=lfs -text
39
  assets/eda_sample_grid.png filter=lfs diff=lfs merge=lfs -text
40
+ assets/cluster_examples.png filter=lfs diff=lfs merge=lfs -text
41
+ assets/dbscan.png filter=lfs diff=lfs merge=lfs -text
42
+ assets/recommend_examples.png filter=lfs diff=lfs merge=lfs -text
43
+ assets/tsne_category.png filter=lfs diff=lfs merge=lfs -text
Assignment_3_NoamFuchs.ipynb CHANGED
The diff for this file is too large to render. See raw diff
 
README.md CHANGED
@@ -199,6 +199,31 @@ The clusters were found from purely visual embeddings with no access to the labe
199
  human-meaningful groupings. That is exactly the property that makes nearest-neighbour recommendation
200
  work: similar-looking products really are near each other in the space.
201
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
202
  ## The Recommender (Inputs and Outputs)
203
 
204
  The recommendation itself is four steps, kept as small standalone functions:
@@ -212,6 +237,20 @@ The recommendation itself is four steps, kept as small standalone functions:
212
  (similarity > 0.985) so the three results are genuinely different products.
213
  4. **Return the Top-3** with their thumbnails, categories and similarity scores.
214
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
215
  ## Evaluation
216
 
217
  To put a number on quality I ran **image-to-image** retrieval on 80 held-out products (products the
@@ -238,6 +277,23 @@ The app has three tabs: a **Recommender** (upload a photo or type a query, get T
238
  The Space loads `catalog.parquet` and the same CLIP model used to build it, so the live results are
239
  exactly the pipeline described above.
240
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
241
  ## Final Conclusions
242
 
243
  CLIP gives a single shared space for images and text, and the clustering confirmed that space is
 
199
  human-meaningful groupings. That is exactly the property that makes nearest-neighbour recommendation
200
  work: similar-looking products really are near each other in the space.
201
 
202
+ ### 11. A second projection: t-SNE
203
+
204
+ ![t-SNE](https://huggingface.co/spaces/Noam12345/visual-product-recommender/resolve/main/assets/tsne_category.png)
205
+
206
+ UMAP and PCA are global views; t-SNE emphasises local neighbourhoods. I ran it as a cross-check, and
207
+ it shows the same per-category grouping, so the structure is not a UMAP artefact.
208
+
209
+ ### 12. A second clustering algorithm: DBSCAN
210
+
211
+ ![DBSCAN](https://huggingface.co/spaces/Noam12345/visual-product-recommender/resolve/main/assets/dbscan.png)
212
+
213
+ K-Means forces every product into one of K round clusters; DBSCAN instead finds dense regions and
214
+ labels the rest as noise. I chose `eps` properly from a **k-distance plot** (left) rather than
215
+ guessing. DBSCAN breaks the catalogue into roughly two dozen fine clusters with almost no noise,
216
+ which says the space is densely packed with small visual neighbourhoods. K-Means K=4 is the coarse,
217
+ interpretable summary of that same structure. The two algorithms agree the space is well clustered.
218
+
219
+ ### 13. What each cluster actually contains
220
+
221
+ ![Cluster examples](https://huggingface.co/spaces/Noam12345/visual-product-recommender/resolve/main/assets/cluster_examples.png)
222
+
223
+ Labels and numbers are one thing; the honest test is to look at the products closest to each
224
+ centroid. Each row is clearly one visual family (packaged goods, tech, furnishings, small colourful
225
+ items). This is what convinced me the space was worth recommending from.
226
+
227
  ## The Recommender (Inputs and Outputs)
228
 
229
  The recommendation itself is four steps, kept as small standalone functions:
 
237
  (similarity > 0.985) so the three results are genuinely different products.
238
  4. **Return the Top-3** with their thumbnails, categories and similarity scores.
239
 
240
+ Here is what it actually returns for five held-out photos (products the catalogue never saw):
241
+
242
+ ![Query to Top-3](https://huggingface.co/spaces/Noam12345/visual-product-recommender/resolve/main/assets/recommend_examples.png)
243
+
244
+ The camera row is the clearest: a Canon body retrieves three other cameras at ~0.94 similarity.
245
+
246
+ ### Bonus: a faster backend with FAISS
247
+
248
+ A linear `EMB @ q` scan is fine for 12K items, but a real catalogue has millions. I index the
249
+ embeddings with **FAISS** (the standard vector-search library) and confirm it returns the **same**
250
+ Top-3 as the brute-force scan, only faster. On 500 queries FAISS was about **50x faster** with
251
+ **100% agreement**, so it is exact here, just quicker. That is the piece that would let the same app
252
+ scale to a production catalogue.
253
+
254
  ## Evaluation
255
 
256
  To put a number on quality I ran **image-to-image** retrieval on 80 held-out products (products the
 
277
  The Space loads `catalog.parquet` and the same CLIP model used to build it, so the live results are
278
  exactly the pipeline described above.
279
 
280
+ ## Business and Ethical Considerations
281
+
282
+ **Business value.** Visual similarity search is the engine behind "shop the look" and "more like
283
+ this" features. It needs no manual tagging (it runs on the product image alone), works across
284
+ languages (useful here, since titles are multilingual), and helps cold-start items that have no
285
+ clicks yet. The same `catalog.parquet` + FAISS setup would directly power a related-items carousel
286
+ or a visual search bar on a store.
287
+
288
+ **Limits and ethics.**
289
+ - **Visual, not semantic.** The model matches appearance, so it can pair items that look alike but
290
+ serve different purposes. Fine for shopping, but it should not be trusted where the *function*
291
+ matters (medical or safety products).
292
+ - **Representation bias.** CLIP is trained on web images and inherits their biases; a product shot in
293
+ an unusual style, or from an under-represented region, may embed poorly and be under-recommended.
294
+ - **Catalogue gaps.** Recommendations can only point inside the catalogue, so sparse categories give
295
+ weak results no matter how good the model is.
296
+
297
  ## Final Conclusions
298
 
299
  CLIP gives a single shared space for images and text, and the clustering confirmed that space is
app.py CHANGED
@@ -46,7 +46,7 @@ def encode_image(img):
46
 
47
 
48
  def top_matches(qvec, k=3):
49
- sims = EMB @ qvec
50
  order = np.argsort(-sims)
51
  chosen = []
52
  for idx in order:
@@ -181,12 +181,30 @@ with gr.Blocks(title="Visual Product Recommender", theme=gr.themes.Soft()) as de
181
  "human categories, as the heatmap shows. The silhouette is modest because most products sit on white "
182
  "studio backgrounds and overlap visually, but the structure is real, which is what makes "
183
  "nearest-neighbour recommendation work.\n\n"
 
 
 
 
 
 
 
 
 
 
184
  "## 4. Recommender and Evaluation\n"
185
  "Embeddings are saved to **`catalog.parquet`**. A query is encoded with the same CLIP model, scored by "
186
  "**cosine similarity** against the catalogue, and the **Top-3** are returned (near-duplicates filtered). "
187
- "On 80 held-out products, image-to-image retrieval reaches **precision@1 ≈ 0.39**, about 3x the random "
188
- "baseline. The full coding work is in the notebook in the **Files** tab "
189
- "(`Assignment_3_NoamFuchs.ipynb`)."
 
 
 
 
 
 
 
 
190
  )
191
 
192
  # ---------------------------------------------------------------- TAB 3
 
46
 
47
 
48
  def top_matches(qvec, k=3):
49
+ sims = EMB @ qvec # cosine (vectors are L2-normalized)
50
  order = np.argsort(-sims)
51
  chosen = []
52
  for idx in order:
 
181
  "human categories, as the heatmap shows. The silhouette is modest because most products sit on white "
182
  "studio backgrounds and overlap visually, but the structure is real, which is what makes "
183
  "nearest-neighbour recommendation work.\n\n"
184
+ "**Going deeper.** I cross-checked with a second projection (**t-SNE**) and a second clustering "
185
+ "algorithm (**DBSCAN**, eps chosen from a k-distance plot), and looked at the actual products closest "
186
+ "to each cluster centroid."
187
+ )
188
+ with gr.Row():
189
+ gr.Image(load_plot("tsne_category.png"), label="t-SNE (second projection)", show_label=True)
190
+ gr.Image(load_plot("dbscan.png"), label="DBSCAN (second clustering algorithm)", show_label=True)
191
+ gr.Image(load_plot("cluster_examples.png"), label="Representative products per cluster", show_label=True)
192
+
193
+ gr.Markdown(
194
  "## 4. Recommender and Evaluation\n"
195
  "Embeddings are saved to **`catalog.parquet`**. A query is encoded with the same CLIP model, scored by "
196
  "**cosine similarity** against the catalogue, and the **Top-3** are returned (near-duplicates filtered). "
197
+ "In the notebook I also benchmark a **FAISS** index (the standard vector-search library) as a scaling "
198
+ "option: about 50x faster than the brute-force scan with identical results. On 80 held-out products, "
199
+ "image-to-image retrieval reaches **precision@1 ≈ 0.39**, about 3x the random baseline. Example queries "
200
+ "and their Top-3:"
201
+ )
202
+ gr.Image(load_plot("recommend_examples.png"), label="Query (held-out photo) to Top-3", show_label=True)
203
+ gr.Markdown(
204
+ "**Business & ethics:** visual search powers 'shop the look' features, needs no manual tags, and works "
205
+ "across languages, but it matches *appearance not function* (avoid for safety-critical items), can inherit "
206
+ "CLIP's web-image biases, and can only recommend products that exist in the catalogue. Full coding work is "
207
+ "in the notebook (**Files** tab, `Assignment_3_NoamFuchs.ipynb`)."
208
  )
209
 
210
  # ---------------------------------------------------------------- TAB 3
assets/cluster_examples.png ADDED

Git LFS Details

  • SHA256: ebd3de33cb2be108e2105aed911eed946b01f91a76c246591074890c2e9ff9f1
  • Pointer size: 131 Bytes
  • Size of remote file: 375 kB
assets/dbscan.png ADDED

Git LFS Details

  • SHA256: cd7a065df5822f2db8a867efbce877ed3ff884370f1f3e60ed03d6bfc73f0240
  • Pointer size: 131 Bytes
  • Size of remote file: 227 kB
assets/recommend_examples.png ADDED

Git LFS Details

  • SHA256: 2d798d2525ed84aa2c6dc4b95f7ecb63b37fbe8926a163526d7d637b34b68844
  • Pointer size: 132 Bytes
  • Size of remote file: 1.26 MB
assets/tsne_category.png ADDED

Git LFS Details

  • SHA256: 2dedfa7c757d40d0addff88a239b6d4cbc6c125dcceac13f8fb56b4e8418de71
  • Pointer size: 131 Bytes
  • Size of remote file: 603 kB