Spaces:

Noam12345
/

visual-product-recommender

Sleeping

App Files Files Community

Noam12345 commited on 21 days ago

Commit

8b18247

verified ·

1 Parent(s): 17e5ce6

Bonus depth: t-SNE, DBSCAN, per-cluster grids, query montage, FAISS benchmark, business/ethics; app stays on numpy

Browse files

Files changed (8) hide show

.gitattributes +4 -0
Assignment_3_NoamFuchs.ipynb +0 -0
README.md +56 -0
app.py +22 -4
assets/cluster_examples.png +3 -0
assets/dbscan.png +3 -0
assets/recommend_examples.png +3 -0
assets/tsne_category.png +3 -0

.gitattributes CHANGED Viewed

@@ -37,3 +37,7 @@ assets/pca_category.png filter=lfs diff=lfs merge=lfs -text
 assets/umap_category.png filter=lfs diff=lfs merge=lfs -text
 assets/umap_cluster.png filter=lfs diff=lfs merge=lfs -text
 assets/eda_sample_grid.png filter=lfs diff=lfs merge=lfs -text

 assets/umap_category.png filter=lfs diff=lfs merge=lfs -text
 assets/umap_cluster.png filter=lfs diff=lfs merge=lfs -text
 assets/eda_sample_grid.png filter=lfs diff=lfs merge=lfs -text
+assets/cluster_examples.png filter=lfs diff=lfs merge=lfs -text
+assets/dbscan.png filter=lfs diff=lfs merge=lfs -text
+assets/recommend_examples.png filter=lfs diff=lfs merge=lfs -text
+assets/tsne_category.png filter=lfs diff=lfs merge=lfs -text

Assignment_3_NoamFuchs.ipynb CHANGED Viewed

The diff for this file is too large to render. See raw diff

README.md CHANGED Viewed

@@ -199,6 +199,31 @@ The clusters were found from purely visual embeddings with no access to the labe
 human-meaningful groupings. That is exactly the property that makes nearest-neighbour recommendation
 work: similar-looking products really are near each other in the space.
 ## The Recommender (Inputs and Outputs)
 The recommendation itself is four steps, kept as small standalone functions:
@@ -212,6 +237,20 @@ The recommendation itself is four steps, kept as small standalone functions:
    (similarity > 0.985) so the three results are genuinely different products.
 4. **Return the Top-3** with their thumbnails, categories and similarity scores.
 ## Evaluation
 To put a number on quality I ran **image-to-image** retrieval on 80 held-out products (products the
@@ -238,6 +277,23 @@ The app has three tabs: a **Recommender** (upload a photo or type a query, get T
 The Space loads `catalog.parquet` and the same CLIP model used to build it, so the live results are
 exactly the pipeline described above.
 ## Final Conclusions
 CLIP gives a single shared space for images and text, and the clustering confirmed that space is

 human-meaningful groupings. That is exactly the property that makes nearest-neighbour recommendation
 work: similar-looking products really are near each other in the space.
+### 11. A second projection: t-SNE
+![t-SNE](https://huggingface.co/spaces/Noam12345/visual-product-recommender/resolve/main/assets/tsne_category.png)
+UMAP and PCA are global views; t-SNE emphasises local neighbourhoods. I ran it as a cross-check, and
+it shows the same per-category grouping, so the structure is not a UMAP artefact.
+### 12. A second clustering algorithm: DBSCAN
+![DBSCAN](https://huggingface.co/spaces/Noam12345/visual-product-recommender/resolve/main/assets/dbscan.png)
+K-Means forces every product into one of K round clusters; DBSCAN instead finds dense regions and
+labels the rest as noise. I chose `eps` properly from a **k-distance plot** (left) rather than
+guessing. DBSCAN breaks the catalogue into roughly two dozen fine clusters with almost no noise,
+which says the space is densely packed with small visual neighbourhoods. K-Means K=4 is the coarse,
+interpretable summary of that same structure. The two algorithms agree the space is well clustered.
+### 13. What each cluster actually contains
+![Cluster examples](https://huggingface.co/spaces/Noam12345/visual-product-recommender/resolve/main/assets/cluster_examples.png)
+Labels and numbers are one thing; the honest test is to look at the products closest to each
+centroid. Each row is clearly one visual family (packaged goods, tech, furnishings, small colourful
+items). This is what convinced me the space was worth recommending from.
 ## The Recommender (Inputs and Outputs)
 The recommendation itself is four steps, kept as small standalone functions:
    (similarity > 0.985) so the three results are genuinely different products.
 4. **Return the Top-3** with their thumbnails, categories and similarity scores.
+Here is what it actually returns for five held-out photos (products the catalogue never saw):
+![Query to Top-3](https://huggingface.co/spaces/Noam12345/visual-product-recommender/resolve/main/assets/recommend_examples.png)
+The camera row is the clearest: a Canon body retrieves three other cameras at ~0.94 similarity.
+### Bonus: a faster backend with FAISS
+A linear `EMB @ q` scan is fine for 12K items, but a real catalogue has millions. I index the
+embeddings with **FAISS** (the standard vector-search library) and confirm it returns the **same**
+Top-3 as the brute-force scan, only faster. On 500 queries FAISS was about **50x faster** with
+**100% agreement**, so it is exact here, just quicker. That is the piece that would let the same app
+scale to a production catalogue.
 ## Evaluation
 To put a number on quality I ran **image-to-image** retrieval on 80 held-out products (products the
 The Space loads `catalog.parquet` and the same CLIP model used to build it, so the live results are
 exactly the pipeline described above.
+## Business and Ethical Considerations
+**Business value.** Visual similarity search is the engine behind "shop the look" and "more like
+this" features. It needs no manual tagging (it runs on the product image alone), works across
+languages (useful here, since titles are multilingual), and helps cold-start items that have no
+clicks yet. The same `catalog.parquet` + FAISS setup would directly power a related-items carousel
+or a visual search bar on a store.
+**Limits and ethics.**
+- **Visual, not semantic.** The model matches appearance, so it can pair items that look alike but
+  serve different purposes. Fine for shopping, but it should not be trusted where the *function*
+  matters (medical or safety products).
+- **Representation bias.** CLIP is trained on web images and inherits their biases; a product shot in
+  an unusual style, or from an under-represented region, may embed poorly and be under-recommended.
+- **Catalogue gaps.** Recommendations can only point inside the catalogue, so sparse categories give
+  weak results no matter how good the model is.
 ## Final Conclusions
 CLIP gives a single shared space for images and text, and the clustering confirmed that space is

app.py CHANGED Viewed

@@ -46,7 +46,7 @@ def encode_image(img):
 def top_matches(qvec, k=3):
-    sims = EMB @ qvec
     order = np.argsort(-sims)
     chosen = []
     for idx in order:
@@ -181,12 +181,30 @@ with gr.Blocks(title="Visual Product Recommender", theme=gr.themes.Soft()) as de
                 "human categories, as the heatmap shows. The silhouette is modest because most products sit on white "
                 "studio backgrounds and overlap visually, but the structure is real, which is what makes "
                 "nearest-neighbour recommendation work.\n\n"
                 "## 4. Recommender and Evaluation\n"
                 "Embeddings are saved to **`catalog.parquet`**. A query is encoded with the same CLIP model, scored by "
                 "**cosine similarity** against the catalogue, and the **Top-3** are returned (near-duplicates filtered). "
-                "On 80 held-out products, image-to-image retrieval reaches **precision@1 ≈ 0.39**, about 3x the random "
-                "baseline. The full coding work is in the notebook in the **Files** tab "
-                "(`Assignment_3_NoamFuchs.ipynb`)."
             )
         # ---------------------------------------------------------------- TAB 3

 def top_matches(qvec, k=3):
+    sims = EMB @ qvec                       # cosine (vectors are L2-normalized)
     order = np.argsort(-sims)
     chosen = []
     for idx in order:
                 "human categories, as the heatmap shows. The silhouette is modest because most products sit on white "
                 "studio backgrounds and overlap visually, but the structure is real, which is what makes "
                 "nearest-neighbour recommendation work.\n\n"
+                "**Going deeper.** I cross-checked with a second projection (**t-SNE**) and a second clustering "
+                "algorithm (**DBSCAN**, eps chosen from a k-distance plot), and looked at the actual products closest "
+                "to each cluster centroid."
+            )
+            with gr.Row():
+                gr.Image(load_plot("tsne_category.png"), label="t-SNE (second projection)", show_label=True)
+                gr.Image(load_plot("dbscan.png"), label="DBSCAN (second clustering algorithm)", show_label=True)
+            gr.Image(load_plot("cluster_examples.png"), label="Representative products per cluster", show_label=True)
+            gr.Markdown(
                 "## 4. Recommender and Evaluation\n"
                 "Embeddings are saved to **`catalog.parquet`**. A query is encoded with the same CLIP model, scored by "
                 "**cosine similarity** against the catalogue, and the **Top-3** are returned (near-duplicates filtered). "
+                "In the notebook I also benchmark a **FAISS** index (the standard vector-search library) as a scaling "
+                "option: about 50x faster than the brute-force scan with identical results. On 80 held-out products, "
+                "image-to-image retrieval reaches **precision@1 ≈ 0.39**, about 3x the random baseline. Example queries "
+                "and their Top-3:"
+            )
+            gr.Image(load_plot("recommend_examples.png"), label="Query (held-out photo) to Top-3", show_label=True)
+            gr.Markdown(
+                "**Business & ethics:** visual search powers 'shop the look' features, needs no manual tags, and works "
+                "across languages, but it matches *appearance not function* (avoid for safety-critical items), can inherit "
+                "CLIP's web-image biases, and can only recommend products that exist in the catalogue. Full coding work is "
+                "in the notebook (**Files** tab, `Assignment_3_NoamFuchs.ipynb`)."
             )
         # ---------------------------------------------------------------- TAB 3

assets/cluster_examples.png ADDED Viewed

Git LFS Details

SHA256: ebd3de33cb2be108e2105aed911eed946b01f91a76c246591074890c2e9ff9f1
Pointer size: 131 Bytes
Size of remote file: 375 kB

assets/dbscan.png ADDED Viewed

Git LFS Details

SHA256: cd7a065df5822f2db8a867efbce877ed3ff884370f1f3e60ed03d6bfc73f0240
Pointer size: 131 Bytes
Size of remote file: 227 kB

assets/recommend_examples.png ADDED Viewed

Git LFS Details

SHA256: 2d798d2525ed84aa2c6dc4b95f7ecb63b37fbe8926a163526d7d637b34b68844
Pointer size: 132 Bytes
Size of remote file: 1.26 MB

assets/tsne_category.png ADDED Viewed

Git LFS Details

SHA256: 2dedfa7c757d40d0addff88a239b6d4cbc6c125dcceac13f8fb56b4e8418de71
Pointer size: 131 Bytes
Size of remote file: 603 kB