Spaces:
Runtime error
Runtime error
Commit ·
05f47ba
1
Parent(s): a4590c9
Update README
Browse filesSigned-off-by: Snehil Shah <snehilshah.989@gmail.com>
README.md
CHANGED
|
@@ -1,8 +1,8 @@
|
|
| 1 |
---
|
| 2 |
title: Multimodal Image Search Engine
|
| 3 |
-
emoji:
|
| 4 |
-
colorFrom:
|
| 5 |
-
colorTo:
|
| 6 |
sdk: gradio
|
| 7 |
sdk_version: 4.13.0
|
| 8 |
app_file: app.py
|
|
@@ -10,4 +10,44 @@ pinned: false
|
|
| 10 |
license: mit
|
| 11 |
---
|
| 12 |
|
| 13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
title: Multimodal Image Search Engine
|
| 3 |
+
emoji: 🔍
|
| 4 |
+
colorFrom: yellow
|
| 5 |
+
colorTo: yellow
|
| 6 |
sdk: gradio
|
| 7 |
sdk_version: 4.13.0
|
| 8 |
app_file: app.py
|
|
|
|
| 10 |
license: mit
|
| 11 |
---
|
| 12 |
|
| 13 |
+
<p align="center">
|
| 14 |
+
<h1 align="center">Multi-Modal Image Search Engine</h1>
|
| 15 |
+
<p align="center">
|
| 16 |
+
A Semantic Search Engine that understands the Content & Context of your Queries.
|
| 17 |
+
<br>
|
| 18 |
+
Use Multi-Modal inputs like Text-Image or a Reverse Image Search to Query a Vector Database of over 15k Images. <a href="https://huggingface.co/spaces/Snehil-Shah/Multimodal-Image-Search-Engine">Try it Out!</a>
|
| 19 |
+
<br><br>
|
| 20 |
+
<img src="https://github.com/Snehil-Shah/Multimodal-Image-Search-Engine/blob/main/assets/demo.gif?raw=true">
|
| 21 |
+
</p>
|
| 22 |
+
</p>
|
| 23 |
+
|
| 24 |
+
<h3>• About The Project</h3>
|
| 25 |
+
|
| 26 |
+
At its core, the Search Engine is built upon the concept of **Vector Similarity Search**.
|
| 27 |
+
All the Images are encoded into vector embeddings based on their semantic meaning using a Transformer Model, which are then stored in a vector space.
|
| 28 |
+
When searched with a query, it returns the nearest neighbors to the input query which are the relevant search results.
|
| 29 |
+
|
| 30 |
+
<p align="center"><img src="https://raw.githubusercontent.com/Snehil-Shah/Multimodal-Image-Search-Engine/main/assets/encoding_flow.png"></p>
|
| 31 |
+
|
| 32 |
+
We use the Contrastive Language-Image Pre-Training (CLIP) Model by OpenAI which is a Pre-trained Multi-Modal Vision Transformer that can semantically encode Words, Sentences & Images into a 512 Dimensional Vector. This Vector encapsulates the meaning & context of the entity into a *Mathematically Measurable* format.
|
| 33 |
+
|
| 34 |
+
<p align="center"><p align="center"><img src="https://raw.githubusercontent.com/Snehil-Shah/Multimodal-Image-Search-Engine/main/assets/Visualization.png" width=1000></p>
|
| 35 |
+
<p align="center"><i>2-D Visualization of 500 Images in a 512-D Vector Space</i></p></p>
|
| 36 |
+
|
| 37 |
+
The Images are stored as vector embeddings in a Qdrant Collection which is a Vector Database. The Search Term is encoded and run as a query to Qdrant, which returns the Nearest Neighbors based on their Cosine-Similarity to the Search Query.
|
| 38 |
+
|
| 39 |
+
<p align="center"><img src="https://raw.githubusercontent.com/Snehil-Shah/Multimodal-Image-Search-Engine/main/assets/retrieval_flow.png"></p>
|
| 40 |
+
|
| 41 |
+
**The Dataset**: All images are sourced from the [Open Images Dataset](https://github.com/cvdfoundation/open-images-dataset) by Common Visual Data Foundation.
|
| 42 |
+
|
| 43 |
+
<h3>• Technologies Used</h3>
|
| 44 |
+
|
| 45 |
+
- Python
|
| 46 |
+
- Jupyter Notebooks
|
| 47 |
+
- Qdrant - Vector Database
|
| 48 |
+
- Sentence-Transformers - Library
|
| 49 |
+
- CLIP by OpenAI - ViT Model
|
| 50 |
+
- Gradio - UI
|
| 51 |
+
- HuggingFace Spaces - Deployment
|
| 52 |
+
|
| 53 |
+
|