π Text To Image Search QuickStart#
Try the image search live in Relevance AI Dashboard.
In this notebook we will show you how to create and experiment with a powerful text to image search engine using OpenAIβs CLIP and Relevance AI.
What I Need#
Project & API Key (The SDK will link you to the corresponding page or you can grab your API key from https://cloud.tryrelevance.com/ in the settings area)
Python 3
Relevance AI Installed as shown below. For more information visit Installation guide
Installation Requirements#
# Relevance AI installation
# remove `!` if running the line in a terminal
!pip install -U RelevanceAI[notebook]==2.0.0
!pip install ftfy regex tqdm
!pip install git+https://github.com/openai/CLIP.git
Client Setup#
You can sign up/login and find your credentials here:
https://cloud.tryrelevance.com/sdk/api Once you have signed up, click on the
value under Activation token
and paste it here
from relevanceai import Client
client = Client()
Text-to-image search#
To enable text-to-image search we will be using Relevance AI as the vector database and OpenAIβs CLIP as the vectorizer, to vectorize text and images into CLIP vector embeddings.
1) Data#
For this quickstart we will be using a sample e-commerce dataset. Alternatively, you can use your own dataset for the different steps.
import pandas as pd
from relevanceai.utils.datasets import get_ecommerce_dataset_clean
# Retrieve our sample dataset. - This comes in the form of a list of documents.
documents = get_ecommerce_dataset_clean()
pd.DataFrame.from_dict(documents).head()
2) Encode / Vectorize with CLIP#
CLIP is a vectorizer from OpenAI that is trained to find similarities between text and image pairs. In the code below we set up CLIP.
import torch
import clip
import requests
from PIL import Image
device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device=device)
# First - let's encode the image based on CLIP
def encode_image(image):
# Let us download the image and then preprocess it
image = (
preprocess(Image.open(requests.get(image, stream=True).raw))
.unsqueeze(0)
.to(device)
)
# We then feed our processed image through the neural net to get a vector
with torch.no_grad():
image_features = model.encode_image(image)
# Lastly we convert it to a list so that we can send it through the SDK
return image_features.tolist()[0]
# Next - let's encode text based on CLIP
def encode_text(text):
# let us get text and then tokenize it
text = clip.tokenize([text]).to(device)
# We then feed our processed text through the neural net to get a vector
with torch.no_grad():
text_features = model.encode_text(text)
return text_features.tolist()[0]
100%|βββββββββββββββββββββββββββββββββββββββ| 338M/338M [00:06<00:00, 52.0MiB/s]
We then encode the data we have into vectors, this will take a couple of mins
documents = documents[:500] # only 500 docs to make the process faster
def encode_image_document(d):
try:
d["product_image_clip_vector_"] = encode_image(d["product_image"])
except:
pass
# Let's import TQDM for a nice progress bar!
from tqdm.auto import tqdm
[encode_image_document(d) for d in tqdm(documents)]
3) Insert#
Uploading our documents into the dataset quickstart_clip
.
In case you are uploading your own dataset, keep in mind that each document should have a field called β_idβ. Such an id can be easily allocated using the uuid package:
ds.insert_documents(documents, create_id=True)
ds = client.Dataset("quickstart_clip")
ds.insert_documents(documents)
Once we have uploaded the data, we can see the dataset on the dashboard.
The dashboard provides users with a great overview and statistics of the dataset as shown below.
4) Search#
This step is to run a simple vector search; you can read more about vector search and how to construct a multi-vector query here.
Note that our dataset includes vectors generated by the Clip encoder. Therefore, in this step, we first vectorize the query using the same encoder to be able to search among the similarly generated vectors.
query = "for my baby daughter"
query_vector = encode_text(query)
multivector_query = [{"vector": query_vector, "fields": ["product_image_clip_vector_"]}]
results = ds.vector_search(multivector_query=multivector_query, page_size=5)
You can use our json shower library to observe the search result in a notebook as shown below:
from relevanceai import show_json
print("=== QUERY === ")
print(query)
print("=== RESULTS ===")
show_json(results, image_fields=["product_image"], text_fields=["product_title"])
=== QUERY ===> for my baby daughter
product_image | product_title | _id | |
---|---|---|---|
0 | ![]() |
Crocs Girl (Infant) 'Littles Hover' Leather Athletic Shoe | cdf48ecc-882a-45ab-b625-ba86bf8cffa4 |
1 | ![]() |
The New York Doll Collection Double Stroller | ae2915f9-d7bb-4e0c-8a05-65682cd5a6d3 |
2 | ![]() |
Badger Basket Envee Baby High Chair/ Play Table in Pink | 585e7877-95eb-4864-9d89-03d5369c08fa |
3 | ![]() |
Crocs Girl (Toddler) 'CC Magical Day Princess' Synthetic Casual Shoes (Size 6 ) | 14c3ad94-3ecd-438b-b00e-1ce5b0eed4e3 |
4 | ![]() |
Crocs Girl (Toddler) 'CC Magical Day Princess' Synthetic Casual Shoes (Size 6 ) | 30809211-dbcd-4b15-8c0a-7702dfe9e30f |
Other Notebooks: