⚑ How To Vectorize#

Open In Colab

Installation#

You can install the Relevance AI using

pip install -q RelevanceAI

Encoding an entire dataset#

The easiest way to update an existing dataset with encoding results is to run encode_documents. This function fetches all the data-points in a dataset, runs the specified function (i.e.Β encoding in this case) and writes the result back to the dataset.

For instance, in the sample code below, we use a dataset called ecommerce_dataset, and encodes the product_description field using the USE2Vec encoder. You can see the list of the available list of models for vectorising here using Vectorhub or feel free to bring your own model(s).

from relevanceai import Client

"""
You can sign up/login and find your credentials here: https://cloud.relevance.ai/sdk/api
Once you have signed up, click on the value under `Activation token` and paste it here
"""
client = Client()
from relevanceai.utils import get_ecommerce_1_dataset
dataset_id = "ecommerce-2"
documents = get_ecommerce_1_dataset(number_of_documents=100)
ds = client.Dataset(dataset_id)
ds.delete()
ds.insert_documents(documents, create_id=True)
ds.vectorize_text(
    models=["princeton-nlp/sup-simcse-roberta-large"],
    fields=["product_text"]
)