β‘ How To Vectorize#
Installation#
You can install the Relevance AI using
pip install -q RelevanceAI
Encoding an entire dataset#
The easiest way to update an existing dataset with encoding results is
to run encode_documents
. This function fetches all the data-points
in a dataset, runs the specified function (i.e.Β encoding in this case)
and writes the result back to the dataset.
For instance, in the sample code below, we use a dataset called
ecommerce_dataset
, and encodes the product_description
field
using the USE2Vec
encoder. You can see the list of the available
list of models for vectorising here using
Vectorhub or feel free to
bring your own model(s).
from relevanceai import Client
"""
You can sign up/login and find your credentials here: https://cloud.tryrelevance.com/sdk/api
Once you have signed up, click on the value under `Activation token` and paste it here
"""
client = Client()
from relevanceai.utils import get_ecommerce_1_dataset
dataset_id = "ecommerce-2"
documents = get_ecommerce_1_dataset(number_of_documents=100)
ds = client.Dataset(dataset_id)
ds.delete()
ds.insert_documents(documents, create_id=True)
ds.vectorize_text(
models=["princeton-nlp/sup-simcse-roberta-large"],
fields=["product_text"]
)