Inserting Data#

Inserting CSVs#

from relevanceai import Client
client = Client()
df = client.Dataset("sample_dataset_id")

csv_filename = "temp.csv"
df.insert_csv(csv_filename)

Inserting Pandas Dataframes#

Insert a dataframe into the dataset. Takes additional args and kwargs based on insert_documents.

from relevanceai import Client
client = Client()
df = client.Dataset("sample_dataset_id")
pandas_df = pd.DataFrame({"value": [3, 2, 1], "_id": ["10", "11", "12"]})
df.insert_pandas_dataframe(pandas_df)

Inserting Media#

Given a path to a directory, this method loads all media-related files into a Dataset.

from relevanceai import Client
client = Client()
ds = client.Dataset("dataset_id")

from pathlib import Path
path = Path("medias/")
# list(path.iterdir()) returns
# [
#    PosixPath('media.jpg'),
#    PosixPath('more-medias'), # a directory
# ]

get_all_medias: bool = True
if get_all_medias:
    # Inserts all medias, even those in the more-medias directory
    ds.insert_media_folder(
        field="medias", path=path, recurse=True
    )
else:
    # Only inserts media.jpg
    ds.insert_media_folder(
        field="medias", path=path, recurse=False
    )

Inserting Documents (JSON-like objects)#

from relevanceai import Client

client = Client()

dataset_id = "sample_dataset_id"
df = client.Dataset(dataset_id)

documents = [
    {
        "_id": "10",
        "value": 5
    },
    {
        "_id": "332",
        "value": 10
    }
]

df.insert_documents(documents)

Insert a list of documents with multi-threading automatically enabled.

When inserting the document you can optionally specify your own id for a document by using the field name “_id”, if not specified a random id is assigned.

When inserting or specifying vectors in a document use the suffix (ends with) “_vector_” for the field name. e.g. “product_description_vector_”.

When inserting or specifying chunks in a document the suffix (ends with) “_chunk_” for the field name. e.g. “products_chunk_”.

When inserting or specifying chunk vectors in a document’s chunks use the suffix (ends with) “_chunkvector_” for the field name. e.g. “products_chunk_.product_description_chunkvector_”.