Useful Utilities#

Preview Your Dataset#

Read.head(n=5, raw_json=False, select_fields=None, **kw)#

Return the first n rows. returns the first n rows of your dataset. It is useful for quickly testing if your object has the right type of data in it.

Parameters
  • n (int, default 5) – Number of rows to select.

  • raw_json (bool) – If True, returns raw JSON and not Pandas Dataframe

  • kw – Additional arguments to feed into show_json

Returns

The first ‘n’ rows of the caller object.

Return type

Pandas DataFrame or Dict, depending on args

Example

from relevanceai import Client

client = Client()

df = client.Dataset("sample_dataset_id", image_fields=["image_url])

df.head()

Info#

Read.info(dtype_count=False)#

Return a dictionary that contains information about the Dataset including the index dtype and columns and non-null values.

Parameters

dtype_count (bool) – If dtype_count is True, prints a value_counts of the data type

Returns

a pandas dataframe of information

Return type

pd.DataFrame

Example

from relevanceai import Client

client = Client()

dataset_id = "sample_dataset_id"
df = client.Dataset(dataset_id)
df.info()

Schema#

Read.schema()#

Returns the schema of a dataset. Refer to datasets.create for different field types available in a Relevance schema.

Example

from relevanceai import Client
client = Client()
dataset_id = "sample_dataset_id"
df = client.Dataset(dataset_id)
df.schema
Return type

Dict

Shape#

Read.shape()#

Returns the shape (N x C) of a dataset N = number of samples in the Dataset C = number of columns in the Dataset

Returns

(N, C)

Return type

Tuple

Example

from relevanceai import Client

client = Client()

dataset_id = "sample_dataset_id"
df = client.Dataset(dataset_id)

length, width = df.shape

Chunk#

Read.chunk_dataset(select_fields=None, chunksize=100, filters=None, after_id=None)#

Function for chunking a dataset

Example

from relevanceai import Client
client = Client()
ds = client.Dataset("sample")
for c in ds.chunk_dataset(
    select_fields=["sample_label"],
    chunksize=100
):
    # Returns a dictionary with 'cursor' and 'documents' keys
    docs = c['documents']
    cursor = c['cursor']
    for d in docs:
        d.update({"value": 3})
    ds.upsert_documents(docs)