Useful Utilities#
Preview Your Dataset#
- Read.head(n=5, raw_json=False, select_fields=None, **kw)#
Return the first n rows. returns the first n rows of your dataset. It is useful for quickly testing if your object has the right type of data in it.
- Parameters
n (int, default 5) – Number of rows to select.
raw_json (bool) – If True, returns raw JSON and not Pandas Dataframe
kw – Additional arguments to feed into show_json
- Returns
The first ‘n’ rows of the caller object.
- Return type
Pandas DataFrame or Dict, depending on args
Example
from relevanceai import Client client = Client() df = client.Dataset("sample_dataset_id", image_fields=["image_url]) df.head()
Info#
- Read.info(dtype_count=False)#
Return a dictionary that contains information about the Dataset including the index dtype and columns and non-null values.
- Parameters
dtype_count (bool) – If dtype_count is True, prints a value_counts of the data type
- Returns
a pandas dataframe of information
- Return type
pd.DataFrame
Example
from relevanceai import Client client = Client() dataset_id = "sample_dataset_id" df = client.Dataset(dataset_id) df.info()
Schema#
- Read.schema()#
Returns the schema of a dataset. Refer to datasets.create for different field types available in a Relevance schema.
Example
from relevanceai import Client client = Client() dataset_id = "sample_dataset_id" df = client.Dataset(dataset_id) df.schema
- Return type
Dict
Shape#
- Read.shape()#
Returns the shape (N x C) of a dataset N = number of samples in the Dataset C = number of columns in the Dataset
- Returns
(N, C)
- Return type
Tuple
Example
from relevanceai import Client client = Client() dataset_id = "sample_dataset_id" df = client.Dataset(dataset_id) length, width = df.shape
Chunk#
- Read.chunk_dataset(select_fields=None, chunksize=100, filters=None, after_id=None)#
Function for chunking a dataset
Example
from relevanceai import Client client = Client() ds = client.Dataset("sample") for c in ds.chunk_dataset( select_fields=["sample_label"], chunksize=100 ): # Returns a dictionary with 'cursor' and 'documents' keys docs = c['documents'] cursor = c['cursor'] for d in docs: d.update({"value": 3}) ds.upsert_documents(docs)