relevanceai.dataset.series#

Module Contents#

class relevanceai.dataset.series.Series(dataset, field: str, image_fields: Optional[List[str]] = None, audio_fields: Optional[List[str]] = None, highlight_fields: Optional[Dict[str, List]] = None, text_fields: Optional[List[str]] = None)#

A wrapper class for being able to vectorize documents over field

Parameters
  • project (str) – Project name on RelevanceAI

  • api_key (str) – API key for RelevanceAI

  • dataset_id (str) – Data type for the output Series. If not specified, this will be inferred from data. See the user guide for more usages.

  • field (str) – The name of the field with the Dataset.

Examples

Assuming the following code has been executed:

from relevanceai import client
relevanceai.package_utils.datasets import get_dummy_ecommerce_dataset

documents = get_dummy_ecommerce_dataset()
client = Client()

df = client.Dataset('ecommerce')
df.create()
df.insert_documents(documents)

Retrieve a Series from your dataset

product_images = df['product_image'] # A Series object of every every product image url in dataset
head#
list_aliases(self)#
sample(self, n: int = 1, frac: float = None, filters: Optional[list] = None, random_state: int = 0, include_vector: bool = True, output_format='pandas')#

Return a random sample of items from a dataset.

Parameters
  • n (int) – Number of items to return. Cannot be used with frac.

  • frac (float) – Fraction of items to return. Cannot be used with n.

  • filters (list) – Query for filtering the search results

  • random_state (int) – Random Seed for retrieving random documents.

Example

from relevanceai import client

client = Client()

df = client.Dataset(dataset_id)
df.sample(n=3)
all(self, chunksize: int = 1000, filters: Optional[List] = None, sort: Optional[List] = None, include_vector: bool = True, show_progress_bar: bool = True)#
apply(self, func: Callable, output_field: str, filters: list = [], axis: int = 0, **kwargs)#

Apply a function along an axis of the DataFrame.

Objects passed to the function are Series objects whose index is either the DataFrame’s index (axis=0) or the DataFrame’s columns (axis=1). By default (result_type=None), the final return type is inferred from the return type of the applied function. Otherwise, it depends on the result_type argument.

Note

We recommend using the bulk_apply functionality if you are looking to have faster processing.

Parameters
  • func (function) – Function to apply to each document

  • axis (int) – Axis along which the function is applied. - 9 or ‘index’: apply function to each column - 1 or ‘columns’: apply function to each row

  • output_field (str) – The field from which to output

Example

from relevanceai import Client

client = Client()

dataset_id = "sample_dataset_id"
df = client.Dataset(dataset_id)

df["sample_1_label"].apply(lambda x: x + 3, output_field="output_field")
numpy(self) numpy.ndarray#

Iterates over all documents in dataset and returns all numeric values in a numpy array.

Parameters

None

Returns

vectors – an array/matrix of all numeric values selected

Return type

np.ndarray

Example

from relevanceai import Client

client = Client()

dataset_id = "sample_dataset_id"
df = client.Dataset(dataset_id)

field = "sample_field"
arr = df[field].numpy()
value_counts(self, normalize: bool = False, ascending: bool = False, sort: bool = False, bins: Optional[int] = None)#

Return a Series containing counts of unique values (or values with in a range if bins is set).

Parameters
  • normalize (bool, default False) – If True then the object returned will contain the relative frequencies of the unique values.

  • ascending (bool, default False) – Sort in ascending order.

  • bins (int, optional) – Groups categories into ‘bins’. These bins are good for representing groups within continuous series

Return type

Series

Example

from relevanceai import Client

client = Client()

dataset_id = "sample_dataset_id"
df = client.Dataset(dataset_id)

field = "sample_field"
value_counts_df = df[field].value_counts()
contains(self, other: Any)#
exists(self)#
not_exists(self)#
date(self, other: Any)#
categories(self, other: List[Any])#
filter(self, **kwargs)#
set_dtype(self, dtype)#
property values(self)#