relevanceai.operations.cluster.centroids#

Module Contents#

class relevanceai.operations.cluster.centroids.Centroids(credentials: relevanceai.client.helpers.Credentials, dataset_id: str)#

Batch API client

closest(self, cluster_ids: Optional[List] = None, centroid_vector_fields: Optional[List] = None, select_fields: Optional[List] = None, approx: int = 0, sum_fields: bool = True, page_size: int = 1, page: int = 1, similarity_metric: str = 'cosine', filters: Optional[List] = None, min_score: int = 0, include_vector: bool = False, include_count: bool = True)#

List of documents closest from the centre.

Parameters
  • cluster_ids (list) – Any of the cluster ids

  • centroid_vector_fields (list) – Vector fields stored

  • select_fields (list) – Fields to include in the search results, empty array/list means all fields

  • approx (int) – Used for approximate search to speed up search. The higher the number, faster the search but potentially less accurate

  • sum_fields (bool) – Whether to sum the multiple vectors similarity search score as 1 or seperate

  • page_size (int) – Size of each page of results

  • page (int) – Page of the results

  • similarity_metric (string) – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]

  • filters (list) – Query for filtering the search results

  • min_score (int) – Minimum score for similarity metric

  • include_vectors (bool) – Include vectors in the search results

  • include_count (bool) – Include the total count of results in the search results

Example

from relevanceai import Client
from relevanceai.ops.clusterer import ClusterOps
from relevanceai.ops.clusterer.kmeans_clusterer import KMeansModel

client = Client()

dataset_id = "sample_dataset_id"
df = client.Dataset(dataset_id)

vector_field = "vector_field_"
n_clusters = 10

model = KMeansModel(k=n_clusters)

df.cluster(model=model, alias=f"kmeans-{n_clusters}", vector_fields=[vector_field])
furthest(self, cluster_ids: Optional[List] = None, centroid_vector_fields: Optional[List] = None, select_fields: Optional[List] = None, approx: int = 0, sum_fields: bool = True, page_size: int = 1, page: int = 1, similarity_metric: str = 'cosine', filters: Optional[List] = None, min_score: int = 0, include_vector: bool = False, include_count: bool = True)#

List of documents furthest from the centre.

Parameters
  • cluster_ids (list) – Any of the cluster ids

  • select_fields (list) – Fields to include in the search results, empty array/list means all fields

  • approx (int) – Used for approximate search to speed up search. The higher the number, faster the search but potentially less accurate

  • sum_fields (bool) – Whether to sum the multiple vectors similarity search score as 1 or seperate

  • page_size (int) – Size of each page of results

  • page (int) – Page of the results

  • similarity_metric (string) – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]

  • filters (list) – Query for filtering the search results

  • min_score (int) – Minimum score for similarity metric

  • include_vectors (bool) – Include vectors in the search results

  • include_count (bool) – Include the total count of results in the search results

update(self, dataset_id: str, vector_fields: List[str], centroid_vector_fields: List[str], alias: str, cluster_centers: List[Dict[str, List[float]]])#

API reference link: https://api.us-east-1.tryrelevance.com/latest/core/documentation#operation/UpdateClusterCentroids

Update the centroids contained within your dataset

Parameters
  • dataset_id (str) – The name of the dataset

  • vector_fields (List[str]) – A list of the vectors fields in your dataset that have cluster centroids you wish to update

  • alias (str) – The alias that was used to cluster

  • cluster_centers (List[Dict[str: List[float]]]) – A List containing dictionaries of cluster id’s to be updated, with their keys being the new centroids

delete_centroid_by_id(self, centroid_id: str, dataset_id: str, vector_field: str, alias: str)#

OLD API reference link: https://api.us-east-1.tryrelevance.com/latest/documentation#operation/delete_centroids_api_services_cluster_centroids__centroid_id__delete_post

Delete a centroid by ID

Parameters
  • centroid_id (str) – The id of the centroid

  • dataset_id (str) – The name of the dataset

  • vector_field (str) – The vector_field that contains the cluster id

  • alias (str) – The alias that was used to cluster