Base class for clustering

Module Contents#

class relevanceai.operations_new.cluster.transform.ClusterTransform(vector_fields: List[str], alias: str, model: Any, model_kwargs: Optional[dict] = None, cluster_field: str = '_cluster_', include_cluster_report: bool = False, **kwargs)#

To write your own operation, you need to add: - name - transform

model :relevanceai.operations_new.cluster.models.base.ClusterModelBase#
property name(self)#

abstractproperty for name

fit_predict_documents(self, documents, warm_start=False)#

If warm_start=True, copies the values from the previous fit. Only works for cluster models that use centroids. You should not have to use this parameter.

transform(self, documents: List[Dict[str, Any]]) List[Dict[str, Any]]#

It takes a list of documents, and for each document, it runs the document through each of the models in the pipeline, and returns the updated documents.


documents (List[Dict[str, Any]]) – List[Dict[str, Any]]

Return type

A list of dictionaries.

static calculate_silhouette(vectors, labels)#
static calculate_squared_error(vectors, labels, centroids)#
format_cluster_label(self, label)#

> If the label is an integer, return a string that says “cluster-” and the integer. If the label is a string, return the string. If the label is neither, raise an error


label – the label of the cluster. This can be a string or an integer.

Return type

A list of lists.

format_cluster_labels(self, labels)#