☀️ Cluster Centroid Heat Maps#

In order to better interpret your clusters, you may need to visualise them using heatmaps. These heatmaps allow users to see which clusters are the closest.

%load_ext autoreload
%autoreload 2

Installation#

!pip install -q jsonshower
from relevanceai import Client
client = Client()

You can retrieve the ecommerce dataset from https://relevanceai.readthedocs.io/en/development/core/available_datasets.html#relevanceai.utils.datasets.get_ecommerce_1_dataset.

ds = client.Dataset("ecommerce")

Centroid Heatmap#

from relevanceai.operations.viz.cluster import ClusterVizOps

cluster_ops = ClusterVizOps.from_dataset(
    ds, alias="main-cluster", vector_fields=["product_image_clip_vector_"]
)
cluster_ops.centroid_heatmap()
Your closest centroids are:
0.74 cluster-5, cluster-1
0.73 cluster-5, cluster-4
0.71 cluster-4, cluster-1
0.65 cluster-4, cluster-2
0.65 cluster-7, cluster-2
0.64 cluster-7, cluster-4
0.64 cluster-7, cluster-5
0.63 cluster-5, cluster-2
[Text(0.5, 1.0, 'cosine plot')]
../_images/cluster_centroid_heatmap_guide_11_2.png

Now we can see if our clusters are useful when we check the dashboard and inspect those clusters:

closest = cluster_ops.closest()["results"]
You can now visit the dashboard at https://cloud.tryrelevance.com/sdk/cluster/centroids/closest

Below, we can now see if 2 separate clusters. One for boots and one for shoes and if we need that granularity.

cluster_ops.show_closest(
    cluster_ids=["cluster-1", "cluster-5"], image_fields=["product_image"]
)
You can now visit the dashboard at https://cloud.tryrelevance.com/sdk/cluster/centroids/closest
product_image cluster_id _id
0 cluster-1 931f907b-13f1-41e5-92fe-c8007cdedada
1 cluster-1 93734870-b304-4426-9cd4-d906fea340b8
2 cluster-1 6416c33d-3287-446c-90d3-ea220bf6312b
3 cluster-5 8f5dfc61-6fd1-422e-9682-7df039b8c099
4 cluster-5 65082728-720b-4604-8ee4-f7d0ecab0e7f
5 cluster-5 7ace5350-1487-44d3-9840-2b89183f3117